De-serializing Kafka Messages With Union-Defined Field

I was writing a Kafka consumer application for a proof-of-concept (POC) project recently when I got into this weird de-serialization behavior, where when it reached a certain field there would always be an error no matter what. This is even though, and I checked repeatedly, that the Kafka producer was serializing and publishing the object to the topic correctly.

Initially, I thought it was the JSON library I was using that was parsing it incorrectly. Both GSON and Jackson, however, failed at several attempts of parsing. This made me think that it was not the JSON libraries’ fault, although I was not totally convinced, so I went to searching on the Internet if anybody had encountered anything similar and how they fixed it.

To give some context on the issue, the error I get is that it was expecting a boolean for that field but instead got an object.

java.lang.IllegalStateException: Expected a boolean but was BEGIN_OBJECT at line XX column YY path...

And for further context, I defined that particular field (Well, there were 2 fields like this in the Avro schema) with a union type. In Avro, fields must have a type and it can be defined with more than one datatype if not just one is expected or allowed for that field.

In my case I wanted to have a boolean field that can also have a null value.

 {
 	"name": "foobar",
 	"type": ["boolean", "null"]
 }

Yes I know, boolean with a null value? What was I thinking?

As quirky as this may sound it is allowed in Java design using the Boolean class instead of its primitive.

Running on debug mode, this is what happened:

{
	"someField": "hello world",
	"dependencyField": true,
	"foobar": {
		"boolean": false
	}
}

But the expected format is supposed to be (And this is how the Avro message is published in the Kafka queue before consumption):

{
	"someField": "hello world",
	"dependencyField": true,
	"foobar": false
}

The reason behind this is that the field, in our example I called it foobar, depends on another boolean field, where if the latter is true, then and only then should foobar have either a true or false value. Simply having its default as false might cause it to give the wrong state for further processors down the line.

At some point I even toyed with the idea of converting the field as a String. Then have 3 possible values, “YES”, “NO” and “N/A”. This would have been more logical. But I did not. I stuck with boolean.

Ultimately, I knew I had to change the schema for it to get past that error. I did not want to but since this was still a POC project, I thought I could live with it. I took out the union type and defined the field as boolean only. Then the issue went away.

This did not answer the question though on why that particular field with a union representation of boolean and null as its type is being interpreted in a different way than the expected.

UPDATE:

So I just found out that there is a JIRA issue about this created here – https://issues.apache.org/jira/browse/AVRO-1582

It looks like this issue has been around for some time and is still unresolved. From the timestamp, it was reported since September 2014.

Similar Posts: