Edited at

How to convert `org.apache.hadoop.io.Text` to byte array?

More than 3 years have passed since last update.

How to convert org.apache.hadoop.io.Text to byte array?

This class has a "sweet" method named getBytes. Seems we can directly convert it to byte array using this method.

If you have similar idea, don't do that! It takes me almost an hour to find this stupid bug.

According to official documentation:


Returns the raw bytes; however, only data up to getLength() is valid

So the bytes you get may contain invalid data!

I think one solution is through Bytes.toByte



Who or what kind of application will use the getBytes method? I totally have no idea.

What can we learn from this?

As library developer,

  • Don't expect everyone will read the whole documentation especially when your library contains tens of thousands of methods.

  • Never expose dangerous methods like getBytes. Or you can name it differently like getBytesUnsafe to call the attention of library users.

As programmer,

  • Be familiar with every method you have called.