0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

First step to fast protocol buffers: Varint

Last updated at Posted at 2023-12-24

Today's article marks my initial step on a journey to create a fast protocol buffer implementation − Introducing protobuf-varint.

This article assumes some basic knowledge of programming and JavaScript but should also be engaging for computer enthusiasts.

Aside from delving into the basics and internals of protobufs, you will also learn a few tricks on how to write faster JavaScript codecs.

Today's article is my first step on a journey to write a fast protocol buffer implementation.

:mortar_board:   Background

JSON, the most common format for sending data, is designed to outperform XML. XML was slow, challenging for humans to read, and introduced considerable overhead.

JSON proves excellent unless you're dealing with a lot of data. It doesn't stream well, and human readability becomes unnecessary when handling data beyond a human's lifetime reading capacity.

A company that extensively transfers data — Google — developed the protobuf transfer format for improved efficiency. Unlike JSON, it is a binary format that supports streaming and aims to minimize transferred data amount.

Google's JavaScript library, protobuf.js, is okay. However, my experience with hypercore revealed its relatively slow performance, while Mafintosh's protocol-buffers is notably faster and more lightweight.

Upon examining the internals, I discovered that protocol-buffers utilizes varintin the background. varint is a handy little library that I thought works well for most use cases, but I have encountered some issues with it... :scream:

:sunglasses:   Varint is cool

Before delving into the code, let's have a primer on var-ints. The protocol buffers specification has four integer data types: 32-bit and 64-bit integers. Both in signed (positive and negative) and unsigned variants (exclusively positive).

type bit byte min value max value
int32 32 4 -0x8000_0000 +0x7fff_ffff
uint32 32 4 +0x0000_0000 +0xffff_ffff
int64 64 8 -0x8000_0000_0000_0000 +0x7fff_ffff_ffff_ffff
uint64 64 8 +0x0000_0000_0000_0000 +0xffff_ffff_ffff_ffff

Actually, there there are indeed two signed encodings: the standard int32 and int64 encodings, along with "zig-zag" variants, sint32 and sint64.
A dedicated section about this topic is reserved for the end of this article. Until then, the main focus of the article revolves mostly around uint32 and uint64.

All var-int codecs employ the same intriguing trick. Instead of directly storing the 4 or 8 bytes, var-int encoding is clever. :bulb:

Out of the 8 bits in a byte, it utilizes only 7. The last bit, also known as the "most significant bit" (MSB), comes into play when additional bits are required.

For unsigned numbers ranging from 0~127 (0b00000000~0b01111111), a single byte suffices. Numbers starting from 128 (0b10000000) onwards use the MSB, necessitating at least two bytes. For example, encoding 128 is encoded as [0b10000000, 0b00000001].
The first byte comprises the initial 7 bits of 128 (all 0) along with the MSB, while the second byte contains the remaining bit needed for 128.

As the numbers grow, like 314_151_314 (0b00010010_10111001_10010001_10010010), additional bytes become essential for encoding. Let's examine this number, grouped by 7 and 8 bits.

                               4         3          2          1
                          ┏━━━━━━━━┓ ┏━━━━━━━━┓ ┏━━━━━━━━┓ ┏━━━━━━━━┓
  8 bit encoded:           00010010   10111001   10010001   10010010
                           ╭──╯│  │╭──╯  │   │╭──╯ │    │╭──╯│
  7 bit blocks:         0001   0010101   1100110   0100011   0010010
  varint encoded:    0000001  10010101  11100110  10100011  10010010
                    ┗━━┷━━━━┛┗┷━━━━━━━┛┗┷━━━━━━━┛┗┷━━━━━━━┛┗┷━━━━━━━┛
                        5        4         3         2         1

In the best-case scenario, this means that up to 75% of data can be omitted for the downside that in a worst case scenario 20% more are required. :heart_eyes:

The impact can be particularly significant, when dealing with sensor data that typically falls within a specific range but may have peaks and valleys.

However, since this isn't universally the case, protocol buffers require you to choose which data encoding fits the nature of the data you are working with.

Note: Varint is similar to UTF-8 encoding.

JavaScript Number Notation

There are various ways to represent numbers in JavaScript. The default assumes a decimal system: 326 is written as 326. In the example above, I utilized the 0x... notation for numbers in a hexadecimal system: 326 is represented as 0x146.

While working on this project, I found the binary notation particularly useful: 326 can be also written as 0b101000110.

Numeric separators prove quite handy as well. The underscore _ can be employed to enhance number readability, and you can place it wherever you prefer: 3_123_4172_93 is equivalent to 3123417293. I used it to group bytes in binary numbers for improved readability: 0b11111111_11111111 is can also be written like 0xFF_FF.

:turtle:  Slow JavaScript code

The varint library is not just faster than Google's library; it is also smaller and more versatile. Aside from functional issues (that I will address later), there are quite a few other things that seem like bad decisions for processing a lot of data.

varint/decode.js#L14~L18
do {
  if (counter >= l || shift > 49) {
    read.bytes = 0
    throw new RangeError('Could not decode varint')
  }
  // ..

For every byte that is read, the if (...) condition is executed. counter >= lchecks the length of an array. Before we read the contents of the array, it is unknown how many bytes we should be reading and if all the bytes are in the binary data chunk. This is a problem, particularly for streaming, where usually an arbitrary amount of bytes is received at a time, and we don't know how much more we have received.

for await (const chunk of connection) {
   // How many numbers are in the chunk?
}

To eliminate this check, I recalled that neither Uint8Array nor Array throws an error when you access data outside the bounds.

(new Uint8Array)[1] === undefined // No problem.

When you have, let's say, a 5MB buffer to decode: almost all numbers can be read without the check. Only at the very end of the buffer, reading could become an issue. To speed up the process, it would be better to assume everything is okay for the common case and conduct a check of the last read number.

const numbers = []
const chunk = //... your input buffer (uint8array)
let offset = 0
while (offset < buffer.byteLength) {
   numbers.push(decode(buffer, offset))
   off += decode.bytes
}
if (off > l) { // read over the end
    numbers.pop() // last number wasn't read correctly
    off -= decode.bytes
}

After all the data is read, we check whether the last number actually used more bytes than are in the chunk and in that case, remove the last number from the list.

So, it would be faster (and no problem) to simply not check counter > l of the buffer. Before we can remove the if entierely, we need to contend with the other check: shift > 49. This check is needed because without it, the loop might create unusuable numbers:

varint/decode.js#L14~L24
do {
    if (/* ... */ shift > 49) {
       // …
    }
    // …
    shift += 7
} while (b >= MSB)

The > 49 and b >= MSB conditions indicate that the loop will iterate either until there is a number without the MSB or if the number read contains more than 49 bits.

However, since each iteration uses shift += 7, it implies that the maximum number of loops is 5 for 32-bit numbers and 10 for 64-bit numbers.

We can skip the entire if (...) block by copying the code 5 times! After all, the encoded number cannot be higher than 32-bit anyway.

b = buf[counter++]
res += shift < 28
  ? (b & REST) << shift
  : (b & REST) * Math.pow(2, shift)
shift += 7
// ---
b = buf[counter++]
res += shift < 28
  ? (b & REST) << shift
  : (b & REST) * Math.pow(2, shift)
shift += 7
// ... 3 more times.

There are several advantages to simply copying the code 5 times. For one, we know the values of counter and shift: 0~4 and 0, 7, 14, 21, 28. So we can fill these in:

b = buf[counter]
if (b < MSB) {
  return b
}
let res = b & REST
b = buf[counter + 1]
if (b < MSB) {
  return res + (b << 7)
}
res += (b & REST) << 7
b = buf[counter + 2]
// ... etc.

This code avoids quite a few code branchs (if/for/while) and also inlines several operations that the computer doesnt need to perform anymore.

Here is the complete uint32 decode function I came up with:
martinheidegger/protobuf-varint/uint32#L17-L44

:eyes:   JavaScript's missing 64-bit number

You may have noticed earlier that the varint library only supports up to 49 bits. This limitation exists because number operations in JavaScript can involve both floating-point numbers and integers. That's why we have the Number.MAX_SAFE_INTEGER constant, which is 0x1f_ffff_ffff_ffff (a 53-bit number).

Using the varint library for 64-bit de-/encoding is a bad idea. Don't use it!

If your protocol buffer implementation returns regular numbers for 64-bit types, that likely indicates it is not compatible!

There is an easy way to work around this: use two numbers.
low for the first 32 bits and high for the second (higher) 32 bits.

This is a very common solution. Google's protobuf.js implements the LongBits class that has a lo and hi property. The much more popular library long also implements 64-bit numbers.

However, my recommendation to work with long numbers is my own, long-compatible,
library: → longfn

The advantage of longfn is that you don't need any custom classes and can simply use regular javascript objects. Not only is this faster, it also comes with fewer dependencies. :trophy:

import { util } from 'protobufjs'
import Long from 'long'

let int64;
int64 = new util.LongBits(low, high) // slow
int64 = new Long(0, 0) // ... also slow
int64 = { low: 0, high: 0 } // Fast and no dependencies

Without any dependency, protobuf-varint will return a simple object.

import { decode } from 'protobuf-varint/int64'
import { add, fromInt } from 'longfn'

const three = fromInt(3)
const long = decode([0])
// long === { low: 0, high: 0, unsigned: false }
add(long, three, long) // long + 3

This way, you have a long number object that you can use both for manual operations and together with any library that you like.

About BigInt
Some time ago, BigInt numbers were introduced to JavaScript engines.

When I wrote the code, I contemplated whether to use BigInt or longfn objects. Two factors were crucial in my decision to not choose BigInt. Firstly, BigInt is a bit older, and some of the older packing libraries don't support its syntax. Secondly, and more importantly, BigInt numbers - by definition - need bound checks and other operations that are make them slower than two regular number operations.

longfn comes with a useful fromBigInt and toBigInt helper in case you prefer that.

:package:   Bring your own memory (BYOM)

The naive way to encode binary data is to always return a new Uint8Array of the needed size.

function myDummyEncoder (input: number): Uint8Array {
   const buf = new Uint8Array(2)
   buf[0] = input & 0xff
   buf[1] = (input >> 8) & 0xff
   return buf
}

const asByte = myDummyEncoder(123)

The problem with this approach is that, typically, when you encode data, you need more than one value. In typical protocol buffers, there are values of mixed type. What you would then need to do is:

const result = new Uint8Array([
  ...myDummyEncoder(123),
  ...myDummyEncoder(77)
])

Every time the function is called, memory gets allocated, is used for a bit, and then garbage collected. This is terribly slow.

If you use any kind of binary encoding you want to provide your own Memory (Uint8Array) to encode the value into. It should also provide an offset from where to start writing.

function myEncoder (input: number, target: Uint8Array, offset: number): Uint8Array {
   target[offset] = input & 0xff
   target[offset + 1] = (input >> 8) & 0xff
   return target
}

const asByte = new Uint8Array(2)
myEncoder(123, asByte, 0)

For binary encoding this is long common. The term BYOB (bring your own buffer) is often used for that. I decided to also use that technique for longfn and for decoding. You can simply pass the target value to the decoder!

import { decode } from 'protobuf-varint/int64'

const long = { low: 0, high: 0, unsigned: false }
decode([123], 0, long) // Will write directly into the long value

This way we can avoid garbage collection and are very fast both when encoding and decoding. :race_car:

:dancers::dancer:   Execute once not twice

By using the previous BYOM technique we run into the issue that we need to know before we encode the value if enough memory is allocated.

import { encode } from 'protobuf-varint/uint32'

encode(
  41415256,
  new Uint8Array(/* How many bytes do we need? */),
)

The common way for codecs is to implement a encodeLength function for a given type to evaluate the size before writing it.

import { encode, encodeLength } from 'protobuf-varint/uint32'

const num = 41415256
encode(
    num,
    new Uint8Array(encodeLength(num))
)

The problem with this approach is that the entire logic of encodeLength needs to be executed again when calling encode.

Through a change of an API I was able to improve the performance by yet another bit for encoding:

import { fixedEncoder } from 'protobuf-varint/uint32'

const num = 41415256
const encoder = fixedEncoder(num)
encoder.encode(
  num,
  new Uint8Array(encoder.bytes)
)

This way the fixedEncoder function will figure out which of each encoder is to be used for the encode call.

Here is how the 1 byte encoder looks like that you get with fixedEncoder(0):

protobuf-varint/lib/uint32.mjs#L64-L70
{
  bytes: 1,
  encode(int, out = [], offset = 0) {
    out[offset] = int
    return out
  }
},

This is as fast as it gets in JavaScript. No branches, no operations. In my first tests this approach is so fast that it is quicker than a lookup for values. :rocket:

:japanese_goblin: Treacherous Code Reuse

Until now I havn't gone into detail in the difference between uint32 (unsigned) encoding and its signed variants int32 and sint32. Before we start, an important note:

sint32 is much better than int32
int32 encodes negative numbers with a bit in the last byte.

+1 → [0b0_0000001]
-1 → [0b1_1111111, 0b1_1111111, 0b1_1111111, 0b1_1111111, 0b0_0001111]

This means that while positive numbers can start with few bytes and become more: every negative numbers will use all 5 bytes.

sint32 uses zig-zag encoding. It will flip the first bit if the number is negative and then will start the encoding from the second bit. This means both positive and negative numbers will grow similarly.

  starts here ↘︎   ↙︎ 1 if negative, 0 if positive
    MSB↘       ↓ ↓
+1 → [0b0_000001_0]
-1 → [0b0_000000_1]
+5 → [0b0_000100_0]
-5 → [0b0_000100_1]

Practically, this means you should almost never use int32 or int64 and instead you should use sint32 or sint64.

Putting that big notice out of the way. In many implementations you might notice that the zig-zag encoding is implemented like in the google documentation:

function encode32(num) {
  return (num << 1) ^ (num >> 31)
}

The problem with this code is that it doesn't take JavaScript number limitations into account.

encode32(+1) === 2
encode32(-1) === 1
encode32(+5) === 10
encode32(-5) === 9
// ↑ Looks good

// ↓ Oops
encode32(-0x80000000) === -1

To avoid this issue and still be fast in the de-/encoding process, I implemented the variants separately to make sure that all edges work well!

:sweat_smile:   Wrapping it up

I have a fondness for protocol buffers and the insights I've gained so far while working on the protobuf-varint implementation. Currently, it's not useful for my work, and chances are it'll remain in its initial state for a while. However, if you find value in this and share the enthusiasm for creating a fast, next-level protobuf implementation, I would be thrilled to collaborate with you on it. :hugging:

2023 has been quite a year for me, filled with numerous events. OWDDM has become more active, and I was grateful to rejoin JSConf.jp as staff for the first offline event in years. Having danced on the edge of burnout for a while, I wasn't able to wrap up this article before Christmas as well as I'd have liked. Still, I'm happy to have managed this, but in 2024, I need to work on my energy planning.

Thank you for reading, and I hope you found it enjoyable and I wish you all the best in the next year!

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?