The One Billion Row Challenge with Zig

The 1brc challenge is finishing today; here is my experience with it.

What is 1brc?

One Billion Row Challenge(1brc) is a challenge in which you must process 1 billion rows on a CSV file as quickly as possible. You were given a CSV file with weather stations. As an output, you need to calculate the min, mean, and max for each weather station and return the output in sorted order (alphabetically by the name) rounded to one fractional digit.

1brc has a primary target for Java solutions, but why don’t we implement it using other languages? It’s an excellent way to learn new tricks in the performance optimization space.

I’ve chosen Zig because it’s a modern general-purpose programming language that allows low-level control over the system. Zig has all the tools one may need to solve such tasks.

Baseline solution where we read line by line and process it takes ~2 min to complete. Let’s optimize it.

First, we need efficient access to the file. Since the 1B row file takes ~14GB, loading it into RAM is not a good idea. We can use mmap to create a memory map; with memory mappings, we know the file bounds, so we utilize our next optimization: processing chunks in parallel.

Let’s leverage modern CPUs with multiple cores to process chunks in parallel. First, we need to know how many CPU cores we have. It gives us the number of workers that will process a chunk of the file.

After we process all the chunks, we can collect the results into one final result.

The output needs to be sorted alphabetically. To do that, I used btree, to keep keys in sorted order.

After the optimizations, I reduced the time from ~2 min to ~4 sec. However, we can do even more.

Most performant solutions used a lot of clever tricks and observations to squeeze more.

Some of them are:

  • hash station names
  • use SIMD for parsing and comparing the strings
  • parse numbers as integers instead of floats

Thanks Gunnar Morling, for a great challenge!

For more info about the challenge, see

My code: