⏏️

Define Zig Enums from Data

I am working currently on a small Zig library which I am not quite ready to talk about yet. However, it involves parsing a big table of data and then doing operations based on that data.

For the remainder of this article, I shall pretend the library is about wind directions.

Note that this article uses Zig 0.14.1. Big changes have happened in the master branch since that was released. Though the parts discussed are probably still the same.

Library Structure Outline

There is a data file called WindData.txt. It looks something like this.1

00;Mistral;N
01;Bora;N
02;Sirocco;S
03;Khamaseen;S
04;Chinook;W
05;Zonda;W
06;Föhn;S
07;Santa Ana;E
08;Harmattan;E
09;Leste;E
0A;Pampero;S
0B;Nor'easter;N
0C;Southerly Buster;S
0D;Berg Wind;E
0E;Vendavales;S

This text file is parsed by a generate.zig file that organizes winds by their direction. The data is put into a wind_data.zon file, which is added by the build system as an Import so it is simply imported into the code base and .. calculated on.

The enum that describes wind directions is duplicated between the main code base and generate.zig.

const Direction = enum { N, E, S, W };

Problem Statement

What if a new version of the data adds a new wind direction: U (for Upwards)? Now the code fails! It needs a way to create the type out of the data.

A rational person would just add the new direction manually, because the algorithms depending on it are going to need to change anyway.

The normal solution (in widely used rust crate wind-direction, for example) is good old fashioned code generation using a generate.py script that is run manually every once in a while by the maintainers, which generates a tables.rs file that includes the above enum.

But this is Zig. Things are not done with code generation. Things are done with comptime and build systems!


generate.zig

I will spare you the boring mechanical parts of generate.zig that read a file and iterate over each line and collects the data into some sort of hashmap. The interesting bit I want to talk about is generating type-definition.zon.

During file reading, all the direction strings are collected into a set to make sure there are no repeats. This is simple enough code.

var set: std.StringArrayHashMapUnmanaged(void) = .empty;

// later in the loop,
try set.put(gpa, wind_direction_string, {});

After the set is collected, it needs to be serialized into zon (which is Zig's own json-like data type). This is as simple as outputting text to a file, but the Zig standard library provides some helpful builtins.2 This is easiest way I found to serialize the strings to what Zig calls enum_literals. (I will get back to that).

const out = std.io.getStdOut().writer(); // <-- replace with output file

// after finishing up the set.
const directions = set.keys();

var sz = std.zon.stringify.serializer(out, .{});
var container = try sz.beginTuple(.{});
for (directions, 0..) |direction, idx| {
    if (idx != 0) try sz.writer.writeByte(','); // a bit hacky but w/e
    try sz.ident(direction);
}
try container.end();
// maybe should print a new line here too?

And voila! Here is the happy data.zon. Note that this is written to an output file, not stdout like the code snippet below.

.{.N,.E,.S,.W}

build.zig

This is the least interesting part. It is just plumbing. Here is the code. It would get invoked whenever zig build commands are called, and cached accordingly, hopefully?

// calling genrate script
const generate_wind_data = b.addExecutable(.{
    .name = "wind_dir_data",
    .root_source_file = b.path("tools/generate.zig"),
    .target = b.graph.host,
});
const generate_step = b.addRunArtifact(generate_wind_data);
generate_step.addFileArg(b.path("WindData.txt"));
const wind_zon = generate_step.addOutputFileArg("data.zon");

// Main library.
const lib_mod = b.createModule(.{
    .root_source_file = b.path("src/lib.zig"),
    .target = target,
    .optimize = optimize,
});

// piping the two
lib_mod.addAnonymousImport(
    "wind_tables",
    .{ .root_source_file = wind_zon },
);

And that's it. Fairly simple code if you're familiar with the Zig build system.

The Library

Here is the cool stuff, if I may. Since this for defining a normal, non-generic type, we shall start it thus. All the code in the following snippets will be in there.

const WindData = WD: {
    // All following code is in here.
};

First, we need to @import the data. That's why it is in a zon file to begin with! But import as what type? In 0.14.1, we still need to define the types zon imports (a restriction I understand is lifted in master).

Zig has a special comptime only type called enum_literal. Oddly enough, unlike comptime_int and comptime_float, you cannot just do this:

// does not compile
const foo: enum_literal = .bar;

The type can actually be written out as @TypeOf(.enum_literal). Sure. A const slice of those would be []const @TypeOf(.enum_literal). Not that you need the const here for deserialization, as you cannot deserialize into a mutable slice.

const data: []const @TypeOf(.enum_literal) = @import("test.zon");

Then we need to iterate over the fields and convert them, one by one, to another type. This time it is std.builtin.Type.EnumField. An array of which will become our final enum. We create a BoundedArray at home with just an index and an iterator. This code was pretty much lifted from Mitchell Hashimoto's 'Tagged Union Subsets with Comptime in Zig' article, which is actually the inspiration for this post.3

var i: usize = 0;
var fields: [data.len]std.builtin.Type.EnumField = undefined;

// outer: // uneeded block label. see below
for (data) |literal| {
    const name = @tagName(literal);

    // // Deduplication code. It is not needed because the file is
    // // generated from a hash set.
    // for (fields[0..i]) |f|
    //     if (std.mem.eql(u8, f.name, name))
    //         continue :outer;

    fields[i] = .{
        .name = name,
        .value = i,
    };
    i += 1;
}

Now we simply create the enum with the helpful builtin @Type.

break :WD @Type(.{ .@"enum" = .{
    .tag_type = usize,
    .fields = fields[0..i],
    .decls = &.{},
    .is_exhaustive = true, // maybe?
} });

Fairly simple. One small trick remains. tag_type is not an optional value, so we need to specify it. It is the integer that should hold the results of the enum.. usize is playing it safe, because it is extremely unlikely to hold more than .. whatever the largest usize is. But that's a very big enum for something than potentially fit into one byte. So it is incumbent upon us to create the smallest integer possible.

const backing_int: type = @Type(.{ .int = .{
    .bits = // what to put here??
    .signedness = .unsigned,
} });

To calculate the smallest number of bits that can hold our fields count, two simple (for computers at least) math operations are needed. First, i (our counter) is raised to the nearest power of two. Then the log2 is taken from it. Boom. Two std.math function calls and that is IT.

const next_p_of_2 = std.math.ceilPowerOfTwoAssert(usize, i + 1);
const bits = std.math.log2_int(usize, next_p_of_2);

And this does it. Here is the full code.

const WindData = WD: {
    const data: []const @TypeOf(.enum_literal) = @import("test.zon");

    var i: usize = 0;
    var fields: [data.len]std.builtin.Type.EnumField = undefined;

    for (data) |literal| {
        const name = @tagName(literal);

        fields[i] = .{
            .name = name,
            .value = i,
        };
        i += 1;
    }

    const next_p_of_2 = std.math.ceilPowerOfTwoAssert(usize, i + 1);
    const bits = std.math.log2_int(usize, next_p_of_2);

    break :WD @Type(.{ .@"enum" = .{
        .tag_type = @Type(.{ .int = .{
            .bits = bits,
            .signedness = .unsigned,
        } }),
        .fields = fields[0..i],
        .decls = &.{},
        .is_exhaustive = true,
    } });
};

Your enum is ready. Now your data definition is resilient to new types added to the database. You still need to fix all the switch statements, tho.


Conclusion

In the actual library I just put out the enum definition into its own separate file that is imported by both generate.zig and the library. But hey, this was a fun exercise.

Until later.