⏏️

Writing a C Compiler, Chapter 2, in Zig

So, with the first chapter of Writing a C Compiler behind us, it is time to start on the second chapter: Unary Operators. But first, a word from build.zig land.

Running the tests with zig build test

The tests for this Book live in their own repository, which I have dutifully cloned and run tests in.

The process I followed to run the tests is fairly manual:

  1. navigate to the test directory.
  2. run the command arch -x86_64 zsh (which I have cleverly aliased to x86). This runs a x86 version of zsh.
  3. run the tests.

When I was doing my Rust implementation, I kept a terminal accessible with a global hotkey and I kept the inner shell running at all times.

With build.zig, I looked for a way to do that automatically. The Build object, which is central to build.zig has a nice addSystemCommand method, which I could use to run my stuff in. Now remained the problem of how to run the tests in the x86 shell.

My first thought was to have the command as arch -x86_64 test_compiler ..etc (test_compiler being the test script provided). While it worked, as in the tests for chapter 1 all ran fine, upon further reading I realized that the arch actually chooses which binary to run from a macOS Universal Binary. It has no effect on python scripts, and the simple tests on chapter 1 worked fine either way. So I asked around, and some kind soul pointed me to the right thing to search for: Subshells.

See, this might seem obvious if you've lived in Unix land all your life, but shells are applications too. You can pass command line arguments to them!! So, bash -c "echo foo", runs bash, runs the command echo foo in bash, then exits. Perfect.

Putting these two together, this was the added bit to build.zig:

const test_step = b.step("test", "Run tests");

// subshells. how do they work.
const inner_command = try std.mem.join(b.allocator, " ", &.{
    "../writing-a-c-compiler-tests/test_compiler",
    b.pathJoin(&.{ b.exe_dir, "paella" }),
    try std.mem.join(
        b.allocator,
        " ",
        b.args orelse &.{""},
    ),
});

// does this work like i think it does?
const test_command = b.addSystemCommand(
    &.{ "arch", "-x86_64", "zsh", "-c", inner_command },
);

test_command.step.dependOn(b.getInstallStep());
test_step.dependOn(&test_command.step);

I am unreasonably happy with this. Using regular Zig standard library tools like std.mem.join allowed to join together the commands I am passing to the zsh subshell, and voila!!

Now the tests (say for chapter 1) can be ran "simply" with zig build test -- --chapter 1. Great success!


Lexer

There are three tokens to add in this chapter: ~, -, and --. The decrement operator is only being tokenized here to reject illegal C syntax like return --2, but otherwise will not be implemented.

So our Tag enum grows1 to have these three tokens: tilde, hyphen. As I am not planning to implement the book's extra credit this time around, I will just have the double hyphen lex into the invalid token. This would reject lexing valid code like x--, but I do not think this is going to be present into the test cases. We will see.

The State enum is a bit more interesting. It cannot accept two consecutive hyphens, so seeing a hyphen would put it into the hyphen state, where if it sees another hyphen, it puts out Tag.invalid, or Tag.hyphen otherwise.

So, if the state is at .start, and the lexer encounters a '-' character, it just switches gear to the .hyphen state. Labelled switch in action. Then in there, triage happens. The code is honestly copied, again, from the Zig compiler, with adjustments.

state: switch (State.start) {
    .start => switch (self.buffer[self.index]) {
        '~' => {
            result.tag = .tilde;
            self.index += 1;
        },
        '-' => continue :state .hyphen,
        // etc
    .hyphen => {
        self.index += 1;
        switch (self.buffer[self.index]) {
            '-' => result.tag = .invalid,
            else => result.tag = .hyphen,
        }
    },
    // etc
}

This should be it. Time to put the new testing command into action.

Ran 43 tests in 5.432s

OK

Both of them work. Time for AST and parsing.

Parsing

Changes to the AST are fairly minimal. There are two unary operations: negation, and complement. Instead of adding a separate UnaryOp enum as the Book suggests, I am going to embed them right into Expr as separate tags, updating the pretty printer while I am at it.2

pub const Expr = union(enum) {
    constant: u64,
    unop_negate: *Expr,
    unop_complement: *Expr,

    pub fn format(
        self: @This(),
        comptime _: []const u8,
        _: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        switch (self) {
            .constant => |c| try writer.print("{d}", .{c}),
            .unop_negate => |e| try writer.print(" -{}", .{e}),
            .unop_complement => |e| try writer.print(" ~{}", .{e}),
        }
    }
};

Updating the parsing functions is a bit more involved, however. Now is the start of PEMDAS, and time to start our recursive descemt into madness.3 This is pretty much the same function provided in the book, except that the operation is chosen immediately.

fn parse_expr(
    alloc: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) !*ast.Expr {
    const current = tokens.next() orelse
        return error.ExpectExpr;

    switch (current.tag) {
        .number_literal => {
            const lit = tokens.buffer[current.loc.start..current.loc.end];
            const res = try std.fmt.parseInt(u64, lit, 10);

            //         vvv this helper function just allocates and sets
            return try create(ast.Expr, alloc, .{ .constant = res });
        },
        .hyphen => {
            const inner_exp = try parse_expr(alloc, tokens);
            return try create(ast.Expr, alloc, .{ .unop_negate = inner_exp });
        },
        .tilde => {
            const inner_exp = try parse_expr(alloc, tokens);
            return try create(ast.Expr, alloc, .{ .unop_complement = inner_exp });
        },
        .l_paren => {
            const inner_exp = try parse_expr(alloc, tokens);
            try expect(.r_paren, tokens);

            return inner_exp;
        },
        else => return error.ExpectExpr,
    }
}

Now the parsing is done, but to test a small annoyance needs to be fixed. In asm_gen.zig, there is a switch over expressions, and it needs to handle the new modes, even tho it is not being called right now. This tedium and the need to repeat it for every step is partially why I stopped the Rust implementation. Adding a else => @panic("unimplemented"), to that switch smoothes things over for now.

Ran 43 tests in 6.363s

OK

Yuppee.

Last but not least, even though it is not needed for the first chapters, as the only identifier is main, I figured I'd try some comptime magic to improve my expectation experience. I changed expect a little bit so it returns a string when I am asking for an identifier, but void otherwise.4

fn expect(
    comptime expected: lexer.Token.Tag,
    tokens: *lexer.Tokenizer,
) !if (expected == .identifier)
    []const u8
else
    void {
    if (tokens.next()) |actual| {
        if (actual.tag != expected)
            return error.SyntaxError;
        if (expected == .identifier)
            return tokens.buffer[actual.loc.start..actual.loc.end];
    } else return error.SyntaxError;
}

Intermediate Representation

The Book uses its own version of Intermediate Representation, called TACKY. It is a variation of a popular IR strategy called three-address code, hence the name.

TACKY uses its own AST, and it would slot between the program's AST and the assembly's AST. This is the first pass. So many strings.

const std = @import("std");

pub const Prgm = struct {
    func_def: *FuncDef,
};

pub const FuncDef = struct {
    name: []const u8,
    // for similar reasons to its use in assembly AST
    instrs: std.ArrayListUnmanaged(Inst),
};

pub const Inst = union(enum) {
    ret: Value,

    // decided to splat the operator here, like discussed in chapter 1.
    unop_negate: Unary,
    unop_complement: Unary,

    // to avoid excessive typing
    pub const Unary = struct {
        src: Value,
        dst: Value,

        pub fn init(src: Value, dst: Value) @This() {
            return .{ .src = src, .dst = dst };
        }
    };
};

pub const Value = union(enum) {
    constant: u64,
    variable: []const u8,
};

The goal of this IR is to disentangle nested expressions by separating them into what is, at least in IR form, separate operations. This means that for an expression like 1 + 2 * 35, it is transformed into the following representation:

tmp0 = 2 * 3
tmp1 = 1 + tmp0
return tmp1

The Book provided expr_emit_ir, with my splatted data structures, looks like the following:

fn expr_emit_ir(
    alloc: std.mem.Allocator,
    expr: *ast.Expr,
    instrs: *std.ArrayListUnmanaged(ir.Instr),
) !ir.Value {
    switch (expr.*) {
        .constant => |c| return .{ .constant = c },
        .unop_negate => |e| {
            const src = try expr_emit_ir(alloc, expr, instrs);
            const dst_name = try make_temporary(alloc, "neg");
            const dst: ir.Value = .{ .variable = dst_name };
            try instrs.append(alloc, .{ .unop_negate = .init(src, dst) });
            return dst;
        },
        .unop_complement => |e| {
            const src = try expr_emit_ir(alloc, expr, instrs);
            const dst_name = try make_temporary(alloc, "cml");
            const dst: ir.Value = .{ .variable = dst_name };
            try instrs.append(alloc, .{ .unop_complement = .init(src, dst) });
            return dst;
        },
    }
}

A bit repetitive, isn't it? So I put together the repeated logic into its own function, unary_helper:

fn unary_helper(
    alloc: std.mem.Allocator,
    expr: *ast.Expr,
    instrs: *std.ArrayListUnmanaged(ir.Instr),
    comptime prefix: []const u8,
) !struct { ir.Value, ir.Value } {
    const src = try expr_emit_ir(alloc, expr, instrs);
    const dst_name = try make_temporary(alloc, prefix);
    const dst: ir.Value = .{ .variable = dst_name };

    return .{ src, dst };
}

Nothing fancy. make_temporary over here is just a function with an ever increasing static variable for creating always different variable names. However, I hit here the weirdest zig compile error I have seen, yet. error: unable to resolve inferred error set. What gives?

Turns out when two recursive functions call each other, both with an inferred error type, Zig throws in the towel. Swift would've never given up and compiled, eventually;

Apparently I had to specify the error type myself.

const helper_error = error{
    MakeTemporary, // return error value of `make_temporary`
    UnaryHelper,
};

... and changed unary_helper to this:

fn unary_helper(
    alloc: std.mem.Allocator,
    expr: *ast.Expr,
    instrs: *std.ArrayListUnmanaged(ir.Instr),
    comptime prefix: []const u8,
) helper_error!struct { ir.Value, ir.Value } {
    const src = expr_emit_ir(alloc, expr, instrs) catch
        return error.UnaryHelper;
    const dst_name = try make_temporary(alloc, prefix);
    const dst: ir.Value = .{ .variable = dst_name };

    return .{ src, dst };
}

And now it compiles. The rest of ir_gen.zig, is rahter a straightforward one to one between the two trees, and rather boring to implement. Instead, I will show you make_temporary. Zig has a weird way to define local static variables.

fn make_temporary(
    alloc: std.mem.Allocator,
    comptime prefix: []const u8,
) helper_error![]const u8 {

    // zig static variables
    const coun = struct {
        var ter: usize = 0;
    };

    const name = std.fmt.allocPrint(
        alloc,
        if (prefix.len == 0) "temp" else prefix ++ ".{}",
        .{coun.ter},
    ) catch return error.MakeTemporary;
    coun.ter += 1;

    return name;
}

First of all, non-Atomic is fine. All this is single threaded anyway. Second of all, any variable directly declared in a struct scope is static. So if you need a local static variable you define a struct with a .. variable. This is exactly the same semantics as declaring a global variable (or at file scope) except the scope here is a lot smaller.6 Mind bending a bit, but makes sense.

To save on pretty printing this time, I figured to try std.json.stringify to get the IR output. This is the C file I am working on. So small and non-threatening.

int main(void) {
    return ~-3;
}

And this is the IR output:

{
	"func_def": {
		"name": "main",
		"instrs": {
			"items": [
				{
					"unop_negate": {
						"src": {
							"constant": 3
						},
						"dst": {
							"variable": "neg.0"
						}
					}
				},
				{
					"unop_complement": {
						"src": {
							"variable": "neg.0"
						},
						"dst": {
							"variable": "cml.1"
						}
					}
				},
				{
					"ret": {
						"variable": "cml.1"
					}
				}
			],
			"capacity": 5
		}
	}
}

Very longwinded, with some useless info. Why do I need to know the ArrayListUnmanaged capacity? Time to work on pretty printing and allocation bookkeeping. Not very interesting, to be honest, so I will skip writing about it.

Except for string interning. that is interesting.

String Interning

String interning is having a single storage for strings that are used throught the program. Keeping the source around is not enough, because of the new strings for the IR's temporary variables. Giving ownership of the strings to ir.Value is fine, except it is really easy to double free the strings when cleaning up. In fact, I did that when I was trying to not implement the string interner, and Zig's DebugAllocator dutifully caught it.

The basic idea behind the interner is to have all strings owned by one repository that you can free at once at the end. One can use an arena, but the interner's structure is more efficient because it also avoids duplication of strings. It can be backed by a hashmap, so it can also attach information to each string. All in all it is a useful memory and resource management technique.

The friendly people at the Zig discord, especially InKryption not only helped me out understand the data idea, bu actually provided me with a complete implementation.

It makes use of Zig's, hoenstly weird, sentinel-terminated arrays: a generalization of C's null-terminated strings.

The idea is as follows: a regular slice (say the []comst u8 we use and love) stores its length within it. It knows what length it is at all times. So if you read from it or iterate on it, it stops dutifully at the length it knows. A null-terminated string, however, would be [:0]const u8. When you read from it or iterate on it, it keeps going until it hits the 0. (Obviously you can use any sentinel value but the 0 is very convenient for strings.)

So the String Interner keeps as its backing data a regular ArrayListUnmanaged(u8) that I have been using before. Say it is called bytes.) But for any new string in the application, it appends it to the ArrayList then adds 0 byte at the end. And I keep its starting index around. When I want to use it, I take slice like so bytes[starting_idx..:0] and voila .. I have a slice of my string.

The remaining pieces of the puzzle is two items in the Zig standard library: StringIndexContext and StringIndexAdapter. They're specifically designed for this specific use case. Who knew?

A Context in Zig's hash maps is a simple enough idea. It just defines the eql and hash functions said hash map will use, with an optional backing data structe. An Adapter is used to override these two functions for speciifc operations. The StringIndex duo have slightly different implementation of the eql function, and if you're curious you can check them out in the standard library's code.7

Without further ado, here is the full StringInterner provided thankfully by InKryption. I honestly had to read multiple times to wrap my head around what's going on, so I commented some parts of it.

const std = @import("std");
const StringInterner = struct {
    // state
    bytes: std.ArrayListUnmanaged(u8),
    map: std.HashMapUnmanaged(
        Id,
        void,
        std.hash_map.StringIndexContext,
        std.hash_map.default_max_load_percentage,
    ),

    // management
    pub const init: StringInterner = .{
        .bytes = .empty,
        .map = .empty,
    };

    pub fn deinit(self: *StringInterner, allocator: std.mem.Allocator) void {
        self.bytes.deinit(allocator);
        self.map.deinit(allocator);
    }

    // type alias. store this in stuff.
    pub const Idx = u32;

    // checks if a string is in or not. returns ID
    pub fn getIdx(
        self: *const StringInterner,
        string: []const u8,
    ) ?Idx {
        return self.map.getKeyAdapted(string, self.hmAdapter());
    }

    // get string from id
    pub fn getString(
        self: *const StringInterner,
        idx: Idx,
    ) ?[:0]const u8 {
        if (!self.map.containsContext(idx, self.hmCtx())) return null;

        // cast is necessary for type inference reasons.
        const slice_sentinel: [:0]const u8 = @ptrCast(self.bytes.items[idx..]);
        return std.mem.sliceTo(slice_sentinel, 0);
    }

    // the insert function. returns aither an existing idx or a new idx.
    pub fn getOrPut(
        self: *StringInterner,
        allocator: std.mem.Allocator,
        string: []const u8,
    ) std.mem.Allocator.Error!Idx {
        // reserves capacity in both the backing array and the map
        try self.bytes.ensureUnusedCapacity(allocator, string.len + 1);
        try self.map.ensureUnusedCapacityContext(allocator, 1, self.hmCtx());

        // getOrPut returns a reference to the key and the value. If it existed, they're
        // valid pointers. If not, their contents are undefined waiting for you to fill them.
        const gop = self.map.getOrPutAssumeCapacityAdapted(string, self.hmAdapter());

        gop.value_ptr.* = {}; // this is void, but it can be anything!! note void is zero-width
        if (gop.found_existing) return gop.key_ptr.*;

        // unlikelt to hit that. the source files are tiny.
        if (self.bytes.items.len > std.math.maxInt(Idx)) return error.OutOfMemory;
        const new_idx: Idx = @intCast(self.bytes.items.len);

        // indertion happens here
        self.bytes.appendSliceAssumeCapacity(string);

        // don't forget to append the null byte !!
        self.bytes.appendAssumeCapacity(0);

        // update the map through the pointer.
        gop.key_ptr.* = new_idx;
        return new_idx;
    }

    // helper functions to avoid a self referential struct
    fn hmCtx(self: *const StringInterner) std.hash_map.StringIndexContext {
        return .{ .bytes = &self.bytes };
    }
    fn hmAdapter(self: *const StringInterner) std.hash_map.StringIndexAdapter {
        return .{ .bytes = &self.bytes };
    }
};

I will currently use this structure as is, only changing the names to be snake_case as I cannot read camelCase. Later on I can use it to attach data to the strings. Fairly useful.

My first go at this was to replace every string with StringInterner.Idx. Just make sure to pass a pointer to the interner when constructing these structures.

pub const Prgm = struct {
    func_def: *FuncDef,
};

pub const FuncDef = struct {
    name: utils.StringInterner.Idx,
    instrs: std.ArrayListUnmanaged(Instr),
};

// and so on.

While this works and compiles fine, it turned out to be problematic when pretty printing. Idx is just a u32 and stores no reference to the interner. And the format functions do not have an allocator or a user-space reference to use. So instead, I am changing the type to [:0]const u8, then storing slices from the interner rather than indices.

Then adjust our ir_gen functions to pass a pointer to the interner through all of them. That or creating a makeshift Ctx type. Time will tell. For an example, now, this the unary_helper mentioned earlier and the new make_temporary, with the new structure threaded in.


fn unary_helper(
    alloc: std.mem.Allocator,
    interner: *utils.StringInterner,
    expr: *ast.Expr,
    instrs: *std.ArrayListUnmanaged(ir.Instr),
    comptime prefix: []const u8,
) helper_error!struct { ir.Value, ir.Value } {
    const src = expr_emit_ir(alloc, interner, expr, instrs) catch
        return error.UnaryHelper;
    const dst_name = try make_temporary(alloc, interner, prefix);
    const dst: ir.Value = .{ .variable = dst_name };

    return .{ src, dst };
}

fn make_temporary(
    alloc: std.mem.Allocator,
    interner: *utils.StringInterner,
    comptime prefix: []const u8,
) helper_error![:0]const u8 {
    const static = struct {
        var counter: usize = 0;
    };

    const name = std.fmt.allocPrint(
        alloc,
        if (prefix.len == 0) "temp" else prefix ++ ".{}",
        .{static.counter},
    ) catch return error.MakeTemporary;
    defer alloc.free(name);

    const name_idx = interner.get_or_put(alloc, name) catch
        return error.MakeTemporary;
    const name_res = interner.get_string(name_idx).?; // <- unwrap optional

    static.counter += 1;

    return name_res;
}

So this all compiles and runs and passes tests just fine. All is left is implementing the pretty printing myself, as the json module seemed to break for reasons I do not understand (related to the interner).8 The technique to do so was discussed in the Chapter 1 article, so no point in repeating it here. It is time to move on to the next section.


Codegen

The first task here is almost invariably to update the assembly AST to include more instructions. Again deviating from the book, I will splat the unary operator into the instructions. This will totally bite me later. The updated parts of the new AST look as follows:

pub const Inst = union(enum) {
    mov: Mov,
    ret: void,

    // unary operations
    neg: Operand,
    not: Operand,

    allocate_stack: i64, // same type as `.stack` below.

    const Mov = struct {
        src: Operand,
        dst: Operand,

        pub fn init(src: Operand, dst: Operand) @This() {
            return .{ .src = src, .dst = dst };
        }
    };
};

pub const Operand = union(enum) {
    imm: u64,
    reg: Register,
    stack: i64,

    pseudo: [:0]const u8, // these get replaced

    pub const Register = enum { AX, R10 };
};

Oh there is a string there. Our string interner will come in handy.

Another note: I am not perfectly happy with the way the book represents register. Because later in the book, when long and pointers are implemented, different names have to be chosen for each register, and I had to encode that info in the Register type itself, which led to weird amount of code duplications. I have not looked at how the reference OCaml implementation does it. But I will cross that bridge when I come to it. So far all registers are 32 bits wide.

Now asm_gen.zig is reworked to take the IR AST as input instead. This is almost a complete rewrite except for the first, trivial, Prgm transformation. Complicates the matter a bit is that IR instructions and Assembly instructions do not map 1 to 1, but each IR instruction can be one or more Assembly instructions. So I have to return a slice. Manual memory management strikes again.

It is easier to start from the bottom up. This is IR values to Assembly operands. No allocator needed! Just copy the string's reference as is.

fn value_to_asm(
    value: ir.Value,
) assembly.Operand {
    switch (value) {
        .constant => |c| return .{ .imm = c },
        .variable => |v| return .{ .pseudo = v },
    }
}

And this is more involved Instr porter:

fn instr_to_asm(
    alloc: std.mem.Allocator,
    instr: ir.Instr,
) ![]assembly.Instr {
    switch (instr) {
        .ret => |v| {
            const src = value_to_asm(v);
            const ret = try alloc.dupe(assembly.Instr, &.{
                .{ .mov = .init(src, .{ .reg = .AX }) },
                .ret,
            });

            return ret;
        },
        .unop_complement => |u| {
            const src = value_to_asm(u.src);
            const dst = value_to_asm(u.dst);

            const ret = try alloc.dupe(assembly.Instr, &.{
                .{ .mov = .init(src, dst) },
                .{ .not = dst },
            });

            return ret;
        },
        .unop_negate => |u| {
            const src = value_to_asm(u.src);
            const dst = value_to_asm(u.dst);

            const ret = try alloc.dupe(assembly.Instr, &.{
                .{ .mov = .init(src, dst) },
                .{ .neg = dst },
            });

            return ret;
        },
    }
}

Not very DRY, and a lot is going on. Certainly a lot of (what is going to be very temporary) allocations. Returning pointers to stack space is a rookie mistake. (Thanks Rust!)

The classic solution to a lot of temporary allocations in similar programs is to use an arena allocator, and then free it all at once when the function is done. This would be set up from func_def_to_asm. So here is the first attempt at it:

fn func_def_to_asm(
    alloc: std.mem.Allocator,
    func_def: ir.FuncDef,
) !assembly.FuncDef {
    var arena_allocator = std.heap.ArenaAllocator.init(alloc);
    defer arena_allocator.deinit();
    const arena = arena_allocator.allocator();

    var instrs = std.ArrayListUnmanaged(assembly.Instr).empty;

    for (func_def.instrs.items) |instr| {
        // note the different allocators for each function.
        const ret = try instr_to_asm(arena, instr);
        try instrs.appendSlice(alloc, ret);
    }

    return .{
        .name = func_def.name,
        .instrs = instrs,
    };
}

And finally Prgm is trivial.

pub fn prgm_to_asm(
    alloc: std.mem.Allocator,
    prgm: ir.Prgm,
) !assembly.Prgm {
    const func_def = try utils.create(
        assembly.FuncDef,
        alloc,
        try func_def_to_asm(alloc, prgm.func_def.*),
    );

    return .{ .func_def = func_def };
}

And the proof of the pudding is in the eating. But not before filling in all the @panic("todo")s and cleaning up the allocated memory. The code so far works and produces the expected result on the previously mentioned tiny C program, but codegen is not over yet.

Fixing Up Instructions

There are two final tasks before actually generating the assembly. First is to replace all them .pseudo operands with .stack operands. The Book used the pseudo operands as a placeholder, as the compiler does not have the stack info at hand yet. It needs to count the amount of intermediate, temporary variables, created during IR generation.

The second is to rewrite illegal assembly instructions, such as moving from a stack to a stack, into legal ones.

These are essentially additional compiler passes, and I am loath to do the same thing I did in my Rust implementation and stuff everything in one file. So a new directory is born, titled asm_passes, with two new Zig files in toe. If you are keeping track, this is my current folder structure.

./src
├── asm_gen.zig
├── asm_passes
│   ├── fixup.zig
│   └── pseudos.zig
├── assembly.zig
├── ast.zig
├── ir_gen.zig
├── ir.zig
├── lexer.zig
├── main.zig
├── parser.zig
└── utils.zig

Most of them are self descriptive. utils is where the string interner definition lives and, currently, one create function, that allocates a value on the heap.

Pseudoregisters

This simply goes over the instrs array stored in FuncDef to change up the arrays. The interesting challenge here is maintaining a map from identifiers to stack offsets. Storing the strings as map keys is horribly wasteful, so the string interner's Idx type is used instead, so it passes in.

I probably made this a lot harder for myself. I tried to implement my own hash map adapter type, then had a struggle with Zig's syntax for references. Eventually, Good prevailed and the code was compiled.

The building block is this "simple" function, with a minimal amount of parameters. getOrPut is a standard library function that almost does what it says on the tin, but gives you pointers to place the key and value yourself as you please. I opted to use a managed (not Unmanaged) hash map here to spare myself an allocator parameter. I could shave off strings too if I manage to figure out how Context works.9

fn pseudo_to_stack(
    op: assembly.Operand,
    strings: *utils.StringInterner,
    map: *std.AutoHashMap(
        utils.StringInterner.Idx,
        assembly.Operand.StackDepth,
    ),
) !assembly.Operand {
    const static = struct {
        var counter: assembly.Operand.StackDepth = 0;
    };

    switch (op) {
        .reg, .imm, .stack => return op,
        .pseudo => |name| {
            const idx = strings.get_idx(name).?;
            const gop = try map.getOrPut(idx);

            if (gop.found_existing) {
                return .{ .stack = gop.value_ptr.* };
            } else {
                static.counter -= 4;

                gop.key_ptr.* = idx;
                gop.value_ptr.* = static.counter;

                return .{ .stack = gop.value_ptr.* };
            }
        },
    }
}

The public facing function that runs on assembly.Prgm comes next. The & and * and .* sprinkled over the loop's variables took a lot of trial and error and Zig discord help to get right. But the shared payload between .neg and .not does not to be duplicated.

pub fn replace_pseudos(
    alloc: std.mem.Allocator,
    strings: *utils.StringInterner,
    prgm: *assembly.Prgm,
) !void {
    var pseudo_map: std.AutoHashMap(
        utils.StringInterner.Idx,
        assembly.Operand.StackDepth,
    ) = .init(alloc);
    defer pseudo_map.deinit();

    for (prgm.func_def.instrs.items) |*instr| {
    //   ^ no address operator as it is already a slice
        switch (instr.*) {
            .mov => |*m| {
                //   ^ this was the tricky one to find
                const src = try pseudo_to_stack(m.src, strings, &pseudo_map);
                const dst = try pseudo_to_stack(m.dst, strings, &pseudo_map);

                m.* = .init(src, dst);
            },
            .neg, .not => |*v| v.* = try pseudo_to_stack(v.*, strings, &pseudo_map),

            .ret, .allocate_stack => {},
        }
    }
}

It works. Thankfully. This little bit made me tear my hair out.10

As a small comparison, this is the C file in question, followed by the pretty printed codegen before and after replacing pseudoregisters.

// to save you a scroll
int main(void) {
    return ~-3;
}
#before
PROGRAM
	FUNCTION main
		mov	imm 3 -> pseudo neg.0
		neg	pseudo neg.0
		mov	pseudo neg.0 -> pseudo cml.1
		not	pseudo cml.1
		mov	pseudo cml.1 -> AX
		ret
# afer
PROGRAM
	FUNCTION main
		mov	imm 3 -> stack -4
		neg	stack -4
		mov	stack -4 -> stack -8
		not	stack -8
		mov	stack -8 -> AX
		ret

This is a horribly inefficient compiler, but it clearly works as intended. Optimization comes later in Part 3 of the book.

Fixing Instructions

The first fixup is to insert the .allocate_stack instruction with the depth of the stack. But this is not being returned anywhere! So a slight rejiggering of pseudo_to_stack to use a passed in counter instead of a local static one, and having replace_pseudos return the stack depth, would solve the problem. Simple, if annoying, code change.

Then adding the first instruction this way becomes fairly simple. Bad performance for this one step but I am not paying too much attention to that.

const depth = try pass_pseudo.replace_pseudos(alloc, strings, prgm_asm);
try prgm_asm.func_def.instrs.insert(
    alloc,
    0,
    .{ .allocate_stack = @intCast(@abs(depth)) },
    //                            ^ this retunrs an unsigned int.
);
// next step gets called here

The other, more involved step is fixing up illegal instructions. If you know anything about x86 assembly, you would know that you cannot make a mov instruction between two stack places. There are reasons for that that I do not particularly care about. Why cannot the Book compile to a saner assembly language like WASM? I am not even on a x86 machine! Thankfully, this is the only thing to deal with in this chapter. Here is my first pass at the code.

pub fn fixup_instrs(
    alloc: std.mem.Allocator,
    prgm: *assembly.Prgm,
) !void {
    var out: std.ArrayListUnmanaged(assembly.Instr) = try .initCapacity(
        alloc,
        prgm.func_def.instrs.capacity,
    );
    defer {
        std.mem.swap(
            std.ArrayListUnmanaged(assembly.Instr),
            &out,
            &prgm.func_def.instrs,
        );
        out.deinit(alloc);
    }

    for (prgm.func_def.instrs.items) |instr| {
        switch (instr) {
            .mov => |m| switch (m.src) {
                .stack => switch (m.dst) {
                    .stack => try out.appendSlice(alloc, &.{
                        .{ .mov = .init(m.src, .{ .reg = .R10 }) },
                        .{ .mov = .init(.{ .reg = .R10 }, m.dst) },
                    }),
                    else => try out.append(alloc, instr),
                },
                else => try out.append(alloc, instr),
            },
            else => try out.append(alloc, instr),
        }
    }
}

So much rightwards drift. It would perhaps be more readable to express it as a state machine, like the tokenizer works. I will make sure this works as intended first then try to streamline it, as this is untenable. Time to run the codegen tests.

Ran 43 tests in 5.153s

OK

Code Emission

Before we get into the state machine, it is better to be done with code emission. It is a fairly simple and mechanical exercise as it just requires updating the printing code, but it could expose pretty much any bugs in the program.

and ...

Ran 43 tests in 5.147s

FAILED (errors=19)

Oh come on. This turns out to be a combination of some commented out code when I was updating the compiler driver to include IR, and dumb spelling mistakes I did in this fairly simple and mechanical task, such as writing & instead of %. Simple mistakes, really.

All tests pass now:

Ran 43 tests in 14.731s

OK

State Machine

Following from the tokenizer's code, I defined a simple State enum:

const State = enum {
    start,
    mov_stack_stack,
    legal,
};

And the main switch inside the loop turns into a labelled switch over State, in which each state has its own logic, but any legal instruction just funnels to .legal. To be honest, writing this was easier than I expected it to be. It even worked on the first iteration, after I fixed the variable names. This is the second.

Jury's up whether the state machine is more readable than the nested switches, but I think I will stick with it for the time being. It passes all tests.

for (prgm.func_def.instrs.items) |instr| {
    state: switch (State.start) {
        .start => switch (instr) {
            .mov => |m| if (m.src == .stack and m.dst == .stack)
                continue :state .mov_stack_stack
            else
                continue :state .legal,
            else => continue :state .legal,
        },

        .mov_stack_stack => try out.appendSlice(alloc, &.{
            .{ .mov = .init(instr.mov.src, .{ .reg = .R10 }) },
            .{ .mov = .init(.{ .reg = .R10 }, instr.mov.dst) },
        }),

        .legal => try out.append(alloc, instr),
    }
}

Just for comparison's sake, this is the equivalent code in my Rust implementation.

for instr in std::mem::take(&mut self.instrs) {
    match instr {
        Instr::Mov {
            src: src @ Operand::Stack(_),
            dst: dst @ Operand::Stack(_),
        } => out.extend([
            Instr::Mov {
                src,
                dst: Operand::Reg(Register::R10),
            },
            Instr::Mov {
                src: Operand::Reg(Register::R10),
                dst,
            },
        ]),
        other => out.push(other),
    }
}

Pattern matching is the best. And this concludes Chapter 2.


Lessons Learned

  1. Building a cool state machine from zero.
  2. More manual memory management magic tricks. (Couldn't push the alliteration further.)
  3. Subshells, and a more automated testing process.
  4. First taste of comptime. ULTIMATE POWER.

Maybe next time I will compile the tests to be always in ReleaseSafe instead of Debug mode. Would certainly make things go more smoothly. Need to figure that one out.