Abdul Rahman Sibahi | Writing a C Compiler, Chapter 5, in Zig

Reflecting on the past chapters so far, I am starting to think that this project is perhaps the perfect vehicle to explore new languages. Right off the bat, one deals with file systems, cross compiling, data structures, memory management, pretty formatting, and what have you.

But as the project goes on, especially if you are familiar with the problem (read: have done it before in a previous language), most of the tasks become purely mechanical. Refactoring, updating requirements, satisfying compiler errors. It shows you how the language handles churn, and whether it is pleasant or not. It also shows how helpful the community around the language when you inevitably have to debug your language knowledge and hit weird snags.

So, yeah. If you want to learn a new programming language, get yourself Writing a C Compiler, do it once (at least halfway) with your favorite language. Then you can do it with any language as a test run. Way less math than a ray tracer.

Anyway, now is Chapter 5. Chapter 5 is about, let me look, local variables. Oh finally some new statements.

Lexer and Parser

One new token, =. Adding this is trivial so I am sparing you the details. There is one thing to update that, while belongs to the parser, is placed in lexer.zig. The precedence chart.

Operator Associativity

Previously, all operators were left associative. This one, however, is right associative. One could make a special case for it, but another upcoming operators¹ are right associative as well. So care must be taken.

The binop_precedence function is implemented as a method in Token.Tag, and this is how it looks after adding =. It returns a tuple now instead of just a u8. The reason left is a u8 as well is because the two numbers are added later and changing from bool to u8 in Zig is a torture. Even if (left) 1 else 0 does not work as it actually returns a comptime int ....

pub fn binop_precedence(self: @This()) ?struct { u8, u8 } {
    return switch (self) {
        .asterisk, // *
        .f_slash, // /
        .percent, // %
        => .{ 50, 1 },
        .hyphen, // -
        .plus, // +
        => .{ 45, 1 },
        .lesser_than,
        .lesser_equals,
        .greater_than,
        .greater_equals,
        => .{ 35, 1 },
        .double_equals,
        .bang_equals,
        => .{ 30, 1 },
        .double_ambersand => .{ 10, 1 },
        .double_pipe => .{ 5, 1 },
        .equals => .{ 1, 0 },
        else => null,
    };
}

It would be perhaps more idiomatic to a create a proper struct and give the fields proper name and have a proper enum for left vs right associativity but, eh. Too much boilerplate for a one use function.

In parse_expr, the rules are slightly amended. Instead of this while loop:

while (next_token.tag.binop_precedence()) |prec| {
    if (prec < min_prec) break;

    const rhs = try parse_expr(alloc, tokens, prec + 1);
    // etc

It now becomes this:

while (next_token.tag.binop_precedence()) |r| {
    const prec, const left = r; // destructuring yay
    if (prec < min_prec) break;

    const rhs = try parse_expr(alloc, tokens, prec + left);
    // etc

And that .. is pretty much it. Just map the new operator to the new expression that I have not written yet.

New Statements and Expressions!

Two new expressions:

pub const Expr = union(enum) {
    constant: u64,

    @"var": [:0]const u8,
    assignment: BinOp,
    // etc

And two new statements, finally. This is the entire Stmt type, with its formatter, too.

pub const Stmt = union(enum) {
    @"return": *Expr,
    expr: *Expr,
    null: void,

    pub fn format(
        self: @This(),
        comptime _: []const u8,
        options: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        const w = options.width orelse 0;
        try writer.writeByteNTimes('\t', w);

        switch (self) {
            .@"return" => |expr| try writer.print("RETURN {}", .{expr}),
            .expr => |expr| try writer.print("{};", .{expr}),
            .null => try writer.print(";", .{}),
        }
    }
};

There is a new AST node, too! Decl is a variable declaration. (Maybe it should be VarDecl considering what is to come?) It is a simple one.

pub const Decl = struct {
    // no type because there is only one type so far: int
    name: []const u8,
    expr: ?*Expr,
};

Here is a fun note. In the AST proper, all the strings are backed by the source code. So they are not null-terminated ([]const u8). However, in the following stages, the strings are backed by StringInterner, so they are null-terminated ([:0]const u8). I could use normal slices or null-terminated pointers ([*:0]const u8) in the second case just as well, but extra safety does not hurt a toy compiler like this one.

Also, since function bodies can include either declarations or statements, there is a need for a simple BlockItem union. Here it is with its formatter that just delegates to the sub type. I will probably never have to touch this again.

pub const BlockItem = union(enum) {
    D: Decl,
    S: Stmt,

    pub fn format(
        self: @This(),
        comptime fmt: []const u8,
        options: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        switch (self) {
            inline else => |i| try i.format(fmt, options, writer),
        }
    }
};

And then update FuncDef to have an collection of BlockItems, rather than a pointer to one Stmt.Since these pointers are arena backed, it makes little sense for it to be an ArrayList, because this collection reallocates a lot. A better idea is either a linked list, or a segmented list which is a halfway between the two. The main downside is that I cannot just take a slice from either of these two collections. I will cross that bridge when I come to it.

The Zig's SinglyLinkedList type has an api change on the master. So to avoid potential future churn in a future where I upgrade Zig's version, I am going to go with SegmentedList.

Here I also revise an earlier design decision. Previously, every parsing structure returns a pointer to its result. But the new FuncDef makes no use of pointers, as they are added to the SegmentedList. So I rewrote all the functions to remove the allocation on the return value, and place instead on where it is actually needed at assignment. Plenty of refactoring.

Because of extensive changes throughout almost every function in parser.zig, here is the entire file in its current state. I have not tried compiling it yet, so beware of typos.

const std = @import("std");
const ast = @import("ast.zig");
const lexer = @import("lexer.zig");

const utils = @import("utils.zig");

pub fn parse_prgm(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.Prgm {
    const func_def = try parse_func_def(arena, tokens);
    const func_ptr = try utils.create(ast.FuncDef, arena, func_def);

    // now that we are done, check the tokenizer is empty.
    if (tokens.next()) |_| return error.ExtraJunk;

    return .{ .func_def = func_ptr };
}

fn parse_func_def(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.FuncDef {
    try expect(.type_int, tokens);

    const name = try expect(.identifier, tokens);

    try expect(.l_paren, tokens);
    try expect(.keyword_void, tokens);
    try expect(.r_paren, tokens);

    try expect(.l_brace, tokens);

    var body: std.SegmentedList(ast.BlockItem, 0) = .{};

    while (true) {
        const next_token = tokens.next() orelse
            return error.SyntaxError;
        if (next_token.tag == .r_brace) break;
        tokens.put_back(next_token);

        const item = try parse_block_item(arena, tokens);
        try body.append(arena, item);
    }

    return .{ .name = name, .body = body };
}

fn parse_block_item(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.BlockItem {
    const next_token = tokens.next() orelse
        return error.SyntaxError;

    switch (next_token.tag) {
        .type_int => {
            const name = try expect(.identifier, tokens);
            const new_token = tokens.next() orelse
                return error.SyntaxError;

            const expr: ?*ast.Expr = switch (new_token.tag) {
                .equals => ret: {
                    const res = try parse_expr(arena, tokens, 0);
                    const expr_ptr = try utils.create(ast.Expr, arena, res);

                    try expect(.semicolon, tokens);
                    break :ret expr_ptr;
                },
                .semicolon => null,
                else => return error.SyntaxError,
            };

            return .{ .D = .{ .name = name, .expr = expr } };
        },
        // statements
        .semicolon => return .{ .S = .null },
        .keyword_return => {
            const expr = try parse_expr(arena, tokens, 0);
            const expr_ptr = try utils.create(ast.Expr, arena, expr);
            try expect(.semicolon, tokens);
            return .{ .S = .{ .@"return" = expr_ptr } };
        },
        else => {
            tokens.put_back(next_token);
            const expr = try parse_expr(arena, tokens, 0);
            const expr_ptr = try utils.create(ast.Expr, arena, expr);
            try expect(.semicolon, tokens);
            return .{ .S = .{ .expr = expr_ptr } };
        },
    }
}

fn parse_expr(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
    min_prec: u8,
) Error!ast.Expr {
    var lhs = try parse_factor(arena, tokens);
    const lhs_ptr = try utils.create(ast.Expr, arena, lhs);

    var next_token = tokens.next() orelse
        return error.SyntaxError;

    while (next_token.tag.binop_precedence()) |r| {
        const prec, const left = r;
        if (prec < min_prec) break;

        const rhs = try parse_expr(arena, tokens, prec + left);
        const rhs_ptr = try utils.create(ast.Expr, arena, rhs);

        const bin_op: ast.Expr.BinOp = .{ lhs_ptr, rhs_ptr };
        lhs = switch (next_token.tag) {
            .plus => .{ .binop_add = bin_op },
            .hyphen => .{ .binop_sub = bin_op },
            .asterisk => .{ .binop_mul = bin_op },
            .f_slash => .{ .binop_div = bin_op },
            .percent => .{ .binop_rem = bin_op },

            .equals => .{ .assignment = bin_op },

            .double_ambersand => .{ .binop_and = bin_op },
            .double_pipe => .{ .binop_or = bin_op },
            .double_equals => .{ .binop_eql = bin_op },
            .bang_equals => .{ .binop_neq = bin_op },
            .lesser_than => .{ .binop_lt = bin_op },
            .greater_than => .{ .binop_gt = bin_op },
            .lesser_equals => .{ .binop_le = bin_op },
            .greater_equals => .{ .binop_ge = bin_op },

            else => unreachable,
        };

        next_token = tokens.next() orelse
            return error.SyntaxError;
    }

    tokens.put_back(next_token);
    return lhs;
}

fn parse_factor(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.Expr {
    const current = tokens.next() orelse
        return error.SyntaxError;

    switch (current.tag) {
        .identifier => return .{
            .@"var" = tokens.buffer[current.loc.start..current.loc.end],
        },
        .number_literal => {
            const lit = tokens.buffer[current.loc.start..current.loc.end];
            const res = std.fmt.parseInt(u64, lit, 10) catch
                return error.InvalidInt;

            return .{ .constant = res };
        },
        .hyphen => {
            const inner_exp = try parse_factor(arena, tokens);
            return .{ .unop_neg = try utils.create(ast.Expr, arena, inner_exp) };
        },
        .tilde => {
            const inner_exp = try parse_factor(arena, tokens);
            return .{ .unop_not = try utils.create(ast.Expr, arena, inner_exp) };
        },
        .bang => {
            const inner_exp = try parse_factor(arena, tokens);
            return .{ .unop_lnot = try utils.create(ast.Expr, arena, inner_exp) };
        },
        .l_paren => {
            const inner_exp = try parse_expr(arena, tokens, 0);
            try expect(.r_paren, tokens);

            return inner_exp;
        },
        else => return error.SyntaxError,
    }
}

inline fn expect(
    comptime expected: lexer.Token.Tag,
    tokens: *lexer.Tokenizer,
) Error!ExpectResult(expected) {
    if (tokens.next()) |actual| {
        if (actual.tag != expected)
            return error.SyntaxError;
        switch (expected) {
            .identifier => return tokens.buffer[actual.loc.start..actual.loc.end],
            else => {},
        }
    } else return error.SyntaxError;
}

fn ExpectResult(comptime expected: lexer.Token.Tag) type {
    switch (expected) {
        .identifier => return []const u8,
        else => return void,
    }
}

const Error =
    std.mem.Allocator.Error ||
    error{
        SyntaxError,
        InvalidInt,
        ExtraJunk,
    };

test "precedence" {
    const t = std.testing;

    var a_a = std.heap.ArenaAllocator.init(t.allocator);
    defer a_a.deinit();
    const a = a_a.allocator();

    {
        const src = "3 * 4 + 5;";
        var tokens = lexer.Tokenizer.init(src);
        const result = try parse_expr(a, &tokens, 0);

        try t.expect(result.* == .binop_add);
        try t.expectFmt("(+ (* 3 4) 5)", "{}", .{result});
    }
    {
        const src = "4 + 3 && -17 * 4 < 5;";
        var tokens = lexer.Tokenizer.init(src);
        const result = try parse_expr(a, &tokens, 0);

        try t.expect(result.* == .binop_and);
        try t.expectFmt("(&& (+ 4 3) (< (* (- 17) 4) 5))", "{}", .{result});
    }
}

Well, that was a trip. The ir_gen functions are broken too so they were replaced by a stub for now.

Here is the C file I am using as my science mouse this chapter:

int main(void) {
    int a = 2147483646;
    int b = 0;
    int c = a / 6 + !b;
    return c * 2 == a - 1431655762;
}

And this is the parser output. It oviously does not cover every path the code can take but that's what a test suite is for. Which they all pass, so hooray.

PROGRAM
	FUNCTION main
		int a <- 2147483646;
		int b <- 0;
		int c <- (+ a (! b));
		RETURN (== c (- a 1431655762))

It is important to keep in mind that the test suite at this stage does not check for correct logic, but for whether the compilers fails the programs it should. Correct logic is checked either by custom unit tests (which I discussed last chapter) and testing the final compiler output.

Wait hold on. Where did the a/6 go?

Unit Testing Considered Useful

Checking the unit tests I did last chapter, they are failing too. What gives? The code is all up there. Can you spot the error?

Poking around, I realized the error happened when I changed all functions to return the nodes immediately rather than pointers to them. It is this pair of lines in the beginning of parse_expr:

var lhs = try parse_factor(arena, tokens);
const lhs_ptr = try utils.create(ast.Expr, arena, lhs);

And then I keep the first and only lhs_ptr around assigning to the left side of all following expressions. Bah. Running the fix on the C code snippet above gets me this result now.

PROGRAM
	FUNCTION main
		int a <- 2147483646;
		int b <- 0;
		int c <- (+ (/ a 6) (! b));
		RETURN (== (* c 2) (- a 1431655762))

This is the new and improved parse_expr.

fn parse_expr(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
    min_prec: u8,
) Error!ast.Expr {
    var lhs = try parse_factor(arena, tokens);

    var next_token = tokens.next() orelse
        return error.SyntaxError;
    defer tokens.put_back(next_token);

    while (next_token.tag.binop_precedence()) |r| {
        const prec, const left = r;
        if (prec < min_prec) break;

        const lhs_ptr = try utils.create(ast.Expr, arena, lhs);  // <-- fix

        const rhs = try parse_expr(arena, tokens, prec + left);
        const rhs_ptr = try utils.create(ast.Expr, arena, rhs);

        const bin_op: ast.Expr.BinOp = .{ lhs_ptr, rhs_ptr };
        lhs = switch (next_token.tag) {
            // same as before
        };

        next_token = tokens.next() orelse
            return error.SyntaxError;
    }

    return lhs;
}

Semantic Analysis

Oh that's a new one. This is where things like 2 = a * 3 are rejected. Other compilers seem to call this pass sema, so I am adding a sema.zig. This file is concerned with transforming and checking the AST: double declarations, no declarations, shadowing, scoping, what have you.

The pseudocode functions in the Book have a variable_map input. it maps each name, apparently, to a unique name (that I assume will be interned later?). I am unsure how this would look eventually, but for now I will be naive and use a std.StringHashMap([]const u8).

Here is the Book's resolve_decl pseudocode into Zig-ish pseudocode:

fn resolve_decl(
    decl: *ast.Decl,
    variable_map: std.StringHashMap([]const u8),
) Error!void {
    if (variable_map.contains(decl.name))
        return error.DuplicateVariableDecl;

    const unique_name = make_temporary(); // todo

    try variable_map.put(decl.name, unique_name);

    const init: ?*Expr = if (decl.init) |*expr|
        try resolve_expr(expr, variable_map) // todo
    else
        null;

    // where is the expr allocated?
    return .{ .name = unique_name, .init = init };
}

So many todos and questions. This requires filling up piece by piece.

New Names for Old Variables

First of all, the new unique_name. This is going to be used, as is, in the next compiler passes. So it makes sense for it to be on the StringInterner. Which means, the StringInterner and its backing allocator should be passed into the function. I am going to save myself some trouble and just go ahead and create a Boilerplate struct like I did in ir_gen previously.

// first pass
const Boilerplate = struct {
    gpa: std.mem.Allocator,
    strings: *utils.StringInterner,
    variable_map: *std.StringHashMapUnmanaged([:0]const u8),

    fn make_temporary(
        self: @This(),
        comptime prefix: []const u8,
    ) Error![:0]const u8 {
        // todo
    }
};

The next problem is that, to make sure the variable is unique in future compiler passes, it should use the counter. The counter is currently self contained within ir_gen's make_temporary, but now it has to be elevated into its own thing. So, I am changing it to make it a method of StringInterner, and have both new and old make_temporary functions use that one.

This is the new method on StringInterner:

pub fn make_temporary(
    self: *StringInterner,
    gpa: std.mem.Allocator,
    prefix: []const u8,
) ![:0]const u8 {
    // zig static variables
    const static = struct {
        var counter: usize = 0;
    };

    var buf: [16]u8 = undefined;
    const name_buf = try std.fmt.bufPrint(
        &buf,
        "{s}.{}",
        .{ (if (prefix.len == 0) "tmp" else prefix), static.counter },
    );

    const name = try self.get_or_put(gpa, name_buf);
    static.counter += 1;

    return name.string;
}

One change from the previous function: prefix is not comptime known any more as it can be a user-specified variable. I am not quite sure, to be honest, if this code compiles, but I will find out by the end of this section. This is the new make_temporary in both ir_gen and the new sema. Note that self (and Error for that matter) are different types for both functions, but it has the same interface.

fn make_temporary(
    self: @This(),
    prefix: []const u8,
) Error![:0]const u8 {
    return try self.strings.make_temporary(self.alloc, prefix);
}

Resolutions

This is the new resolve_decl after the nip and tuck and adjusting to change the Decl in place:

fn resolve_decl(
    bp: Boilerplate,
    decl: *ast.Decl,
) Error!void {
    if (bp.variable_map.contains(decl.name))
        return error.DuplicateVariableDecl;

    const unique_name = try bp.make_temporary(decl.name);
    try bp.variable_map.put(bp.gpa, decl.name, unique_name);

    if (decl.init) |expr|
        try resolve_expr(bp, expr);
}

Only resolve_expr remains a stub. So naturally the next step is the relatively simple resolve_stmt.

fn resolve_stmt(
    bp: Boilerplate,
    stmt: *ast.Stmt,
) Error!void {
    switch (stmt.*) {
        .@"return", .expr => |expr| try resolve_expr(bp, expr),
        .null => {},
    }
}

And now this is resolve_expr. To be frank I am not quite sure I still have the right incantations regarding * and .* and &. I trust the compiler will correct me when it comes to it.²

fn resolve_expr(
    bp: Boilerplate,
    expr: *ast.Expr,
) Error!void {
    switch (expr.*) {
        .constant => {},
        .assignment => |*b| {
            if (b.@"0".* != .@"var") return error.InvalidLValue;
            try resolve_expr(bp, b.@"0");
            try resolve_expr(bp, b.@"1");
        },
        .@"var" => |name| expr.* = if (bp.variable_map.get(name)) |un|
            .{ .@"var" = un }
        else
            return error.UndeclaredVariable,
        .unop_neg,
        .unop_not,
        .unop_lnot,
        => |u| try resolve_expr(bp, u),
        .binop_add,
        .binop_sub,
        .binop_mul,
        .binop_div,
        .binop_rem,
        .binop_and,
        .binop_or,
        .binop_eql,
        .binop_neq,
        .binop_ge,
        .binop_gt,
        .binop_le,
        .binop_lt,
        => |b| {
            try resolve_expr(bp, b.@"0");
            try resolve_expr(bp, b.@"1");
        },
    }
}

This leaves resolve_prgm and resolve_func_def which are barely interesting, if not for setting up the Boilerplate.

pub fn resolve_prgm(
    gpa: std.mem.Allocator,
    strings: *utils.StringInterner,
    prgm: *ast.Prgm,
) Error!void {
    var variable_map: std.StringHashMapUnmanaged([:0]const u8) = .empty;
    const bp: Boilerplate = .{
        .gpa = gpa,
        .strings = strings,
        .variable_map = &variable_map,
    };
    try resolve_func_def(bp, prgm.func_def);
}

fn resolve_func_def(
    bp: Boilerplate,
    func_def: *ast.FuncDef,
) Error!void {
    var iter = func_def.body.iterator(0);
    while (iter.next()) |item| switch (item.*) {
        .S => |*s| try resolve_stmt(bp, s),
        .D => |*d| try resolve_decl(bp, d),
    };
}

Now, the proof of the pudding is in the eating. Let's wire it in the compiler driver (with an all new validate option), and see the results.

NoSpaceLeft

Running the test suite came up with odd and confusing errors and long info dumps. Time to investigate.

First of all, run the new code against the sample C program from before. and it fails with a .. memory leak? Turning off the memory leak detection (which is simple enough with Zig's DebugAllocator) It passes fine, but the variable names seem unchanged. Hmm.

Trying the test suite again with the turned off leak checking (just to see if it passes), however, is also failing with NoSpaceLeft. One file seems to fail, and it is following:

int main(void) {
    int first_variable = 1;
    int second_variable = 2;
    return first_variable + second_variable;
}

Looks simple enough. Running it through, the zig compilers gives me a very helpful, and very very long, stack tarce. It fails in make_temporary. Oh the buffer does not have enough space. I made a 16 byte buffer to avoid a heap allocation. Silly me. Let's change it to a 64 byte buffer.³

// was
var buf: [16]u8 = undefined;
const name_buf = try std.fmt.bufPrint(
    &buf,
    "{s}.{}",
    .{ (if (prefix.len == 0) "tmp" else prefix), static.counter },
);
// becomes
var buf: [64]u8 = undefined;
const name_buf = try std.fmt.bufPrint(
    &buf,
    "{s}.{}",
    .{ (if (prefix.len == 0) "tmp" else prefix), static.counter },
);

Well, this fixes the issue at least. This is the output of --validate on that file now:

PROGRAM
	FUNCTION main
		int first_variable <- 1;
		int second_variable <- 2;
		RETURN (+ first_variable.0 second_variable.1)

As you can see, the names are updated alright, but apparently not in Decl's name field. That's a silly oversight. Let's fix that now.

    // end of resolve_decl
    decl.name = unique_name; // <-- new line
}

Well this takes care of that. Now let me figure out the memory leak.

Leak

Let's run the same program again and look through the call trace the zig compiler very helpfully provides.

Oh I forgot to deinit variable_map. Silly me. This would happen in resolve_prgm because I have no use for it afterwards right now.

Yay no memory leaks. That ws easy.

IR Generation

No changes to the IR syntax tree today. Only generating IR for the new AST nodes. This also means there are no assembly changes either. Almost there to a stable IR backend.⁴

The most interesting thing in here is that variable declarations are simply discarded. They have done their job in semantic analysis, and now they are shown the door. But let's go backwards.

The new expressions emitting IR are just @"var" and assignment. var is cast to a null terminated pointer but that is it. It is already backed by the interner from the semantic analysis phase. assignment puts the items in a weird bizarro order, where dst is the first item.

.@"var" => |v| return .{ .variable = @ptrCast(v) },
.assignment => |b| {
    // bizarro order
    const dst = try expr_emit_ir(bp, b.@"0"); // guaranteed to be a var
    const src = try expr_emit_ir(bp, b.@"1");
    try bp.append(.{ .copy = .init(src, dst) });
    return dst;
},

The new statements just pass through:

fn stmt_emit_ir(
    bp: Boilerplate,
    stmt: *const ast.Stmt,
) Error!void {
    switch (stmt.*) {
        .null => {},
        .@"return" => |e| try bp.append(.{
            .ret = try expr_emit_ir(bp, e),
        }),
        .expr => |e| _ = try expr_emit_ir(bp, e),
    }
}

The interesting new part is func_def_emit_ir. Since body is now a list of stuff rather than just one statement, I need to iterate over it and generate IR from each item. But there is no decl_emit_ir yet, so I will do that first. If there is no initialization expression, do nothing. If there is, evaluate the expression and copy it to the variable's name.

fn decl_emit_ir(
    bp: Boilerplate,
    decl: *const ast.Decl,
) Error!void {
    if (decl.init) |e| {
        const src = try expr_emit_ir(bp, e);
        try bp.append(.{ .copy = .init(src, .{ .variable = @ptrCast(decl.name) }) });
    }
}

Now func_def_emit_ir has to construct the IR instructions in a body. So a loop over the block items to process them one at a time is the simplest way to do it. The Book recommends adding a simple return 0; instruction to the end as an edgecase for functions written without a return value. This is the C standard for main, but as omitting a return value is undefined behaviour for other functions, I can just do whatever, so adding a return 0 is entirely within the spec. It would not get processed if the code author did not forget their return.

fn func_def_emit_ir(
    alloc: std.mem.Allocator,
    strings: *utils.StringInterner,
    func_def: *const ast.FuncDef,
) Error!ir.FuncDef {
    const name = try strings.get_or_put(alloc, func_def.name);
    var instrs: std.ArrayListUnmanaged(ir.Instr) = .empty;

    const bp: Boilerplate = .{
        .alloc = alloc,
        .strings = strings,
        .instrs = &instrs,
    };

    var iter = func_def.body.constIterator(0);
    while (iter.next()) |item| switch (item.*) {
        .S => |s| try stmt_emit_ir(bp, s),
        .D => |d| try decl_emit_ir(bp, d),
    };

    try instrs.append(alloc, .{ .ret = .{ .constant = 0 } });

    return .{ .name = name.string, .instrs = instrs };
}

And that's it. There is no new assembly generation because there are no new IR instructions. So time to test the whole shebang, which it all passes. So Hurray.

Lessons Learned Rant About Zig

How much I hate Zig's unergonomic optionals, for one, and how unergonomic integer literals are, for two.

For all the good Zig has, some things are just way, way harder than they have any right to be. Take for example parse_expr.

while (current.tag.binop_precedence()) |r| {
    const prec, const left = r;
    if (prec < min_prec) break;

    const rhs = try parse_expr(arena, tokens, prec + left); // <- look here
    // snip --

The reason I made binop_precedence return a u8 for left-associativity, is how absurdly difficult it is and ugly it is to work with otherwise.

The natural way of modeling this is to use either a bool or an enum. But if left was a bool, this very normal looking, perfectly readable code would not compile:

const rhs = try parse_expr(arena, tokens, prec + if (left) 1 else 0);

The reason for that is integer literals (1 and 0 in this snippet) are of type comptime_int. The type inference is not smart enough to get there.

Which is like, fine, whatever. That would not be an issue if it was not for the fact you literally cannot create typed literals. 1u8, like Rust or Odin would do, is not the Zig way. It is not explicit enough. The way to do it is @as(u8, 1). No joke.

const rhs = try parse_expr(arena, tokens, prec + if (left) @as(u8, 1) else 0);

At least the type inference is smart enough to infer the 0, thank god.

Wait, you say, you can cast. So why not cast the bool to a u8?

Haha you fool. Casting a bool to an integer makes no sense! There is a special builtin for that, called @intFromBool. Why not use that, you ask? Because this built in returns a u1. Which is really a u8 in memory but the type system will not automatically cast it to the wider integer because ... fuck if I know.⁵ So you end up with this beauty: @as(u8, @intFromBool(left)).

Absolutely inane design decisions that all they do is annoy and frustrate the user. So I just returned a u8 from binop_precedence. Incorrect, but less fuss.

I will leave the optionals rant for later. Meanwhile, enjoy this contraption that should have been in the core language. (Yes it does work.)

pub fn compare_optional(
    opt: anytype,
    value: @typeInfo(
        @typeInfo(@TypeOf(opt)).optional.child,
    ).@"union".tag_type.?,
) bool {
    return ((opt orelse return false) == value);
}