⏏️

Writing a C Compiler, Chapter 9, in Zig

After eight grueling (not really) chapters of Writing a C Compiler, time to implement more assembly instructions. Functions! Linkage! Commas!


Lexer, AST, and Parser

Lexer just has a comma now. I thought about adding the comma operator but that didn't seem worth the trouble.

The AST has two new additions. Function call expressions and function declarations (which are rebranded and improved function definitions)! Other changes include how the structures themselves are defined. A program is now a list of function declarations, instead of just one. How about that?

These are the new AST nodes. I am not sure these compile or not yet, which I will find out when I am done with the parser. The use of SegmentedList is discussed a couple of chapters ago as a more Arena-friendly collection type.

pub const Prgm = struct {
    funcs: std.SegmentedList(FuncDecl, 0),
};

pub const Block = struct {
    body: std.SegmentedList(BlockItem, 0),
};

pub const BlockItem = union(enum) {
    D: Decl,
    S: Stmt,
};

pub const Decl = union(enum) {
    F: FuncDecl,
    V: VarDecl,
};

pub const FuncDecl = struct {
    name: []const u8,
    params: std.SegmentedList(Identifier, 0),
    block: ?Block,
};

pub const VarDecl = struct {
    name: Identifier,
    init: ?*Expr,
};

pub const Expr = union(enum) {
    // snip --
    func_call: struct { Identifier, std.SegmentedList(Expr, 0) },
};

// This was implemented last chapter fixing the Segmentation Fault!
pub const Identifier = union(enum) {
    name: []const u8,
    idx: utils.StringInterner.Idx,
};

It is going to be annoying fixing all the type errors throughout. Nonetheless, the parsing grammar for these new node types are going to change significantly. Here is. This is the new, tentative, parse_prgm. I am not sure this is entirely correct yet.

pub fn parse_prgm(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.Prgm {
    var funcs: std.SegmentedList(ast.FuncDecl, 0) = .{};

    while (tokens.next()) |next_token| {
        tokens.put_back(next_token);
        const func_decl = try parse_func_decl(arena, tokens);
        try funcs.append(arena, func_decl);
    }

    return .{ .funcs = funcs };
}

parse_func_decl is the same as the old parse_func_def, but with optional parameters and an optional body. Ok maybe not the same, it is a behemoth. And all this is going to get significantly more complex when adding different types than int.

fn parse_func_decl(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.FuncDecl {
    // same old
    try expect(.type_int, tokens);
    const name = try expect(.identifier, tokens);

    // new stuff !!
    var params: std.SegmentedList(ast.Identifier, 0) = .{};
    { // params
        try expect(.l_paren, tokens);
        const next_token = tokens.next() orelse
            return error.SyntaxError;
        // labelled switch to loop over mutliple parameters.
        params: switch (next_token.tag) {
            // old bahaviour is this:
            .keyword_void => try expect(.r_paren, tokens),

            // new optional parameters
            .type_int => {
                const ident = try expect(.identifier, tokens);
                try params.append(arena, .{ .name = ident });
                const next_next = tokens.next() orelse
                    return error.SyntaxError;

                // loops back here. this would be sginifcantly more annoying
                // to write without labeled switch
                switch (next_next.tag) {
                    .comma => {
                        try expect(.type_int, tokens);
                        continue :params .type_int;
                    },
                    .r_paren => {},

                    // could just retunr here but where is the fun then?
                    else => continue :params .invalid,
                }
            },
            else => return error.SyntaxError,
        }
    }
    const block = block: {
        const peeked = tokens.next() orelse
            return error.SyntaxError;

        switch (peeked.tag) {
            .semicolon => break :block null,
            .l_brace => {
                tokens.put_back(peeked);
                break :block try parse_block(arena, tokens);
            },
            else => return error.SyntaxError,
        }
    };

    return .{ .name = name, .params = params, .block = block };
}

There is one big item left, which is BlockItem, which can be a function declaration as well as a variable declaration, and it is not possible to know which is which from the token type_int as done before.

The most straightforward way to implement it in the current code base is as follows: look for an type_int token, verify there is an identifier token afterwards, without recording it, then see what the third token is. If it is a semicolon or an equals, it is a variable; if it is a parenthesis, it is a function; otherwise it is illegal. Then the parser is rewinded to the int and the correct function is called. In later chapters, with global variables, it will be apparent that the only place which only accepts one type of declarations is the for loop statement, but I will cross that bridge when I get to it.

fn parse_decl(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!ast.Decl {
    const int_token = tokens.next() orelse
        return error.NotEnoughJunk;
    if (int_token.tag != .type_int) return error.SyntaxError;
    _ = try expect(.identifier, tokens);

    const new_token = tokens.next() orelse
        return error.NotEnoughJunk;

    tokens.put_back(int_token);
    switch (new_token.tag) {
        .semicolon, .equals => return .{ .V = try parse_var_decl(arena, tokens) },
        .l_paren => return .{ .F = try parse_func_decl(arena, tokens) },
        else => return error.SyntaxError,
    }
}

Parsing function calls is a new challenge. Previously, any identifier is immediately assumed to be a variable. But now, if the identifier is followed by parenthesis, it could be a function call with an arbitrary number of parameters. The new identifier case exhibits a serious case of rightward drift.

.identifier => {
    const next_token = tokens.next() orelse
        return error.NotEnoughJunk;
    const name = tokens.buffer[current.loc.start..current.loc.end];

    switch (next_token.tag) {
        .l_paren => return .{ .func_call = .{
            .{ .name = name },
            try parse_args(arena, tokens),
        } },
        else => {
            tokens.put_back(next_token);
            return .{ .@"var" = .{ .name = name } };
        },
    }
},

// elsewhere:
fn parse_args(
    arena: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) Error!std.SegmentedList(ast.Expr, 0) {
    // assumes l_paren already consumed
    var ret: std.SegmentedList(ast.Expr, 0) = .{};

    const current = tokens.next() orelse
        return error.NotEnoughJunk;

    args: switch (current.tag) {
        .r_paren => return ret,
        .comma => {
            const expr = try parse_expr(arena, tokens, 0);
            try ret.append(arena, expr);

            const n_token = tokens.next() orelse
                return error.NotEnoughJunk;
            continue :args n_token.tag;
        },
        else => { // only actually relevant for the first argument.
            tokens.put_back(current);
            continue :args .comma;
        },
    }
}

Semantic Analysis

This is a bit more involved than usual this chapter. In addition to variable resolution, the compiler needs to do type checking! Do all declarations of functions (which can repeat), have the same number of parameters?

Starting with the easier stuff, the identifier resolution pass should handle the new function syntax. Start with function calls. Here is the added part in resolve_expr.

.func_call => |*f| {
    if (bp.variable_map.get(f.@"0".name)) |entry| {
        f.@"0" = .{ .idx = entry.name };
        var iter = f.@"1".iterator(0);
        while (iter.next()) |item|
            try resolve_expr(bp, item);
    } else return error.Undeclaredfunction;
},

Resolving function declarations had me move the creation of the main variable map back into resolve_prgm instead, and create a new inner map for every function. Also, I need to update the Entry type of the variable map. This is the new Entry.

const Entry = struct {
    name: utils.StringInterner.Idx,
    scope: enum { local, parent } = .local,
    linkage: enum { none, external } = .none, // <-- new
};

Then this is resolve_func_decl, and the inner parameters resolution, which is really just a cheap copy of resolve_var_decl itself rebranded resolve_decl.

fn resolve_func_decl(
    bp: Boilerplate,
    func_decl: *ast.FuncDecl,
) Error!void {
    if (bp.variable_map.get(func_decl.name)) |prev|
        if (prev.scope == .local and prev.linkage != .external)
            return error.DuplicateFunctionDecl;

    try bp.variable_map.put(bp.gpa, func_decl.name, .{
        .name = try bp.strings.get_or_put(bp.gpa, func_decl.name),
        .scope = .local,
        .linkage = .external,
    });

    var variable_map = try bp.variable_map.clone(bp.gpa);
    defer variable_map.deinit(bp.gpa);

    var iter = variable_map.valueIterator();
    while (iter.next()) |value|
        value.* = .{
            .name = value.name,
            .scope = .parent,
            .linkage = value.linkage,
        };

    const inner_bp = bp.into_ineer(&variable_map);

    var iter = func_decl.params.iterator(0);
    while (iter.next()) |param| {
        // a cheap imitation of `resolve_var_decl`
        // should pribably be in its own function.
        if (inner_bp.variable_map.get(param.name)) |entry|
            if (entry.scope == .local)
                return error.DuplicateVariableDecl;

        const unique_name = try inner_bp.make_temporary(param.name);
        try inner_bp.variable_map.put(inner_bp.gpa, param.name, .{ .name = unique_name });

        param.* = .{ .idx = unique_name };
    }

    if (func_decl.block) |*block|
        try resolve_block(inner_bp, null, block);
}

One more thing, there is need to check that block level function declarations have no body. I am going to do that in resolve_block.

fn resolve_block(
    bp: Boilerplate,
    current_label: ?utils.StringInterner.Idx,
    block: *ast.Block,
) Error!void {
    var iter = block.body.iterator(0);
    while (iter.next()) |item| switch (item.*) {
        .S => |*s| try resolve_stmt(bp, current_label, s),
        .D => |*d| switch (d.*) {
            .F => |*f| if (f.block) |_|
                return error.IllegalFuncDefinition
            else
                try resolve_func_decl(bp, f),
            .V => |*v| try resolve_var_decl(bp, v),
        },
    };
}

Before delving into type checking, the book suggests to run the test suite, but expect a number of failures. Thankfully, I can check the individual folders separately using the eye test, without running the test suite in the official manner, I can tell that the compiler is passing all the right files it should at this stage, even the ones in invalid_types.

Type checking

The Book has the type checking done in its own pass. At first, I tried just stuffing the type checking logic right into the same identifier resolution pass. But the actual problem was that I needed a separate data structure anyway, since function declarations have to match even when they are in different scopes. So I am still doing it in the same run, but with a separate data structure.

The type checking consists mostly of the following: not using the same identifier for both ints and functions, making sure functions always have the same number of parameters in all declarations, and make sure a function is not defined (with a body) twice.

A new data structure would be needed to stuff this info, global throughout the whole file.1 A hashmap taking the unique identifiers as keys and their types as values. The type is either an integer or a function with defined arity (number of parameters). I also need to track whether a function has been defined or not.

const TypeMap = std.AutoHashMapUnmanaged(
    u32,
    Type,
);

const Type = union(enum) {
    int,
    func: struct {
        arity: usize,
        defined: bool,
    },
};

Then adding a pointer to it to Boilerplate, and adjusting resolve_prgm as follows.

pub fn resolve_prgm(
    gpa: std.mem.Allocator,
    strings: *utils.StringInterner,
    prgm: *ast.Prgm,
) Error!void {
    var variable_map: VariableMap = .empty;
    defer variable_map.deinit(gpa);

    var type_map: TypeMap = .empty;
    defer type_map.deinit(gpa);

    const bp: Boilerplate = .{
        .gpa = gpa,
        .strings = strings,
        .variable_map = &variable_map,
        .type_map = &type_map,
    };

    var iter = prgm.funcs.iterator(0);
    while (iter.next()) |item|
        try resolve_func_decl(bp, item);
}

What follows next is lots of annoying boilerplate. I must make sure every time I add something to a variable_map, I am adding its new name (if any) to type_map. Hairier than usual logic, but should try to straighten it out before stuffing it in a Boilerplate method.

For example, in resolve_finc_decl, adding a name to variable_map and type_map is done as follows:

{
    const nname = try bp.strings.get_or_put(bp.gpa, func_decl.name);
    try bp.variable_map.put(bp.gpa, func_decl.name, .{
        .name = nname,
        .scope = .local,
        .linkage = .external,
    });
    const gop = try bp.type_map.getOrPut(bp.gpa, nname.real_idx);
    if (gop.found_existing) {
        if (gop.value_ptr.* != .func or
            gop.value_ptr.func.arity != func_decl.params.count())
        {
            return error.TypeError;
        } else if (gop.value_ptr.func.defined and
            func_decl.block != null)
        {
            return error.DuplicateFunctionDef;
        }
    } else gop.value_ptr.* = .{ .func = .{
        .arity = func_decl.params.count(),
        .defined = func_decl.block != null,
    } };
}

A similar thing to do in resolve_var_decl, and inside the small tidbit in resolve_func_decl that reeolves parameters. This is how it looks like.

{
    const gop = try bp.type_map.getOrPut(bp.gpa, unique_name.real_idx);
    if (gop.found_existing) {
        if (gop.value_ptr.* != .int) {
            return error.TypeError;
        }
    } else gop.value_ptr.* = .int;
}

There does not seem enough shared logic right now to try and DRY these. Maybe later. What is left is checking these things in function calls and variable declarations. They are both very funny looking.

.@"var" => |name| {
    if (bp.variable_map.get(name.name)) |un| {
        if (bp.type_map.get(un.name.real_idx).? == .int) // unwrap optional
            expr.* = .{ .@"var" = .{ .idx = un.name } }
        else
            return error.TypeError;
    } else return error.UndeclaredVariable;
},
.func_call => |*f| {
    if (bp.variable_map.get(f.@"0".name)) |entry| {
        const t = bp.type_map.get(entry.name.real_idx).?;
        if (t == .func and
            t.func.arity == f.@"1".count())
        {
            f.@"0" = .{ .idx = entry.name };
            var iter = f.@"1".iterator(0);
            while (iter.next()) |item|
                try resolve_expr(bp, item);
        } else return error.TypeError;
    } else return error.UndeclaredFunction;
},

I think this should be it. I checked for functions being defined twice/ I checked that functions should be functions and types should be types. I checked all functions with the same name should have the same arity. zig build returns no errors. What is left?

The proof of the pudding is in the test suite. Time to run the test suite. (which now applies --latest-only by default as not to take too long.)

% paella ❱ zig build submit -- --chapter 9 --stage validate
----------------------------------------------------------------------
Ran 61 tests in 157.089s

OK

Phew. Mind you this does not mean the logic is correct. It just means it is failing the ones it should fail and passing the ones it should pass. The full test suite passes as well, which is cool.

I took the time to refactor out the common logic between function parameters and variable declarations. Zig's comptime allows some tricks that Rust would torture me for,2 forcing me to use a trait or some weird amalgamation of const generics. It turned out most of the logic is for identifiers, except for initializers in variable declarations. The new function follows.

fn resolve_var_decl(
    comptime T: enum { param, @"var" }, // anoynmous types yay
    bp: Boilerplate,
    item: switch (T) { // can simply be `if` but this is more readable i think
        .@"var" => *ast.VarDecl,
        .param => *ast.Identifier,
    },
) Error!void {
    const identifier = switch (T) { // pulling out the common logic
        .@"var" => &item.name,
        .param => item,
    };
    if (bp.variable_map.get(identifier.name)) |entry| if (entry.scope == .local)
        return error.DuplicateDecl;

    const unique_name = try bp.make_temporary(identifier.name);
    try bp.variable_map.put(bp.gpa, identifier.name, .{ .name = unique_name });

    { // TYPE CHECKING
        const gop = try bp.type_map.getOrPut(bp.gpa, unique_name.real_idx);
        if (gop.found_existing) {
            if (gop.value_ptr.* != .int)
                return error.TypeError;
        } else gop.value_ptr.* = .int;
    }

    identifier.* = .{ .idx = unique_name };

    if (T == .@"var") // logic unique to declarations
        if (item.init) |expr|
            try resolve_expr(bp, expr);
}

// called like this, inside `resolve_func_decl`
while (params.next()) |param|
    try resolve_var_decl(.param, inner_bp, param);

I ran the test suite again (and the eye tests) after this and everything seems to work out.


Internal Represntation

After a long drought, finally time to update the IR syntax tree and generation. Which means new assembly stuff later. Couldn't the Book stick to adding more control flow constructs for ever? Maybe function calls can be implemented by just jumping around. Is that a thing?

The syntax tree updatss mirror the AST's.

const Identifier = utils.StringInterner.Idx; // type alias

pub const Prgm = struct {
    funcs: std.ArrayListUnmanaged(FuncDef), // <--
};

pub const FuncDef = struct {
    name: Identifier,
    params: std.ArrayListUnmanaged(Identifier), // <--
    instrs: std.ArrayListUnmanaged(Instr),
};

pub const Instr = union(enum) {
    func_call: struct {
        name: Identifier,
        args: std.ArrayListUnmanaged(Value),
        dst: Value,
    },

    // rest of the owl
};

And the rest is pretty much the same, sans a few updates to the pretty printers and deinitializers. Note that there are no function declarations, as they are discarded in this stage after being done with the type checking.

prgm_emit_ir is simply a small update from the one-function version to the many-functions version, skipping over declarations without bodies

pub fn prgm_emit_ir(
    alloc: std.mem.Allocator,
    strings: *utils.StringInterner,
    prgm: *const ast.Prgm,
) Error!ir.Prgm {
    var funcs: std.ArrayListUnmanaged(ir.FuncDef) = try .initCapacity(
        alloc,
        prgm.funcs.len,
    );

    var iter = prgm.funcs.constIterator(0);
    while (iter.next()) |f| if (f.block) |_| { // skip empty declarations
        const fir = try func_def_emit_ir(alloc, strings, f);
        try funcs.append(alloc, fir);
    };

    return .{ .funcs = funcs };
}

func_def_emit_ir is also almost exactly the same, except that the identifiers are moved over.

fn func_def_emit_ir(
    alloc: std.mem.Allocator,
    strings: *utils.StringInterner,
    func_def: *const ast.FuncDecl,
) Error!ir.FuncDef {
    const name = try strings.get_or_put(alloc, func_def.name);

    var params: std.ArrayListUnmanaged(utils.StringInterner.Idx) = try .initCapacity(
        alloc,
        func_def.params.count(),
    );
    var iter = func_def.params.constIterator(0);

    while (iter.next()) |param|
        try params.append(alloc, param.idx);

    var instrs: std.ArrayListUnmanaged(ir.Instr) = .empty;
    const bp: Boilerplate = .{
        .alloc = alloc,
        .strings = strings,
        .instrs = &instrs,
    };

    try block_emit_ir(bp, &func_def.block.?);
    try instrs.append(alloc, .{ .ret = .{ .constant = 0 } });

    return .{ .name = name, .params = params, .instrs = instrs };
}

Function call expressions are new, even though they follow the same pattern as all other expressions.

.func_call => |f| {
    var args: std.ArrayListUnmanaged(ir.Value) = try .initCapacity(
        bp.alloc,
        f.@"1".count(),
    );

    const dst = .{ .variable = try bp.make_temporary("fn") };

    var iter = f.@"1".constIterator(0);
    while (iter.next()) |e| {
        const v = try expr_emit_ir(bp, e);
        try args.append(bp.alloc, v);
    }

    try bp.append(.{ .func_call = .{
        .name = f.@"0".idx,
        .args = args,
        .dst = dst,
    } });

    return dst;
},

The last thing left to do here is to make sure that functon declaration as block items are skipped as well.

fn block_emit_ir(
    bp: Boilerplate,
    block: *const ast.Block,
) Error!void {
    var iter = block.body.constIterator(0);
    while (iter.next()) |item| switch (item.*) {
        .S => |*s| try stmt_emit_ir(bp, s),
        .D => |d| if (d == .V) try var_decl_emit_ir(bp, &d.V), // <-
    };
}

Small Detour Back to Type Checking

I am honestly not quite sure about some decisions up to now. For example, if function declarations are discarded, why are their parameters given unique identities at all? Maybe I should go back and simply check the count (and ideally the types, but there are no types in here). This would change the end of resolve_func_decl to the following, instead of the if statement only encasing the last line. Since the count is checked earlier

if (func_decl.block) |*block| {
    var params = func_decl.params.iterator(0);
    while (params.next()) |param|
        try resolve_var_decl(.param, inner_bp, param);

    try resolve_block(inner_bp, null, block);
}

This passes all validations tests, except one.

/* Duplicate parameter names are illegal in function declarations
   as well as definitions */
int foo(int a, int a);

int main(void) {
    return foo(1, 2);
}

int foo(int a, int b) {
    return a + b;
}

Ah well. Amusingly, at least according to the LLM I asked, int foo(int, int) is a legal declaration. Anyway, this was a failed detour.


Assembly Generation

This is the first serious brush in the Book with the System V ABI, the most common binary interface for Unixes. As there are currently no types more complex than int, all the compiler needs to worry about is stuffing parameters in the right registers. (And for function bodies, retreiving them from the right registers). The IR has not concerned itself with this, relying on the surrounding passes to make sense out of it.

In my Rust implementation, I used clever iterator combinators to write the logic for this section. I am kind of curious to see how it will turn out in Zig.

The first step, as usual, is to update the assembly syntax tree. A program needs to be redefined as a list of functions, and there are new instructions and new registers. Some necessary refactorings should be timed now as well. This is assembly.Prgm.

pub const Prgm = struct {
    funcs: std.ArrayListUnmanaged(FuncDef),

    pub fn fixup(
        self: *@This(),
        alloc: std.mem.Allocator,
    ) !void {
        for (self.funcs.items) |*func| {
            // functions called here changed from taking *Prgm to *FuncDef
            // ideally this should live under FuncDef, but this is fine for now
            const depth = try pass_pseudo.replace_pseudos(alloc, func);
            try func.instrs.insert(alloc, 0, .{
                .allocate_stack = @abs(depth),
            });
            try pass_fixup.fixup_instrs(alloc, func);
        }
    }

The new instructions are simple additions. However, I am using two type aliases here: Depth for an unsigned version of stack depth, and Identifier for utils.StringInterner.Idx.

allocate_stack: Depth, // old
dealloc_stack: Depth,

push: Operand,
call: Identifier,

And aside from the new Registers, which I will not bore you with, this is pretty much it.3 Implementing the codegen is the next step. parameters in function definitions need to be taken from their correct register and stack positions; and function calls need to stuff them there. prgm_to_asm, the main entry point, is straightforward.

pub fn prgm_to_asm(
    alloc: std.mem.Allocator,
    prgm: ir.Prgm,
) !assembly.Prgm {
    var funcs: std.ArrayListUnmanaged(assembly.FuncDef) = try .initCapacity(
        alloc,
        prgm.funcs.items.len,
    );

    for (prgm.funcs.items) |func|
        try funcs.append(
            alloc,
            try func_def_to_asm(alloc, func),
        );

    return .{ .funcs = funcs };
}

For functions, it is slightly different. The first six parameters should be passed in these registers in this specific order: DI, SI, DX, CX, R8, and R9. Remaining arguments are pushed into the stack in reverse order. So normally, I will just assign a constant for them registers.

const REGISTERS: [6]assembly.Operand.Register =
    .{ .DI, .SI, .DX, .CX, .R8, .R9 };

In func_def_to_asm itself, a zipped for loop is perhaps the most straightforward way of doing this. The first section is simple enough. And since the remaining paramters are pushed into the stack at function call in reverse order, they are in the correct order here. Starting from a 16 stack offset, they are moved one by one. Luckily, this same loop can be used!

for (func_def.params.items, 0..) |param, idx|
    if (idx < REGISTERS.len)
        try instrs.append(alloc, .{ .mov = .init(
            .{ .reg = REGISTERS[idx] },
            .{ .pseudo = param },
        ) })
    else {
        const offset = (idx - REGISTERS.len + 2) * 8; // 16, 24, etc
        try instrs.append(alloc, .{ .mov = .init(
            .{ .stack = @intCast(offset) },
            .{ .pseudo = param },
        ) });
    };

Implementing function calls is a lot more involved. First, the stack needs to be aligned properly at 16, as per the System V ABI. Copying the passed in expressions to the input registers is easy, but reversing the rest of them requires some shenanigans. First is to have a stack allocated array with the maximum possible size because it is unknown at the outset what the size of the returned slice could be.

// the return is of unknown size.
// maximum possible size is parameter count * 2 + 4
var ret: std.ArrayListUnmanaged(assembly.Instr) =
    try .initCapacity(alloc, c.args.items.len * 2 + 4);

Then calculate the amount of arguments in the stack and the needed padding to 16. Zig has a saturating subtraction operator -| that is perfect for this.

const depth = c.args.items.len -| REGISTERS.len;
const padding: assembly.Instr.Depth =
    if (depth % 2 == 0) 8 else 0;
if (padding > 0)
    try ret.append(alloc, .{ .allocate_stack = padding }); // 1

The numbering in the comments is me keeping track of instructions that do not depend on parameters count. After that comes putting the values in their respective registers, and the stack.4

for (c.args.items, 0..) |arg, idx| {
    if (idx >= REGISTERS.len) break;

    try ret.append(alloc, .{ .mov = .init(
        .{ .reg = REGISTERS[idx] },
        value_to_asm(arg),
    ) });
}
for (0..depth) |idx| {
    const v_ir = c.args.items[c.args.items.len - 1 - idx];
    const v_asm = value_to_asm(v_ir);
    switch (v_asm) {
        .imm, .reg => try ret.append(alloc, .{ .push = v_asm }),
        else => try ret.appendSlice(alloc, &.{
            .{ .mov = .init(v_asm, .{ .reg = .AX }) },
            .{ .push = .{ .reg = .AX } },
        }),
    }
}

I think this should work. 0..depth will be .. nothing, and the loop will not run. If it is larger, it starts from the last item (at index len - , 1), then subtracts the current index of the loop. And finally, dealing with the return value, and returning the slice.

// emit call instruction
try ret.append(alloc, .{ .call = c.name }); // 2

const bytes_to_remove = 8 * depth + padding;
if (bytes_to_remove != 0)
    try ret.append(alloc, .{ .dealloc_stack = bytes_to_remove }); // 3

try ret.append(alloc, .{ .mov = .init(
    .{ .reg = .AX },
    value_to_asm(c.dst),
) }); // 4

return ret.items;

Doing the eye test for codegen, I quickly noticed something stupid. I am moving the register's value to an immutable. Oops.

int add(int x, int y);

int main(void) {
    return add(1, 2);
}
// PROGRAM
// 	FUNCTION main
// 		allocate	4
// 		allocate	8
// 		mov	DI -> imm 1
// 		mov	SI -> imm 2
// 		call	.Ladd
// 		deallocate	8
// 		mov	AX -> stack -4
// 		mov	stack -4 -> AX
// 		ret
// 		mov	imm 0 -> AX
// 		ret

Fixing that by switching two lines, the eye test passes fine. The test suite does not hit any errors (and, as a reminder, does not test beyond success a nd failure).

After that comes the annoying fixup pass adjustments. The depth returned by replace_pseudos is adjusted to save the depth of each function in a field in the function definition. And also, when allocating, rounding that up to the nearest multiple of 16.

There is a neat trick to round up a number, n, to the nearest multiple of 16. Basically (n + 15) & ~15. Unfortunately, Zig makes this a pain to write because the integer type inference is not clever enough to understand ~15.

There in the standard library is a helpful function to do that math: std.mem.alignForward. Kindly pointed to me by the Zig Discord. So I added this bit at the end of replace_pseudos.

func_def.depth = @intCast(pseudo_map.count() * 4);

const aligned = std.mem.alignForward(assembly.Instr.Depth, func_def.depth, 16);
try func_def.instrs.insert(alloc, 0, .{ .allocate_stack = aligned });

So, time to actually emit the assembly.

Code Emission

The changes here are pretty small. The new instructions are one thing, but there are new registers and new register sizes for push, which takes 8-width registers. The logic for printing registers got pretty large so I put it in its own function.

fn emit_register(
    reg: Operand.Register,
    width: usize,
    writer: anytype,
) !void {
    p: switch (width) {
        1 => switch (reg) {
            .AX => try writer.print("%al", .{}),
            .DX => try writer.print("%dl", .{}),
            .CX => try writer.print("%cl", .{}),
            .DI => try writer.print("%dil", .{}),
            .SI => try writer.print("%sil", .{}),
            .R8 => try writer.print("%r8b", .{}),
            .R9 => try writer.print("%r9b", .{}),
            .R10 => try writer.print("%r10b", .{}),
            .R11 => try writer.print("%r11b", .{}),
        },
        4 => switch (reg) {
            .AX => try writer.print("%eax", .{}),
            .DX => try writer.print("%edx", .{}),
            .CX => try writer.print("%ecx", .{}),
            .DI => try writer.print("%edi", .{}),
            .SI => try writer.print("%esi", .{}),
            .R8 => try writer.print("%r8d", .{}),
            .R9 => try writer.print("%r9d", .{}),
            .R10 => try writer.print("%r10d", .{}),
            .R11 => try writer.print("%r11d", .{}),
        },
        8 => switch (reg) {
            .AX => try writer.print("%rax", .{}),
            .DX => try writer.print("%rdx", .{}),
            .CX => try writer.print("%rcx", .{}),
            .DI => try writer.print("%rdi", .{}),
            .SI => try writer.print("%rsi", .{}),
            .R8 => try writer.print("%r8", .{}),
            .R9 => try writer.print("%r9", .{}),
            .R10 => try writer.print("%r10", .{}),
            .R11 => try writer.print("%r11", .{}),
        },
        else => continue :p 4, // default case
    }
}

And then in the format function just do the following. There is a bit of redundancy regarding the 4 as a default case. You are not paranoid if they are out to get you!

.reg => |r| try emit_register(r, options.width orelse 4, writer),

Running the test suite, I get into 16 failures. Oh come on.

Most of them seem to be UnrecognizedFlag, which is the error I have in the compiler driver for flags that are unrecognized. That is what I get for not reading the text and jumping straight to the tables. There is a tidbit there about recognizing a -c flag, which is the gcc flag for creating object files instead of executables.

This is a small change to the argument parser. That is the new one:


pub fn parse_args() !Args {
    var args = std.process.args();
    _ = args.skip();

    var path: ?[:0]const u8 = null;
    var mode: Mode = .compile;
    var c_flag = false; // new

    while (args.next()) |arg| {
        if (arg[0] == '-') {
            if (arg[1] == 'c') // new
                c_flag = true
            else
                mode = std.meta.stringToEnum(Mode, arg[2..]) orelse
                    return error.UnrecognizedFlag;
        } else if (path == null)
            path = arg
        else
            return error.PathDuplicated;
    }

    return .{
        .path = path orelse return error.PathNotFound,
        .mode = mode,
        .c_flag = c_flag, // new
    };
}

And mugh below that, when calling the assembler, the -c flag is passed if c_flag is true. And changing the extension to .o. So it becomes the following (changing the extension itself is done elsewhere in a bit of rather annoying code). This is probably the nicest way of doing it.

{ // assembler
    var child = std.process.Child.init(
        if (args.c_flag)
            &.{ "gcc", "-c", asm_out, "-o", obj }
        else
            &.{ "gcc", asm_out, "-o", exe },
        gpa,
    );

    const term = try child.spawnAndWait();
    if (!std.meta.eql(term, .{ .Exited = 0 }))
        return error.AssemblerFail;

    try std.fs.cwd().deleteFile(asm_out); // cleanup
}

Now running the test suite for real.


Debugging

On the plud side, all the programs ar ecompiling and running. On the negative side, I got a lot of bad return codes. There are logic mistakes. Also 3 errors relating to unicode and timeouts? I do not get it. The test suite's output is generally jumbled up and I find it hard to process. Having said that, the easiest way to attack the problem is to look at the individual files and compiles them myself.

UnicodeDecodeError

This is one of the weird errors, Nora Sandler's comments and all. The program should call the system's putchar, which prints a character to standard out.

#ifdef SUPPRESS_WARNINGS
#pragma GCC diagnostic ignored "-Wunused-parameter"
#endif
int putchar(int c);

/* Make sure we can correctly manage calling conventions from the callee side
 * (by accessing parameters, including parameters on the stack) and the caller side
 * (by calling a standard library function) in the same function
 */
int foo(int a, int b, int c, int d, int e, int f, int g, int h) {
    putchar(h);
    return a + g;
}

int main(void) {
    return foo(1, 2, 3, 4, 5, 6, 7, 65);
}

This is the output for each stage.

PARSING
	FUNCTION putchar c
	FUNCTION foo a b c d e f g h
		(putchar h);
		RETURN (+ a g)
	FUNCTION main
		RETURN (foo 1 2 3 4 5 6 7 65)
================================
VALIDATION
	FUNCTION putchar c.0
	FUNCTION foo a.1 b.2 c.3 d.4 e.5 f.6 g.7 h.8
		(putchar h.8);
		RETURN (+ a.1 g.7)
	FUNCTION main
		RETURN (foo 1 2 3 4 5 6 7 65)
================================
IR
	FUNCTION foo
		fn.9 <- putchar(h.8)
		add.10 <- a.1 + g.7
		ret add.10
		ret 0
	FUNCTION main
		fn.11 <- foo(1, 2, 3, 4, 5, 6, 7, 65)
		ret fn.11
		ret 0
================================
CODEGEN
	FUNCTION foo
		allocate	48
		mov	DI -> stack -4
		mov	SI -> stack -8
		mov	DX -> stack -12
		mov	CX -> stack -16
		mov	R8 -> stack -20
		mov	R9 -> stack -24
		mov	stack 16 -> R10
		mov	R10 -> stack -28
		mov	stack 24 -> R10
		mov	R10 -> stack -32
		allocate	8
		mov	stack -32 -> DI
		call	.Lputchar
		deallocate	8
		mov	AX -> stack -36
		mov	stack -4 -> R10
		mov	R10 -> stack -40
		mov	stack -28 -> R10
		add	R10 -> stack -40
		mov	stack -40 -> AX
		ret
		mov	imm 0 -> AX
		ret
	FUNCTION main
		allocate	16
		allocate	8
		mov	imm 1 -> DI
		mov	imm 2 -> SI
		mov	imm 3 -> DX
		mov	imm 4 -> CX
		mov	imm 5 -> R8
		mov	imm 6 -> R9
		push	imm 65
		push	imm 7
		call	.Lfoo
		deallocate	24
		mov	AX -> stack -4
		mov	stack -4 -> AX
		ret
		mov	imm 0 -> AX
		ret

Hold on, before going into assembly, I spot a weird thing here. Why is the call to foo is preceded by .L. This is probably a formatting error which does not matter. But more importantly, the numbers for deallocations seem weird. Why is it deallocating 24? Let's review the logic in instr_to_asm in .func_call.

const bytes_to_remove = 8 * depth + padding;
if (bytes_to_remove != 0)
    try ret.append(alloc, .{ .dealloc_stack = bytes_to_remove }); // 3

Eh, this makes sense. Counting the arguments passed on the stack there, there are 2. So why is there padding?

const padding: assembly.Instr.Depth =
    if (depth % 2 == 0) 8 else 0; // <--- here
if (padding != 0)
    try ret.append(alloc, .{ .allocate_stack = padding }); // 1

Oh I am assigning the padding in reverse. If the number of arguments is even, there should be no padding. Silly me.

This does not actually deal with the error I got for this file, which is this:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 0: invalid start byte

This might be the valued printed by putchar failing to read. However, only way to know is to run the test suit again.

% paella ❱ zig build submit --release=safe  -- --chapter 9

# many moths later
----------------------------------------------------------------------
Ran 61 tests in 60.822s

FAILED (failures=12)

Hey at least no weird UTF-8 errors. That's 7 less failures than last time (which was 16 failures and 3 errors).

AssertionError

Note: Please do not rad this section. It is embarrassing.

This is a more readable error than the last one.

AssertionError: Incorrect behavior in chapter_9/valid/stack_arguments/test_for_memory_leaks
* Bad return code: expected 1 and got 0

I can work with this. This is the C file in question. These are a lot of arguments.

/* Make sure stack arguments are deallocated correctly after returning from a function call; also test passing variables as stack arguments */

#ifdef SUPPRESS_WARNINGS
#pragma GCC diagnostic ignored "-Wunused-parameter"
#endif

int lots_of_args(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int k, int l, int m, int n, int o) {
    return l + o;
}

int main(void) {
    int ret = 0;
    for (int i = 0; i < 10000000; i = i + 1) {
        ret = lots_of_args(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ret, 13, 14, 15);
    }
    return ret == 150000000;
}

I will skip the whole parsing to IR thing and simply give you here the last codegen, with added comments. It is huge.

PROGRAM
	FUNCTION lots_of_args
		allocate	64
		mov	DI -> stack -4		; a
		mov	SI -> stack -8		; b
		mov	DX -> stack -12		; c
		mov	CX -> stack -16		; d
		mov	R8 -> stack -20		; e
		mov	R9 -> stack -24		; f
		mov	stack 16 -> R10
		mov	R10 -> stack -28	; g
		mov	stack 24 -> R10
		mov	R10 -> stack -32	; h
		mov	stack 32 -> R10
		mov	R10 -> stack -36	; i
		mov	stack 40 -> R10
		mov	R10 -> stack -40	; j
		mov	stack 48 -> R10
		mov	R10 -> stack -44	; k
		mov	stack 56 -> R10
		mov	R10 -> stack -48	; l
		mov	stack 64 -> R10
		mov	R10 -> stack -52	; m
		mov	stack 72 -> R10
		mov	R10 -> stack -56	; n
		mov	stack 80 -> R10
		mov	R10 -> stack -60	; o
		mov	stack -48 -> R10
		mov	R10 -> stack -64	; l
		mov	stack -60 -> R10	; o
		add	R10 -> stack -64	; l + o
		mov	stack -64 -> AX
		ret
		mov	imm 0 -> AX
		ret
	FUNCTION main
		allocate	32
		mov	imm 0 -> stack -4	; ret
		mov	imm 0 -> stack -8	; i
		=> .Lst_for.16
		cmp	imm 10000000 -> stack -8
		mov	imm 0 -> stack -12
		setl	stack -12
		cmp	imm 0 -> stack -12
		jmpe	.Lbr_for.16
		allocate	8	; odd number of arguments
		; register arguments
		mov	imm 1 -> DI	; a
		mov	imm 2 -> SI	; b
		mov	imm 3 -> DX	; c
		mov	imm 4 -> CX	; d
		mov	imm 5 -> R8	; e
		mov	imm 6 -> R9	; f
		; stack arguments
		push	imm 15		; o
		push	imm 14		; n
		push	imm 13		; m
		mov	stack -4 -> AX 	; ret into l
		push	AX		; l
		push	imm 11		; k
		push	imm 10		; j
		push	imm 9		; i
		push	imm 8		; h
		push	imm 7		; g
		call	lots_of_args
		deallocate	80	; why is this 80? 9 pushes + padding
		mov	AX -> stack -16	; l + o here
		mov	stack -16 -> R10
		mov	R10 -> stack -4	; assign to ret
		=> .Lcn_for.16
		mov	stack -8 -> R10
		mov	R10 -> stack -20
		add	imm 1 -> stack -20
		mov	stack -20 -> R10
		mov	R10 -> stack -8
		jmp	.Lst_for.16
		=> .Lbr_for.16
		cmp	imm 150000000 -> stack -4
		mov	imm 0 -> stack -24
		sete	stack -24
		mov	stack -24 -> AX
		ret
		mov	imm 0 -> AX
		ret

I am actually at a loss. All this seems to make sense. The generated asembly is more of the same. Let us see another failure:

AssertionError: Incorrect behavior in chapter_9/valid/arguments_in_registers/fibonacci
* Bad return code: expected 8 and got -11

How would you even get -11?

int fib(int n) {
    if (n == 0 || n == 1) {
        return n;
    } else {
        return fib(n - 1) + fib(n - 2);
    }
}

int main(void) {
    int n = 6;
    return fib(n);
}

This is the assembly this time.

	.globl _fib
_fib:
	pushq   %rbp		;; function prologue
	movq    %rsp, %rbp
	subq    $48, %rsp	;; allocate 48
	movl    %edi, -4(%rsp)	;; n
	cmpl    $0, -4(%rsp)	;; does n equal 0?
	movl    $0, -8(%rsp)
	sete      -8(%rsp)
	cmpl    $0, -8(%rsp)	;; is n == 0 true ?
	jne     .Ltrue_or.2
	cmpl    $1, -4(%rsp)	;; does n equal 1?
	movl    $0, -12(%rsp)
	sete      -12(%rsp)
	cmpl    $0, -12(%rsp)	;; is n == 1 true?
	jne     .Ltrue_or.2
	movl    $0, -16(%rsp)
	jmp    .Lend_or.3
.Ltrue_or.2:
	movl    $1, -16(%rsp)
.Lend_or.3:
	cmpl    $0, -16(%rsp)	;; is the previous expresison false?
	je      .Lelse.7
	movl    -4(%rsp), %eax	;; if so just return n
	movq    %rbp, %rsp
	popq    %rbp
	ret
	jmp    .Lend.8
.Lelse.7:
	movl    -4(%rsp), %r10d		;; if not ..
	movl    %r10d, -20(%rsp)	;; move n here
	subl    $1, -20(%rsp)		;; n - 1
	movl    -20(%rsp), %edi		;; move n-1 to edi
	call    _fib			;; call fib
	movl    %eax, -24(%rsp)		;; stash result of fib(n-1)
	movl    -4(%rsp), %r10d
	movl    %r10d, -28(%rsp)	;; move n here
	subl    $2, -28(%rsp)		;; n - 2
	movl    -28(%rsp), %edi		;; move n-1 to edi
	call    _fib			;; call fib
	movl    %eax, -32(%rsp)		;; stash result of fib(n-2)
	movl    -24(%rsp), %r10d	;; stashed result of fib(n-1)
	movl    %r10d, -36(%rsp)
	movl    -32(%rsp), %r10d	;; stashed result of fib(n-2)
	addl    %r10d, -36(%rsp)	;; add two results together
	movl    -36(%rsp), %eax		;; move result to proper place
	movq    %rbp, %rsp
	popq    %rbp
	ret				;; the end
.Lend.8:
	movl    $0, %eax	;; this bit here is the useless return 0
	movq    %rbp, %rsp	;; added at the end of every function
	popq    %rbp
	ret
	.globl _main
_main:
	pushq   %rbp
	movq    %rsp, %rbp
	subq    $16, %rsp
	movl    $6, -4(%rsp)
	movl    -4(%rsp), %edi
	call    _fib
	movl    %eax, -8(%rsp)
	movl    -8(%rsp), %eax
	movq    %rbp, %rsp
	popq    %rbp
	ret
	movl    $0, %eax
	movq    %rbp, %rsp
	popq    %rbp
	ret

The logic seems fine. I am not debugging the logic for ifs and fors as those test cases pass fine already. This sample here has no stack arguments yet it is still failing. This definitely means there is a memory corruption somewhere, but where? I am sure the smart ones of you spotted it. I am not that smart.5

I figured maybe the mistake is from using the subshell. So I tried doing the test suite the manual way I did it previously. But nope, same result. The good news is that my subshell spell works perfectly.

On a whim, which is most likely wrong, I thought of changing the sign of the stack offset for function bodies. (So they start from -16 downwards rather than 16 upwards). And that would not affect the failure in the fibonacci example anyway. Surprisingly, that actually affected absolutely nothing: same number of failing cases. I am starting to think this is a parking_lot bug.

I moved to compare the codegen stage with my previous rust program, and it is identical. I moved on to comparing the assembly generated. Even using diff with colors to compare the two assemblies.

Then there I saw it. The logic is fine, but there is a typo: something I wrote many chapters ago.

Behold: stack offsets: -8(%rsp). It should be from %rbp.

Now all the tests pass. Wonder how any of them did in the first place.


Lessons Learned

  1. Spell checking is important.
  2. diff --color is a useful tool.
  3. Unrelated to paella itself, but I made some improvements to the build script, and learned more abot jj and git. I think I will add git tags to commits that finish each chapter, which would make browsing them easier.
  4. This week I instealled llm, and had it set up to talk to the free version of Github Copilot. Really convenient and useful, and I used to spell check the articles. I will try some local models next but I do not expect they will do much with my Macbook Air M2.
  5. Can I have typed integer literals or slightly greedier type inference in Zig please? It is absurd that ~15 just does not work.

One chapter left for Part 1. Unsure what to do next.