Abdul Rahman Sibahi | Writing a C Compiler, Chapter 3, in Zig

Two chapters of Writing a C Compiler done, many more to go, and it is time to implement binary expressions. But before I get into that, let me rant a bit about Zig.

The War Against Tabs

Zig's grammar does not tolerate the tab character anywhere in the file. This is unfortunate, as I like tabs. With source code, it is whatever, as zig fmt takes care of it, but can I please type my tabs in multiline string literals please?

I like to indent the output, whether it is the parser's debugging output or the final assembly output, with tabs. It just makes sense. It is simpler conceptually to type one tab twice for the second level of indentation than to type a space 4 or 8 or 16 times. Put a tab, and let the user choose the preferred tab width.

But Zig's grammar will have none of it. There can be no tab characters in the file (albeit it is tolerated a bit at indentation before zig fmt gets rid of it all.) This puts me in a dilemma. Consider this piece of code for emitting the ret instruction.

.ret => try writer.writeAll(
    "\tmovq    %rbp, %rsp\n" ++
    "\tpopq    %rbp\n" ++
    "\tret"
),

Reasonable, right? But zig fmt takes a machete to it, and they no longer nicely align.

.ret => try writer.writeAll(
    "\tmovq    %rbp, %rsp\n" ++
        "\tpopq    %rbp\n" ++
        "\tret",
),

The next reasonable solution is Zig's rather clever and very nice multiline string literals. It is literally my favorite piece of syntax in Zig. So much more ergonomic and grokkable and understandable than Swift's """ or Rust's .. nothing. So naturally, I'd type it like this:

.ret => try writer.writeAll(
    \\	movq    %rbp, %rsp
    \\	popq    %rbp
    \\	ret
),
//    ^^ tab character here.

But this does not compile. Just does not. No tab characters allowed. One could argue that this decision reduces confusion over a variable width character like tab and hides the true intentions of whatever. I do not care. I want my tabs, goddammit. I would be ok with escaping tabs '\t', but escapes do not work in multiline string literals. I could use std.fmt.comptimePrint, but that adds too much obfuscation and makes using the literals as the fmt arguments in print statements a lot more complicated. It is just annoying all around.

Anyway, with some magic of comptime, and trial and error of Zig's comptime rules, I came up with this ~~macro~~ function:

inline fn indent(
    comptime text: []const u8,
) []const u8 {
    comptime {
        var iter = std.mem.splitScalar(u8, text, '\n');
        var res: []const u8 = "";

        while (iter.next()) |line|
            res = if (line.len > 0 and line[0] != '_')
                res ++ "\t" ++ line ++ "\n"
            else
                res ++ line ++ "\n";

        return res[0 .. res.len - 1];
    }
}

inline fn and the comptime block conspire together to make sure the body of this function always ever runs at comptime and never at run time, with no ceremony at the calling site.

The code itself is fairly straightforward: it separates the input by lines; adds a new line to all lines; and indents (with tabs!) any line that is not empty and does not start with _, for label and function names. This is specific to my use case, but it is fine. It is used like this.

.ret => try writer.writeAll(indent(
    \\movq    %rbp, %rsp
    \\popq    %rbp
    \\ret
)),

Almost invisible, What's better is that, since it returns a comptime known string, it can be used as the fmt argument in print functions, like the following snippet for the function's prelude (which also showcases the _ and the empty line rules).

try writer.print(indent(
    \\.globl _{0s}
    \\_{0s}:
    \\pushq   %rbp
    \\movq    %rsp, %rbp
    \\
), .{self.name});

This prints, for name "main", as intended, as follows:

	.globl _main
_main:
	pushq   %rbp
	movq    %rsp, %rbp

So much pain would have been avoided if I could just type the damn byte. Now, back to business.

Lexer

Updating the lexer requires adding four new tokens, +, *, /, and %. Where is subtraction, you say? We already lex it, dummy. A failure mode for ++, like done in Chapter 2, is not necessary, because 1 ++ 2 would be rejected by the parser later anyway. And there is no ambiguity about prefix operators, as the + unary prefix operator is not implemented either. This makes things so considerably simple that I will not bother writing the update down. On to parsing, which is way more interesting.

Parser

As I am not using a parser combinator library this time, I have to actually get down and implement the legendary precedence claimbing/pratt parsing algorithm. So, sleeves rolled. First to update the Expr type, splatting operations as before. BinOp is just a helper tuple alias.

One note: due to this helper type, the formatting for these is a bit more interesting than usual. Instead of creating an anonymous struct literal as second argument for print (as usual), I can just pass the BinOp payload, and it works.

pub const Expr = union(enum) {
    constant: u64,
    unop_negate: *Expr,
    unop_complement: *Expr,
    binop_add: BinOp,
    binop_sub: BinOp,
    binop_mul: BinOp,
    binop_div: BinOp,
    binop_rem: BinOp,

    pub const BinOp = struct { *Expr, *Expr };

    pub fn format(
        self: @This(),
        comptime _: []const u8,
        _: std.fmt.FormatOptions,
        writer: anytype,
    ) !void {
        switch (self) {
            .constant => |c| try writer.print("{d}", .{c}),
            // RPN
            .unop_negate => |e| try writer.print("{} --", .{e}),
            .unop_complement => |e| try writer.print("{} ~", .{e}),

            .binop_add => |b| try writer.print("{} {} +", b), // <-- NOT `.{b}`
            .binop_sub => |b| try writer.print("{} {} -", b),
            .binop_mul => |b| try writer.print("{} {} *", b),
            .binop_div => |b| try writer.print("{} {} /", b),
            .binop_rem => |b| try writer.print("{} {} %", b),
        }
    }
};

This is an article about me writing Zig, so I will not bore with yet another explanation of precedence climbing. Plenty of those online, and they are described in literally every compiler book.¹ I am trying to write code here! Most of the work will be in the function parse_expr. Here is the current version.

fn parse_expr(
    alloc: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) !*ast.Expr {
    const current = tokens.next() orelse
        return error.SyntaxError;

    switch (current.tag) {
        // literal
        .number_literal => {
            const lit = tokens.buffer[current.loc.start..current.loc.end];
            const res = try std.fmt.parseInt(u64, lit, 10);

            return try create(ast.Expr, alloc, .{ .constant = res });
        },

        // unary operations
        .hyphen => {
            const inner_exp = try parse_expr(alloc, tokens);
            return try create(ast.Expr, alloc, .{ .unop_negate = inner_exp });
        },
        .tilde => {
            const inner_exp = try parse_expr(alloc, tokens);
            return try create(ast.Expr, alloc, .{ .unop_complement = inner_exp });
        },

        // groups
        .l_paren => {
            const inner_exp = try parse_expr(alloc, tokens);
            try expect(.r_paren, tokens);

            return inner_exp;
        },
        else => return error.ExpectExpr,
    }
}

Well the first step is to rename this to parse_factor (as the book calls it, or atom is fine.). This is possible because unary operators have, almost, the highest precedence in C, and can almost be assumed to be part of the literal. In fact the only operators with higher precedence are postfix, so the grammar works out naturally. Updating the recursive calls it in order to, except that whatever is between the parenthesis is a full expression, and not just a factor, so it becomes this.

fn parse_factor(
    alloc: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) !*ast.Expr {
    const current = tokens.next() orelse
        return error.SyntaxError;

    switch (current.tag) {
        .number_literal => {
            // snip --
        },
        .hyphen => {
            const inner_exp = try parse_factor(alloc, tokens);
            return try create(ast.Expr, alloc, .{ .unop_negate = inner_exp });
        },
        .tilde => {
            const inner_exp = try parse_factor(alloc, tokens);
            return try create(ast.Expr, alloc, .{ .unop_complement = inner_exp });
        },
        .l_paren => {
            const inner_exp = try parse_expr(alloc, tokens);
            try expect(.r_paren, tokens);

            return inner_exp;
        },
        else => return error.ExpectFactor,
    }
}

The new parse_expr is where the magic happens. This is a first draft with no precedence for addition and subtraction:

fn parse_expr(
    alloc: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
) !*ast.Expr {
    var lhs = try parse_factor(alloc, tokens);

    var next_token = tokens.next() orelse
        return error.SyntaxError;

    while (next_token.tag == .plus or next_token.tag == .hyphen) {
        const rhs = try parse_factor(alloc, tokens);
        const new_lhs: ast.Expr = switch (next_token.tag) {
            .plus => .{ .binop_add = .{ lhs, rhs } },
            .hyphen => .{ .binop_sub = .{ lhs, rhs } },
            else => unreachable,
        };
        lhs = utils.create(ast.Expr, alloc, new_lhs);
        next_token = tokens.next() orelse
            return error.SyntaxError; // unwrapping is probably correct.
    }

    return lhs;
}

I was about to go on, but I realized there is a bug in this implementation: It always consumes one more token than needed. The book uses a helper function called peek that does not consume the token stream. This is fine as long as it is parsed in the same function, but it is not fine when it is used as a signal to stop parsing, and the next parser is expected to see it. Some sort of cache is required.² Either in the tokenizer itself or external to it.

The tokenizer has two fields: buffer, which is the source code; and index which is a cursor over the source code. The simplest way to do a cache is to add a third field: peeked which would hold an optional token. But there is a funnier way: a put_back method.

See, every token has its span in the source code attached. So all I need to do, in theory, is just reset the index to that span's start. This is more work for the tokenizer as it has to run the tokenizing state machine over it again, but it is funny enough and simple enough and does-not-change-how-the-rest-of-the-code-works enough that I am going to try it. It probably doesn't even need to be a method, so inline.

pub inline fn put_back(self: *Tokenizer, token: Token) void {
    self.index = token.loc.start;
}

Done. I will put that in parse_expr. It probably works.

    // snip --
    tokens.put_back(next_token);
    return lhs;
}

Precedence

The previous function parses only left-associative expressions with a single level of precedence. To parse proper PEMDAS, the precedence climbing algorithm should only parse expressions that are higher than a given, starting precedence, that is passed in as a parameter. Also, a precedence table is needed, which could be just baked into the binary.

This requires a small helper method on the Tag type to determine the precedence of each tag (and whether it is a binary operator to begin with). However, the changes to parse_expr are minimal, aside from adding a new parameter.

fn parse_expr(
    alloc: std.mem.Allocator,
    tokens: *lexer.Tokenizer,
    min_prec: u8, // <-- new parameter
) !*ast.Expr {
    var lhs = try parse_factor(alloc, tokens);

    var next_token = tokens.next() orelse
        return error.SyntaxError;

    while (next_token.tag.binop_precedence()) |prec| { // <- helper function
        if (prec < min_prec) break;

        const rhs = try parse_expr(alloc, tokens, prec + 1);
        const bin_op: ast.Expr.BinOp = .{ lhs, rhs };
        const new_lhs: ast.Expr = switch (next_token.tag) {
            .plus => .{ .binop_add = bin_op},
            .hyphen => .{ .binop_sub = bin_op},
            .asterisk => .{ .binop_mul = bin_op },
            .f_slash => .{ .binop_div = bin_op },
            .percent => .{ .binop_rem = bin_op },
            else => unreachable,
        };
        lhs = utils.create(ast.Expr, alloc, new_lhs);
        next_token = tokens.next() orelse
            return error.SyntaxError;
    }

    tokens.put_back(next_token);
    return lhs;
}

And since everything is left associative, that's it. It is done. Aside from Zig being unable to infer the error union type, which means annotating an actual error type.

This is the C file I am using as a scratch board this chapter:

int main(void) {
	return (3 + 4 * 5 + -4);
}

And this is parsing output. I think I have come up with a better C syntax, don't you think? RPN is great.

PROGRAM
	FUNCTION main
		RETURN 3 4 5 * + 4 -- +

Playing around a bit, let's see how it looks as S-expressions. Much fun can be had.

PROGRAM
	FUNCTION main
		RETURN (+ (+ 3 (* 4 5)) (- 4))

Anyway, all chapter 3 parsing tests pass. So the cute put_back function works!

Having Fun with Closures

Zig does not have closures or anonymous functions. If you want to build something of the sort, well, don't. The language will fight you. But if you insist, it is a bit of fun.

Consider these couple of lines from parse_expr:

while (next_token.tag.binop_precedence()) |prec| {
    if (prec < min_prec) break;
    // snip --

If you have ever done functional programming, or just used Rust, you can immediately tell this is an Option::filter! (Maybe the Haskellers have another name for it.) So I figured I will try doing something similar here in Zig. I did not commit this code, as it replaces these two lines by two function definitions and one struct definition. It is a fun attempt and it worked and passed the tests as well. Might even compile to the same code, too.

So we need to define a filter over optionals. It basically checks if the payload satisfies a predicate. The most straightforward way to do it that's generic enough is this:

fn filter(T: type, opt: ?T, closure: anytype) ?T {
    const inner = opt orelse return null;
    if (closure.filter(inner)) return opt else return null;
}

closure here is an anytype that has a filter method (which takes a T and returns bool). I do not like anytype but I am not committed enough to this bit to find a better abstraction. Now inside our function, or file, we define a Closure struct (or any other name, w/e) like this:

const Closure = struct {
    min_prec: u8,
    fn filter(self: @This(), item: u8) bool {
        return item >= self.min_prec;
    }
};

Then replace our while loop invocation with this beauty:

//                      here is the part that makes it a closure vv
while (filter(u8, next_token.tag.binop_precedence(), Closure{ .min_prec = min_prec })) |prec| {
    // snip --

And it works! This is almost how closures work in languages that have them. Now back to boring imperative code.

Intermediate Representation

The IR pass here is fairly straightforward, and is a direct 1 to 1 translation from the AST. The code is short and very similar to the previousc chapter's implementation of unary instructions that it is entirely uninteresting to write about again. Seriously, much of the same. The only difference is that the container type, Binary, has two src fields. Here it is next to chapter 2's champion Unary:

pub const Unary = struct {
    src: Value,
    dst: Value,

    pub fn init(src: Value, dst: Value) @This() {
        return .{ .src = src, .dst = dst };
    }
};

pub const Binary = struct {
    src1: Value,
    src2: Value,
    dst: Value,

    pub fn init(src1: Value, src2: Value, dst: Value) @This() {
        return .{ .src1 = src1, .src2 = src2, .dst = dst };
    }
};

Another thing I did to reduce the boilerplate in ir_gen.zig, which was to create a Boilerplate type, that holds all the pointers used in every function. This is the full type definition as of now.

const Boilerplate = struct {
    alloc: std.mem.Allocator,
    strings: *utils.StringInterner,
    instrs: *std.ArrayListUnmanaged(ir.Instr),

    // helper for everywhere
    fn append(
        self: @This(),
        instr: ir.Instr,
    ) Error!void {
        try self.instrs.append(self.alloc, instr);
    }

    // helpers for `expr_emit_ir`
    fn unary(
        self: @This(),
        e: *ast.Expr,
        comptime prefix: []const u8,
    ) Error!ir.Instr.Unary {
        const src = try expr_emit_ir(self, e);
        const dst_name = try make_temporary(self.alloc, self.strings, prefix);
        const dst: ir.Value = .{ .variable = dst_name };

        return .init(src, dst);
    }

    fn binary(
        self: @This(),
        b: ast.Expr.BinOp,
        comptime prefix: []const u8,
    ) Error!ir.Instr.Binary {
        const src1 = try expr_emit_ir(self, b.@"0");
        const src2 = try expr_emit_ir(self, b.@"1");
        const dst_name = try make_temporary(self.alloc, self.strings, prefix);
        const dst: ir.Value = .{ .variable = dst_name };

        return .init(src1, src2, dst);
    }
};

And here is an example of it being used in the tiny stmt_emit_ir, which saved on so much parameter space!

fn stmt_emit_ir(
    bp: Boilerplate,
    stmt: *ast.Stmt,
) Error!void {
    switch (stmt.*) {
        .@"return" => |e| try bp.append(.{
            .ret = try expr_emit_ir(bp, e),
        }),
    }
}

As a bonus, here is the IR generated for the sample C program above.

PROGRAM
	FUNCTION main
		mul.0 <- 4 + 5
		add.1 <- 3 + mul.0
		neg.2 <- - 4
		add.3 <- add.1 + neg.2
		ret add.3

Assembly

Addition, subtraction, and multiplication are straightforward implementations, as they have their corresponding assembly instructions. Integer division .. is not. The first task, as usual, is to update the assembly syntax tree to include all the new instructions, and two new registers.

pub const Instr = union(enum) {
    mov: Mov,
    ret: void,

    // unary operations
    neg: Operand,
    not: Operand,

    // binary operations // NEW
    add: Mov,
    sub: Mov,
    mul: Mov,
    idiv: Operand,

    cdq: void, // sign extension, don't ask.

    allocate_stack: u64,

    const Mov = struct { // just a pair of operands
        src: Operand,
        dst: Operand,

        pub fn init(src: Operand, dst: Operand) @This() {
            return .{ .src = src, .dst = dst };
        }
    };
};


pub const Operand = union(enum) {
    imm: u64,
    reg: Register,
    pseudo: [:0]const u8,
    stack: i64,

    pub const Register = enum { AX, DX, R10, R11 };
};

As it turned out, I did the basic work correctly last chapter. So the rest of this code turned out to be mechanical as well. These are the new cases in instr_to_asm. Since these commands have similar structure, they are implemented the same, with another internal switch for where they differ. It makes separating them easier later on.

.binop_add, .binop_sub, .binop_mul => |b| {
    const src1 = value_to_asm(b.src1);
    const src2 = value_to_asm(b.src2);
    const dst = value_to_asm(b.dst);

    return try alloc.dupe(assembly.Instr, &.{
        .{ .mov = .init(src1, dst) },
        switch (instr) {
            .binop_add => .{ .add = .init(src2, dst) },
            .binop_sub => .{ .sub = .init(src2, dst) },
            .binop_mul => .{ .mul = .init(src2, dst) },
            else => unreachable,
        },
    });
},
.binop_div, .binop_rem => |b| {
    const src1 = value_to_asm(b.src1);
    const src2 = value_to_asm(b.src2);
    const dst = value_to_asm(b.dst);

    const dst_reg: assembly.Operand.Register =
        if (instr == .binop_div) .AX else .DX;

    return try alloc.dupe(assembly.Instr, &.{
        .{ .mov = .init(src1, .{ .reg = .AX }) },
        .cdq,
        .{ .idiv = src2 },
        .{ .mov = .init(.{ .reg = dst_reg }, dst) },
    });
},

Replacing pseudo registers turned out to be straightforward too. I just had to add the .add, .sub. .mul to the .mov case (because they similarly take two operands); and .idiv to the unaries. This is the full function now.

pub fn replace_pseudos(
    alloc: std.mem.Allocator,
    prgm: *assembly.Prgm,
) !assembly.Instr.Depth {
    var pseudo_map: PseudoMap = .init(alloc);
    defer pseudo_map.deinit();

    for (prgm.func_def.instrs.items) |*instr| {
        switch (instr.*) {
            .mov, .add, .sub,.mul => |*m| m.* = .init(
                try pseudo_to_stack(m.src, &pseudo_map),
                try pseudo_to_stack(m.dst, &pseudo_map),
            ),
            .neg, .not , .idiv=> |*v| v.* = try pseudo_to_stack(v.*, &pseudo_map),
            .ret, .cdq, .allocate_stack => {},

        }
    }

    return @intCast(pseudo_map.count() * 4);
}

Fixing illegal instructions is, perhaps, the most annoying part of the book. idiv cannot take a constand and the binaries cannot between two stac locations or whatever. It is a terribly unintresting part of the process and a hell to debug. Since I modeled it as a state machine, it might prove to be a bit more interesting than my Rust implementation. Here are the new fixups:

idiv cannot tak a constant. Change it to mov to r10d then idiv r10d
add and sub cannot be between two stack places. Change to mov then `add.
imul cannot have its destination be a stack place. Change to mov to r11d, then imul, then mov back from r11d.

Do you see how mind numbing this is? Thankfully the state machine idea made copying and pasting previous code slightly trivial. I can perhaps even refactor it later, or not.

    const State = enum {
        start,
        mov_stack_stack,
        legal,

        // new states
        add_stack_stack,
        sub_stack_stack,
        mul_to_stack,
        idiv_const,
    };

    for (prgm.func_def.instrs.items) |instr| {
        state: switch (State.start) {
            .start => switch (instr) {
                .mov => |m| if (m.src == .stack and m.dst == .stack)
                    continue :state .mov_stack_stack
                else
                    continue :state .legal,

                // new code starts here
                .add => |m| if (m.src == .stack and m.dst == .stack)
                    continue :state .add_stack_stack
                else
                    continue :state .legal,
                .sub => |m| if (m.src == .stack and m.dst == .stack)
                    continue :state .sub_stack_stack
                else
                    continue :state .legal,
                .mul => |m| if (m.dst == .stack)
                    continue :state .mul_to_stack
                else
                    continue :state .legal,
                .idiv => |o| if (o == .imm)
                    continue :state .idiv_const
                else
                    continue :state .legal,

                else => continue :state .legal,
            },

            .mov_stack_stack => try out.appendSlice(alloc, &.{
                .{ .mov = .init(instr.mov.src, .{ .reg = .R10 }) },
                .{ .mov = .init(.{ .reg = .R10 }, instr.mov.dst) },
            }),

            // new code starts Here
            .add_stack_stack => try out.appendSlice(alloc, &.{
                .{ .mov = .init(instr.add.src, .{ .reg = .R10 }) },
                .{ .add = .init(.{ .reg = .R10 }, instr.add.dst) },
            }),
            .sub_stack_stack => try out.appendSlice(alloc, &.{
                .{ .mov = .init(instr.sub.src, .{ .reg = .R10 }) },
                .{ .sub = .init(.{ .reg = .R10 }, instr.sub.dst) },
            }),
            .mul_to_stack => try out.appendSlice(alloc, &.{
                .{ .mov = .init(instr.mul.dst, .{ .reg = .R11 }) },
                .{ .mul = .init(instr.mul.src, .{ .reg = .R11 }) },
                .{ .mov = .init(.{ .reg = .R11 }, instr.mul.dst) },
            }),
            .idiv_const => try out.appendSlice(alloc, &.{
                .{ .mov = .init(instr.idiv, .{ .reg = .R11 }) },
                .{ .idiv = .{ .reg = .R11 } },
            }),

            .legal => try out.append(alloc, instr),
        }
    }
}

Lots of copying and pasting. Maybe when I refactor this I can make the State enum carry payloads. The quicker I am out of assembly instructions fixup, however, the better. Assemblers should just be smarter.

All codegen tests. Pass, which here simply means it is failing when it should and is not hitting any panics in the passed codepaths. Always a plus. Filling up code emission would prove the pudding.

Thanks to the already implemented boilerplate last chapter, this is also a nice and straightforward task. And, silly typos aside, everything works smoothly and all tests pass. Here is the sample C program in "codegen" form, and final assembly form.³

PROGRAM
	FUNCTION main
		allocate	16
		mov	imm 4 -> stack -4
		mov	stack -4 -> R11
		mul	imm 5 -> R11
		mov	R11 -> stack -4
		mov	imm 3 -> stack -8
		mov	stack -4 -> R10
		add	R10 -> stack -8
		mov	imm 4 -> stack -12
		neg	stack -12
		mov	stack -8 -> R10
		mov	R10 -> stack -16
		mov	stack -12 -> R10
		add	R10 -> stack -16
		mov	stack -16 -> AX
		ret

	.globl _main
_main:
	pushq   %rbp
	movq    %rsp, %rbp
	subq    $16, %rsp
	movl    $4, -4(%rsp)
	movl    -4(%rsp), %r11d
	imull   $5, %r11d
	movl    %r11d, -4(%rsp)
	movl    $3, -8(%rsp)
	movl    -4(%rsp), %r10d
	addl    %r10d, -8(%rsp)
	movl    $4, -12(%rsp)
	negl    -12(%rsp)
	movl    -8(%rsp), %r10d
	movl    %r10d, -16(%rsp)
	movl    -12(%rsp), %r10d
	addl    %r10d, -16(%rsp)
	movl    -16(%rsp), %eax
	movq    %rbp, %rsp
	popq    %rbp
	ret

Extra Credit

The Book has an extra credit section for doing bitwise operations. I am skipping extra credit this run. They are useful and interesting but also more churn in the following chapters. This is the last time you will hear me talk of it.

Lessons Learned

Implemented a ~~macro~~ comptime function.
Closures at home.
Positive reinforcement of previous design decisions.