Two chapters of Writing a C Compiler done, many more to go, and it is time to implement binary expressions. But before I get into that, let me rant a bit about Zig.
The War Against Tabs
Zig's grammar does not tolerate the tab character anywhere in the file. This is unfortunate, as I like tabs. With source code, it is whatever, as zig fmt
takes care of it, but can I please type my tabs in multiline string literals please?
I like to indent the output, whether it is the parser's debugging output or the final assembly output, with tabs. It just makes sense. It is simpler conceptually to type one tab twice for the second level of indentation than to type a space 4 or 8 or 16 times. Put a tab, and let the user choose the preferred tab width.
But Zig's grammar will have none of it. There can be no tab characters in the file (albeit it is tolerated a bit at indentation before zig fmt
gets rid of it all.) This puts me in a dilemma. Consider this piece of code for emitting the ret
instruction.
.ret => try writer.writeAll(
"\tmovq %rbp, %rsp\n" ++
"\tpopq %rbp\n" ++
"\tret"
),
Reasonable, right? But zig fmt
takes a machete to it, and they no longer nicely align.
.ret => try writer.writeAll(
"\tmovq %rbp, %rsp\n" ++
"\tpopq %rbp\n" ++
"\tret",
),
The next reasonable solution is Zig's rather clever and very nice multiline string literals. It is literally my favorite piece of syntax in Zig. So much more ergonomic and grokkable and understandable than Swift's """
or Rust's .. nothing. So naturally, I'd type it like this:
.ret => try writer.writeAll(
\\ movq %rbp, %rsp
\\ popq %rbp
\\ ret
),
// ^^ tab character here.
But this does not compile. Just does not. No tab characters allowed. One could argue that this decision reduces confusion over a variable width character like tab and hides the true intentions of whatever. I do not care. I want my tabs, goddammit. I would be ok with escaping tabs '\t'
, but escapes do not work in multiline string literals. I could use std.fmt.comptimePrint
, but that adds too much obfuscation and makes using the literals as the fmt
arguments in print
statements a lot more complicated. It is just annoying all around.
Anyway, with some magic of comptime
, and trial and error of Zig's comptime rules, I came up with this macro function:
inline fn indent(
comptime text: []const u8,
) []const u8 {
comptime {
var iter = std.mem.splitScalar(u8, text, '\n');
var res: []const u8 = "";
while (iter.next()) |line|
res = if (line.len > 0 and line[0] != '_')
res ++ "\t" ++ line ++ "\n"
else
res ++ line ++ "\n";
return res[0 .. res.len - 1];
}
}
inline fn
and the comptime
block conspire together to make sure the body of this function always ever runs at comptime
and never at run time, with no ceremony at the calling site.
The code itself is fairly straightforward: it separates the input by lines; adds a new line to all lines; and indents (with tabs!) any line that is not empty and does not start with _
, for label and function names. This is specific to my use case, but it is fine. It is used like this.
.ret => try writer.writeAll(indent(
\\movq %rbp, %rsp
\\popq %rbp
\\ret
)),
Almost invisible, What's better is that, since it returns a comptime
known string, it can be used as the fmt
argument in print functions, like the following snippet for the function's prelude (which also showcases the _
and the empty line rules).
try writer.print(indent(
\\.globl _{0s}
\\_{0s}:
\\pushq %rbp
\\movq %rsp, %rbp
\\
), .{self.name});
This prints, for name "main"
, as intended, as follows:
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
So much pain would have been avoided if I could just type the damn byte. Now, back to business.
Lexer
Updating the lexer requires adding four new tokens, +
, *
, /
, and %
. Where is subtraction, you say? We already lex it, dummy. A failure mode for ++
, like done in Chapter 2, is not necessary, because 1 ++ 2
would be rejected by the parser later anyway. And there is no ambiguity about prefix operators, as the +
unary prefix operator is not implemented either. This makes things so considerably simple that I will not bother writing the update down. On to parsing, which is way more interesting.
Parser
As I am not using a parser combinator library this time, I have to actually get down and implement the legendary precedence claimbing/pratt parsing algorithm. So, sleeves rolled. First to update the Expr
type, splatting operations as before. BinOp
is just a helper tuple alias.
One note: due to this helper type, the formatting for these is a bit more interesting than usual. Instead of creating an anonymous struct literal as second argument for print
(as usual), I can just pass the BinOp
payload, and it works.
pub const Expr = union(enum) {
constant: u64,
unop_negate: *Expr,
unop_complement: *Expr,
binop_add: BinOp,
binop_sub: BinOp,
binop_mul: BinOp,
binop_div: BinOp,
binop_rem: BinOp,
pub const BinOp = struct { *Expr, *Expr };
pub fn format(
self: @This(),
comptime _: []const u8,
_: std.fmt.FormatOptions,
writer: anytype,
) !void {
switch (self) {
.constant => |c| try writer.print("{d}", .{c}),
// RPN
.unop_negate => |e| try writer.print("{} --", .{e}),
.unop_complement => |e| try writer.print("{} ~", .{e}),
.binop_add => |b| try writer.print("{} {} +", b), // <-- NOT `.{b}`
.binop_sub => |b| try writer.print("{} {} -", b),
.binop_mul => |b| try writer.print("{} {} *", b),
.binop_div => |b| try writer.print("{} {} /", b),
.binop_rem => |b| try writer.print("{} {} %", b),
}
}
};
This is an article about me writing Zig, so I will not bore with yet another explanation of precedence climbing. Plenty of those online, and they are described in literally every compiler book.1 I am trying to write code here! Most of the work will be in the function parse_expr
. Here is the current version.
fn parse_expr(
alloc: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) !*ast.Expr {
const current = tokens.next() orelse
return error.SyntaxError;
switch (current.tag) {
// literal
.number_literal => {
const lit = tokens.buffer[current.loc.start..current.loc.end];
const res = try std.fmt.parseInt(u64, lit, 10);
return try create(ast.Expr, alloc, .{ .constant = res });
},
// unary operations
.hyphen => {
const inner_exp = try parse_expr(alloc, tokens);
return try create(ast.Expr, alloc, .{ .unop_negate = inner_exp });
},
.tilde => {
const inner_exp = try parse_expr(alloc, tokens);
return try create(ast.Expr, alloc, .{ .unop_complement = inner_exp });
},
// groups
.l_paren => {
const inner_exp = try parse_expr(alloc, tokens);
try expect(.r_paren, tokens);
return inner_exp;
},
else => return error.ExpectExpr,
}
}
Well the first step is to rename this to parse_factor
(as the book calls it, or atom
is fine.). This is possible because unary operators have, almost, the highest precedence in C, and can almost be assumed to be part of the literal. In fact the only operators with higher precedence are postfix, so the grammar works out naturally. Updating the recursive calls it in order to, except that whatever is between the parenthesis is a full expression, and not just a factor, so it becomes this.
fn parse_factor(
alloc: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) !*ast.Expr {
const current = tokens.next() orelse
return error.SyntaxError;
switch (current.tag) {
.number_literal => {
// snip --
},
.hyphen => {
const inner_exp = try parse_factor(alloc, tokens);
return try create(ast.Expr, alloc, .{ .unop_negate = inner_exp });
},
.tilde => {
const inner_exp = try parse_factor(alloc, tokens);
return try create(ast.Expr, alloc, .{ .unop_complement = inner_exp });
},
.l_paren => {
const inner_exp = try parse_expr(alloc, tokens);
try expect(.r_paren, tokens);
return inner_exp;
},
else => return error.ExpectFactor,
}
}
The new parse_expr
is where the magic happens. This is a first draft with no precedence for addition and subtraction:
fn parse_expr(
alloc: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) !*ast.Expr {
var lhs = try parse_factor(alloc, tokens);
var next_token = tokens.next() orelse
return error.SyntaxError;
while (next_token.tag == .plus or next_token.tag == .hyphen) {
const rhs = try parse_factor(alloc, tokens);
const new_lhs: ast.Expr = switch (next_token.tag) {
.plus => .{ .binop_add = .{ lhs, rhs } },
.hyphen => .{ .binop_sub = .{ lhs, rhs } },
else => unreachable,
};
lhs = utils.create(ast.Expr, alloc, new_lhs);
next_token = tokens.next() orelse
return error.SyntaxError; // unwrapping is probably correct.
}
return lhs;
}
I was about to go on, but I realized there is a bug in this implementation: It always consumes one more token than needed. The book uses a helper function called peek
that does not consume the token stream. This is fine as long as it is parsed in the same function, but it is not fine when it is used as a signal to stop parsing, and the next parser is expected to see it. Some sort of cache is required.2 Either in the tokenizer itself or external to it.
The tokenizer has two fields: buffer
, which is the source code; and index
which is a cursor over the source code. The simplest way to do a cache is to add a third field: peeked
which would hold an optional token. But there is a funnier way: a put_back
method.
See, every token has its span in the source code attached. So all I need to do, in theory, is just reset the index to that span's start. This is more work for the tokenizer as it has to run the tokenizing state machine over it again, but it is funny enough and simple enough and does-not-change-how-the-rest-of-the-code-works enough that I am going to try it. It probably doesn't even need to be a method, so inline
.
pub inline fn put_back(self: *Tokenizer, token: Token) void {
self.index = token.loc.start;
}
Done. I will put that in parse_expr
. It probably works.
// snip --
tokens.put_back(next_token);
return lhs;
}
Precedence
The previous function parses only left-associative expressions with a single level of precedence. To parse proper PEMDAS, the precedence climbing algorithm should only parse expressions that are higher than a given, starting precedence, that is passed in as a parameter. Also, a precedence table is needed, which could be just baked into the binary.
This requires a small helper method on the Tag
type to determine the precedence of each tag (and whether it is a binary operator to begin with). However, the changes to parse_expr
are minimal, aside from adding a new parameter.
fn parse_expr(
alloc: std.mem.Allocator,
tokens: *lexer.Tokenizer,
min_prec: u8, // <-- new parameter
) !*ast.Expr {
var lhs = try parse_factor(alloc, tokens);
var next_token = tokens.next() orelse
return error.SyntaxError;
while (next_token.tag.binop_precedence()) |prec| { // <- helper function
if (prec < min_prec) break;
const rhs = try parse_expr(alloc, tokens, prec + 1);
const bin_op: ast.Expr.BinOp = .{ lhs, rhs };
const new_lhs: ast.Expr = switch (next_token.tag) {
.plus => .{ .binop_add = bin_op},
.hyphen => .{ .binop_sub = bin_op},
.asterisk => .{ .binop_mul = bin_op },
.f_slash => .{ .binop_div = bin_op },
.percent => .{ .binop_rem = bin_op },
else => unreachable,
};
lhs = utils.create(ast.Expr, alloc, new_lhs);
next_token = tokens.next() orelse
return error.SyntaxError;
}
tokens.put_back(next_token);
return lhs;
}
And since everything is left associative, that's it. It is done. Aside from Zig being unable to infer the error union type, which means annotating an actual error type.
This is the C file I am using as a scratch board this chapter:
int main(void) {
return (3 + 4 * 5 + -4);
}
And this is parsing output. I think I have come up with a better C syntax, don't you think? RPN is great.
PROGRAM
FUNCTION main
RETURN 3 4 5 * + 4 -- +
Playing around a bit, let's see how it looks as S-expressions. Much fun can be had.
PROGRAM
FUNCTION main
RETURN (+ (+ 3 (* 4 5)) (- 4))
Anyway, all chapter 3 parsing tests pass. So the cute put_back
function works!
Having Fun with Closures
Zig does not have closures or anonymous functions. If you want to build something of the sort, well, don't. The language will fight you. But if you insist, it is a bit of fun.
Consider these couple of lines from parse_expr
:
while (next_token.tag.binop_precedence()) |prec| {
if (prec < min_prec) break;
// snip --
If you have ever done functional programming, or just used Rust, you can immediately tell this is an Option::filter
! (Maybe the Haskellers have another name for it.) So I figured I will try doing something similar here in Zig. I did not commit this code, as it replaces these two lines by two function definitions and one struct definition. It is a fun attempt and it worked and passed the tests as well. Might even compile to the same code, too.
So we need to define a filter over optionals. It basically checks if the payload satisfies a predicate. The most straightforward way to do it that's generic enough is this:
fn filter(T: type, opt: ?T, closure: anytype) ?T {
const inner = opt orelse return null;
if (closure.filter(inner)) return opt else return null;
}
closure
here is an anytype
that has a filter
method (which takes a T
and returns bool
). I do not like anytype
but I am not committed enough to this bit to find a better abstraction. Now inside our function, or file, we define a Closure
struct (or any other name, w/e) like this:
const Closure = struct {
min_prec: u8,
fn filter(self: @This(), item: u8) bool {
return item >= self.min_prec;
}
};
Then replace our while
loop invocation with this beauty:
// here is the part that makes it a closure vv
while (filter(u8, next_token.tag.binop_precedence(), Closure{ .min_prec = min_prec })) |prec| {
// snip --
And it works! This is almost how closures work in languages that have them. Now back to boring imperative code.
Intermediate Representation
The IR pass here is fairly straightforward, and is a direct 1 to 1 translation from the AST. The code is short and very similar to the previousc chapter's implementation of unary instructions that it is entirely uninteresting to write about again. Seriously, much of the same. The only difference is that the container type, Binary
, has two src
fields. Here it is next to chapter 2's champion Unary
:
pub const Unary = struct {
src: Value,
dst: Value,
pub fn init(src: Value, dst: Value) @This() {
return .{ .src = src, .dst = dst };
}
};
pub const Binary = struct {
src1: Value,
src2: Value,
dst: Value,
pub fn init(src1: Value, src2: Value, dst: Value) @This() {
return .{ .src1 = src1, .src2 = src2, .dst = dst };
}
};
Another thing I did to reduce the boilerplate in ir_gen.zig
, which was to create a Boilerplate
type, that holds all the pointers used in every function. This is the full type definition as of now.
const Boilerplate = struct {
alloc: std.mem.Allocator,
strings: *utils.StringInterner,
instrs: *std.ArrayListUnmanaged(ir.Instr),
// helper for everywhere
fn append(
self: @This(),
instr: ir.Instr,
) Error!void {
try self.instrs.append(self.alloc, instr);
}
// helpers for `expr_emit_ir`
fn unary(
self: @This(),
e: *ast.Expr,
comptime prefix: []const u8,
) Error!ir.Instr.Unary {
const src = try expr_emit_ir(self, e);
const dst_name = try make_temporary(self.alloc, self.strings, prefix);
const dst: ir.Value = .{ .variable = dst_name };
return .init(src, dst);
}
fn binary(
self: @This(),
b: ast.Expr.BinOp,
comptime prefix: []const u8,
) Error!ir.Instr.Binary {
const src1 = try expr_emit_ir(self, b.@"0");
const src2 = try expr_emit_ir(self, b.@"1");
const dst_name = try make_temporary(self.alloc, self.strings, prefix);
const dst: ir.Value = .{ .variable = dst_name };
return .init(src1, src2, dst);
}
};
And here is an example of it being used in the tiny stmt_emit_ir
, which saved on so much parameter space!
fn stmt_emit_ir(
bp: Boilerplate,
stmt: *ast.Stmt,
) Error!void {
switch (stmt.*) {
.@"return" => |e| try bp.append(.{
.ret = try expr_emit_ir(bp, e),
}),
}
}
As a bonus, here is the IR generated for the sample C program above.
PROGRAM
FUNCTION main
mul.0 <- 4 + 5
add.1 <- 3 + mul.0
neg.2 <- - 4
add.3 <- add.1 + neg.2
ret add.3
Assembly
Addition, subtraction, and multiplication are straightforward implementations, as they have their corresponding assembly instructions. Integer division .. is not. The first task, as usual, is to update the assembly syntax tree to include all the new instructions, and two new registers.
pub const Instr = union(enum) {
mov: Mov,
ret: void,
// unary operations
neg: Operand,
not: Operand,
// binary operations // NEW
add: Mov,
sub: Mov,
mul: Mov,
idiv: Operand,
cdq: void, // sign extension, don't ask.
allocate_stack: u64,
const Mov = struct { // just a pair of operands
src: Operand,
dst: Operand,
pub fn init(src: Operand, dst: Operand) @This() {
return .{ .src = src, .dst = dst };
}
};
};
pub const Operand = union(enum) {
imm: u64,
reg: Register,
pseudo: [:0]const u8,
stack: i64,
pub const Register = enum { AX, DX, R10, R11 };
};
As it turned out, I did the basic work correctly last chapter. So the rest of this code turned out to be mechanical as well. These are the new cases in instr_to_asm
. Since these commands have similar structure, they are implemented the same, with another internal switch
for where they differ. It makes separating them easier later on.
.binop_add, .binop_sub, .binop_mul => |b| {
const src1 = value_to_asm(b.src1);
const src2 = value_to_asm(b.src2);
const dst = value_to_asm(b.dst);
return try alloc.dupe(assembly.Instr, &.{
.{ .mov = .init(src1, dst) },
switch (instr) {
.binop_add => .{ .add = .init(src2, dst) },
.binop_sub => .{ .sub = .init(src2, dst) },
.binop_mul => .{ .mul = .init(src2, dst) },
else => unreachable,
},
});
},
.binop_div, .binop_rem => |b| {
const src1 = value_to_asm(b.src1);
const src2 = value_to_asm(b.src2);
const dst = value_to_asm(b.dst);
const dst_reg: assembly.Operand.Register =
if (instr == .binop_div) .AX else .DX;
return try alloc.dupe(assembly.Instr, &.{
.{ .mov = .init(src1, .{ .reg = .AX }) },
.cdq,
.{ .idiv = src2 },
.{ .mov = .init(.{ .reg = dst_reg }, dst) },
});
},
Replacing pseudo registers turned out to be straightforward too. I just had to add the .add, .sub. .mul
to the .mov
case (because they similarly take two operands); and .idiv
to the unaries. This is the full function now.
pub fn replace_pseudos(
alloc: std.mem.Allocator,
prgm: *assembly.Prgm,
) !assembly.Instr.Depth {
var pseudo_map: PseudoMap = .init(alloc);
defer pseudo_map.deinit();
for (prgm.func_def.instrs.items) |*instr| {
switch (instr.*) {
.mov, .add, .sub,.mul => |*m| m.* = .init(
try pseudo_to_stack(m.src, &pseudo_map),
try pseudo_to_stack(m.dst, &pseudo_map),
),
.neg, .not , .idiv=> |*v| v.* = try pseudo_to_stack(v.*, &pseudo_map),
.ret, .cdq, .allocate_stack => {},
}
}
return @intCast(pseudo_map.count() * 4);
}
Fixing illegal instructions is, perhaps, the most annoying part of the book. idiv
cannot take a constand and the binaries cannot between two stac locations or whatever. It is a terribly unintresting part of the process and a hell to debug. Since I modeled it as a state machine, it might prove to be a bit more interesting than my Rust implementation. Here are the new fixups:
idiv
cannot tak a constant. Change it tomov
tor10d
thenidiv r10d
add
andsub
cannot be between two stack places. Change tomov
then `add.imul
cannot have its destination be a stack place. Change tomov
tor11d
, thenimul
, thenmov
back fromr11d
.
Do you see how mind numbing this is? Thankfully the state machine idea made copying and pasting previous code slightly trivial. I can perhaps even refactor it later, or not.
const State = enum {
start,
mov_stack_stack,
legal,
// new states
add_stack_stack,
sub_stack_stack,
mul_to_stack,
idiv_const,
};
for (prgm.func_def.instrs.items) |instr| {
state: switch (State.start) {
.start => switch (instr) {
.mov => |m| if (m.src == .stack and m.dst == .stack)
continue :state .mov_stack_stack
else
continue :state .legal,
// new code starts here
.add => |m| if (m.src == .stack and m.dst == .stack)
continue :state .add_stack_stack
else
continue :state .legal,
.sub => |m| if (m.src == .stack and m.dst == .stack)
continue :state .sub_stack_stack
else
continue :state .legal,
.mul => |m| if (m.dst == .stack)
continue :state .mul_to_stack
else
continue :state .legal,
.idiv => |o| if (o == .imm)
continue :state .idiv_const
else
continue :state .legal,
else => continue :state .legal,
},
.mov_stack_stack => try out.appendSlice(alloc, &.{
.{ .mov = .init(instr.mov.src, .{ .reg = .R10 }) },
.{ .mov = .init(.{ .reg = .R10 }, instr.mov.dst) },
}),
// new code starts Here
.add_stack_stack => try out.appendSlice(alloc, &.{
.{ .mov = .init(instr.add.src, .{ .reg = .R10 }) },
.{ .add = .init(.{ .reg = .R10 }, instr.add.dst) },
}),
.sub_stack_stack => try out.appendSlice(alloc, &.{
.{ .mov = .init(instr.sub.src, .{ .reg = .R10 }) },
.{ .sub = .init(.{ .reg = .R10 }, instr.sub.dst) },
}),
.mul_to_stack => try out.appendSlice(alloc, &.{
.{ .mov = .init(instr.mul.dst, .{ .reg = .R11 }) },
.{ .mul = .init(instr.mul.src, .{ .reg = .R11 }) },
.{ .mov = .init(.{ .reg = .R11 }, instr.mul.dst) },
}),
.idiv_const => try out.appendSlice(alloc, &.{
.{ .mov = .init(instr.idiv, .{ .reg = .R11 }) },
.{ .idiv = .{ .reg = .R11 } },
}),
.legal => try out.append(alloc, instr),
}
}
}
Lots of copying and pasting. Maybe when I refactor this I can make the State
enum carry payloads. The quicker I am out of assembly instructions fixup, however, the better. Assemblers should just be smarter.
All codegen
tests. Pass, which here simply means it is failing when it should and is not hitting any panics in the passed codepaths. Always a plus. Filling up code emission would prove the pudding.
Thanks to the already implemented boilerplate last chapter, this is also a nice and straightforward task. And, silly typos aside, everything works smoothly and all tests pass. Here is the sample C program in "codegen
" form, and final assembly form.3
PROGRAM
FUNCTION main
allocate 16
mov imm 4 -> stack -4
mov stack -4 -> R11
mul imm 5 -> R11
mov R11 -> stack -4
mov imm 3 -> stack -8
mov stack -4 -> R10
add R10 -> stack -8
mov imm 4 -> stack -12
neg stack -12
mov stack -8 -> R10
mov R10 -> stack -16
mov stack -12 -> R10
add R10 -> stack -16
mov stack -16 -> AX
ret
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $4, -4(%rsp)
movl -4(%rsp), %r11d
imull $5, %r11d
movl %r11d, -4(%rsp)
movl $3, -8(%rsp)
movl -4(%rsp), %r10d
addl %r10d, -8(%rsp)
movl $4, -12(%rsp)
negl -12(%rsp)
movl -8(%rsp), %r10d
movl %r10d, -16(%rsp)
movl -12(%rsp), %r10d
addl %r10d, -16(%rsp)
movl -16(%rsp), %eax
movq %rbp, %rsp
popq %rbp
ret
Extra Credit
The Book has an extra credit section for doing bitwise operations. I am skipping extra credit this run. They are useful and interesting but also more churn in the following chapters. This is the last time you will hear me talk of it.
Lessons Learned
- Implemented a
macrocomptime
function. - Closures at home.
- Positive reinforcement of previous design decisions.