After eight grueling (not really) chapters of Writing a C Compiler, time to implement more assembly instructions. Functions! Linkage! Commas!
Lexer, AST, and Parser
Lexer just has a comma now. I thought about adding the comma operator but that didn't seem worth the trouble.
The AST has two new additions. Function call expressions and function declarations (which are rebranded and improved function definitions)! Other changes include how the structures themselves are defined. A program is now a list of function declarations, instead of just one. How about that?
These are the new AST nodes. I am not sure these compile or not yet, which I will find out when I am done with the parser. The use of SegmentedList
is discussed a couple of chapters ago as a more Arena-friendly collection type.
pub const Prgm = struct {
funcs: std.SegmentedList(FuncDecl, 0),
};
pub const Block = struct {
body: std.SegmentedList(BlockItem, 0),
};
pub const BlockItem = union(enum) {
D: Decl,
S: Stmt,
};
pub const Decl = union(enum) {
F: FuncDecl,
V: VarDecl,
};
pub const FuncDecl = struct {
name: []const u8,
params: std.SegmentedList(Identifier, 0),
block: ?Block,
};
pub const VarDecl = struct {
name: Identifier,
init: ?*Expr,
};
pub const Expr = union(enum) {
// snip --
func_call: struct { Identifier, std.SegmentedList(Expr, 0) },
};
// This was implemented last chapter fixing the Segmentation Fault!
pub const Identifier = union(enum) {
name: []const u8,
idx: utils.StringInterner.Idx,
};
It is going to be annoying fixing all the type errors throughout. Nonetheless, the parsing grammar for these new node types are going to change significantly. Here is. This is the new, tentative, parse_prgm
. I am not sure this is entirely correct yet.
pub fn parse_prgm(
arena: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) Error!ast.Prgm {
var funcs: std.SegmentedList(ast.FuncDecl, 0) = .{};
while (tokens.next()) |next_token| {
tokens.put_back(next_token);
const func_decl = try parse_func_decl(arena, tokens);
try funcs.append(arena, func_decl);
}
return .{ .funcs = funcs };
}
parse_func_decl
is the same as the old parse_func_def
, but with optional parameters and an optional body. Ok maybe not the same, it is a behemoth. And all this is going to get significantly more complex when adding different types than int
.
fn parse_func_decl(
arena: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) Error!ast.FuncDecl {
// same old
try expect(.type_int, tokens);
const name = try expect(.identifier, tokens);
// new stuff !!
var params: std.SegmentedList(ast.Identifier, 0) = .{};
{ // params
try expect(.l_paren, tokens);
const next_token = tokens.next() orelse
return error.SyntaxError;
// labelled switch to loop over mutliple parameters.
params: switch (next_token.tag) {
// old bahaviour is this:
.keyword_void => try expect(.r_paren, tokens),
// new optional parameters
.type_int => {
const ident = try expect(.identifier, tokens);
try params.append(arena, .{ .name = ident });
const next_next = tokens.next() orelse
return error.SyntaxError;
// loops back here. this would be sginifcantly more annoying
// to write without labeled switch
switch (next_next.tag) {
.comma => {
try expect(.type_int, tokens);
continue :params .type_int;
},
.r_paren => {},
// could just retunr here but where is the fun then?
else => continue :params .invalid,
}
},
else => return error.SyntaxError,
}
}
const block = block: {
const peeked = tokens.next() orelse
return error.SyntaxError;
switch (peeked.tag) {
.semicolon => break :block null,
.l_brace => {
tokens.put_back(peeked);
break :block try parse_block(arena, tokens);
},
else => return error.SyntaxError,
}
};
return .{ .name = name, .params = params, .block = block };
}
There is one big item left, which is BlockItem
, which can be a function declaration as well as a variable declaration, and it is not possible to know which is which from the token type_int
as done before.
The most straightforward way to implement it in the current code base is as follows: look for an type_int
token, verify there is an identifier token afterwards, without recording it, then see what the third token is. If it is a semicolon
or an equals
, it is a variable; if it is a parenthesis, it is a function; otherwise it is illegal. Then the parser is rewinded to the int
and the correct function is called. In later chapters, with global variables, it will be apparent that the only place which only accepts one type of declarations is the for
loop statement, but I will cross that bridge when I get to it.
fn parse_decl(
arena: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) Error!ast.Decl {
const int_token = tokens.next() orelse
return error.NotEnoughJunk;
if (int_token.tag != .type_int) return error.SyntaxError;
_ = try expect(.identifier, tokens);
const new_token = tokens.next() orelse
return error.NotEnoughJunk;
tokens.put_back(int_token);
switch (new_token.tag) {
.semicolon, .equals => return .{ .V = try parse_var_decl(arena, tokens) },
.l_paren => return .{ .F = try parse_func_decl(arena, tokens) },
else => return error.SyntaxError,
}
}
Parsing function calls is a new challenge. Previously, any identifier is immediately assumed to be a variable. But now, if the identifier is followed by parenthesis, it could be a function call with an arbitrary number of parameters. The new identifier
case exhibits a serious case of rightward drift.
.identifier => {
const next_token = tokens.next() orelse
return error.NotEnoughJunk;
const name = tokens.buffer[current.loc.start..current.loc.end];
switch (next_token.tag) {
.l_paren => return .{ .func_call = .{
.{ .name = name },
try parse_args(arena, tokens),
} },
else => {
tokens.put_back(next_token);
return .{ .@"var" = .{ .name = name } };
},
}
},
// elsewhere:
fn parse_args(
arena: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) Error!std.SegmentedList(ast.Expr, 0) {
// assumes l_paren already consumed
var ret: std.SegmentedList(ast.Expr, 0) = .{};
const current = tokens.next() orelse
return error.NotEnoughJunk;
args: switch (current.tag) {
.r_paren => return ret,
.comma => {
const expr = try parse_expr(arena, tokens, 0);
try ret.append(arena, expr);
const n_token = tokens.next() orelse
return error.NotEnoughJunk;
continue :args n_token.tag;
},
else => { // only actually relevant for the first argument.
tokens.put_back(current);
continue :args .comma;
},
}
}
Semantic Analysis
This is a bit more involved than usual this chapter. In addition to variable resolution, the compiler needs to do type checking! Do all declarations of functions (which can repeat), have the same number of parameters?
Starting with the easier stuff, the identifier resolution pass should handle the new function syntax. Start with function calls. Here is the added part in resolve_expr
.
.func_call => |*f| {
if (bp.variable_map.get(f.@"0".name)) |entry| {
f.@"0" = .{ .idx = entry.name };
var iter = f.@"1".iterator(0);
while (iter.next()) |item|
try resolve_expr(bp, item);
} else return error.Undeclaredfunction;
},
Resolving function declarations had me move the creation of the main variable map back into resolve_prgm
instead, and create a new inner map for every function. Also, I need to update the Entry
type of the variable map. This is the new Entry
.
const Entry = struct {
name: utils.StringInterner.Idx,
scope: enum { local, parent } = .local,
linkage: enum { none, external } = .none, // <-- new
};
Then this is resolve_func_decl
, and the inner parameters resolution, which is really just a cheap copy of resolve_var_decl
itself rebranded resolve_decl
.
fn resolve_func_decl(
bp: Boilerplate,
func_decl: *ast.FuncDecl,
) Error!void {
if (bp.variable_map.get(func_decl.name)) |prev|
if (prev.scope == .local and prev.linkage != .external)
return error.DuplicateFunctionDecl;
try bp.variable_map.put(bp.gpa, func_decl.name, .{
.name = try bp.strings.get_or_put(bp.gpa, func_decl.name),
.scope = .local,
.linkage = .external,
});
var variable_map = try bp.variable_map.clone(bp.gpa);
defer variable_map.deinit(bp.gpa);
var iter = variable_map.valueIterator();
while (iter.next()) |value|
value.* = .{
.name = value.name,
.scope = .parent,
.linkage = value.linkage,
};
const inner_bp = bp.into_ineer(&variable_map);
var iter = func_decl.params.iterator(0);
while (iter.next()) |param| {
// a cheap imitation of `resolve_var_decl`
// should pribably be in its own function.
if (inner_bp.variable_map.get(param.name)) |entry|
if (entry.scope == .local)
return error.DuplicateVariableDecl;
const unique_name = try inner_bp.make_temporary(param.name);
try inner_bp.variable_map.put(inner_bp.gpa, param.name, .{ .name = unique_name });
param.* = .{ .idx = unique_name };
}
if (func_decl.block) |*block|
try resolve_block(inner_bp, null, block);
}
One more thing, there is need to check that block level function declarations have no body. I am going to do that in resolve_block
.
fn resolve_block(
bp: Boilerplate,
current_label: ?utils.StringInterner.Idx,
block: *ast.Block,
) Error!void {
var iter = block.body.iterator(0);
while (iter.next()) |item| switch (item.*) {
.S => |*s| try resolve_stmt(bp, current_label, s),
.D => |*d| switch (d.*) {
.F => |*f| if (f.block) |_|
return error.IllegalFuncDefinition
else
try resolve_func_decl(bp, f),
.V => |*v| try resolve_var_decl(bp, v),
},
};
}
Before delving into type checking, the book suggests to run the test suite, but expect a number of failures. Thankfully, I can check the individual folders separately using the eye test, without running the test suite in the official manner, I can tell that the compiler is passing all the right files it should at this stage, even the ones in invalid_types
.
Type checking
The Book has the type checking done in its own pass. At first, I tried just stuffing the type checking logic right into the same identifier resolution pass. But the actual problem was that I needed a separate data structure anyway, since function declarations have to match even when they are in different scopes. So I am still doing it in the same run, but with a separate data structure.
The type checking consists mostly of the following: not using the same identifier for both int
s and functions, making sure functions always have the same number of parameters in all declarations, and make sure a function is not defined (with a body) twice.
A new data structure would be needed to stuff this info, global throughout the whole file.1 A hashmap taking the unique identifiers as keys and their types as values. The type is either an integer or a function with defined arity (number of parameters). I also need to track whether a function has been defined or not.
const TypeMap = std.AutoHashMapUnmanaged(
u32,
Type,
);
const Type = union(enum) {
int,
func: struct {
arity: usize,
defined: bool,
},
};
Then adding a pointer to it to Boilerplate
, and adjusting resolve_prgm
as follows.
pub fn resolve_prgm(
gpa: std.mem.Allocator,
strings: *utils.StringInterner,
prgm: *ast.Prgm,
) Error!void {
var variable_map: VariableMap = .empty;
defer variable_map.deinit(gpa);
var type_map: TypeMap = .empty;
defer type_map.deinit(gpa);
const bp: Boilerplate = .{
.gpa = gpa,
.strings = strings,
.variable_map = &variable_map,
.type_map = &type_map,
};
var iter = prgm.funcs.iterator(0);
while (iter.next()) |item|
try resolve_func_decl(bp, item);
}
What follows next is lots of annoying boilerplate. I must make sure every time I add something to a variable_map
, I am adding its new name (if any) to type_map
. Hairier than usual logic, but should try to straighten it out before stuffing it in a Boilerplate
method.
For example, in resolve_finc_decl
, adding a name to variable_map
and type_map
is done as follows:
{
const nname = try bp.strings.get_or_put(bp.gpa, func_decl.name);
try bp.variable_map.put(bp.gpa, func_decl.name, .{
.name = nname,
.scope = .local,
.linkage = .external,
});
const gop = try bp.type_map.getOrPut(bp.gpa, nname.real_idx);
if (gop.found_existing) {
if (gop.value_ptr.* != .func or
gop.value_ptr.func.arity != func_decl.params.count())
{
return error.TypeError;
} else if (gop.value_ptr.func.defined and
func_decl.block != null)
{
return error.DuplicateFunctionDef;
}
} else gop.value_ptr.* = .{ .func = .{
.arity = func_decl.params.count(),
.defined = func_decl.block != null,
} };
}
A similar thing to do in resolve_var_decl
, and inside the small tidbit in resolve_func_decl
that reeolves parameters. This is how it looks like.
{
const gop = try bp.type_map.getOrPut(bp.gpa, unique_name.real_idx);
if (gop.found_existing) {
if (gop.value_ptr.* != .int) {
return error.TypeError;
}
} else gop.value_ptr.* = .int;
}
There does not seem enough shared logic right now to try and DRY these. Maybe later. What is left is checking these things in function calls and variable declarations. They are both very funny looking.
.@"var" => |name| {
if (bp.variable_map.get(name.name)) |un| {
if (bp.type_map.get(un.name.real_idx).? == .int) // unwrap optional
expr.* = .{ .@"var" = .{ .idx = un.name } }
else
return error.TypeError;
} else return error.UndeclaredVariable;
},
.func_call => |*f| {
if (bp.variable_map.get(f.@"0".name)) |entry| {
const t = bp.type_map.get(entry.name.real_idx).?;
if (t == .func and
t.func.arity == f.@"1".count())
{
f.@"0" = .{ .idx = entry.name };
var iter = f.@"1".iterator(0);
while (iter.next()) |item|
try resolve_expr(bp, item);
} else return error.TypeError;
} else return error.UndeclaredFunction;
},
I think this should be it. I checked for functions being defined twice/ I checked that functions should be functions and types should be types. I checked all functions with the same name should have the same arity. zig build
returns no errors. What is left?
The proof of the pudding is in the test suite. Time to run the test suite. (which now applies --latest-only
by default as not to take too long.)
% paella ❱ zig build submit -- --chapter 9 --stage validate
----------------------------------------------------------------------
Ran 61 tests in 157.089s
OK
Phew. Mind you this does not mean the logic is correct. It just means it is failing the ones it should fail and passing the ones it should pass. The full test suite passes as well, which is cool.
I took the time to refactor out the common logic between function parameters and variable declarations. Zig's comptime
allows some tricks that Rust would torture me for,2 forcing me to use a trait or some weird amalgamation of const generics. It turned out most of the logic is for identifiers, except for initializers in variable declarations. The new function follows.
fn resolve_var_decl(
comptime T: enum { param, @"var" }, // anoynmous types yay
bp: Boilerplate,
item: switch (T) { // can simply be `if` but this is more readable i think
.@"var" => *ast.VarDecl,
.param => *ast.Identifier,
},
) Error!void {
const identifier = switch (T) { // pulling out the common logic
.@"var" => &item.name,
.param => item,
};
if (bp.variable_map.get(identifier.name)) |entry| if (entry.scope == .local)
return error.DuplicateDecl;
const unique_name = try bp.make_temporary(identifier.name);
try bp.variable_map.put(bp.gpa, identifier.name, .{ .name = unique_name });
{ // TYPE CHECKING
const gop = try bp.type_map.getOrPut(bp.gpa, unique_name.real_idx);
if (gop.found_existing) {
if (gop.value_ptr.* != .int)
return error.TypeError;
} else gop.value_ptr.* = .int;
}
identifier.* = .{ .idx = unique_name };
if (T == .@"var") // logic unique to declarations
if (item.init) |expr|
try resolve_expr(bp, expr);
}
// called like this, inside `resolve_func_decl`
while (params.next()) |param|
try resolve_var_decl(.param, inner_bp, param);
I ran the test suite again (and the eye tests) after this and everything seems to work out.
Internal Represntation
After a long drought, finally time to update the IR syntax tree and generation. Which means new assembly stuff later. Couldn't the Book stick to adding more control flow constructs for ever? Maybe function calls can be implemented by just jumping around. Is that a thing?
The syntax tree updatss mirror the AST's.
const Identifier = utils.StringInterner.Idx; // type alias
pub const Prgm = struct {
funcs: std.ArrayListUnmanaged(FuncDef), // <--
};
pub const FuncDef = struct {
name: Identifier,
params: std.ArrayListUnmanaged(Identifier), // <--
instrs: std.ArrayListUnmanaged(Instr),
};
pub const Instr = union(enum) {
func_call: struct {
name: Identifier,
args: std.ArrayListUnmanaged(Value),
dst: Value,
},
// rest of the owl
};
And the rest is pretty much the same, sans a few updates to the pretty printers and deinitializers. Note that there are no function declarations, as they are discarded in this stage after being done with the type checking.
prgm_emit_ir
is simply a small update from the one-function version to the many-functions version, skipping over declarations without bodies
pub fn prgm_emit_ir(
alloc: std.mem.Allocator,
strings: *utils.StringInterner,
prgm: *const ast.Prgm,
) Error!ir.Prgm {
var funcs: std.ArrayListUnmanaged(ir.FuncDef) = try .initCapacity(
alloc,
prgm.funcs.len,
);
var iter = prgm.funcs.constIterator(0);
while (iter.next()) |f| if (f.block) |_| { // skip empty declarations
const fir = try func_def_emit_ir(alloc, strings, f);
try funcs.append(alloc, fir);
};
return .{ .funcs = funcs };
}
func_def_emit_ir
is also almost exactly the same, except that the identifiers are moved over.
fn func_def_emit_ir(
alloc: std.mem.Allocator,
strings: *utils.StringInterner,
func_def: *const ast.FuncDecl,
) Error!ir.FuncDef {
const name = try strings.get_or_put(alloc, func_def.name);
var params: std.ArrayListUnmanaged(utils.StringInterner.Idx) = try .initCapacity(
alloc,
func_def.params.count(),
);
var iter = func_def.params.constIterator(0);
while (iter.next()) |param|
try params.append(alloc, param.idx);
var instrs: std.ArrayListUnmanaged(ir.Instr) = .empty;
const bp: Boilerplate = .{
.alloc = alloc,
.strings = strings,
.instrs = &instrs,
};
try block_emit_ir(bp, &func_def.block.?);
try instrs.append(alloc, .{ .ret = .{ .constant = 0 } });
return .{ .name = name, .params = params, .instrs = instrs };
}
Function call expressions are new, even though they follow the same pattern as all other expressions.
.func_call => |f| {
var args: std.ArrayListUnmanaged(ir.Value) = try .initCapacity(
bp.alloc,
f.@"1".count(),
);
const dst = .{ .variable = try bp.make_temporary("fn") };
var iter = f.@"1".constIterator(0);
while (iter.next()) |e| {
const v = try expr_emit_ir(bp, e);
try args.append(bp.alloc, v);
}
try bp.append(.{ .func_call = .{
.name = f.@"0".idx,
.args = args,
.dst = dst,
} });
return dst;
},
The last thing left to do here is to make sure that functon declaration as block items are skipped as well.
fn block_emit_ir(
bp: Boilerplate,
block: *const ast.Block,
) Error!void {
var iter = block.body.constIterator(0);
while (iter.next()) |item| switch (item.*) {
.S => |*s| try stmt_emit_ir(bp, s),
.D => |d| if (d == .V) try var_decl_emit_ir(bp, &d.V), // <-
};
}
Small Detour Back to Type Checking
I am honestly not quite sure about some decisions up to now. For example, if function declarations are discarded, why are their parameters given unique identities at all? Maybe I should go back and simply check the count (and ideally the types, but there are no types in here). This would change the end of resolve_func_decl
to the following, instead of the if statement only encasing the last line. Since the count is checked earlier
if (func_decl.block) |*block| {
var params = func_decl.params.iterator(0);
while (params.next()) |param|
try resolve_var_decl(.param, inner_bp, param);
try resolve_block(inner_bp, null, block);
}
This passes all validations tests, except one.
/* Duplicate parameter names are illegal in function declarations
as well as definitions */
int foo(int a, int a);
int main(void) {
return foo(1, 2);
}
int foo(int a, int b) {
return a + b;
}
Ah well. Amusingly, at least according to the LLM I asked, int foo(int, int)
is a legal declaration. Anyway, this was a failed detour.
Assembly Generation
This is the first serious brush in the Book with the System V ABI, the most common binary interface for Unixes. As there are currently no types more complex than int
, all the compiler needs to worry about is stuffing parameters in the right registers. (And for function bodies, retreiving them from the right registers). The IR has not concerned itself with this, relying on the surrounding passes to make sense out of it.
In my Rust implementation, I used clever iterator combinators to write the logic for this section. I am kind of curious to see how it will turn out in Zig.
The first step, as usual, is to update the assembly syntax tree. A program needs to be redefined as a list of functions, and there are new instructions and new registers. Some necessary refactorings should be timed now as well. This is assembly.Prgm
.
pub const Prgm = struct {
funcs: std.ArrayListUnmanaged(FuncDef),
pub fn fixup(
self: *@This(),
alloc: std.mem.Allocator,
) !void {
for (self.funcs.items) |*func| {
// functions called here changed from taking *Prgm to *FuncDef
// ideally this should live under FuncDef, but this is fine for now
const depth = try pass_pseudo.replace_pseudos(alloc, func);
try func.instrs.insert(alloc, 0, .{
.allocate_stack = @abs(depth),
});
try pass_fixup.fixup_instrs(alloc, func);
}
}
The new instructions are simple additions. However, I am using two type aliases here: Depth
for an unsigned version of stack depth, and Identifier
for utils.StringInterner.Idx
.
allocate_stack: Depth, // old
dealloc_stack: Depth,
push: Operand,
call: Identifier,
And aside from the new Registers, which I will not bore you with, this is pretty much it.3 Implementing the codegen is the next step. parameters in function definitions need to be taken from their correct register and stack positions; and function calls need to stuff them there. prgm_to_asm
, the main entry point, is straightforward.
pub fn prgm_to_asm(
alloc: std.mem.Allocator,
prgm: ir.Prgm,
) !assembly.Prgm {
var funcs: std.ArrayListUnmanaged(assembly.FuncDef) = try .initCapacity(
alloc,
prgm.funcs.items.len,
);
for (prgm.funcs.items) |func|
try funcs.append(
alloc,
try func_def_to_asm(alloc, func),
);
return .{ .funcs = funcs };
}
For functions, it is slightly different. The first six parameters should be passed in these registers in this specific order: DI
, SI
, DX
, CX
, R8
, and R9
. Remaining arguments are pushed into the stack in reverse order. So normally, I will just assign a constant for them registers.
const REGISTERS: [6]assembly.Operand.Register =
.{ .DI, .SI, .DX, .CX, .R8, .R9 };
In func_def_to_asm
itself, a zipped for
loop is perhaps the most straightforward way of doing this. The first section is simple enough. And since the remaining paramters are pushed into the stack at function call in reverse order, they are in the correct order here. Starting from a 16 stack
offset, they are moved one by one. Luckily, this same loop can be used!
for (func_def.params.items, 0..) |param, idx|
if (idx < REGISTERS.len)
try instrs.append(alloc, .{ .mov = .init(
.{ .reg = REGISTERS[idx] },
.{ .pseudo = param },
) })
else {
const offset = (idx - REGISTERS.len + 2) * 8; // 16, 24, etc
try instrs.append(alloc, .{ .mov = .init(
.{ .stack = @intCast(offset) },
.{ .pseudo = param },
) });
};
Implementing function calls is a lot more involved. First, the stack needs to be aligned properly at 16, as per the System V ABI. Copying the passed in expressions to the input registers is easy, but reversing the rest of them requires some shenanigans. First is to have a stack allocated array with the maximum possible size because it is unknown at the outset what the size of the returned slice could be.
// the return is of unknown size.
// maximum possible size is parameter count * 2 + 4
var ret: std.ArrayListUnmanaged(assembly.Instr) =
try .initCapacity(alloc, c.args.items.len * 2 + 4);
Then calculate the amount of arguments in the stack and the needed padding to 16. Zig has a saturating subtraction operator -|
that is perfect for this.
const depth = c.args.items.len -| REGISTERS.len;
const padding: assembly.Instr.Depth =
if (depth % 2 == 0) 8 else 0;
if (padding > 0)
try ret.append(alloc, .{ .allocate_stack = padding }); // 1
The numbering in the comments is me keeping track of instructions that do not depend on parameters count. After that comes putting the values in their respective registers, and the stack.4
for (c.args.items, 0..) |arg, idx| {
if (idx >= REGISTERS.len) break;
try ret.append(alloc, .{ .mov = .init(
.{ .reg = REGISTERS[idx] },
value_to_asm(arg),
) });
}
for (0..depth) |idx| {
const v_ir = c.args.items[c.args.items.len - 1 - idx];
const v_asm = value_to_asm(v_ir);
switch (v_asm) {
.imm, .reg => try ret.append(alloc, .{ .push = v_asm }),
else => try ret.appendSlice(alloc, &.{
.{ .mov = .init(v_asm, .{ .reg = .AX }) },
.{ .push = .{ .reg = .AX } },
}),
}
}
I think this should work. 0..depth
will be .. nothing, and the loop will not run. If it is larger, it starts from the last item (at index len - , 1
), then subtracts the current index of the loop. And finally, dealing with the return value, and returning the slice.
// emit call instruction
try ret.append(alloc, .{ .call = c.name }); // 2
const bytes_to_remove = 8 * depth + padding;
if (bytes_to_remove != 0)
try ret.append(alloc, .{ .dealloc_stack = bytes_to_remove }); // 3
try ret.append(alloc, .{ .mov = .init(
.{ .reg = .AX },
value_to_asm(c.dst),
) }); // 4
return ret.items;
Doing the eye test for codegen, I quickly noticed something stupid. I am moving the register's value to an immutable. Oops.
int add(int x, int y);
int main(void) {
return add(1, 2);
}
// PROGRAM
// FUNCTION main
// allocate 4
// allocate 8
// mov DI -> imm 1
// mov SI -> imm 2
// call .Ladd
// deallocate 8
// mov AX -> stack -4
// mov stack -4 -> AX
// ret
// mov imm 0 -> AX
// ret
Fixing that by switching two lines, the eye test passes fine. The test suite does not hit any errors (and, as a reminder, does not test beyond success a nd failure).
After that comes the annoying fixup pass adjustments. The depth returned by replace_pseudos
is adjusted to save the depth of each function in a field in the function definition. And also, when allocating, rounding that up to the nearest multiple of 16.
There is a neat trick to round up a number, n
, to the nearest multiple of 16. Basically (n + 15) & ~15
. Unfortunately, Zig makes this a pain to write because the integer type inference is not clever enough to understand ~15
.
There in the standard library is a helpful function to do that math: std.mem.alignForward
. Kindly pointed to me by the Zig Discord. So I added this bit at the end of replace_pseudos
.
func_def.depth = @intCast(pseudo_map.count() * 4);
const aligned = std.mem.alignForward(assembly.Instr.Depth, func_def.depth, 16);
try func_def.instrs.insert(alloc, 0, .{ .allocate_stack = aligned });
So, time to actually emit the assembly.
Code Emission
The changes here are pretty small. The new instructions are one thing, but there are new registers and new register sizes for push
, which takes 8-width registers. The logic for printing registers got pretty large so I put it in its own function.
fn emit_register(
reg: Operand.Register,
width: usize,
writer: anytype,
) !void {
p: switch (width) {
1 => switch (reg) {
.AX => try writer.print("%al", .{}),
.DX => try writer.print("%dl", .{}),
.CX => try writer.print("%cl", .{}),
.DI => try writer.print("%dil", .{}),
.SI => try writer.print("%sil", .{}),
.R8 => try writer.print("%r8b", .{}),
.R9 => try writer.print("%r9b", .{}),
.R10 => try writer.print("%r10b", .{}),
.R11 => try writer.print("%r11b", .{}),
},
4 => switch (reg) {
.AX => try writer.print("%eax", .{}),
.DX => try writer.print("%edx", .{}),
.CX => try writer.print("%ecx", .{}),
.DI => try writer.print("%edi", .{}),
.SI => try writer.print("%esi", .{}),
.R8 => try writer.print("%r8d", .{}),
.R9 => try writer.print("%r9d", .{}),
.R10 => try writer.print("%r10d", .{}),
.R11 => try writer.print("%r11d", .{}),
},
8 => switch (reg) {
.AX => try writer.print("%rax", .{}),
.DX => try writer.print("%rdx", .{}),
.CX => try writer.print("%rcx", .{}),
.DI => try writer.print("%rdi", .{}),
.SI => try writer.print("%rsi", .{}),
.R8 => try writer.print("%r8", .{}),
.R9 => try writer.print("%r9", .{}),
.R10 => try writer.print("%r10", .{}),
.R11 => try writer.print("%r11", .{}),
},
else => continue :p 4, // default case
}
}
And then in the format function just do the following. There is a bit of redundancy regarding the 4 as a default case. You are not paranoid if they are out to get you!
.reg => |r| try emit_register(r, options.width orelse 4, writer),
Running the test suite, I get into 16 failures. Oh come on.
Most of them seem to be UnrecognizedFlag
, which is the error I have in the compiler driver for flags that are unrecognized. That is what I get for not reading the text and jumping straight to the tables. There is a tidbit there about recognizing a -c
flag, which is the gcc
flag for creating object files instead of executables.
This is a small change to the argument parser. That is the new one:
pub fn parse_args() !Args {
var args = std.process.args();
_ = args.skip();
var path: ?[:0]const u8 = null;
var mode: Mode = .compile;
var c_flag = false; // new
while (args.next()) |arg| {
if (arg[0] == '-') {
if (arg[1] == 'c') // new
c_flag = true
else
mode = std.meta.stringToEnum(Mode, arg[2..]) orelse
return error.UnrecognizedFlag;
} else if (path == null)
path = arg
else
return error.PathDuplicated;
}
return .{
.path = path orelse return error.PathNotFound,
.mode = mode,
.c_flag = c_flag, // new
};
}
And mugh below that, when calling the assembler, the -c
flag is passed if c_flag
is true. And changing the extension to .o
. So it becomes the following (changing the extension itself is done elsewhere in a bit of rather annoying code). This is probably the nicest way of doing it.
{ // assembler
var child = std.process.Child.init(
if (args.c_flag)
&.{ "gcc", "-c", asm_out, "-o", obj }
else
&.{ "gcc", asm_out, "-o", exe },
gpa,
);
const term = try child.spawnAndWait();
if (!std.meta.eql(term, .{ .Exited = 0 }))
return error.AssemblerFail;
try std.fs.cwd().deleteFile(asm_out); // cleanup
}
Now running the test suite for real.
Debugging
On the plud side, all the programs ar ecompiling and running. On the negative side, I got a lot of bad return codes. There are logic mistakes. Also 3 errors relating to unicode and timeouts? I do not get it. The test suite's output is generally jumbled up and I find it hard to process. Having said that, the easiest way to attack the problem is to look at the individual files and compiles them myself.
UnicodeDecodeError
This is one of the weird errors, Nora Sandler's comments and all. The program should call the system's putchar
, which prints a character to standard out.
#ifdef SUPPRESS_WARNINGS
#pragma GCC diagnostic ignored "-Wunused-parameter"
#endif
int putchar(int c);
/* Make sure we can correctly manage calling conventions from the callee side
* (by accessing parameters, including parameters on the stack) and the caller side
* (by calling a standard library function) in the same function
*/
int foo(int a, int b, int c, int d, int e, int f, int g, int h) {
putchar(h);
return a + g;
}
int main(void) {
return foo(1, 2, 3, 4, 5, 6, 7, 65);
}
This is the output for each stage.
PARSING
FUNCTION putchar c
FUNCTION foo a b c d e f g h
(putchar h);
RETURN (+ a g)
FUNCTION main
RETURN (foo 1 2 3 4 5 6 7 65)
================================
VALIDATION
FUNCTION putchar c.0
FUNCTION foo a.1 b.2 c.3 d.4 e.5 f.6 g.7 h.8
(putchar h.8);
RETURN (+ a.1 g.7)
FUNCTION main
RETURN (foo 1 2 3 4 5 6 7 65)
================================
IR
FUNCTION foo
fn.9 <- putchar(h.8)
add.10 <- a.1 + g.7
ret add.10
ret 0
FUNCTION main
fn.11 <- foo(1, 2, 3, 4, 5, 6, 7, 65)
ret fn.11
ret 0
================================
CODEGEN
FUNCTION foo
allocate 48
mov DI -> stack -4
mov SI -> stack -8
mov DX -> stack -12
mov CX -> stack -16
mov R8 -> stack -20
mov R9 -> stack -24
mov stack 16 -> R10
mov R10 -> stack -28
mov stack 24 -> R10
mov R10 -> stack -32
allocate 8
mov stack -32 -> DI
call .Lputchar
deallocate 8
mov AX -> stack -36
mov stack -4 -> R10
mov R10 -> stack -40
mov stack -28 -> R10
add R10 -> stack -40
mov stack -40 -> AX
ret
mov imm 0 -> AX
ret
FUNCTION main
allocate 16
allocate 8
mov imm 1 -> DI
mov imm 2 -> SI
mov imm 3 -> DX
mov imm 4 -> CX
mov imm 5 -> R8
mov imm 6 -> R9
push imm 65
push imm 7
call .Lfoo
deallocate 24
mov AX -> stack -4
mov stack -4 -> AX
ret
mov imm 0 -> AX
ret
Hold on, before going into assembly, I spot a weird thing here. Why is the call to foo
is preceded by .L
. This is probably a formatting error which does not matter. But more importantly, the numbers for deallocations seem weird. Why is it deallocating 24? Let's review the logic in instr_to_asm
in .func_call
.
const bytes_to_remove = 8 * depth + padding;
if (bytes_to_remove != 0)
try ret.append(alloc, .{ .dealloc_stack = bytes_to_remove }); // 3
Eh, this makes sense. Counting the arguments passed on the stack there, there are 2. So why is there padding?
const padding: assembly.Instr.Depth =
if (depth % 2 == 0) 8 else 0; // <--- here
if (padding != 0)
try ret.append(alloc, .{ .allocate_stack = padding }); // 1
Oh I am assigning the padding in reverse. If the number of arguments is even, there should be no padding. Silly me.
This does not actually deal with the error I got for this file, which is this:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 0: invalid start byte
This might be the valued printed by putchar
failing to read. However, only way to know is to run the test suit again.
% paella ❱ zig build submit --release=safe -- --chapter 9
# many moths later
----------------------------------------------------------------------
Ran 61 tests in 60.822s
FAILED (failures=12)
Hey at least no weird UTF-8 errors. That's 7 less failures than last time (which was 16 failures and 3 errors).
AssertionError
Note: Please do not rad this section. It is embarrassing.
This is a more readable error than the last one.
AssertionError: Incorrect behavior in chapter_9/valid/stack_arguments/test_for_memory_leaks
* Bad return code: expected 1 and got 0
I can work with this. This is the C file in question. These are a lot of arguments.
/* Make sure stack arguments are deallocated correctly after returning from a function call; also test passing variables as stack arguments */
#ifdef SUPPRESS_WARNINGS
#pragma GCC diagnostic ignored "-Wunused-parameter"
#endif
int lots_of_args(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j, int k, int l, int m, int n, int o) {
return l + o;
}
int main(void) {
int ret = 0;
for (int i = 0; i < 10000000; i = i + 1) {
ret = lots_of_args(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ret, 13, 14, 15);
}
return ret == 150000000;
}
I will skip the whole parsing to IR thing and simply give you here the last codegen, with added comments. It is huge.
PROGRAM
FUNCTION lots_of_args
allocate 64
mov DI -> stack -4 ; a
mov SI -> stack -8 ; b
mov DX -> stack -12 ; c
mov CX -> stack -16 ; d
mov R8 -> stack -20 ; e
mov R9 -> stack -24 ; f
mov stack 16 -> R10
mov R10 -> stack -28 ; g
mov stack 24 -> R10
mov R10 -> stack -32 ; h
mov stack 32 -> R10
mov R10 -> stack -36 ; i
mov stack 40 -> R10
mov R10 -> stack -40 ; j
mov stack 48 -> R10
mov R10 -> stack -44 ; k
mov stack 56 -> R10
mov R10 -> stack -48 ; l
mov stack 64 -> R10
mov R10 -> stack -52 ; m
mov stack 72 -> R10
mov R10 -> stack -56 ; n
mov stack 80 -> R10
mov R10 -> stack -60 ; o
mov stack -48 -> R10
mov R10 -> stack -64 ; l
mov stack -60 -> R10 ; o
add R10 -> stack -64 ; l + o
mov stack -64 -> AX
ret
mov imm 0 -> AX
ret
FUNCTION main
allocate 32
mov imm 0 -> stack -4 ; ret
mov imm 0 -> stack -8 ; i
=> .Lst_for.16
cmp imm 10000000 -> stack -8
mov imm 0 -> stack -12
setl stack -12
cmp imm 0 -> stack -12
jmpe .Lbr_for.16
allocate 8 ; odd number of arguments
; register arguments
mov imm 1 -> DI ; a
mov imm 2 -> SI ; b
mov imm 3 -> DX ; c
mov imm 4 -> CX ; d
mov imm 5 -> R8 ; e
mov imm 6 -> R9 ; f
; stack arguments
push imm 15 ; o
push imm 14 ; n
push imm 13 ; m
mov stack -4 -> AX ; ret into l
push AX ; l
push imm 11 ; k
push imm 10 ; j
push imm 9 ; i
push imm 8 ; h
push imm 7 ; g
call lots_of_args
deallocate 80 ; why is this 80? 9 pushes + padding
mov AX -> stack -16 ; l + o here
mov stack -16 -> R10
mov R10 -> stack -4 ; assign to ret
=> .Lcn_for.16
mov stack -8 -> R10
mov R10 -> stack -20
add imm 1 -> stack -20
mov stack -20 -> R10
mov R10 -> stack -8
jmp .Lst_for.16
=> .Lbr_for.16
cmp imm 150000000 -> stack -4
mov imm 0 -> stack -24
sete stack -24
mov stack -24 -> AX
ret
mov imm 0 -> AX
ret
I am actually at a loss. All this seems to make sense. The generated asembly is more of the same. Let us see another failure:
AssertionError: Incorrect behavior in chapter_9/valid/arguments_in_registers/fibonacci
* Bad return code: expected 8 and got -11
How would you even get -11?
int fib(int n) {
if (n == 0 || n == 1) {
return n;
} else {
return fib(n - 1) + fib(n - 2);
}
}
int main(void) {
int n = 6;
return fib(n);
}
This is the assembly this time.
.globl _fib
_fib:
pushq %rbp ;; function prologue
movq %rsp, %rbp
subq $48, %rsp ;; allocate 48
movl %edi, -4(%rsp) ;; n
cmpl $0, -4(%rsp) ;; does n equal 0?
movl $0, -8(%rsp)
sete -8(%rsp)
cmpl $0, -8(%rsp) ;; is n == 0 true ?
jne .Ltrue_or.2
cmpl $1, -4(%rsp) ;; does n equal 1?
movl $0, -12(%rsp)
sete -12(%rsp)
cmpl $0, -12(%rsp) ;; is n == 1 true?
jne .Ltrue_or.2
movl $0, -16(%rsp)
jmp .Lend_or.3
.Ltrue_or.2:
movl $1, -16(%rsp)
.Lend_or.3:
cmpl $0, -16(%rsp) ;; is the previous expresison false?
je .Lelse.7
movl -4(%rsp), %eax ;; if so just return n
movq %rbp, %rsp
popq %rbp
ret
jmp .Lend.8
.Lelse.7:
movl -4(%rsp), %r10d ;; if not ..
movl %r10d, -20(%rsp) ;; move n here
subl $1, -20(%rsp) ;; n - 1
movl -20(%rsp), %edi ;; move n-1 to edi
call _fib ;; call fib
movl %eax, -24(%rsp) ;; stash result of fib(n-1)
movl -4(%rsp), %r10d
movl %r10d, -28(%rsp) ;; move n here
subl $2, -28(%rsp) ;; n - 2
movl -28(%rsp), %edi ;; move n-1 to edi
call _fib ;; call fib
movl %eax, -32(%rsp) ;; stash result of fib(n-2)
movl -24(%rsp), %r10d ;; stashed result of fib(n-1)
movl %r10d, -36(%rsp)
movl -32(%rsp), %r10d ;; stashed result of fib(n-2)
addl %r10d, -36(%rsp) ;; add two results together
movl -36(%rsp), %eax ;; move result to proper place
movq %rbp, %rsp
popq %rbp
ret ;; the end
.Lend.8:
movl $0, %eax ;; this bit here is the useless return 0
movq %rbp, %rsp ;; added at the end of every function
popq %rbp
ret
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $6, -4(%rsp)
movl -4(%rsp), %edi
call _fib
movl %eax, -8(%rsp)
movl -8(%rsp), %eax
movq %rbp, %rsp
popq %rbp
ret
movl $0, %eax
movq %rbp, %rsp
popq %rbp
ret
The logic seems fine. I am not debugging the logic for if
s and for
s as those test cases pass fine already. This sample here has no stack arguments yet it is still failing. This definitely means there is a memory corruption somewhere, but where? I am sure the smart ones of you spotted it. I am not that smart.5
I figured maybe the mistake is from using the subshell. So I tried doing the test suite the manual way I did it previously. But nope, same result. The good news is that my subshell spell works perfectly.
On a whim, which is most likely wrong, I thought of changing the sign of the stack offset for function bodies. (So they start from -16 downwards rather than 16 upwards). And that would not affect the failure in the fibonacci
example anyway. Surprisingly, that actually affected absolutely nothing: same number of failing cases. I am starting to think this is a parking_lot
bug.
I moved to compare the codegen stage with my previous rust program, and it is identical. I moved on to comparing the assembly generated. Even using diff
with colors to compare the two assemblies.
Then there I saw it. The logic is fine, but there is a typo: something I wrote many chapters ago.
Behold: stack offsets: -8(%rsp)
. It should be from %rbp
.
Now all the tests pass. Wonder how any of them did in the first place.
Lessons Learned
- Spell checking is important.
diff --color
is a useful tool.- Unrelated to
paella
itself, but I made some improvements to the build script, and learned more abotjj
andgit
. I think I will addgit
tags to commits that finish each chapter, which would make browsing them easier. - This week I instealled
llm
, and had it set up to talk to the free version of Github Copilot. Really convenient and useful, and I used to spell check the articles. I will try some local models next but I do not expect they will do much with my Macbook Air M2. - Can I have typed integer literals or slightly greedier type inference in Zig please? It is absurd that
~15
just does not work.
One chapter left for Part 1. Unsure what to do next.