So, with the first chapter of Writing a C Compiler behind us, it is time to start on the second chapter: Unary Operators. But first, a word from build.zig
land.
Running the tests with zig build test
The tests for this Book live in their own repository, which I have dutifully cloned and run tests in.
The process I followed to run the tests is fairly manual:
- navigate to the test directory.
- run the command
arch -x86_64 zsh
(which I have cleverly aliased tox86
). This runs a x86 version ofzsh
. - run the tests.
When I was doing my Rust implementation, I kept a terminal accessible with a global hotkey and I kept the inner shell running at all times.
With build.zig
, I looked for a way to do that automatically. The Build
object, which is central to build.zig
has a nice addSystemCommand
method, which I could use to run my stuff in. Now remained the problem of how to run the tests in the x86 shell.
My first thought was to have the command as arch -x86_64 test_compiler ..etc
(test_compiler
being the test script provided). While it worked, as in the tests for chapter 1 all ran fine, upon further reading I realized that the arch
actually chooses which binary to run from a macOS Universal Binary. It has no effect on python scripts, and the simple tests on chapter 1 worked fine either way. So I asked around, and some kind soul pointed me to the right thing to search for: Subshells.
See, this might seem obvious if you've lived in Unix land all your life, but shells are applications too. You can pass command line arguments to them!! So, bash -c "echo foo"
, runs bash
, runs the command echo foo
in bash
, then exits. Perfect.
Putting these two together, this was the added bit to build.zig
:
const test_step = b.step("test", "Run tests");
// subshells. how do they work.
const inner_command = try std.mem.join(b.allocator, " ", &.{
"../writing-a-c-compiler-tests/test_compiler",
b.pathJoin(&.{ b.exe_dir, "paella" }),
try std.mem.join(
b.allocator,
" ",
b.args orelse &.{""},
),
});
// does this work like i think it does?
const test_command = b.addSystemCommand(
&.{ "arch", "-x86_64", "zsh", "-c", inner_command },
);
test_command.step.dependOn(b.getInstallStep());
test_step.dependOn(&test_command.step);
I am unreasonably happy with this. Using regular Zig standard library tools like std.mem.join
allowed to join together the commands I am passing to the zsh
subshell, and voila!!
Now the tests (say for chapter 1) can be ran "simply" with zig build test -- --chapter 1
. Great success!
Lexer
There are three tokens to add in this chapter: ~
, -
, and --
. The decrement operator is only being tokenized here to reject illegal C syntax like return --2
, but otherwise will not be implemented.
So our Tag
enum grows1 to have these three tokens: tilde, hyphen
. As I am not planning to implement the book's extra credit this time around, I will just have the double hyphen lex into the invalid
token. This would reject lexing valid code like x--
, but I do not think this is going to be present into the test cases. We will see.
The State
enum is a bit more interesting. It cannot accept two consecutive hyphens, so seeing a hyphen would put it into the hyphen
state, where if it sees another hyphen, it puts out Tag.invalid
, or Tag.hyphen
otherwise.
So, if the state is at .start
, and the lexer encounters a '-'
character, it just switches gear to the .hyphen
state. Labelled switch
in action. Then in there, triage happens. The code is honestly copied, again, from the Zig compiler, with adjustments.
state: switch (State.start) {
.start => switch (self.buffer[self.index]) {
'~' => {
result.tag = .tilde;
self.index += 1;
},
'-' => continue :state .hyphen,
// etc
.hyphen => {
self.index += 1;
switch (self.buffer[self.index]) {
'-' => result.tag = .invalid,
else => result.tag = .hyphen,
}
},
// etc
}
This should be it. Time to put the new testing command into action.
Ran 43 tests in 5.432s
OK
Both of them work. Time for AST and parsing.
Parsing
Changes to the AST are fairly minimal. There are two unary operations: negation, and complement. Instead of adding a separate UnaryOp
enum as the Book suggests, I am going to embed them right into Expr
as separate tags, updating the pretty printer while I am at it.2
pub const Expr = union(enum) {
constant: u64,
unop_negate: *Expr,
unop_complement: *Expr,
pub fn format(
self: @This(),
comptime _: []const u8,
_: std.fmt.FormatOptions,
writer: anytype,
) !void {
switch (self) {
.constant => |c| try writer.print("{d}", .{c}),
.unop_negate => |e| try writer.print(" -{}", .{e}),
.unop_complement => |e| try writer.print(" ~{}", .{e}),
}
}
};
Updating the parsing functions is a bit more involved, however. Now is the start of PEMDAS, and time to start our recursive descemt into madness.3 This is pretty much the same function provided in the book, except that the operation is chosen immediately.
fn parse_expr(
alloc: std.mem.Allocator,
tokens: *lexer.Tokenizer,
) !*ast.Expr {
const current = tokens.next() orelse
return error.ExpectExpr;
switch (current.tag) {
.number_literal => {
const lit = tokens.buffer[current.loc.start..current.loc.end];
const res = try std.fmt.parseInt(u64, lit, 10);
// vvv this helper function just allocates and sets
return try create(ast.Expr, alloc, .{ .constant = res });
},
.hyphen => {
const inner_exp = try parse_expr(alloc, tokens);
return try create(ast.Expr, alloc, .{ .unop_negate = inner_exp });
},
.tilde => {
const inner_exp = try parse_expr(alloc, tokens);
return try create(ast.Expr, alloc, .{ .unop_complement = inner_exp });
},
.l_paren => {
const inner_exp = try parse_expr(alloc, tokens);
try expect(.r_paren, tokens);
return inner_exp;
},
else => return error.ExpectExpr,
}
}
Now the parsing is done, but to test a small annoyance needs to be fixed. In asm_gen.zig
, there is a switch over expressions, and it needs to handle the new modes, even tho it is not being called right now. This tedium and the need to repeat it for every step is partially why I stopped the Rust implementation. Adding a else => @panic("unimplemented"),
to that switch
smoothes things over for now.
Ran 43 tests in 6.363s
OK
Yuppee.
Last but not least, even though it is not needed for the first chapters, as the only identifier is main
, I figured I'd try some comptime
magic to improve my expectation experience. I changed expect
a little bit so it returns a string when I am asking for an identifier, but void
otherwise.4
fn expect(
comptime expected: lexer.Token.Tag,
tokens: *lexer.Tokenizer,
) !if (expected == .identifier)
[]const u8
else
void {
if (tokens.next()) |actual| {
if (actual.tag != expected)
return error.SyntaxError;
if (expected == .identifier)
return tokens.buffer[actual.loc.start..actual.loc.end];
} else return error.SyntaxError;
}
Intermediate Representation
The Book uses its own version of Intermediate Representation, called TACKY
. It is a variation of a popular IR strategy called three-address code, hence the name.
TACKY
uses its own AST, and it would slot between the program's AST and the assembly's AST. This is the first pass. So many strings.
const std = @import("std");
pub const Prgm = struct {
func_def: *FuncDef,
};
pub const FuncDef = struct {
name: []const u8,
// for similar reasons to its use in assembly AST
instrs: std.ArrayListUnmanaged(Inst),
};
pub const Inst = union(enum) {
ret: Value,
// decided to splat the operator here, like discussed in chapter 1.
unop_negate: Unary,
unop_complement: Unary,
// to avoid excessive typing
pub const Unary = struct {
src: Value,
dst: Value,
pub fn init(src: Value, dst: Value) @This() {
return .{ .src = src, .dst = dst };
}
};
};
pub const Value = union(enum) {
constant: u64,
variable: []const u8,
};
The goal of this IR is to disentangle nested expressions by separating them into what is, at least in IR form, separate operations. This means that for an expression like 1 + 2 * 3
5, it is transformed into the following representation:
tmp0 = 2 * 3
tmp1 = 1 + tmp0
return tmp1
The Book provided expr_emit_ir
, with my splatted data structures, looks like the following:
fn expr_emit_ir(
alloc: std.mem.Allocator,
expr: *ast.Expr,
instrs: *std.ArrayListUnmanaged(ir.Instr),
) !ir.Value {
switch (expr.*) {
.constant => |c| return .{ .constant = c },
.unop_negate => |e| {
const src = try expr_emit_ir(alloc, expr, instrs);
const dst_name = try make_temporary(alloc, "neg");
const dst: ir.Value = .{ .variable = dst_name };
try instrs.append(alloc, .{ .unop_negate = .init(src, dst) });
return dst;
},
.unop_complement => |e| {
const src = try expr_emit_ir(alloc, expr, instrs);
const dst_name = try make_temporary(alloc, "cml");
const dst: ir.Value = .{ .variable = dst_name };
try instrs.append(alloc, .{ .unop_complement = .init(src, dst) });
return dst;
},
}
}
A bit repetitive, isn't it? So I put together the repeated logic into its own function, unary_helper
:
fn unary_helper(
alloc: std.mem.Allocator,
expr: *ast.Expr,
instrs: *std.ArrayListUnmanaged(ir.Instr),
comptime prefix: []const u8,
) !struct { ir.Value, ir.Value } {
const src = try expr_emit_ir(alloc, expr, instrs);
const dst_name = try make_temporary(alloc, prefix);
const dst: ir.Value = .{ .variable = dst_name };
return .{ src, dst };
}
Nothing fancy. make_temporary
over here is just a function with an ever increasing static variable for creating always different variable names. However, I hit here the weirdest zig compile error I have seen, yet. error: unable to resolve inferred error set
. What gives?
Turns out when two recursive functions call each other, both with an inferred error type, Zig throws in the towel. Swift would've never given up and compiled, eventually;
Apparently I had to specify the error type myself.
const helper_error = error{
MakeTemporary, // return error value of `make_temporary`
UnaryHelper,
};
... and changed unary_helper
to this:
fn unary_helper(
alloc: std.mem.Allocator,
expr: *ast.Expr,
instrs: *std.ArrayListUnmanaged(ir.Instr),
comptime prefix: []const u8,
) helper_error!struct { ir.Value, ir.Value } {
const src = expr_emit_ir(alloc, expr, instrs) catch
return error.UnaryHelper;
const dst_name = try make_temporary(alloc, prefix);
const dst: ir.Value = .{ .variable = dst_name };
return .{ src, dst };
}
And now it compiles. The rest of ir_gen.zig
, is rahter a straightforward one to one between the two trees, and rather boring to implement. Instead, I will show you make_temporary
. Zig has a weird way to define local static variables.
fn make_temporary(
alloc: std.mem.Allocator,
comptime prefix: []const u8,
) helper_error![]const u8 {
// zig static variables
const coun = struct {
var ter: usize = 0;
};
const name = std.fmt.allocPrint(
alloc,
if (prefix.len == 0) "temp" else prefix ++ ".{}",
.{coun.ter},
) catch return error.MakeTemporary;
coun.ter += 1;
return name;
}
First of all, non-Atomic is fine. All this is single threaded anyway. Second of all, any variable directly declared in a struct
scope is static. So if you need a local static variable you define a struct
with a .. variable. This is exactly the same semantics as declaring a global variable (or at file scope) except the scope here is a lot smaller.6 Mind bending a bit, but makes sense.
To save on pretty printing this time, I figured to try std.json.stringify
to get the IR output. This is the C file I am working on. So small and non-threatening.
int main(void) {
return ~-3;
}
And this is the IR output:
{
"func_def": {
"name": "main",
"instrs": {
"items": [
{
"unop_negate": {
"src": {
"constant": 3
},
"dst": {
"variable": "neg.0"
}
}
},
{
"unop_complement": {
"src": {
"variable": "neg.0"
},
"dst": {
"variable": "cml.1"
}
}
},
{
"ret": {
"variable": "cml.1"
}
}
],
"capacity": 5
}
}
}
Very longwinded, with some useless info. Why do I need to know the ArrayListUnmanaged
capacity? Time to work on pretty printing and allocation bookkeeping. Not very interesting, to be honest, so I will skip writing about it.
Except for string interning. that is interesting.
String Interning
String interning is having a single storage for strings that are used throught the program. Keeping the source around is not enough, because of the new strings for the IR's temporary variables. Giving ownership of the strings to ir.Value
is fine, except it is really easy to double free the strings when cleaning up. In fact, I did that when I was trying to not implement the string interner, and Zig's DebugAllocator
dutifully caught it.
The basic idea behind the interner is to have all strings owned by one repository that you can free at once at the end. One can use an arena, but the interner's structure is more efficient because it also avoids duplication of strings. It can be backed by a hashmap, so it can also attach information to each string. All in all it is a useful memory and resource management technique.
The friendly people at the Zig discord, especially InKryption not only helped me out understand the data idea, bu actually provided me with a complete implementation.
It makes use of Zig's, hoenstly weird, sentinel-terminated arrays: a generalization of C's null-terminated strings.
The idea is as follows: a regular slice (say the []comst u8
we use and love) stores its length within it. It knows what length it is at all times. So if you read from it or iterate on it, it stops dutifully at the length it knows. A null-terminated string, however, would be [:0]const u8
. When you read from it or iterate on it, it keeps going until it hits the 0
. (Obviously you can use any sentinel value but the 0
is very convenient for strings.)
So the String Interner keeps as its backing data a regular ArrayListUnmanaged(u8)
that I have been using before. Say it is called bytes
.) But for any new string in the application, it appends it to the ArrayList then adds 0
byte at the end. And I keep its starting index around. When I want to use it, I take slice like so bytes[starting_idx..:0]
and voila .. I have a slice of my string.
The remaining pieces of the puzzle is two items in the Zig standard library: StringIndexContext
and StringIndexAdapter
. They're specifically designed for this specific use case. Who knew?
A Context
in Zig's hash maps is a simple enough idea. It just defines the eql
and hash
functions said hash map will use, with an optional backing data structe. An Adapter
is used to override these two functions for speciifc operations. The StringIndex
duo have slightly different implementation of the eql
function, and if you're curious you can check them out in the standard library's code.7
Without further ado, here is the full StringInterner
provided thankfully by InKryption. I honestly had to read multiple times to wrap my head around what's going on, so I commented some parts of it.
const std = @import("std");
const StringInterner = struct {
// state
bytes: std.ArrayListUnmanaged(u8),
map: std.HashMapUnmanaged(
Id,
void,
std.hash_map.StringIndexContext,
std.hash_map.default_max_load_percentage,
),
// management
pub const init: StringInterner = .{
.bytes = .empty,
.map = .empty,
};
pub fn deinit(self: *StringInterner, allocator: std.mem.Allocator) void {
self.bytes.deinit(allocator);
self.map.deinit(allocator);
}
// type alias. store this in stuff.
pub const Idx = u32;
// checks if a string is in or not. returns ID
pub fn getIdx(
self: *const StringInterner,
string: []const u8,
) ?Idx {
return self.map.getKeyAdapted(string, self.hmAdapter());
}
// get string from id
pub fn getString(
self: *const StringInterner,
idx: Idx,
) ?[:0]const u8 {
if (!self.map.containsContext(idx, self.hmCtx())) return null;
// cast is necessary for type inference reasons.
const slice_sentinel: [:0]const u8 = @ptrCast(self.bytes.items[idx..]);
return std.mem.sliceTo(slice_sentinel, 0);
}
// the insert function. returns aither an existing idx or a new idx.
pub fn getOrPut(
self: *StringInterner,
allocator: std.mem.Allocator,
string: []const u8,
) std.mem.Allocator.Error!Idx {
// reserves capacity in both the backing array and the map
try self.bytes.ensureUnusedCapacity(allocator, string.len + 1);
try self.map.ensureUnusedCapacityContext(allocator, 1, self.hmCtx());
// getOrPut returns a reference to the key and the value. If it existed, they're
// valid pointers. If not, their contents are undefined waiting for you to fill them.
const gop = self.map.getOrPutAssumeCapacityAdapted(string, self.hmAdapter());
gop.value_ptr.* = {}; // this is void, but it can be anything!! note void is zero-width
if (gop.found_existing) return gop.key_ptr.*;
// unlikelt to hit that. the source files are tiny.
if (self.bytes.items.len > std.math.maxInt(Idx)) return error.OutOfMemory;
const new_idx: Idx = @intCast(self.bytes.items.len);
// indertion happens here
self.bytes.appendSliceAssumeCapacity(string);
// don't forget to append the null byte !!
self.bytes.appendAssumeCapacity(0);
// update the map through the pointer.
gop.key_ptr.* = new_idx;
return new_idx;
}
// helper functions to avoid a self referential struct
fn hmCtx(self: *const StringInterner) std.hash_map.StringIndexContext {
return .{ .bytes = &self.bytes };
}
fn hmAdapter(self: *const StringInterner) std.hash_map.StringIndexAdapter {
return .{ .bytes = &self.bytes };
}
};
I will currently use this structure as is, only changing the names to be snake_case
as I cannot read camelCase
. Later on I can use it to attach data to the strings. Fairly useful.
My first go at this was to replace every string with StringInterner.Idx
. Just make sure to pass a pointer to the interner when constructing these structures.
pub const Prgm = struct {
func_def: *FuncDef,
};
pub const FuncDef = struct {
name: utils.StringInterner.Idx,
instrs: std.ArrayListUnmanaged(Instr),
};
// and so on.
While this works and compiles fine, it turned out to be problematic when pretty printing. Idx
is just a u32
and stores no reference to the interner. And the format
functions do not have an allocator or a user-space reference to use. So instead, I am changing the type to [:0]const u8
, then storing slices from the interner rather than indices.
Then adjust our ir_gen
functions to pass a pointer to the interner through all of them. That or creating a makeshift Ctx
type. Time will tell. For an example, now, this the unary_helper
mentioned earlier and the new make_temporary
, with the new structure threaded in.
fn unary_helper(
alloc: std.mem.Allocator,
interner: *utils.StringInterner,
expr: *ast.Expr,
instrs: *std.ArrayListUnmanaged(ir.Instr),
comptime prefix: []const u8,
) helper_error!struct { ir.Value, ir.Value } {
const src = expr_emit_ir(alloc, interner, expr, instrs) catch
return error.UnaryHelper;
const dst_name = try make_temporary(alloc, interner, prefix);
const dst: ir.Value = .{ .variable = dst_name };
return .{ src, dst };
}
fn make_temporary(
alloc: std.mem.Allocator,
interner: *utils.StringInterner,
comptime prefix: []const u8,
) helper_error![:0]const u8 {
const static = struct {
var counter: usize = 0;
};
const name = std.fmt.allocPrint(
alloc,
if (prefix.len == 0) "temp" else prefix ++ ".{}",
.{static.counter},
) catch return error.MakeTemporary;
defer alloc.free(name);
const name_idx = interner.get_or_put(alloc, name) catch
return error.MakeTemporary;
const name_res = interner.get_string(name_idx).?; // <- unwrap optional
static.counter += 1;
return name_res;
}
So this all compiles and runs and passes tests just fine. All is left is implementing the pretty printing myself, as the json
module seemed to break for reasons I do not understand (related to the interner).8 The technique to do so was discussed in the Chapter 1 article, so no point in repeating it here. It is time to move on to the next section.
Codegen
The first task here is almost invariably to update the assembly AST to include more instructions. Again deviating from the book, I will splat the unary operator into the instructions. This will totally bite me later. The updated parts of the new AST look as follows:
pub const Inst = union(enum) {
mov: Mov,
ret: void,
// unary operations
neg: Operand,
not: Operand,
allocate_stack: i64, // same type as `.stack` below.
const Mov = struct {
src: Operand,
dst: Operand,
pub fn init(src: Operand, dst: Operand) @This() {
return .{ .src = src, .dst = dst };
}
};
};
pub const Operand = union(enum) {
imm: u64,
reg: Register,
stack: i64,
pseudo: [:0]const u8, // these get replaced
pub const Register = enum { AX, R10 };
};
Oh there is a string there. Our string interner will come in handy.
Another note: I am not perfectly happy with the way the book represents register. Because later in the book, when long
and pointers are implemented, different names have to be chosen for each register, and I had to encode that info in the Register
type itself, which led to weird amount of code duplications. I have not looked at how the reference OCaml implementation does it. But I will cross that bridge when I come to it. So far all registers are 32 bits wide.
Now asm_gen.zig
is reworked to take the IR AST as input instead. This is almost a complete rewrite except for the first, trivial, Prgm
transformation. Complicates the matter a bit is that IR instructions and Assembly instructions do not map 1 to 1, but each IR instruction can be one or more Assembly instructions. So I have to return a slice. Manual memory management strikes again.
It is easier to start from the bottom up. This is IR values to Assembly operands. No allocator needed! Just copy the string's reference as is.
fn value_to_asm(
value: ir.Value,
) assembly.Operand {
switch (value) {
.constant => |c| return .{ .imm = c },
.variable => |v| return .{ .pseudo = v },
}
}
And this is more involved Instr
porter:
fn instr_to_asm(
alloc: std.mem.Allocator,
instr: ir.Instr,
) ![]assembly.Instr {
switch (instr) {
.ret => |v| {
const src = value_to_asm(v);
const ret = try alloc.dupe(assembly.Instr, &.{
.{ .mov = .init(src, .{ .reg = .AX }) },
.ret,
});
return ret;
},
.unop_complement => |u| {
const src = value_to_asm(u.src);
const dst = value_to_asm(u.dst);
const ret = try alloc.dupe(assembly.Instr, &.{
.{ .mov = .init(src, dst) },
.{ .not = dst },
});
return ret;
},
.unop_negate => |u| {
const src = value_to_asm(u.src);
const dst = value_to_asm(u.dst);
const ret = try alloc.dupe(assembly.Instr, &.{
.{ .mov = .init(src, dst) },
.{ .neg = dst },
});
return ret;
},
}
}
Not very DRY, and a lot is going on. Certainly a lot of (what is going to be very temporary) allocations. Returning pointers to stack space is a rookie mistake. (Thanks Rust!)
The classic solution to a lot of temporary allocations in similar programs is to use an arena allocator, and then free it all at once when the function is done. This would be set up from func_def_to_asm
. So here is the first attempt at it:
fn func_def_to_asm(
alloc: std.mem.Allocator,
func_def: ir.FuncDef,
) !assembly.FuncDef {
var arena_allocator = std.heap.ArenaAllocator.init(alloc);
defer arena_allocator.deinit();
const arena = arena_allocator.allocator();
var instrs = std.ArrayListUnmanaged(assembly.Instr).empty;
for (func_def.instrs.items) |instr| {
// note the different allocators for each function.
const ret = try instr_to_asm(arena, instr);
try instrs.appendSlice(alloc, ret);
}
return .{
.name = func_def.name,
.instrs = instrs,
};
}
And finally Prgm
is trivial.
pub fn prgm_to_asm(
alloc: std.mem.Allocator,
prgm: ir.Prgm,
) !assembly.Prgm {
const func_def = try utils.create(
assembly.FuncDef,
alloc,
try func_def_to_asm(alloc, prgm.func_def.*),
);
return .{ .func_def = func_def };
}
And the proof of the pudding is in the eating. But not before filling in all the @panic("todo")
s and cleaning up the allocated memory. The code so far works and produces the expected result on the previously mentioned tiny C program, but codegen is not over yet.
Fixing Up Instructions
There are two final tasks before actually generating the assembly. First is to replace all them .pseudo
operands with .stack
operands. The Book used the pseudo
operands as a placeholder, as the compiler does not have the stack info at hand yet. It needs to count the amount of intermediate, temporary variables, created during IR generation.
The second is to rewrite illegal assembly instructions, such as moving from a stack to a stack, into legal ones.
These are essentially additional compiler passes, and I am loath to do the same thing I did in my Rust implementation and stuff everything in one file. So a new directory is born, titled asm_passes
, with two new Zig files in toe. If you are keeping track, this is my current folder structure.
./src
├── asm_gen.zig
├── asm_passes
│ ├── fixup.zig
│ └── pseudos.zig
├── assembly.zig
├── ast.zig
├── ir_gen.zig
├── ir.zig
├── lexer.zig
├── main.zig
├── parser.zig
└── utils.zig
Most of them are self descriptive. utils
is where the string interner definition lives and, currently, one create
function, that allocates a value on the heap.
Pseudoregisters
This simply goes over the instrs
array stored in FuncDef
to change up the arrays. The interesting challenge here is maintaining a map from identifiers to stack offsets. Storing the strings as map keys is horribly wasteful, so the string interner's Idx
type is used instead, so it passes in.
I probably made this a lot harder for myself. I tried to implement my own hash map adapter type, then had a struggle with Zig's syntax for references. Eventually, Good prevailed and the code was compiled.
The building block is this "simple" function, with a minimal amount of parameters. getOrPut
is a standard library function that almost does what it says on the tin, but gives you pointers to place the key and value yourself as you please. I opted to use a managed (not Unmanaged
) hash map here to spare myself an allocator parameter. I could shave off strings
too if I manage to figure out how Context
works.9
fn pseudo_to_stack(
op: assembly.Operand,
strings: *utils.StringInterner,
map: *std.AutoHashMap(
utils.StringInterner.Idx,
assembly.Operand.StackDepth,
),
) !assembly.Operand {
const static = struct {
var counter: assembly.Operand.StackDepth = 0;
};
switch (op) {
.reg, .imm, .stack => return op,
.pseudo => |name| {
const idx = strings.get_idx(name).?;
const gop = try map.getOrPut(idx);
if (gop.found_existing) {
return .{ .stack = gop.value_ptr.* };
} else {
static.counter -= 4;
gop.key_ptr.* = idx;
gop.value_ptr.* = static.counter;
return .{ .stack = gop.value_ptr.* };
}
},
}
}
The public facing function that runs on assembly.Prgm
comes next. The &
and *
and .*
sprinkled over the loop's variables took a lot of trial and error and Zig discord help to get right. But the shared payload between .neg
and .not
does not to be duplicated.
pub fn replace_pseudos(
alloc: std.mem.Allocator,
strings: *utils.StringInterner,
prgm: *assembly.Prgm,
) !void {
var pseudo_map: std.AutoHashMap(
utils.StringInterner.Idx,
assembly.Operand.StackDepth,
) = .init(alloc);
defer pseudo_map.deinit();
for (prgm.func_def.instrs.items) |*instr| {
// ^ no address operator as it is already a slice
switch (instr.*) {
.mov => |*m| {
// ^ this was the tricky one to find
const src = try pseudo_to_stack(m.src, strings, &pseudo_map);
const dst = try pseudo_to_stack(m.dst, strings, &pseudo_map);
m.* = .init(src, dst);
},
.neg, .not => |*v| v.* = try pseudo_to_stack(v.*, strings, &pseudo_map),
.ret, .allocate_stack => {},
}
}
}
It works. Thankfully. This little bit made me tear my hair out.10
As a small comparison, this is the C file in question, followed by the pretty printed codegen before and after replacing pseudoregisters.
// to save you a scroll
int main(void) {
return ~-3;
}
#before
PROGRAM
FUNCTION main
mov imm 3 -> pseudo neg.0
neg pseudo neg.0
mov pseudo neg.0 -> pseudo cml.1
not pseudo cml.1
mov pseudo cml.1 -> AX
ret
# afer
PROGRAM
FUNCTION main
mov imm 3 -> stack -4
neg stack -4
mov stack -4 -> stack -8
not stack -8
mov stack -8 -> AX
ret
This is a horribly inefficient compiler, but it clearly works as intended. Optimization comes later in Part 3 of the book.
Fixing Instructions
The first fixup is to insert the .allocate_stack
instruction with the depth of the stack. But this is not being returned anywhere! So a slight rejiggering of pseudo_to_stack
to use a passed in counter instead of a local static one, and having replace_pseudos
return the stack depth, would solve the problem. Simple, if annoying, code change.
Then adding the first instruction this way becomes fairly simple. Bad performance for this one step but I am not paying too much attention to that.
const depth = try pass_pseudo.replace_pseudos(alloc, strings, prgm_asm);
try prgm_asm.func_def.instrs.insert(
alloc,
0,
.{ .allocate_stack = @intCast(@abs(depth)) },
// ^ this retunrs an unsigned int.
);
// next step gets called here
The other, more involved step is fixing up illegal instructions. If you know anything about x86 assembly, you would know that you cannot make a mov
instruction between two stack places. There are reasons for that that I do not particularly care about. Why cannot the Book compile to a saner assembly language like WASM? I am not even on a x86 machine! Thankfully, this is the only thing to deal with in this chapter. Here is my first pass at the code.
pub fn fixup_instrs(
alloc: std.mem.Allocator,
prgm: *assembly.Prgm,
) !void {
var out: std.ArrayListUnmanaged(assembly.Instr) = try .initCapacity(
alloc,
prgm.func_def.instrs.capacity,
);
defer {
std.mem.swap(
std.ArrayListUnmanaged(assembly.Instr),
&out,
&prgm.func_def.instrs,
);
out.deinit(alloc);
}
for (prgm.func_def.instrs.items) |instr| {
switch (instr) {
.mov => |m| switch (m.src) {
.stack => switch (m.dst) {
.stack => try out.appendSlice(alloc, &.{
.{ .mov = .init(m.src, .{ .reg = .R10 }) },
.{ .mov = .init(.{ .reg = .R10 }, m.dst) },
}),
else => try out.append(alloc, instr),
},
else => try out.append(alloc, instr),
},
else => try out.append(alloc, instr),
}
}
}
So much rightwards drift. It would perhaps be more readable to express it as a state machine, like the tokenizer works. I will make sure this works as intended first then try to streamline it, as this is untenable. Time to run the codegen tests.
Ran 43 tests in 5.153s
OK
Code Emission
Before we get into the state machine, it is better to be done with code emission. It is a fairly simple and mechanical exercise as it just requires updating the printing code, but it could expose pretty much any bugs in the program.
and ...
Ran 43 tests in 5.147s
FAILED (errors=19)
Oh come on. This turns out to be a combination of some commented out code when I was updating the compiler driver to include IR, and dumb spelling mistakes I did in this fairly simple and mechanical task, such as writing &
instead of %
. Simple mistakes, really.
All tests pass now:
Ran 43 tests in 14.731s
OK
State Machine
Following from the tokenizer's code, I defined a simple State
enum:
const State = enum {
start,
mov_stack_stack,
legal,
};
And the main switch
inside the loop turns into a labelled switch
over State
, in which each state has its own logic, but any legal instruction just funnels to .legal
. To be honest, writing this was easier than I expected it to be. It even worked on the first iteration, after I fixed the variable names. This is the second.
Jury's up whether the state machine is more readable than the nested switch
es, but I think I will stick with it for the time being. It passes all tests.
for (prgm.func_def.instrs.items) |instr| {
state: switch (State.start) {
.start => switch (instr) {
.mov => |m| if (m.src == .stack and m.dst == .stack)
continue :state .mov_stack_stack
else
continue :state .legal,
else => continue :state .legal,
},
.mov_stack_stack => try out.appendSlice(alloc, &.{
.{ .mov = .init(instr.mov.src, .{ .reg = .R10 }) },
.{ .mov = .init(.{ .reg = .R10 }, instr.mov.dst) },
}),
.legal => try out.append(alloc, instr),
}
}
Just for comparison's sake, this is the equivalent code in my Rust implementation.
for instr in std::mem::take(&mut self.instrs) {
match instr {
Instr::Mov {
src: src @ Operand::Stack(_),
dst: dst @ Operand::Stack(_),
} => out.extend([
Instr::Mov {
src,
dst: Operand::Reg(Register::R10),
},
Instr::Mov {
src: Operand::Reg(Register::R10),
dst,
},
]),
other => out.push(other),
}
}
Pattern matching is the best. And this concludes Chapter 2.
Lessons Learned
- Building a cool state machine from zero.
- More manual memory management magic tricks. (Couldn't push the alliteration further.)
- Subshells, and a more automated testing process.
- First taste of
comptime
. ULTIMATE POWER.
Maybe next time I will compile the tests to be always in ReleaseSafe
instead of Debug
mode. Would certainly make things go more smoothly. Need to figure that one out.