Lab 4 - IR generation
In this lab we will write a code generator for our Tiger compiler targeting LLVM IR.
The IR code generator is responsible for the following tasks:
- Creating a IR module in which functions will be defined.
- Creating independent top-level IR functions for every Tiger function in the source code, including the nested ones.
- Building stack frames to hold escaping variables, and giving pointer to the outer stack frame to every function called.
- Building basic blocks to represent the flow of the Tiger function.
A completed code generator should never raise an error related to the source code. All errors should have been caught in earlier phases already. For example, an unknown identifier would have been caught and reported by the binder. A typing inconsistency would have been caught and reported by the typer.
Setup
Retrieve the lab code and commit it to your git with the following commands,
$ mkdir lab4
$ cd lab4
$ wget -qO- www.sifflez.org/lectures/compil/lab4/dragon-tiger.tar.gz | tar zxv
$ git add -f dragon-tiger/
$ git commit -m "Import dragon-tiger for lab4"
This lab is the first one which requires the headers and libraries for LLVM. You
can usually install them from a llvm-dev
package. You must use at least version
3.9 and at most version 9.0 of LLVM. Let us build the project:
$ cd dragon-tiger
$ ./configure
$ make
If it doesn’t find the LLVM header and libraries, ensure that you have installed them.
If needed, you can use the --with-llvm=
argument to configure to point it to the
right location.
If everything worked as expected, you should be able to run
the compiler driver dtiger
as follows,
$ src/driver/dtiger --help
Options:
-h [ --help ] describe arguments
--dump-ast dump the parsed AST
--dump-ir dump the generated IR
-b [ --bind ] run the binder on the parsed AST
-t [ --type ] run the type checker on the parsed AST
-i [ --irgen ] run the LLVM IR code generator
--trace-parser enable parser traces
--trace-lexer enable lexer traces
-v [ --verbose ] be verbose
--input-file arg input Tiger file
-o [ --object ] arg generate object code file
$ echo "print_int(42)" > test.tig
$ src/driver/dtiger -i --dump-ir test.tig
; ModuleID = 'tiger'
source_filename = "tiger"
define i32 @main() {
entry:
br label %body
body: ; preds = %entry
call void @__print_int(i32 42)
ret i32 0
}
declare void @__print_int(i32)
Ensure that you test thoroughly and commit each feature. Follow precisely the instructions, this lab is graded and machine corrected!
Important note
Some of you may not have completed the previous assignment. To this effect, the current archive contains a pre-built parser library (so as not to give you the full solution) and binder and type checker phase and this lab may be done separately.
However, when you have completed the previous assignment, feel free to
replace the src/parser/
directory by a copy of the one from your
previous assignment. This will be most satisfactory, as your code from
the first step will be used also in the second step. Do not forget to
add parser
to the SUBDIRS
variable in src/Makefile.am
(it must
be the first subdirectory there because it auto-generates files
used in other subdirectories) and in
src/parser/Makefile
in configure.ac
. However, you must
keep nodes.hh
as distributed with this lab, because it contains extra
visitors for the AST.
You might also want to propagate your evaluator code although this will not be needed.
Similarly, you can get the src/ast/
directory from the second lab
as long as you have implemented the binder, type checker and escaper.
Assembly production and IR verification
Piping your IR output to the llc
program will tell you
if there is a typing output in our IR (for example if you try to use a pointer instead
of an integer), and will output Intel assembly code corresponding to the computer
is it currently executing onto.
$ src/driver/dtiger -i --dump-ir test.tig | opt -mem2reg | llc
[…intel assembly output…]
If you are more familiar with ARM Thumb-2 instruction set, you can obtain it by using
llc -march=arm -mcpu=cortex-m4
for example.
You can combine this with opt -mem2reg
to see the effect of removing the unneeded
alloca
IR instructions:
$ src/driver/dtiger -i --dump-ir test.tig | opt -mem2reg | llc -march=arm -mcpu=cortex-m4
[…thumb2 assembly output…]
Code organization
The IR code generator is called with the -i
option. It is interesting to
also add the --dump-ir
option which provides a text dump of the generated IR.
The declarations of the IR code generator can be found in src/irgen/irgen.hh
.
The implementation is split in two parts:
src/irgen/irgen.cc
contains the code to generate new functions, stack frames, variable declarations, etc.src/irgen/irgen-visitor.cc
contains the code to transform the AST into IR.
For this lab, you are provided with an incomplete src/irgen/irgen-visitor.cc
that you will have to fill according to the instructions. Each visit
method
must return a llvm::Value *
which represents the result of the translated
expression. If the expression (or statement) has no useful return value,
visit
must return nullptr
.
LLVM Builder
A LLVM IRBuilder object helps you insert code into a basic block. It
remembers the block and the place within the block where it is currently
inserting code (called the insertion point). Our IRGenerator
class
contains a Builder
field for this purpose.
The IRBuilder contains a lot of methods designed to make code generation easier. For example, it can move the insertion point to the end of another basic block, it can generate a branch instruction to another block, or generate a conditional branch instruction to another block, etc.
▶ Read carefully the provided code and make sure you understand
how the visitor for the Sequence
AST node works, both for a non-empty and
an empty sequence.
Variables
▶ Using utility methods declared in irgen.hh
, implement the visitor
for VarDecl
nodes. Do not forget to initialize the variable content
with its initial value if it has one. Look at the IRBuilder provided
utility to store the initial value at the right address. Also, the
llvm::Value *
representing the variable address should be registered
into the allocations
map which maps variable declarations to their
addresses.
To validate your implementation, you can check that the following Tiger code
let var a := 1 in end
generates something like
define i32 @main() {
entry:
%a = alloca i32
br label %body
body: ; preds = %entry
store i32 1, i32* %a
ret i32 0
}
One can clearly identify the allocation of a i32
reserved space
onto the stack (whose address is stored into %a
) in the entry
block of the main function. In the body, the initial value 1
is stored at address %a
.
▶ Implement the visitor for Identifier
nodes.
The following code
let var a := 1 in print_int(a) end
should now also contain something like:
%0 = load i32, i32* %a
call void @__print_int(i32 %0)
showing that the value stored at %a
has been read into %0
and then passed to function print_int
.
▶ Implement the visitor for Assign
nodes.
The following code
let var a := 1 in a := 3; print_int(a) end
should now also contain something like:
store i32 1, i32* %a
store i32 3, i32* %a
%0 = load i32, i32* %a
call void @__print_int(i32 %0)
representing the successive assignments of variable a
.
Do not worry with the useless initial assignment; once
we will go through the LLVM optimizer, the useless
assignment to 1
will be removed from the final code
automatically and look like:
define i32 @main() local_unnamed_addr {
entry:
tail call void @__print_int(i32 3)
ret i32 0
}
You can see it by yourself by piping the result of the
dtiger
compiler through opt -O3 | llvm-dis
:
$ src/driver/dtiger -i --dump-ir test.tig | opt -O3 | llvm-dis
Note: you might need to add some LLVM-specific directory to your PATH
variable
in order to find the opt
and llvm-dis
tools.
Tests and branches
Translating a IfThenElse
AST node into IR is simple but
requires several steps. In Tiger, a if … then … else …
expression might return a value, as in
let var a := if 2 > 3 then 10 else 20 in … end
So the various steps needed to translate such a node are:
- Allocate stack space (in the entry block) to represent
the result (let’s call it
%result
). - Build three new basic blocks: the
if_then
block to represent thethen
alternative, theif_else
block to represent theelse
alternative, and theif_end
block as the join point where the execution will resume after theif … then … else …
. - Evaluate the test condition and either branch to the
if_then
block or to theif_else
block depending on the test evaluation (not 0 / 0). Do not forget that the conditional branch needs ai1
(boolean) as a parameter, not ai32
. You might want to look for theBuilder.CreateICmpNE()
helper function to compare your value to the constant 0. - In the
if_then
block, evaluate the body, assign its value to the%result
variable, and branch to theif_end
block. - Do the same thing for the
if_else
branch. - Set the current insertion point to be the
if_end
block so that the code generation continues from there. - Load and return the result stored in
%result
.
There is one pitfall however: if the if …
expression is of type void
,
then the %result
variable must not be
written to or read from (and nullptr
must be returned by visit
).
▶ Implement the visitor for IfThenElse
AST nodes.
Do not forget to test your implementation using tests with
and without else
, and tests returning either a value or
nothing.
Loops
▶ Implement the visitor for WhileLoop
without taking care
of Tiger break
statements at this time.
▶ Implement the visitor for Break
. You need to modify
the WhileLoop
and ForLoop
visitors in order to
associate the loop AST nodes with their exit blocks
using the loop_exit_bbs
map.