|
Course
Info
CS 179E: Project in Computer Science (Compilers)
Professor: Mohsen Lesani
Professor office hours: Email for appointment, online or in person
Discussions: Piazza website, Please check ilearn for the link
Prerequisites: CS 100 and CS 152 with grades of C or better; ENGR
180W; 8 additional upperĀdivision units in Computer Science
Units: 4
Suggested book: Andrew Appel with Jens Palsberg, Modern Compiler
Implementation in Java, 2nd edition. Cambridge University Press, 2002.
In this course, the students form groups and can choose to develop
their own compiler projects in the areas of distributed systems, blockchains and
cryptocurrencies, automatic program synthesis and programming languages. See the instructor homepage
for example projects. The students can choose the following predefined
and well-developed project. The project covers all the phases of the
design and implementation of a modern compiler for object-oriented
languages. The project is divided into four separate phases that can be
done independently. The phases can be combined into a Java-to-MIPS
compiler. All the required material including the test-case programs
that are used for grading, ample description, implementation hints and
documentation are available in this page.
The
goal of the course is to design and implement the main phases of a
modern compiler. The project consists of four phases. In these
phases, you will be converting a subset of the Java language called
MiniJava to simpler languages and eventually to complete MIPS machine
code. The phases can be
combined into a MiniJava-to-MIPS compiler.
This project was first designed and
implemented by Jens
Palsberg's group. It is adopted and extended by Mohsen Lesani.
Evaluation:
Submissions:
Phase 1: 25%
Phase 2: 25%
Phase 3: 25%
Phase 4: 25%
Note that the submission of each phase of the project should include a
document with the following secitons: (1) requirements and specifications,
(2) design, (3) testing and verification
Academic Integrity:
You may talk about the problem with fellow students, the TA, and the
instructor, consult books, papers, published material or the Web. However,
you must reference all the sources that helped you. You should write
solutions in your own words and coding.
Phases
Phase 1: Type-checking
Phase 2: Intermediate Code Generation for Object-oriented Languages
Phase 3: Register Allocation
Phase 4: Activation Records and Instruction Selection
Phase
1: Type-checking
Introduction
MiniJava
is a
subset of java that includes the bare minimum of Java: integers,
integer arrays, classes,
subclasses, and printing integers to standard out. It does not permit
any float types,
strings, overloading and overriding methods, or any interfaces. It has
a few other restrictions, but those
are minor. Thus, the MiniJava statement System.out.println(...); can
only print integers.
The MiniJava expression e.length
only applies to expressions of type int[].
A sample
MiniJava program is Factorial.java. Download
class Factorial{ public static void main(String[] a){ System.out.println(new Fac().ComputeFac(10)); } }
class Fac { public int ComputeFac(int num){ int num_aux; if (num < 1) num_aux = 1; else num_aux = num * (this.ComputeFac(num-1)); return num_aux; } }
Dependencies
We
need to have the following:
- JavaCC (Java Compiler Compiler) parser generator. Install javacc:
$ sudo apt-get
install javacc
Given a context free grammer, JavaCC generates a parser.
- JTB (Java Tree Builder) jar file jtb.jar: Download
JTB is a syntax tree builder to be used with JavaCC. Given
a JavaCC grammar file, it automatically generates syntax tree and
visitor classes and an annotated JavaCC grammer to build syntax trees.
- MiniJava Grammer for JavaCC minijava.jj:
Download
The context free grammer of MiniJava that is input to JTB and
JavaCC.
Here is a more readable representation of the MiniJava grammar.
- Test cases Phase1Tests.tar.gz:
Download
The test cases that we test our type checker with.
Generating AST
We
must first create a parser for the MiniJava language and generate a set
of syntax tree classes and visitor classes. For an overview of JTB, JavaCC and the Visitor pattern, study the following slides. Download
Complete the
following steps.
$ java -jar
/path/to/jtb.jar /path/to/minijava.jj
$ javacc
jtb.out.jj
Once this is done, you will have a complete parser for MiniJava and a
set of classes used for traversing the syntax tree. You will also have
two different default visitors: DepthFirstVisitor and GJDepthFirst. You
should extend these two visitors to type check a MiniJava program.
Type-checking
The
goal of phase 1 is to write a type checker for MiniJava. Given a
program, the type checker checks at compile time that type mismatch
does not happen at run time. It either approves or rejects the program.
The set of rules that the type checker checks are represented as a type
system. You can consult the book chapter 5 on semantic analysis and the
following MiniJava type system. Download
Submission
Your
main file should be called Typecheck.java,
and if P.java contains
a program to be
type checked, then:
java Typecheck < P.java
should
print either Program
type checked successfully or Type error.
Untar
the following tarball and
read the instructions in the ReadMe.txt
file.
Your
submission must work with the testing
script when run on the department servers. Make
sure you test your final tarball before submitting.
Phase1Tester.tar.gz
Download
Phase 2: Intermediate Code Generation for Object-oriented Languages
Introduction
In this phase, we translate programs in the MiniJava language
to
programs in the Vapor language. Vapor is a low level assembly-like
language. Vapor programs are described as a set of functions and data
segments. In contrast to architecutal assembly languages that support a
finite number of registers, Vapor functions can use an unbounded number
of variables. The main challenge in the translation from MiniJava to
Vapor is mapping objects to memory and object-oriented method calls to
function calls. Consult the book chapter 7 on Translation to
Intermediate Code and chapter 14 on Object-Oriented Languages.
Dependencies
We
need to have the following items. We had the first three in phase 1.
-
JavaCC (Java Compiler Compiler) parser generator. Install javacc:
$ sudo apt-get
install javacc
Given a context free grammer, JavaCC
generates a parser.
- JTB (Java Tree Builder) jar file jtb.jar: Download
JTB is a syntax tree builder to be used with JavaCC. Given
a JavaCC grammar file, it automatically generates syntax tree and
visitor classes and an annotated JavaCC grammer to build syntax trees.
-
MiniJava Grammer for JavaCC minijava.jj:
Download
The context free grammer of MiniJava that is input to JTB and
JavaCC.
Here is a more readable representation of the MiniJava
grammar.
-
Vapor Interpretor vapor.jar:
Download
- Test cases
Sample input programs Phase2Tests.tar.gz:
Download
Sample output programs Phase3Tests.tar.gz:
Download
Vapor Language Specification
Vapor program is a
list of
functions and data segments. For example:
const Table
2
3
5
7
func LoadTableData(offset)
addr = Add(:Table offset)
ret addr
func DoSomething(a b)
PrintInt(a)
addr = call :LoadTableData(b)
val = [addr+4]
ret val
Identifiers.
Identifiers are used for two things: variables and labels. Identifiers
consist of a sequence of letters, digits, dots (.) , and
underscores (_) , but the first character
cannot be a digit or a dot.
Variables are always local to a function; variable names must be unique
within a function.
There are three types of labels: data labels, function labels and code
labels. All label names must be unique across the entire program.
Data Segments.
Vapor has two types of global data segments. A const segment is for read-only
data (like virtual function tables). A var segment is for global
mutable data.
Each section starts with a data label and is followed by static data
values. For example:
const MinutesPerHour
60
var MyClass.FunctionTable
:MyClass.Start
:MyClass.Finish
-1
Each entry in a data segment is four bytes long. The entire first
segment is four bytes long and contains the 2's complement
representation of the number 60. It's a constant data segment and so
memory write operations will fail at runtime.
The second segment is twelve byte long and consists of the address of
the MyClass.Start
function, followed by the address of the MyClass.Finish
function, followed by the 2's complement representation of the number
-1. This is a variable data segment and can be written to at runtime.
The two data labels, MinutesPerHour
and MyClass.FunctionTable,
can be used in other places in the program.
All values (integers and addresses) are four bytes long.
Functions.
The syntax for a function definition is:
func FunctionLabel(params...)
body...
Each line of the body of a function is one of:
- code label: Label:
- assignment: Location = Value
- branch: if Value goto CodeAddress
- goto: goto CodeAddress
- function call: call FunctionAddress (Args...)
- function return: ret Value
- call to built-in operation: OpName
(Args...)
Assignment.
There are three distinct types of assignment. The first is variable
assignment:
Var = Value
Here, Value is either an
integer literal, a string literal, a variable name, or a label
reference :Label.
For example:
a =
12
Store the integer literal 12 into variable a.
a =
"abc" Store
the string abc into
variable a.
a =
sum
Copy the value in variable sum into variable a.
a =
:Factorial Store the address of
the label Factorial into
variable a.
For your homework, strings are only allowed as arguments to the Error built-in that we consider
below.
The next two types of assignment are memory load and memory store.
Memory operations always operate on 4-byte quantities and memory
addresses must be 4-byte aligned.
Var = MemoryReference
MemoryReference =
Value
A memory reference consists of a base address, which is either a label
reference or a register, followed by an integer offset (either positive
or negative).
[:MyArray+4] refers
to the address 4 bytes past the MyArray
label.
[x-4] refers to the address 4 bytes before the address stored in
variable x.
Some memory load and store examples:
x =
[:FunctionTable+8]
[:GlobalCounter] = 15
[array-4] = length
Branch
There are two variants of the branch instruction:
if Value goto :CodeLabel
if0 Value goto :CodeLabel
The if jumps to CodeLabel if Value is non-zero and falls
through to the next instruction otherwise. The if0 does the opposite, jumping
to the specified label if Value
is zero.
Goto
The goto
instruction is an unconditional jump to the specified target.
goto :CodeLabel
In addition to jumping to fixed labels, the goto instruction can also jump
to a computed address read in from a variable.
goto Var
For
this phase, goto can only refer to code
labels.
Function Call
Var = call :FunctionLabel
(Args...)
The assignment Var = is
optional. The arguements list Args...
is a whitespace-separated list of value entries (either integer
literals, variables, or label references). The return value of the
function is stored in the Var
variable.
Like goto,
call
can also use a function address loaded from a variable:
Var = call Var (Args...)
Function Return
The ret instruction
returns from a function. The return value is optional.
ret Value
ret
Built-In Operations
In addition to the core language, the Vapor interpreter also supports a
set of built-in operations for arithmetic, memory
allocation, and displaying output.
Basic arithmetic: Add, Sub, Mul, Div,
Rem, MulS, DivS, RemS, ShiftL,
ShiftR, ShiftLA. The "-S" variants operate on signed integers.
a = Add(a b)
Comparison: Eq,
Ne, Lt, Le, LtS, LeS. The "-S" variants operate on signed
integers.
Bitwise boolean operators: And, Or, Not.
HeapAlloc
and HeapAllocZ
take an integer - the number of bytes of
memory to allocate - and returns the address of newly-allocated memory.
The "-Z" variant also initalizes the memory to all zero.
a = HeapAlloc(20)
Output: PrintInt and PrintIntS print out unsigned
and signed integers,
respectively. PrintString prints
out strings. These do not return a
value.
PrintInt(13)
s = "Hello"
PrintString(s)
Error is for abnormal program termination (for errors like null
pointer
deferences, etc). It takes a string message to display to the user.
DebugPrint is only
for debugging. It accepts any number of values and
prints out the interpreter's internal representation of the value. This
can be useful for getting information about pointers.
Though
the full specification defines many built-in operations, for your
homework you are only allowed to use the following:
- Basic Arithmetic: Add, Sub, MulS
- Comparison: Eq, Lt, LtS
- Displaying Output: PrintIntS
- Memory Allocation: HeapAllocZ
- Error: Error
Translator
Given a MiniJava program as input, the translator should output an
equivalent Vapor program.
Consider
the factorial MiniJava program. Factorial.java
Download
class Factorial{ public static void main(String[] a){ System.out.println(new Fac().ComputeFac(10)); } }
class Fac { public int ComputeFac(int num){ int num_aux; if (num < 1) num_aux = 1; else num_aux = num * (this.ComputeFac(num-1)); return num_aux; } }
The translator should traslate it to a Vapor program like the
following. Factorial.vapor
Dowload
const vmt_Fac
:Fac.ComputeFac
func Main()
t.0 = HeapAllocZ(4)
[t.0] = :vmt_Fac
if t.0 goto :null1
Error("null pointer")
null1:
t.1 = [t.0]
t.1 = [t.1+0]
t.2 = call t.1(t.0 10)
PrintIntS(t.2)
ret
func Fac.ComputeFac(this num)
t.0 = LtS(num 1)
if0 t.0 goto :if1_else
num_aux
= 1
goto :if1_end
if1_else:
t.1 =
[this]
t.1 =
[t.1+0]
t.2 =
Sub(num 1)
t.3 = call t.1(this t.2)
num_aux
= MulS(num t.3)
if1_end:
ret num_aux
A Vapor program p.vapor
can be run using the Vapor interpreter vapor.jar as follows:
$ java -jar
vapor.jar run p.vapor
Hints
During the translation, for each class, we construct the two objects
class record and class v-table that keep information about the location
of the fields and methods. These object are incrementally built at the
beginning and used during the translation.
A class record maps field names to their offsets. Fields are assigned
offsets in the order that they are visited. The class record of a
subclass has a reference to the class record of the superclass and the
offsets of the former start from the size of the latter. When a field
access is translated to a memory operation, the class record is used to
determine the offset of the object memory that the memory operation
should access.
The class v-table maps method names of the class to their assigned
offsets. The v-table of a subclass is a copy of the v-table of
superclass that is extended for new methods of the subclass. It is also
good to keep a list of method names in the order of their offsets that
is used when the v-table is written.
From type-checking, we get the classes in an order where every subclass
comes after its superclass. The list is traversed in order and the
class record and v-table for each class is created as extensions of
those of its superclass.
Translation is done recursively using visitors. To generate the
translation, a printer class is used where indentation can be increased
and decreased on the entry and exit from scopes.
Identifiers can be declared with different types in different scopes.
Thus, a symbol table that maps identifiers to types is kept during the
translation. The type of an identifier is either class, method, field
or local variable. The symbol table keeps a stack of scopes. For
example at the beginning of a class, a new scope is pushed and the
implicit field "this", the fields and the methods of the class are
added to the scope. The main class is handled separately to generate
the "Main()" function.
In the translation of each class, its v-table is translated to a const
data segment.
In the translation of each (non-static) method, the implicit this
parameter is explicitly added to the parameters of the translated
function.
In the translation of statements, we create unique temporal variables
using a sequence number. Similarly, we can generate unique labels.
In the translation of an identifier either in an expression or as the
left hand side of an assignment, its type is retrieved from the symbol
table. If it is a field, the class type for "this" can be retrieved
from the symbol table, the class record can be retrieved from the class
type and the index of the field can be retrieved from the class
record. If the retrieved index of the field is i, the offset for the
memory access is statically computed as i * 4 + 4. This is because, a
word is four bytes and the first location of an object stores a
reference to the v-table.
In the translation of an array access or assignment, the index is
multiplied (using the MulS instruction) by 4 and then it is added
(using the Add instruction) to the base address. This is because words
are 4 bytes. Then, another 4 is added to it because the first word
stores the size of the array. Note that before these instructions, the
out-of-bounds condition should be checked. The size of the array is read
from the base address of the array and compared to the index. The array
length instruction is simply translated to an access to the array base
address.
s = [b] where b is the base address
ok = LtS(i, s) where i is the index if ok goto :l'
Error("Array index out of bounds")
l': ok = LtS(-1, i)
if ok goto :l
Error("Array index out of bounds")
l: o = MulS(i
4)
d = Add(b, o)
r = [d+4]
In the translation of the if statement, we label the beginning of the
else statement and the end and use the instruction if0 to goto to the
else label if the condition fails. We goto to the end label at the end
of the then statement. The while statement can be similarly translated
by labeling the start and the end.
In the translation of a method call, if the value of the receiver is
zero, a null pointer error is reported. The class type and then the v-table of
the receiver object can be retrieved from the symbol table. The offset of the method can be retrieved from the v-table.
vt =
[rc] where rc is the receiver
f = [vt +
d] where d = o * 4,
o is the offset of the method retrieved from the v-table
r = call f(rc
arg1 arg2)
In the translation of an object allocation, the class type and then
the class record can be retrieved from the symbol table. The allocation
size is 4 times the record size plus 4. This is because a word is four
bytes and the first word is used to store a reference to the v-table of
the object.
r =
HeapAllocZ(s) where s = rs * 4 + 4 and rs is
the record size
[r] =
:ClassVTableName
In the translation of array allocation, the size is multiplied (using
the MulS instruction) by 4 and added (using the Add instruction) by 4.
This is because a word is 4 bytes and the first word stores the array
size. The resulting size is passed to the HeapAllocZ instruction.
The and expression can be translated to two if0 instructions that check
if each of the operands is zero, assign zero to the result and goto the
end label. Before the end label, one is assigned to the result. The not
expression can be similarly translated.
Less than, addition, subtraction and multiplication operations are
directly mapped to LtS, Add, Sub, and MulS instructions.
True and false literals are translated to 1 and 0 and integer literals
are directly translated.
Submission
Your
main file should be called J2V.java, and if P.java contains a syntactically
correct MiniJava program, then
java J2V < P.java > P.vapor
creates
a Vapor program P.vapor with
the same behavior as P.java.
Untar
the following tarball
and
read the instructions in the ReadMe.txt
file. Your
submission must work with the testing
script when run on the department servers.
Make
sure you test your final tarball before submitting.
Phase2Tester.tar.gz
Download
Phase
3: Register Allocation
Introduction
In
this phase, we translate the Vapor language to the Vapor-M language. In
contrast to Vapor that provides local variables, Vapor-M provides
registers and stacks. The local variables should be mapped to registers
and run-time stack elements. Consult the book chapter 6 on Activation
Records, chapter 10 on Liveness Analysis and the following paper on
register allocation.
Massimiliano Poletto and Vivek Sarkar, Linear Scan Register
Allocation, ACM TOPLAS 21(5):895-913, 1999.
Dependencies
We
need to have the following items. We had the first item in phase 2.
-
Vapor Interpretor vapor.jar:
Download
-
Vapor Parser vapor-parser.jar:
Download
The source code vapor-parser-source.tar.gz
Download and
the java doc vapor-parser-javadoc.tar.gz
Download for
this parser are also available.
-
Test cases
Sample input programs Phase3Tests.tar.gz:
Download
Sample output programs Phase4Tests.tar.gz:
Download
Vapor-M Language Specification
A Vapor-M program is largely the
same as a Vapor program except that instead of local variables,
registers and stack memory are used.
Instead of local variables we have 23 registers: $s0..$s7, $t0..$t8, $a0..$a3, $v0, $v1.
Registers are global to all functions (whereas local variables were
local to a function activation).
To follow MIPS calling conventions, we use the registers as follows:
$s0..$s7: general
use callee-saved
$t0..$t8: general
use caller-saved
$a0..$a3: reserved
for argument passing
$v0: returning a
result from a call
$v0, $v1: can also
be used as temporary registers for loading values from the stack
The register $t9 is
reserved for the next assignment.
Each function has three stack arrays called in, out, and locals. The in and out arrays
are for passing arguments between functions. The in array
refers to the out
array of the caller. The local
array is for function-local storage that can be used for spilled
registers. The sizes of these arrays are declared at the top of every
function (instead of a parameter list).
Each element of each array is a 4-byte word. The indexes into the array
is the word-offset (not the byte offset). Array references can be used
wherever memory references can be used. So in[1]
refers to the second element of the in stack
array.
func Run [in 2, out 0, local 4]
$t1 = in[1]
local[3] = $t1
PrintString($t1)
$v0 = 1
ret
Translation
Consider
the factorial Vapor program Factorial.vapor.
Download
const vmt_Fac
:Fac.ComputeFac
func Main()
t.0 = HeapAllocZ(4)
[t.0] = :vmt_Fac
if t.0 goto :null1
Error("null pointer")
null1:
t.1 = [t.0]
t.1 = [t.1+0]
t.2 = call t.1(t.0 10)
PrintIntS(t.2)
ret
func Fac.ComputeFac(this num)
t.0 = LtS(num 1)
if0 t.0 goto :if1_else
num_aux
= 1
goto :if1_end
if1_else:
t.1 =
[this]
t.1 =
[t.1+0]
t.2 =
Sub(num 1)
t.3 = call t.1(this t.2)
num_aux
= MulS(num t.3)
if1_end:
ret num_aux
The translator should traslate it to a Vapor-M program like the
following. Factorial.vaporm.
Download
const vmt_Fac
:Fac.ComputeFac
func Main [in 0, out 0, local 0]
$t0 = HeapAllocZ(4)
[$t0] = :vmt_Fac
if $t0 goto :null1
Error("null pointer")
null1:
$t1 = [$t0]
$t1 = [$t1]
$a0 = $t0
$a1 = 10
call $t1
$t1 = $v0
PrintIntS($t1)
ret
func Fac.ComputeFac [in 0, out 0, local 1]
local[0] = $s0
$t0 = $a0
$s0 = $a1
$t1 = LtS($s0 1)
if0 $t1 goto :if1_else
$t1 = 1
goto :if1_end
if1_else:
$t2 = [$t0]
$t2 = [$t2]
$t3 = Sub($s0 1)
$a0 = $t0
$a1 = $t3
call $t2
$t3 = $v0
$t1 = MulS($s0 $t3)
if1_end:
$v0 = $t1
$s0 = local[0]
ret
The jar file vapor-parser.jar
includes both the parser and syntax tree classes for the Vapor
language. Compile your program against it using the -classpath option. A Vapor
program can be parsed to Vapor syntax tree using the Vapor parser vapor-parser.jar as follows. A
Vapor program will never contain the following AST nodes:
VVarRef.Register, VMemRef.Stack.
import cs132.util.ProblemException;
import
cs132.vapor.parser.VaporParser;
import cs132.vapor.ast.VaporProgram;
import cs132.vapor.ast.VBuiltIn.Op;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;
public static VaporProgram
parseVapor(InputStream in, PrintStream err) throws IOException {
Op[] ops = {
Op.Add,
Op.Sub, Op.MulS, Op.Eq, Op.Lt, Op.LtS,
Op.PrintIntS, Op.HeapAllocZ, Op.Error,
};
boolean allowLocals = true;
String[] registers =
null;
boolean allowStack = false;
VaporProgram tree;
try {
tree =
VaporParser.run(new InputStreamReader(in), 1, 1,
java.util.Arrays.asList(ops),
allowLocals, registers, allowStack);
}
catch (ProblemException ex) {
err.println(ex.getMessage());
return
null;
}
return tree;
}
A Vapor-M program p.vaporm
can be run using the Vapor interpreter vapor.jar as follows:
$ java -jar
vapor.jar run -mips p.vaporm
Hints
The data segments are simply translated to const segments.
Register allocation is applied to each function separately.
$a0..$a3: reserved
for argument passing
$s0..$s7: general
use callee-saved
$t0..$t8: general
use caller-saved
$v0: returning a
result from a call
The $t variables are temporary caller saved registers, while the $s
registers are callee saved. Each function saves the $s registers that
it will use at the beginning of the function and saves the $t registers
that it will need before calling another function.
Each frame for a function has three parts
In
The arguments of the call that are
spilled to the stack.
This section is sometimes considered to
be part of the caller stack.
Local
The local variables (that are spilled to
the stack)
The backup for the caller saved registers used
by the function Out
The backup for arguments passed by
registers
Arguments that are spilled on the stack
The size of these sections is the
maximum number of arguments of called
functions.
For each function, the data flow analysis results in the def, use, in,
and out sets of variables for each instruction. The set of active
variables at each instruction is the union of the def and in sets for
the instruction. The live intervals (used by the linear scan algorithm)
can be calculated using the active sets by a top-down scan of the
instructions.
The first parameters are mapped to the argument registers. The
parameters that do not fit in the argument registers are spilled to the
(in) stack. The argument registers that are not used for arguments are
added to the pool of registers used for register allocation. The pool always
includes $s and $t registers.
We use the linear scan algorithm for register allocation. See the
algorithm in the paper. The algorithm returns a mapping from variables
to locations and the number of variables spilled to the stack. A location
is either a register, a (local) stack offset or an (in) stack offset.
Note that for each parameter that is mapped to an argument register, an
interval starts from the first instruction. These intervals are marked
as fixed so that they are not spilled by the algorithm to the stack.
Once these intervals reach their end points, their argument registers
can be put back to the pool to be reused.
Using the allocation map resulted from the register allocation
algorithm and the out sets resulted from the liveness analysis, the
sets of registers that are used after an instruction can be calculated.
This set is later used to determine the registers that should be saved
before a function call.
On entry to a function
Saving the used saved registers $s
On exit from a function
Assigning the return value to $v0
Restoring the saved registers $s
Before a function call
Saving temporal registers $t that will be used after the
call
Saving the argument registers $a
Copying the arguments to the argument registers and the
stack
Note that if arguments of the caller should be
passed to callee, they should
be read from the stack as otherwise $a
registers would be both read and
written.
After a function call
Restoring temporal registers
Restoring argument registers
Some instructions accept only registers as operands. The registers $v0
and $v1 can also be used as temporary registers for loading values from
the stack.
An implementation-specific note is that during the parsing, unique
integer values are assigned to variables that we use to refer to variables
instead of names.
Submission
Your
main file should be called V2VM.java, and if P.vapor contains a syntactically
correct Vapor program, then
java V2VM < P.vapor > P.vaporm
creates
a Vapor-M program P.vaporm with
the same behavior as P.vapor
Untar
the following tarball
and
read the instructions in the ReadMe.txt
file. Your
submission must work with the testing
script when run on the department servers.
Make
sure you test your final tarball before submitting.
Phase3Tester.tar.gz
Download
Phase
4: Activation Records and Instruction Selection
Introduction
In this phase, the Vapor-M registers and stacks should be mapped to
MIPS registers and runtime stack. In addition, the Vapor-M instuctions
should be mapped to MIPS instructions.
Dependencies
We
need to have the following items. We had the first item in phase 3.
-
Vapor Parser vapor-parser.jar:
Download
The source code vapor-parser-source.tar.gz
Download and
the java doc vapor-parser-javadoc.tar.gz
Download for
this parser are also available.
-
Test cases
Sample input programs Phase4Tests.tar.gz:
Download
Sample output programs Phase5Tests.tar.gz:
Download
- MIPS Interpreter mars.jar
Download from the MARS project
Translation
The translator should traslate a Vapor-M program like Factorial.vaporm (Download) to a MIPS program like Factorial.s (Download)
The jar file vapor-parser.jar
includes both the parser and syntax tree classes for the Vapor-M
language. Compile your program against it using the -classpath option. A Vapor-M
program can be parsed to Vapor-M syntax tree using the Vapor parser vapor-parser.jar as follows. A
Vapor-M program will never contain the following AST node:
VVarRef.Local.
import cs132.util.ProblemException;
import
cs132.vapor.parser.VaporParser;
import cs132.vapor.ast.VaporProgram;
import cs132.vapor.ast.VBuiltIn.Op;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;
public static VaporProgram
parseVapor(InputStream in, PrintStream err) throws IOException {
Op[] ops = {
Op.Add,
Op.Sub, Op.MulS, Op.Eq, Op.Lt, Op.LtS,
Op.PrintIntS, Op.HeapAllocZ, Op.Error,
};
boolean allowLocals = false;
String[] registers = {
"v0", "v1",
"a0", "a1", "a2", "a3",
"t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7",
"s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
"t8",
};
boolean allowStack = true;
VaporProgram tree;
try {
tree =
VaporParser.run(new InputStreamReader(in), 1, 1,
java.util.Arrays.asList(ops),
allowLocals, registers, allowStack);
}
catch (ProblemException ex) {
err.println(ex.getMessage());
return
null;
}
return tree;
}
A MIPS program P.s can be
run as
$ java -jar mars.jar nc
P.s
Running without any arguments will bring up the simulator GUI, which
will let you step forward and backward through your program.
Hints
MIPS snippet
.data
var1: .word 5
table:
Label1
Label2
.text
jal Main
li $v0 10 # syscall: exit
syscall
_print:
li $v0 1 # syscall: print integer
syscall
la $a0 _newline # address of string in memory
li $v0 4 # syscall: print string
syscall
jr $ra
_error:
li $v0 4 # syscall: print string
syscall
li $v0 10 # syscall: exit
syscall
_heapAlloc:
li $v0 9 # syscall: sbrk
syscall # address in $v0
jr $ra
.data
.align 0
_newline: .asciiz "\n"
_str0: .asciiz "null pointer\n"
Common instructions
li reg, value # load immediate
load immediate value into destination
register
la reg, label # load address
load address of the label to the register
lw $t2, ($t0) # load word
lw $t2, 4($t0)
load word at $t0 to $t2
load word at (St0 + 4) to $t2
sw reg, label # store word
store word reg to location label
move reg1, reg2
copy the value of reg1 to reg2
jal label # jump and link
copy program counter (return address) to
register $ra
jump to program statement at label
jalr reg # jump to reg and link
copy program counter (return address) to
register $ra
jump to program statement at the address
in reg
j label # jump
jump to label
jr reg # jump register
jump to return address in reg
addi $t2, $t3, 5 # add immediate
$t2 = $t3 + 5
add $t0, $t1, $t2 # add
$t0 = $t1 + $t2 add as signed (2's
complement) integers
sub $t2, $t3, $t4 # subtract
$t2 = $t3 - $t4
addiu $t, $s, imm
Adds $s and a sign-extended immediate
value and stores the result in $t
addu $t1, $t6, $t7 # add as unsigned integers
$t1 = $t6 + $t7
subu reg1, reg2, reg3 # subtract as unsigned
integers
reg1 = reg2 - reg3
mult reg1, reg2 # multiply
multiply 32-bit quantities in reg1 and
reg2, and store 64-bit
result in special registers Lo and Hi:
(Hi, Lo) = reg1 * reg2
slt $d, $s, $t # set on less than
If $s is less than $t, $d is set to one.
It gets zero otherwise.
beqz reg, label # branch if equal to zero
branch to label if the value reg is zero
bnez reg, label # branch if not equal to zero
branch to label if the value reg is not
zero
The stack grows from higher addresses down to lower
addresses.
Higher address
----------------------
|
----------------------
|
$fp -> | In / Caller Out
----------------------
| ra
----------------------
| fp
----------------------
|
| Local
----------------------
|
$sp -> | Out
----------------------
| ra
----------------------
| fp
----------------------
Callee frame
Lower address
Note that each word is 4 bytes and literals below should be multiplied
by 4.
On entry to a function
Save $fp to location $sp - 2
Move $fp to $sp
Pushing the frame
Decrease $sp by size = Local + Out + 2
1 for Return address
1 for frame pointer
Saving the return register at $fp - 1 (if the function
calls any other function)
Note that $fp now holds the old $sp
On exit from a function
Restoring the return register $ra (if the function calls
any other function)
from $fp - 1
Restore $fp from $fp - 2
Popping the frame
Decreasing $sp by size = Local + Out + 2
Jumping to the return register
The register $t9 can be used to load values for instructions
that
accept only registers.
Submission
Your
main file should be called VM2M.java, and if P.vaporm contains a syntactically
correct Vapor-M program, then
java VM2M < P.vaporm > P.s
creates
a MIPS program P.s with
the same behavior as P.vaporm.
Untar
the following tarball
and
read the instructions in the ReadMe.txt
file. Your
submission must work with the testing
script when run on the department servers.
Make
sure you test your final tarball before submitting.
Phase4Tester.tar.gz
Download
|
|