Summer 2022: In this course, the students form groups and can develop their own compiler projects in the areas of distributed systems, blockchains and cryptocurrencies, automatic program synthesis and programming languages. One choice is the following project. It is a well-designed project with template code, hints and test cases. The students can also define and do other projects in this area. For project ideas you can see the instructor's page.

Compiler from Java to MIPS

in Four Nifty Steps

	Course Info CS 179E: Project in Computer Science (Compilers) Professor: Mohsen Lesani Professor office hours: Email for appointment, online or in person Discussions: Piazza website, Please check ilearn for the link Prerequisites: CS 100 and CS 152 with grades of C or better; ENGR 180W; 8 additional upperdivision units in Computer Science Units: 4 Suggested book: Andrew Appel with Jens Palsberg, Modern Compiler Implementation in Java, 2nd edition. Cambridge University Press, 2002. In this course, the students form groups and can choose to develop their own compiler projects in the areas of distributed systems, blockchains and cryptocurrencies, automatic program synthesis and programming languages. See the instructor homepage for example projects. The students can choose the following predefined and well-developed project. The project covers all the phases of the design and implementation of a modern compiler for object-oriented languages. The project is divided into four separate phases that can be done independently. The phases can be combined into a Java-to-MIPS compiler. All the required material including the test-case programs that are used for grading, ample description, implementation hints and documentation are available in this page. The goal of the course is to design and implement the main phases of a modern compiler. The project consists of four phases. In these phases, you will be converting a subset of the Java language called MiniJava to simpler languages and eventually to complete MIPS machine code. The phases can be combined into a MiniJava-to-MIPS compiler. This project was first designed and implemented by Jens Palsberg's group. It is adopted and extended by Mohsen Lesani. Evaluation: Submissions: Phase 1: 25% Phase 2: 25% Phase 3: 25% Phase 4: 25% Note that the submission of each phase of the project should include a document with the following secitons: (1) requirements and specifications, (2) design, (3) testing and verification Academic Integrity: You may talk about the problem with fellow students, the TA, and the instructor, consult books, papers, published material or the Web. However, you must reference all the sources that helped you. You should write solutions in your own words and coding. Phases Phase 1: Type-checking Phase 2: Intermediate Code Generation for Object-oriented Languages Phase 3: Register Allocation Phase 4: Activation Records and Instruction Selection Phase 1: Type-checking Introduction MiniJava is a subset of java that includes the bare minimum of Java: integers, integer arrays, classes, subclasses, and printing integers to standard out. It does not permit any float types, strings, overloading and overriding methods, or any interfaces. It has a few other restrictions, but those are minor. Thus, the MiniJava statement System.out.println(...); can only print integers. The MiniJava expression e.length only applies to expressions of type int[]. A sample MiniJava program is Factorial.java. Download class Factorial{ public static void main(String[] a){ System.out.println(new Fac().ComputeFac(10)); } } class Fac { public int ComputeFac(int num){ int num_aux; if (num < 1) num_aux = 1; else num_aux = num * (this.ComputeFac(num-1)); return num_aux; } } Dependencies We need to have the following: - JavaCC (Java Compiler Compiler) parser generator. Install javacc: $ sudo apt-get install javacc Given a context free grammer, JavaCC generates a parser. - JTB (Java Tree Builder) jar file jtb.jar: Download JTB is a syntax tree builder to be used with JavaCC. Given a JavaCC grammar file, it automatically generates syntax tree and visitor classes and an annotated JavaCC grammer to build syntax trees. - MiniJava Grammer for JavaCC minijava.jj: Download The context free grammer of MiniJava that is input to JTB and JavaCC. Here is a more readable representation of the MiniJava grammar. - Test cases Phase1Tests.tar.gz: Download The test cases that we test our type checker with. Generating AST We must first create a parser for the MiniJava language and generate a set of syntax tree classes and visitor classes. For an overview of JTB, JavaCC and the Visitor pattern, study the following slides. Download Complete the following steps. $ java -jar /path/to/jtb.jar /path/to/minijava.jj $ javacc jtb.out.jj Once this is done, you will have a complete parser for MiniJava and a set of classes used for traversing the syntax tree. You will also have two different default visitors: DepthFirstVisitor and GJDepthFirst. You should extend these two visitors to type check a MiniJava program. Type-checking The goal of phase 1 is to write a type checker for MiniJava. Given a program, the type checker checks at compile time that type mismatch does not happen at run time. It either approves or rejects the program. The set of rules that the type checker checks are represented as a type system. You can consult the book chapter 5 on semantic analysis and the following MiniJava type system. Download Submission Your main file should be called Typecheck.java, and if `P.java` contains a program to be type checked, then: `java Typecheck < P.java` should print either Program type checked successfully or Type error. Untar the following tarball and read the instructions in the ReadMe.txt file. Your submission must work with the testing script when run on the department servers. Make sure you test your final tarball before submitting. Phase1Tester.tar.gz Download Phase 2: Intermediate Code Generation for Object-oriented Languages Introduction In this phase, we translate programs in the MiniJava language to programs in the Vapor language. Vapor is a low level assembly-like language. Vapor programs are described as a set of functions and data segments. In contrast to architecutal assembly languages that support a finite number of registers, Vapor functions can use an unbounded number of variables. The main challenge in the translation from MiniJava to Vapor is mapping objects to memory and object-oriented method calls to function calls. Consult the book chapter 7 on Translation to Intermediate Code and chapter 14 on Object-Oriented Languages. Dependencies We need to have the following items. We had the first three in phase 1. - JavaCC (Java Compiler Compiler) parser generator. Install javacc: $ sudo apt-get install javacc Given a context free grammer, JavaCC generates a parser. - JTB (Java Tree Builder) jar file jtb.jar: Download JTB is a syntax tree builder to be used with JavaCC. Given a JavaCC grammar file, it automatically generates syntax tree and visitor classes and an annotated JavaCC grammer to build syntax trees. - MiniJava Grammer for JavaCC minijava.jj: Download The context free grammer of MiniJava that is input to JTB and JavaCC. Here is a more readable representation of the MiniJava grammar. - Vapor Interpretor vapor.jar: Download - Test cases Sample input programs Phase2Tests.tar.gz: Download Sample output programs Phase3Tests.tar.gz: Download Vapor Language Specification Vapor program is a list of functions and data segments. For example: const Table 2 3 5 7 func LoadTableData(offset) addr = Add(:Table offset) ret addr func DoSomething(a b) PrintInt(a) addr = call :LoadTableData(b) val = [addr+4] ret val Identifiers. Identifiers are used for two things: variables and labels. Identifiers consist of a sequence of letters, digits, dots (.) , and underscores (_) , but the first character cannot be a digit or a dot. Variables are always local to a function; variable names must be unique within a function. There are three types of labels: data labels, function labels and code labels. All label names must be unique across the entire program. Data Segments. Vapor has two types of global data segments. A const segment is for read-only data (like virtual function tables). A var segment is for global mutable data. Each section starts with a data label and is followed by static data values. For example: const MinutesPerHour 60 var MyClass.FunctionTable :MyClass.Start :MyClass.Finish -1 Each entry in a data segment is four bytes long. The entire first segment is four bytes long and contains the 2's complement representation of the number 60. It's a constant data segment and so memory write operations will fail at runtime. The second segment is twelve byte long and consists of the address of the MyClass.Start function, followed by the address of the MyClass.Finish function, followed by the 2's complement representation of the number -1. This is a variable data segment and can be written to at runtime. The two data labels, MinutesPerHour and MyClass.FunctionTable, can be used in other places in the program. All values (integers and addresses) are four bytes long. Functions. The syntax for a function definition is: func FunctionLabel(params...) body... Each line of the body of a function is one of: - code label: Label: - assignment: Location = Value - branch: if Value goto CodeAddress - goto: goto CodeAddress - function call: call FunctionAddress (Args...) - function return: ret Value - call to built-in operation: OpName (Args...) Assignment. There are three distinct types of assignment. The first is variable assignment: Var = Value Here, Value is either an integer literal, a string literal, a variable name, or a label reference :Label. For example: a = 12 Store the integer literal 12 into variable a. a = "abc" Store the string abc into variable a. a = sum Copy the value in variable sum into variable a. a = :Factorial Store the address of the label Factorial into variable a. For your homework, strings are only allowed as arguments to the `Error` built-in that we consider below. The next two types of assignment are memory load and memory store. Memory operations always operate on 4-byte quantities and memory addresses must be 4-byte aligned. Var = MemoryReference MemoryReference = Value A memory reference consists of a base address, which is either a label reference or a register, followed by an integer offset (either positive or negative). [:MyArray+4] refers to the address 4 bytes past the MyArray label. [x-4] refers to the address 4 bytes before the address stored in variable x. Some memory load and store examples: x = [:FunctionTable+8] [:GlobalCounter] = 15 [array-4] = length Branch There are two variants of the branch instruction: if Value goto :CodeLabel if0 Value goto :CodeLabel The if jumps to CodeLabel if Value is non-zero and falls through to the next instruction otherwise. The if0 does the opposite, jumping to the specified label if Value is zero. Goto The goto instruction is an unconditional jump to the specified target. goto :CodeLabel In addition to jumping to fixed labels, the goto instruction can also jump to a computed address read in from a variable. goto Var For this phase, `goto` can only refer to code labels. Function Call Var = call :FunctionLabel (Args...) The assignment Var = is optional. The arguements list Args... is a whitespace-separated list of value entries (either integer literals, variables, or label references). The return value of the function is stored in the Var variable. Like goto, call can also use a function address loaded from a variable: Var = call Var (Args...) Function Return The ret instruction returns from a function. The return value is optional. ret Value ret Built-In Operations In addition to the core language, the Vapor interpreter also supports a set of built-in operations for arithmetic, memory allocation, and displaying output. Basic arithmetic: Add, Sub, Mul, Div, Rem, MulS, DivS, RemS, ShiftL, ShiftR, ShiftLA. The "-S" variants operate on signed integers. a = Add(a b) Comparison: Eq, Ne, Lt, Le, LtS, LeS. The "-S" variants operate on signed integers. Bitwise boolean operators: And, Or, Not. HeapAlloc and HeapAllocZ take an integer - the number of bytes of memory to allocate - and returns the address of newly-allocated memory. The "-Z" variant also initalizes the memory to all zero. a = HeapAlloc(20) Output: PrintInt and PrintIntS print out unsigned and signed integers, respectively. PrintString prints out strings. These do not return a value. PrintInt(13) s = "Hello" PrintString(s) Error is for abnormal program termination (for errors like null pointer deferences, etc). It takes a string message to display to the user. DebugPrint is only for debugging. It accepts any number of values and prints out the interpreter's internal representation of the value. This can be useful for getting information about pointers. Though the full specification defines many built-in operations, for your homework you are only allowed to use the following: Basic Arithmetic: `Add`, `Sub`, `MulS` Comparison: `Eq`, `Lt`, `LtS` Displaying Output: `PrintIntS` Memory Allocation: `HeapAllocZ` Error: `Error` Translator Given a MiniJava program as input, the translator should output an equivalent Vapor program. Consider the factorial MiniJava program. Factorial.java Download class Factorial{ public static void main(String[] a){ System.out.println(new Fac().ComputeFac(10)); } } class Fac { public int ComputeFac(int num){ int num_aux; if (num < 1) num_aux = 1; else num_aux = num * (this.ComputeFac(num-1)); return num_aux; } } The translator should traslate it to a Vapor program like the following. Factorial.vapor Dowload const vmt_Fac :Fac.ComputeFac func Main() t.0 = HeapAllocZ(4) [t.0] = :vmt_Fac if t.0 goto :null1 Error("null pointer") null1: t.1 = [t.0] t.1 = [t.1+0] t.2 = call t.1(t.0 10) PrintIntS(t.2) ret func Fac.ComputeFac(this num) t.0 = LtS(num 1) if0 t.0 goto :if1_else num_aux = 1 goto :if1_end if1_else: t.1 = [this] t.1 = [t.1+0] t.2 = Sub(num 1) t.3 = call t.1(this t.2) num_aux = MulS(num t.3) if1_end: ret num_aux A Vapor program p.vapor can be run using the Vapor interpreter vapor.jar as follows: $ java -jar vapor.jar run p.vapor Hints During the translation, for each class, we construct the two objects class record and class v-table that keep information about the location of the fields and methods. These object are incrementally built at the beginning and used during the translation. A class record maps field names to their offsets. Fields are assigned offsets in the order that they are visited. The class record of a subclass has a reference to the class record of the superclass and the offsets of the former start from the size of the latter. When a field access is translated to a memory operation, the class record is used to determine the offset of the object memory that the memory operation should access. The class v-table maps method names of the class to their assigned offsets. The v-table of a subclass is a copy of the v-table of superclass that is extended for new methods of the subclass. It is also good to keep a list of method names in the order of their offsets that is used when the v-table is written. From type-checking, we get the classes in an order where every subclass comes after its superclass. The list is traversed in order and the class record and v-table for each class is created as extensions of those of its superclass. Translation is done recursively using visitors. To generate the translation, a printer class is used where indentation can be increased and decreased on the entry and exit from scopes. Identifiers can be declared with different types in different scopes. Thus, a symbol table that maps identifiers to types is kept during the translation. The type of an identifier is either class, method, field or local variable. The symbol table keeps a stack of scopes. For example at the beginning of a class, a new scope is pushed and the implicit field "this", the fields and the methods of the class are added to the scope. The main class is handled separately to generate the "Main()" function. In the translation of each class, its v-table is translated to a const data segment. In the translation of each (non-static) method, the implicit this parameter is explicitly added to the parameters of the translated function. In the translation of statements, we create unique temporal variables using a sequence number. Similarly, we can generate unique labels. In the translation of an identifier either in an expression or as the left hand side of an assignment, its type is retrieved from the symbol table. If it is a field, the class type for "this" can be retrieved from the symbol table, the class record can be retrieved from the class type and the index of the field can be retrieved from the class record. If the retrieved index of the field is i, the offset for the memory access is statically computed as i * 4 + 4. This is because, a word is four bytes and the first location of an object stores a reference to the v-table. In the translation of an array access or assignment, the index is multiplied (using the MulS instruction) by 4 and then it is added (using the Add instruction) to the base address. This is because words are 4 bytes. Then, another 4 is added to it because the first word stores the size of the array. Note that before these instructions, the out-of-bounds condition should be checked. The size of the array is read from the base address of the array and compared to the index. The array length instruction is simply translated to an access to the array base address. s = [b] where b is the base address ok = LtS(i, s) where i is the index if ok goto :l' Error("Array index out of bounds") l': ok = LtS(-1, i) if ok goto :l Error("Array index out of bounds") l: o = MulS(i 4) d = Add(b, o) r = [d+4] In the translation of the if statement, we label the beginning of the else statement and the end and use the instruction if0 to goto to the else label if the condition fails. We goto to the end label at the end of the then statement. The while statement can be similarly translated by labeling the start and the end. In the translation of a method call, if the value of the receiver is zero, a null pointer error is reported. The class type and then the v-table of the receiver object can be retrieved from the symbol table. The offset of the method can be retrieved from the v-table. vt = [rc] where rc is the receiver f = [vt + d] where d = o * 4, o is the offset of the method retrieved from the v-table r = call f(rc arg1 arg2) In the translation of an object allocation, the class type and then the class record can be retrieved from the symbol table. The allocation size is 4 times the record size plus 4. This is because a word is four bytes and the first word is used to store a reference to the v-table of the object. r = HeapAllocZ(s) where s = rs * 4 + 4 and rs is the record size [r] = :ClassVTableName In the translation of array allocation, the size is multiplied (using the MulS instruction) by 4 and added (using the Add instruction) by 4. This is because a word is 4 bytes and the first word stores the array size. The resulting size is passed to the HeapAllocZ instruction. The and expression can be translated to two if0 instructions that check if each of the operands is zero, assign zero to the result and goto the end label. Before the end label, one is assigned to the result. The not expression can be similarly translated. Less than, addition, subtraction and multiplication operations are directly mapped to LtS, Add, Sub, and MulS instructions. True and false literals are translated to 1 and 0 and integer literals are directly translated. Submission Your main file should be called `J2V.java`, and if `P.java` contains a syntactically correct MiniJava program, then `java J2V < P.java > P.vapor` creates a Vapor program `P.vapor` with the same behavior as `P.java`. Untar the following tarball and read the instructions in the ReadMe.txt file. Your submission must work with the testing script when run on the department servers. Make sure you test your final tarball before submitting. Phase2Tester.tar.gz Download Phase 3: Register Allocation Introduction In this phase, we translate the Vapor language to the Vapor-M language. In contrast to Vapor that provides local variables, Vapor-M provides registers and stacks. The local variables should be mapped to registers and run-time stack elements. Consult the book chapter 6 on Activation Records, chapter 10 on Liveness Analysis and the following paper on register allocation. Massimiliano Poletto and Vivek Sarkar, Linear Scan Register Allocation, ACM TOPLAS 21(5):895-913, 1999. Dependencies We need to have the following items. We had the first item in phase 2. - Vapor Interpretor vapor.jar: Download - Vapor Parser vapor-parser.jar: Download The source code vapor-parser-source.tar.gz Download and the java doc vapor-parser-javadoc.tar.gz Download for this parser are also available. - Test cases Sample input programs Phase3Tests.tar.gz: Download Sample output programs Phase4Tests.tar.gz: Download Vapor-M Language Specification A Vapor-M program is largely the same as a Vapor program except that instead of local variables, registers and stack memory are used. Instead of local variables we have 23 registers: $s0..$s7, $t0..$t8, $a0..$a3, $v0, $v1. Registers are global to all functions (whereas local variables were local to a function activation). To follow MIPS calling conventions, we use the registers as follows: $s0..$s7: general use callee-saved $t0..$t8: general use caller-saved $a0..$a3: reserved for argument passing $v0: returning a result from a call $v0, $v1: can also be used as temporary registers for loading values from the stack The register $t9 is reserved for the next assignment. Each function has three stack arrays called in, out, and locals. The in and out arrays are for passing arguments between functions. The in array refers to the out array of the caller. The local array is for function-local storage that can be used for spilled registers. The sizes of these arrays are declared at the top of every function (instead of a parameter list). Each element of each array is a 4-byte word. The indexes into the array is the word-offset (not the byte offset). Array references can be used wherever memory references can be used. So in[1] refers to the second element of the in stack array. func Run [in 2, out 0, local 4] $t1 = in[1] local[3] = $t1 PrintString($t1) $v0 = 1 ret Translation Consider the factorial Vapor program Factorial.vapor. Download const vmt_Fac :Fac.ComputeFac func Main() t.0 = HeapAllocZ(4) [t.0] = :vmt_Fac if t.0 goto :null1 Error("null pointer") null1: t.1 = [t.0] t.1 = [t.1+0] t.2 = call t.1(t.0 10) PrintIntS(t.2) ret func Fac.ComputeFac(this num) t.0 = LtS(num 1) if0 t.0 goto :if1_else num_aux = 1 goto :if1_end if1_else: t.1 = [this] t.1 = [t.1+0] t.2 = Sub(num 1) t.3 = call t.1(this t.2) num_aux = MulS(num t.3) if1_end: ret num_aux The translator should traslate it to a Vapor-M program like the following. Factorial.vaporm. Download const vmt_Fac :Fac.ComputeFac func Main [in 0, out 0, local 0] $t0 = HeapAllocZ(4) [$t0] = :vmt_Fac if $t0 goto :null1 Error("null pointer") null1: $t1 = [$t0] $t1 = [$t1] $a0 = $t0 $a1 = 10 call $t1 $t1 = $v0 PrintIntS($t1) ret func Fac.ComputeFac [in 0, out 0, local 1] local[0] = $s0 $t0 = $a0 $s0 = $a1 $t1 = LtS($s0 1) if0 $t1 goto :if1_else $t1 = 1 goto :if1_end if1_else: $t2 = [$t0] $t2 = [$t2] $t3 = Sub($s0 1) $a0 = $t0 $a1 = $t3 call $t2 $t3 = $v0 $t1 = MulS($s0 $t3) if1_end: $v0 = $t1 $s0 = local[0] ret The jar file vapor-parser.jar includes both the parser and syntax tree classes for the Vapor language. Compile your program against it using the -classpath option. A Vapor program can be parsed to Vapor syntax tree using the Vapor parser vapor-parser.jar as follows. A Vapor program will never contain the following AST nodes: VVarRef.Register, VMemRef.Stack. import cs132.util.ProblemException; import cs132.vapor.parser.VaporParser; import cs132.vapor.ast.VaporProgram; import cs132.vapor.ast.VBuiltIn.Op; import java.io.InputStreamReader; import java.io.IOException; import java.io.PrintStream; public static VaporProgram parseVapor(InputStream in, PrintStream err) throws IOException { Op[] ops = { Op.Add, Op.Sub, Op.MulS, Op.Eq, Op.Lt, Op.LtS, Op.PrintIntS, Op.HeapAllocZ, Op.Error, }; boolean allowLocals = true; String[] registers = null; boolean allowStack = false; VaporProgram tree; try { tree = VaporParser.run(new InputStreamReader(in), 1, 1, java.util.Arrays.asList(ops), allowLocals, registers, allowStack); } catch (ProblemException ex) { err.println(ex.getMessage()); return null; } return tree; } A Vapor-M program p.vaporm can be run using the Vapor interpreter vapor.jar as follows: $ java -jar vapor.jar run -mips p.vaporm Hints The data segments are simply translated to const segments. Register allocation is applied to each function separately. $a0..$a3: reserved for argument passing $s0..$s7: general use callee-saved $t0..$t8: general use caller-saved $v0: returning a result from a call The $t variables are temporary caller saved registers, while the $s registers are callee saved. Each function saves the $s registers that it will use at the beginning of the function and saves the $t registers that it will need before calling another function. Each frame for a function has three parts In The arguments of the call that are spilled to the stack. This section is sometimes considered to be part of the caller stack. Local The local variables (that are spilled to the stack) The backup for the caller saved registers used by the function Out The backup for arguments passed by registers Arguments that are spilled on the stack The size of these sections is the maximum number of arguments of called functions. For each function, the data flow analysis results in the def, use, in, and out sets of variables for each instruction. The set of active variables at each instruction is the union of the def and in sets for the instruction. The live intervals (used by the linear scan algorithm) can be calculated using the active sets by a top-down scan of the instructions. The first parameters are mapped to the argument registers. The parameters that do not fit in the argument registers are spilled to the (in) stack. The argument registers that are not used for arguments are added to the pool of registers used for register allocation. The pool always includes $s and $t registers. We use the linear scan algorithm for register allocation. See the algorithm in the paper. The algorithm returns a mapping from variables to locations and the number of variables spilled to the stack. A location is either a register, a (local) stack offset or an (in) stack offset. Note that for each parameter that is mapped to an argument register, an interval starts from the first instruction. These intervals are marked as fixed so that they are not spilled by the algorithm to the stack. Once these intervals reach their end points, their argument registers can be put back to the pool to be reused. Using the allocation map resulted from the register allocation algorithm and the out sets resulted from the liveness analysis, the sets of registers that are used after an instruction can be calculated. This set is later used to determine the registers that should be saved before a function call. On entry to a function Saving the used saved registers $s On exit from a function Assigning the return value to $v0 Restoring the saved registers $s Before a function call Saving temporal registers $t that will be used after the call Saving the argument registers $a Copying the arguments to the argument registers and the stack Note that if arguments of the caller should be passed to callee, they should be read from the stack as otherwise $a registers would be both read and written. After a function call Restoring temporal registers Restoring argument registers Some instructions accept only registers as operands. The registers $v0 and $v1 can also be used as temporary registers for loading values from the stack. An implementation-specific note is that during the parsing, unique integer values are assigned to variables that we use to refer to variables instead of names. Submission Your main file should be called `V2VM.java`, and if `P.vapor` contains a syntactically correct Vapor program, then `java V2VM < P.vapor > P.vaporm` creates a Vapor-M program `P.vaporm` with the same behavior as `P.vapor` Untar the following tarball and read the instructions in the ReadMe.txt file. Your submission must work with the testing script when run on the department servers. Make sure you test your final tarball before submitting. Phase3Tester.tar.gz Download Phase 4: Activation Records and Instruction Selection Introduction In this phase, the Vapor-M registers and stacks should be mapped to MIPS registers and runtime stack. In addition, the Vapor-M instuctions should be mapped to MIPS instructions. Dependencies We need to have the following items. We had the first item in phase 3. - Vapor Parser vapor-parser.jar: Download The source code vapor-parser-source.tar.gz Download and the java doc vapor-parser-javadoc.tar.gz Download for this parser are also available. - Test cases Sample input programs Phase4Tests.tar.gz: Download Sample output programs Phase5Tests.tar.gz: Download - MIPS Interpreter mars.jar Download from the MARS project Translation The translator should traslate a Vapor-M program like Factorial.vaporm (Download) to a MIPS program like Factorial.s (Download) The jar file vapor-parser.jar includes both the parser and syntax tree classes for the Vapor-M language. Compile your program against it using the -classpath option. A Vapor-M program can be parsed to Vapor-M syntax tree using the Vapor parser vapor-parser.jar as follows. A Vapor-M program will never contain the following AST node: VVarRef.Local. import cs132.util.ProblemException; import cs132.vapor.parser.VaporParser; import cs132.vapor.ast.VaporProgram; import cs132.vapor.ast.VBuiltIn.Op; import java.io.InputStreamReader; import java.io.IOException; import java.io.PrintStream; public static VaporProgram parseVapor(InputStream in, PrintStream err) throws IOException { Op[] ops = { Op.Add, Op.Sub, Op.MulS, Op.Eq, Op.Lt, Op.LtS, Op.PrintIntS, Op.HeapAllocZ, Op.Error, }; boolean allowLocals = false; String[] registers = { "v0", "v1", "a0", "a1", "a2", "a3", "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7", "t8", }; boolean allowStack = true; VaporProgram tree; try { tree = VaporParser.run(new InputStreamReader(in), 1, 1, java.util.Arrays.asList(ops), allowLocals, registers, allowStack); } catch (ProblemException ex) { err.println(ex.getMessage()); return null; } return tree; } A MIPS program P.s can be run as $ java -jar mars.jar nc P.s Running without any arguments will bring up the simulator GUI, which will let you step forward and backward through your program. Hints MIPS snippet .data var1: .word 5 table: Label1 Label2 .text jal Main li $v0 10 # syscall: exit syscall _print: li $v0 1 # syscall: print integer syscall la $a0 _newline # address of string in memory li $v0 4 # syscall: print string syscall jr $ra _error: li $v0 4 # syscall: print string syscall li $v0 10 # syscall: exit syscall _heapAlloc: li $v0 9 # syscall: sbrk syscall # address in $v0 jr $ra .data .align 0 _newline: .asciiz "\n" _str0: .asciiz "null pointer\n" Common instructions li reg, value # load immediate load immediate value into destination register la reg, label # load address load address of the label to the register lw $t2, ($t0) # load word lw $t2, 4($t0) load word at $t0 to $t2 load word at (St0 + 4) to $t2 sw reg, label # store word store word reg to location label move reg1, reg2 copy the value of reg1 to reg2 jal label # jump and link copy program counter (return address) to register $ra jump to program statement at label jalr reg # jump to reg and link copy program counter (return address) to register $ra jump to program statement at the address in reg j label # jump jump to label jr reg # jump register jump to return address in reg addi $t2, $t3, 5 # add immediate $t2 = $t3 + 5 add $t0, $t1, $t2 # add $t0 = $t1 + $t2 add as signed (2's complement) integers sub $t2, $t3, $t4 # subtract $t2 = $t3 - $t4 addiu $t, $s, imm Adds $s and a sign-extended immediate value and stores the result in $t addu $t1, $t6, $t7 # add as unsigned integers $t1 = $t6 + $t7 subu reg1, reg2, reg3 # subtract as unsigned integers reg1 = reg2 - reg3 mult reg1, reg2 # multiply multiply 32-bit quantities in reg1 and reg2, and store 64-bit result in special registers Lo and Hi: (Hi, Lo) = reg1 * reg2 slt $d, $s, $t # set on less than If $s is less than $t, $d is set to one. It gets zero otherwise. beqz reg, label # branch if equal to zero branch to label if the value reg is zero bnez reg, label # branch if not equal to zero branch to label if the value reg is not zero The stack grows from higher addresses down to lower addresses. Higher address ---------------------- \| ---------------------- \| $fp -> \| In / Caller Out ---------------------- \| ra ---------------------- \| fp ---------------------- \| \| Local ---------------------- \| $sp -> \| Out ---------------------- \| ra ---------------------- \| fp ---------------------- Callee frame Lower address Note that each word is 4 bytes and literals below should be multiplied by 4. On entry to a function Save $fp to location $sp - 2 Move $fp to $sp Pushing the frame Decrease $sp by size = Local + Out + 2 1 for Return address 1 for frame pointer Saving the return register at $fp - 1 (if the function calls any other function) Note that $fp now holds the old $sp On exit from a function Restoring the return register $ra (if the function calls any other function) from $fp - 1 Restore $fp from $fp - 2 Popping the frame Decreasing $sp by size = Local + Out + 2 Jumping to the return register The register $t9 can be used to load values for instructions that accept only registers. Submission Your main file should be called `VM2M.java`, and if `P.vaporm` contains a syntactically correct Vapor-M program, then `java VM2M < P.vaporm > P.s` creates a MIPS program `P.s` with the same behavior as `P.vaporm`. Untar the following tarball and read the instructions in the ReadMe.txt file. Your submission must work with the testing script when run on the department servers. Make sure you test your final tarball before submitting. Phase4Tester.tar.gz Download

Course Info

CS 179E: Project in Computer Science (Compilers)
Professor: Mohsen Lesani
Professor office hours: Email for appointment, online or in person
Discussions: Piazza website, Please check ilearn for the link
Prerequisites: CS 100 and CS 152 with grades of C or better; ENGR 180W; 8 additional upperdivision units in Computer Science
Units: 4
Suggested book: Andrew Appel with Jens Palsberg, Modern Compiler Implementation in Java, 2nd edition. Cambridge University Press, 2002.

In this course, the students form groups and can choose to develop their own compiler projects in the areas of distributed systems, blockchains and cryptocurrencies, automatic program synthesis and programming languages. See the instructor homepage for example projects. The students can choose the following predefined and well-developed project. The project covers all the phases of the design and implementation of a modern compiler for object-oriented languages. The project is divided into four separate phases that can be done independently. The phases can be combined into a Java-to-MIPS compiler. All the required material including the test-case programs that are used for grading, ample description, implementation hints and documentation are available in this page.

The goal of the course is to design and implement the main phases of a modern compiler. The project consists of four phases. In these phases, you will be converting a subset of the Java language called MiniJava to simpler languages and eventually to complete MIPS machine code. The phases can be combined into a MiniJava-to-MIPS compiler.

This project was first designed and implemented by Jens Palsberg's group. It is adopted and extended by Mohsen Lesani.

Evaluation:
    Submissions:
        Phase 1: 25%
        Phase 2: 25%
        Phase 3: 25%
        Phase 4: 25%

         Note that the submission of each phase of the project should include a
         document with the following secitons: (1) requirements and specifications,
         (2) design, (3) testing and verification

    Academic Integrity:
         You may talk about the problem with fellow students, the TA, and the
         instructor, consult books, papers, published material or the Web. However,
         you must reference all the sources that helped you. You should write
         solutions in your own words and coding.

Phases

   Phase 1: Type-checking
   Phase 2: Intermediate Code Generation for Object-oriented Languages
   Phase 3: Register Allocation
   Phase 4: Activation Records and Instruction Selection

Phase 1: Type-checking

Introduction

MiniJava is a subset of java that includes the bare minimum of Java: integers, integer arrays, classes, subclasses, and printing integers to standard out. It does not permit any float types, strings, overloading and overriding methods, or any interfaces. It has a few other restrictions, but those are minor. Thus, the MiniJava statement System.out.println(...); can only print integers. The MiniJava expression e.length only applies to expressions of type int[].

A sample MiniJava program is Factorial.java. Download

class Factorial{
   public static void main(String[] a){
      System.out.println(new Fac().ComputeFac(10));
   }
}

class Fac {
   public int ComputeFac(int num){
      int num_aux;
      if (num < 1)
         num_aux = 1;
      else
         num_aux = num * (this.ComputeFac(num-1));
      return num_aux;
   }
}

Dependencies

We need to have the following:
- JavaCC (Java Compiler Compiler) parser generator. Install javacc:
$ sudo apt-get install javacc
Given a context free grammer, JavaCC generates a parser.

- JTB (Java Tree Builder) jar file jtb.jar: Download
JTB is a syntax tree builder to be used with JavaCC. Given a JavaCC grammar file, it automatically generates syntax tree and visitor classes and an annotated JavaCC grammer to build syntax trees.

- MiniJava Grammer for JavaCC minijava.jj: Download
The context free grammer of MiniJava that is input to JTB and JavaCC.
Here is a more readable representation of the MiniJava grammar.

- Test cases Phase1Tests.tar.gz: Download
The test cases that we test our type checker with.

Generating AST

We must first create a parser for the MiniJava language and generate a set of syntax tree classes and visitor classes. For an overview of JTB, JavaCC and the Visitor pattern, study the following slides. Download

Complete the following steps.

$ java -jar /path/to/jtb.jar /path/to/minijava.jj

$ javacc jtb.out.jj

Once this is done, you will have a complete parser for MiniJava and a set of classes used for traversing the syntax tree. You will also have two different default visitors: DepthFirstVisitor and GJDepthFirst. You should extend these two visitors to type check a MiniJava program.

Type-checking

The goal of phase 1 is to write a type checker for MiniJava. Given a program, the type checker checks at compile time that type mismatch does not happen at run time. It either approves or rejects the program. The set of rules that the type checker checks are represented as a type system. You can consult the book chapter 5 on semantic analysis and the following MiniJava type system. Download

Submission

Your main file should be called Typecheck.java, and if P.java contains a program to be type checked, then:

java Typecheck < P.java

should print either Program type checked successfully or Type error.

Untar the following tarball and read the instructions in the ReadMe.txt file.
Your submission must work with the testing script when run on the department servers. Make sure you test your final tarball before submitting.

Phase1Tester.tar.gz Download

Phase 2: Intermediate Code Generation for Object-oriented Languages

Introduction

In this phase, we translate programs in the MiniJava language to programs in the Vapor language. Vapor is a low level assembly-like language. Vapor programs are described as a set of functions and data segments. In contrast to architecutal assembly languages that support a finite number of registers, Vapor functions can use an unbounded number of variables. The main challenge in the translation from MiniJava to Vapor is mapping objects to memory and object-oriented method calls to function calls. Consult the book chapter 7 on Translation to Intermediate Code and chapter 14 on Object-Oriented Languages.

Dependencies

We need to have the following items. We had the first three in phase 1.

- JavaCC (Java Compiler Compiler) parser generator. Install javacc:
$ sudo apt-get install javacc
Given a context free grammer, JavaCC generates a parser.

- JTB (Java Tree Builder) jar file jtb.jar: Download
JTB is a syntax tree builder to be used with JavaCC. Given a JavaCC grammar file, it automatically generates syntax tree and visitor classes and an annotated JavaCC grammer to build syntax trees.

- MiniJava Grammer for JavaCC minijava.jj: Download
The context free grammer of MiniJava that is input to JTB and JavaCC.
Here is a more readable representation of the MiniJava grammar.

- Vapor Interpretor vapor.jar: Download

- Test cases
Sample input programs Phase2Tests.tar.gz: Download
Sample output programs Phase3Tests.tar.gz: Download

Vapor Language Specification

Vapor program is a list of functions and data segments. For example:

const Table
2
3
5
7

   func LoadTableData(offset)
addr = Add(:Table offset)
    ret addr

func DoSomething(a b)
    PrintInt(a)
addr = call :LoadTableData(b)
val = [addr+4]
    ret val

Identifiers.
Identifiers are used for two things: variables and labels. Identifiers consist of a sequence of letters, digits, dots (.) , and underscores (_) , but the first character cannot be a digit or a dot.

Variables are always local to a function; variable names must be unique within a function.

There are three types of labels: data labels, function labels and code labels. All label names must be unique across the entire program.

Data Segments.
Vapor has two types of global data segments. A const segment is for read-only data (like virtual function tables). A var segment is for global mutable data.

Each section starts with a data label and is followed by static data values. For example:

const MinutesPerHour
60

var MyClass.FunctionTable
:MyClass.Start
:MyClass.Finish
-1

Each entry in a data segment is four bytes long. The entire first segment is four bytes long and contains the 2's complement representation of the number 60. It's a constant data segment and so memory write operations will fail at runtime.

The second segment is twelve byte long and consists of the address of the MyClass.Start function, followed by the address of the MyClass.Finish function, followed by the 2's complement representation of the number -1. This is a variable data segment and can be written to at runtime.

The two data labels, MinutesPerHour and MyClass.FunctionTable, can be used in other places in the program.

All values (integers and addresses) are four bytes long.

Functions.
The syntax for a function definition is:

func FunctionLabel(params...)
body...

Each line of the body of a function is one of:
- code label: Label:
- assignment: Location = Value
- branch: if Value goto CodeAddress
- goto: goto CodeAddress
- function call: call FunctionAddress (Args...)
- function return: ret Value
- call to built-in operation: OpName (Args...)

Assignment.
There are three distinct types of assignment. The first is variable assignment:

Var = Value
Here, Value is either an integer literal, a string literal, a variable name, or a label reference :Label.

For example:
a = 12              Store the integer literal 12 into variable a.
a = "abc"           Store the string abc into variable a.
a = sum             Copy the value in variable sum into variable a.
a = :Factorial      Store the address of the label Factorial into variable a.

For your homework, strings are only allowed as arguments to the Error built-in that we consider below.

The next two types of assignment are memory load and memory store. Memory operations always operate on 4-byte quantities and memory addresses must be 4-byte aligned.

Var = MemoryReference
MemoryReference = Value

A memory reference consists of a base address, which is either a label reference or a register, followed by an integer offset (either positive or negative).
[:MyArray+4] refers to the address 4 bytes past the MyArray label.
[x-4] refers to the address 4 bytes before the address stored in variable x.

Some memory load and store examples:
x = [:FunctionTable+8]
[:GlobalCounter] = 15
[array-4] = length

Branch
There are two variants of the branch instruction:

if Value goto :CodeLabel
if0 Value goto :CodeLabel

The if jumps to CodeLabel if Value is non-zero and falls through to the next instruction otherwise. The if0 does the opposite, jumping to the specified label if Value is zero.

Goto
The goto instruction is an unconditional jump to the specified target.

goto :CodeLabel

In addition to jumping to fixed labels, the goto instruction can also jump to a computed address read in from a variable.

goto Var

For this phase, goto can only refer to code labels.

Function Call
Var = call :FunctionLabel (Args...)

The assignment Var = is optional. The arguements list Args... is a whitespace-separated list of value entries (either integer literals, variables, or label references). The return value of the function is stored in the Var variable.

Like goto, call can also use a function address loaded from a variable:

Var = call Var (Args...)

Function Return
The ret instruction returns from a function. The return value is optional.

ret Value
ret

Built-In Operations
In addition to the core language, the Vapor interpreter also supports a set of built-in operations for arithmetic, memory allocation, and displaying output.

Basic arithmetic: Add, Sub, Mul, Div, Rem, MulS, DivS, RemS, ShiftL, ShiftR, ShiftLA. The "-S" variants operate on signed integers.

a = Add(a b)

Comparison: Eq, Ne, Lt, Le, LtS, LeS. The "-S" variants operate on signed integers.

Bitwise boolean operators: And, Or, Not.

HeapAlloc and HeapAllocZ take an integer - the number of bytes of memory to allocate - and returns the address of newly-allocated memory. The "-Z" variant also initalizes the memory to all zero.

a = HeapAlloc(20)

Output: PrintInt and PrintIntS print out unsigned and signed integers, respectively. PrintString prints out strings. These do not return a value.

PrintInt(13)
s = "Hello"
PrintString(s)

Error is for abnormal program termination (for errors like null pointer deferences, etc). It takes a string message to display to the user.

DebugPrint is only for debugging. It accepts any number of values and prints out the interpreter's internal representation of the value. This can be useful for getting information about pointers.

Though the full specification defines many built-in operations, for your homework you are only allowed to use the following:

Basic Arithmetic: Add, Sub, MulS
Comparison: Eq, Lt, LtS
Displaying Output: PrintIntS
Memory Allocation: HeapAllocZ
Error: Error

Translator

Given a MiniJava program as input, the translator should output an equivalent Vapor program.

Consider the factorial MiniJava program. Factorial.java Download

class Factorial{
   public static void main(String[] a){
      System.out.println(new Fac().ComputeFac(10));
   }
}

class Fac {
   public int ComputeFac(int num){
      int num_aux;
      if (num < 1)
         num_aux = 1;
      else
         num_aux = num * (this.ComputeFac(num-1));
      return num_aux;
   }
}

The translator should traslate it to a Vapor program like the following. Factorial.vapor Dowload

const vmt_Fac
:Fac.ComputeFac

func Main()
t.0 = HeapAllocZ(4)
[t.0] = :vmt_Fac
if t.0 goto :null1
Error("null pointer")
null1:
t.1 = [t.0]
t.1 = [t.1+0]
t.2 = call t.1(t.0 10)
PrintIntS(t.2)
ret

func Fac.ComputeFac(this num)
t.0 = LtS(num 1)
if0 t.0 goto :if1_else
    num_aux = 1
    goto :if1_end
if1_else:
    t.1 = [this]
    t.1 = [t.1+0]
    t.2 = Sub(num 1)
    t.3 = call t.1(this t.2)
    num_aux = MulS(num t.3)
if1_end:
ret num_aux

A Vapor program p.vapor can be run using the Vapor interpreter vapor.jar as follows:

$ java -jar vapor.jar run p.vapor

Hints

During the translation, for each class, we construct the two objects class record and class v-table that keep information about the location of the fields and methods. These object are incrementally built at the beginning and used during the translation.

A class record maps field names to their offsets. Fields are assigned offsets in the order that they are visited. The class record of a subclass has a reference to the class record of the superclass and the offsets of the former start from the size of the latter. When a field access is translated to a memory operation, the class record is used to determine the offset of the object memory that the memory operation should access.

The class v-table maps method names of the class to their assigned offsets. The v-table of a subclass is a copy of the v-table of superclass that is extended for new methods of the subclass. It is also good to keep a list of method names in the order of their offsets that is used when the v-table is written.

From type-checking, we get the classes in an order where every subclass comes after its superclass. The list is traversed in order and the class record and v-table for each class is created as extensions of those of its superclass.

Translation is done recursively using visitors. To generate the translation, a printer class is used where indentation can be increased and decreased on the entry and exit from scopes.

Identifiers can be declared with different types in different scopes. Thus, a symbol table that maps identifiers to types is kept during the translation. The type of an identifier is either class, method, field or local variable. The symbol table keeps a stack of scopes. For example at the beginning of a class, a new scope is pushed and the implicit field "this", the fields and the methods of the class are added to the scope. The main class is handled separately to generate the "Main()" function.

In the translation of each class, its v-table is translated to a const data segment.

In the translation of each (non-static) method, the implicit this parameter is explicitly added to the parameters of the translated function.

In the translation of statements, we create unique temporal variables using a sequence number. Similarly, we can generate unique labels.

In the translation of an identifier either in an expression or as the left hand side of an assignment, its type is retrieved from the symbol table. If it is a field, the class type for "this" can be retrieved from the symbol table, the class record can be retrieved from the class type and the index of the field can be retrieved from the class record. If the retrieved index of the field is i, the offset for the memory access is statically computed as i * 4 + 4. This is because, a word is four bytes and the first location of an object stores a reference to the v-table.

In the translation of an array access or assignment, the index is multiplied (using the MulS instruction) by 4 and then it is added (using the Add instruction) to the base address. This is because words are 4 bytes. Then, another 4 is added to it because the first word stores the size of the array. Note that before these instructions, the out-of-bounds condition should be checked. The size of the array is read from the base address of the array and compared to the index. The array length instruction is simply translated to an access to the array base address.

      s = [b]          where b is the base address
      ok = LtS(i, s)   where i is the index
      if ok goto :l'
      Error("Array index out of bounds")
l': ok = LtS(-1, i)
      if ok goto :l
      Error("Array index out of bounds")
   l: o = MulS(i 4)
      d = Add(b, o)
      r = [d+4]

In the translation of the if statement, we label the beginning of the else statement and the end and use the instruction if0 to goto to the else label if the condition fails. We goto to the end label at the end of the then statement. The while statement can be similarly translated by labeling the start and the end.

In the translation of a method call, if the value of the receiver is zero, a null pointer error is reported. The class type and then the v-table of the receiver object can be retrieved from the symbol table. The offset of the method can be retrieved from the v-table.

   vt = [rc]      where rc is the receiver
   f = [vt + d]   where d = o * 4,
                        o is the offset of the method retrieved from the v-table
   r = call f(rc arg1 arg2)

In the translation of an object allocation, the class type and then the class record can be retrieved from the symbol table. The allocation size is 4 times the record size plus 4. This is because a word is four bytes and the first word is used to store a reference to the v-table of the object.

   r = HeapAllocZ(s)   where s = rs * 4 + 4 and rs is the record size
   [r] = :ClassVTableName

In the translation of array allocation, the size is multiplied (using the MulS instruction) by 4 and added (using the Add instruction) by 4. This is because a word is 4 bytes and the first word stores the array size. The resulting size is passed to the HeapAllocZ instruction.

The and expression can be translated to two if0 instructions that check if each of the operands is zero, assign zero to the result and goto the end label. Before the end label, one is assigned to the result. The not expression can be similarly translated.

Less than, addition, subtraction and multiplication operations are directly mapped to LtS, Add, Sub, and MulS instructions.

True and false literals are translated to 1 and 0 and integer literals are directly translated.

Submission

Your main file should be called J2V.java, and if P.java contains a syntactically correct MiniJava program, then

java J2V < P.java > P.vapor

creates a Vapor program P.vapor with the same behavior as P.java.

Untar the following tarball and read the instructions in the ReadMe.txt file. Your submission must work with the testing script when run on the department servers. Make sure you test your final tarball before submitting.

Phase2Tester.tar.gz Download

Phase 3: Register Allocation

Introduction

In this phase, we translate the Vapor language to the Vapor-M language. In contrast to Vapor that provides local variables, Vapor-M provides registers and stacks. The local variables should be mapped to registers and run-time stack elements. Consult the book chapter 6 on Activation Records, chapter 10 on Liveness Analysis and the following paper on register allocation.

Massimiliano Poletto and Vivek Sarkar, Linear Scan Register Allocation, ACM TOPLAS 21(5):895-913, 1999.

Dependencies

We need to have the following items. We had the first item in phase 2.

- Vapor Interpretor vapor.jar: Download

- Vapor Parser vapor-parser.jar: Download
The source code vapor-parser-source.tar.gz Download and
the java doc vapor-parser-javadoc.tar.gz Download for this parser are also available.

- Test cases
Sample input programs Phase3Tests.tar.gz: Download
Sample output programs Phase4Tests.tar.gz: Download

Vapor-M Language Specification

A Vapor-M program is largely the same as a Vapor program except that instead of local variables, registers and stack memory are used.

Instead of local variables we have 23 registers: $s0..$s7, $t0..$t8, $a0..$a3, $v0, $v1.
Registers are global to all functions (whereas local variables were local to a function activation).
To follow MIPS calling conventions, we use the registers as follows:
$s0..$s7: general use callee-saved
$t0..$t8: general use caller-saved
$a0..$a3: reserved for argument passing
$v0: returning a result from a call
$v0, $v1: can also be used as temporary registers for loading values from the stack

The register $t9 is reserved for the next assignment.

Each function has three stack arrays called in, out, and locals. The in and out arrays are for passing arguments between functions. The in array refers to the out array of the caller. The local array is for function-local storage that can be used for spilled registers. The sizes of these arrays are declared at the top of every function (instead of a parameter list).

Each element of each array is a 4-byte word. The indexes into the array is the word-offset (not the byte offset). Array references can be used wherever memory references can be used. So in[1] refers to the second element of the in stack array.

func Run [in 2, out 0, local 4]
$t1 = in[1]
local[3] = $t1
PrintString($t1)
$v0 = 1
ret

Translation

Consider the factorial Vapor program Factorial.vapor. Download

const vmt_Fac
:Fac.ComputeFac

func Main()
t.0 = HeapAllocZ(4)
[t.0] = :vmt_Fac
if t.0 goto :null1
Error("null pointer")
null1:
t.1 = [t.0]
t.1 = [t.1+0]
t.2 = call t.1(t.0 10)
PrintIntS(t.2)
ret

func Fac.ComputeFac(this num)
t.0 = LtS(num 1)
if0 t.0 goto :if1_else
    num_aux = 1
    goto :if1_end
if1_else:
    t.1 = [this]
    t.1 = [t.1+0]
    t.2 = Sub(num 1)
    t.3 = call t.1(this t.2)
    num_aux = MulS(num t.3)
if1_end:
ret num_aux

The translator should traslate it to a Vapor-M program like the following. Factorial.vaporm. Download

const vmt_Fac
:Fac.ComputeFac

func Main [in 0, out 0, local 0]
$t0 = HeapAllocZ(4)
[$t0] = :vmt_Fac
if $t0 goto :null1
Error("null pointer")
null1:
$t1 = [$t0]
$t1 = [$t1]
$a0 = $t0
$a1 = 10
call $t1
$t1 = $v0
PrintIntS($t1)
ret

func Fac.ComputeFac [in 0, out 0, local 1]
local[0] = $s0
$t0 = $a0
$s0 = $a1
$t1 = LtS($s0 1)
if0 $t1 goto :if1_else
$t1 = 1
goto :if1_end
if1_else:
$t2 = [$t0]
$t2 = [$t2]
$t3 = Sub($s0 1)
$a0 = $t0
$a1 = $t3
call $t2
$t3 = $v0
$t1 = MulS($s0 $t3)
if1_end:
$v0 = $t1
$s0 = local[0]
ret

The jar file vapor-parser.jar includes both the parser and syntax tree classes for the Vapor language. Compile your program against it using the -classpath option. A Vapor program can be parsed to Vapor syntax tree using the Vapor parser vapor-parser.jar as follows. A Vapor program will never contain the following AST nodes: VVarRef.Register, VMemRef.Stack.

import cs132.util.ProblemException;
import cs132.vapor.parser.VaporParser;
import cs132.vapor.ast.VaporProgram;
import cs132.vapor.ast.VBuiltIn.Op;

import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;

public static VaporProgram parseVapor(InputStream in, PrintStream err) throws IOException {
Op[] ops = {
    Op.Add, Op.Sub, Op.MulS, Op.Eq, Op.Lt, Op.LtS,
    Op.PrintIntS, Op.HeapAllocZ, Op.Error,
};
boolean allowLocals = true;
String[] registers = null;
boolean allowStack = false;

VaporProgram tree;
try {
    tree = VaporParser.run(new InputStreamReader(in), 1, 1,
                           java.util.Arrays.asList(ops),
                         allowLocals, registers, allowStack);
}
catch (ProblemException ex) {
    err.println(ex.getMessage());
    return null;
}

return tree;
}

A Vapor-M program p.vaporm can be run using the Vapor interpreter vapor.jar as follows:

$ java -jar vapor.jar run -mips p.vaporm

Hints

The data segments are simply translated to const segments.

Register allocation is applied to each function separately.

$a0..$a3: reserved for argument passing
$s0..$s7: general use callee-saved
$t0..$t8: general use caller-saved
$v0: returning a result from a call

The $t variables are temporary caller saved registers, while the $s registers are callee saved. Each function saves the $s registers that it will use at the beginning of the function and saves the $t registers that it will need before calling another function.

Each frame for a function has three parts
   In
      The arguments of the call that are spilled to the stack.
      This section is sometimes considered to be part of the caller stack.
   Local
      The local variables (that are spilled to the stack)
      The backup for the caller saved registers used by the function
   Out
      The backup for arguments passed by registers
      Arguments that are spilled on the stack
      The size of these sections is the maximum number of arguments of called
      functions.

For each function, the data flow analysis results in the def, use, in, and out sets of variables for each instruction. The set of active variables at each instruction is the union of the def and in sets for the instruction. The live intervals (used by the linear scan algorithm) can be calculated using the active sets by a top-down scan of the instructions.

The first parameters are mapped to the argument registers. The parameters that do not fit in the argument registers are spilled to the (in) stack. The argument registers that are not used for arguments are added to the pool of registers used for register allocation. The pool always includes $s and $t registers.

We use the linear scan algorithm for register allocation. See the algorithm in the paper. The algorithm returns a mapping from variables to locations and the number of variables spilled to the stack. A location is either a register, a (local) stack offset or an (in) stack offset.
Note that for each parameter that is mapped to an argument register, an interval starts from the first instruction. These intervals are marked as fixed so that they are not spilled by the algorithm to the stack. Once these intervals reach their end points, their argument registers can be put back to the pool to be reused.

Using the allocation map resulted from the register allocation algorithm and the out sets resulted from the liveness analysis, the sets of registers that are used after an instruction can be calculated. This set is later used to determine the registers that should be saved before a function call.

On entry to a function
   Saving the used saved registers $s

On exit from a function
   Assigning the return value to $v0
   Restoring the saved registers $s

Before a function call
   Saving temporal registers $t that will be used after the call
   Saving the argument registers $a
   Copying the arguments to the argument registers and the stack
      Note that if arguments of the caller should be passed to callee, they should
      be read from the stack as otherwise $a registers would be both read and
      written.

After a function call
   Restoring temporal registers
   Restoring argument registers

Some instructions accept only registers as operands. The registers $v0 and $v1 can also be used as temporary registers for loading values from the stack.

An implementation-specific note is that during the parsing, unique integer values are assigned to variables that we use to refer to variables instead of names.

Submission

Your main file should be called V2VM.java, and if P.vapor contains a syntactically correct Vapor program, then

java V2VM < P.vapor > P.vaporm

creates a Vapor-M program P.vaporm with the same behavior as P.vapor

Phase3Tester.tar.gz Download

Phase 4: Activation Records and Instruction Selection

Introduction

In this phase, the Vapor-M registers and stacks should be mapped to MIPS registers and runtime stack. In addition, the Vapor-M instuctions should be mapped to MIPS instructions.

Dependencies

We need to have the following items. We had the first item in phase 3.

- Vapor Parser vapor-parser.jar: Download
The source code vapor-parser-source.tar.gz Download and
the java doc vapor-parser-javadoc.tar.gz Download for this parser are also available.

- Test cases
Sample input programs Phase4Tests.tar.gz: Download
Sample output programs Phase5Tests.tar.gz: Download

- MIPS Interpreter mars.jar Download from the MARS project

Translation

The translator should traslate a Vapor-M program like Factorial.vaporm (Download) to a MIPS program like Factorial.s (Download)

The jar file vapor-parser.jar includes both the parser and syntax tree classes for the Vapor-M language. Compile your program against it using the -classpath option. A Vapor-M program can be parsed to Vapor-M syntax tree using the Vapor parser vapor-parser.jar as follows. A Vapor-M program will never contain the following AST node: VVarRef.Local.

import cs132.util.ProblemException;
import cs132.vapor.parser.VaporParser;
import cs132.vapor.ast.VaporProgram;
import cs132.vapor.ast.VBuiltIn.Op;

import java.io.InputStreamReader;
import java.io.IOException;
import java.io.PrintStream;

public static VaporProgram parseVapor(InputStream in, PrintStream err) throws IOException {
Op[] ops = {
    Op.Add, Op.Sub, Op.MulS, Op.Eq, Op.Lt, Op.LtS,
    Op.PrintIntS, Op.HeapAllocZ, Op.Error,
};

boolean allowLocals = false;
String[] registers = {
    "v0", "v1",
    "a0", "a1", "a2", "a3",
    "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7",
    "s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
    "t8",
};
boolean allowStack = true;

VaporProgram tree;
try {
    tree = VaporParser.run(new InputStreamReader(in), 1, 1,
                           java.util.Arrays.asList(ops),
                         allowLocals, registers, allowStack);
}
catch (ProblemException ex) {
    err.println(ex.getMessage());
    return null;
}

return tree;
}

A MIPS program P.s can be run as

$ java -jar mars.jar nc P.s

Running without any arguments will bring up the simulator GUI, which will let you step forward and backward through your program.

Hints

MIPS snippet
   .data
      var1: .word 5
      table:
         Label1
         Label2

   .text
      jal Main
      li $v0 10   # syscall: exit
      syscall

      _print:
         li $v0 1   # syscall: print integer
         syscall
         la $a0 _newline   # address of string in memory
         li $v0 4   # syscall: print string
         syscall
         jr $ra
      _error:
         li $v0 4   # syscall: print string
         syscall
         li $v0 10 # syscall: exit
         syscall
      _heapAlloc:
         li $v0 9   # syscall: sbrk
         syscall    # address in $v0
         jr $ra
      .data
      .align 0
         _newline: .asciiz "\n"
         _str0: .asciiz "null pointer\n"

Common instructions
   li reg, value   # load immediate
      load immediate value into destination register
   la reg, label   # load address
      load address of the label to the register
   lw $t2, ($t0)   # load word
   lw $t2, 4($t0)
      load word at $t0 to $t2
      load word at (St0 + 4) to $t2
   sw reg, label   # store word
      store word reg to location label
   move reg1, reg2
      copy the value of reg1 to reg2
   jal label   # jump and link
      copy program counter (return address) to register $ra
      jump to program statement at label
   jalr reg   # jump to reg and link
      copy program counter (return address) to register $ra
      jump to program statement at the address in reg
   j label   # jump
      jump to label
   jr reg   # jump register
      jump to return address in reg
   addi $t2, $t3, 5   # add immediate
      $t2 = $t3 + 5
   add $t0, $t1, $t2   # add
      $t0 = $t1 + $t2 add as signed (2's complement) integers
   sub $t2, $t3, $t4   # subtract
      $t2 = $t3 - $t4
   addiu $t, $s, imm
      Adds $s and a sign-extended immediate value and stores the result in $t
   addu $t1, $t6, $t7   # add as unsigned integers
      $t1 = $t6 + $t7
   subu reg1, reg2, reg3   # subtract as unsigned integers
      reg1 = reg2 - reg3
   mult reg1, reg2   # multiply
      multiply 32-bit quantities in reg1 and reg2, and store 64-bit
      result in special registers Lo and Hi: (Hi, Lo) = reg1 * reg2
   slt $d, $s, $t   # set on less than
      If $s is less than $t, $d is set to one. It gets zero otherwise.
   beqz reg, label   # branch if equal to zero
      branch to label if the value reg is zero
   bnez reg, label   # branch if not equal to zero
      branch to label if the value reg is not zero

The stack grows from higher addresses down to lower addresses.

Higher address
         ----------------------
         |
         ----------------------
         |
$fp ->   | In / Caller Out
         ----------------------
         | ra
         ----------------------
         | fp
         ----------------------
         |
         | Local
         ----------------------
         |
$sp ->   | Out
         ----------------------
         | ra
         ----------------------
         | fp
         ----------------------

         Callee frame

Lower address

Note that each word is 4 bytes and literals below should be multiplied by 4.

On entry to a function
   Save $fp to location $sp - 2
   Move $fp to $sp
   Pushing the frame
      Decrease $sp by size = Local + Out + 2
         1 for Return address
         1 for frame pointer
   Saving the return register at $fp - 1 (if the function calls any other function)
      Note that $fp now holds the old $sp

On exit from a function
   Restoring the return register $ra (if the function calls any other function)
      from $fp - 1
   Restore $fp from $fp - 2
   Popping the frame
      Decreasing $sp by size = Local + Out + 2
   Jumping to the return register

The register $t9 can be used to load values for instructions that accept only registers.

Submission

Your main file should be called VM2M.java, and if P.vaporm contains a syntactically correct Vapor-M program, then

java VM2M < P.vaporm > P.s

creates a MIPS program P.s with the same behavior as P.vaporm.

Phase4Tester.tar.gz Download