This module will be talking about C. C is a low level language that compiles directly to assembly. It is an imperative language, which means that assignment statements explicitly change memory values, and there is very little support for OOP patterns and auto-reference passing. The opposite would be a declarative language, which would describe functions(not changing memory values, but describing a series of actions).
C is also a procedural language, which means that all the code is modular and contained in functions. These functions can be bundled into libraries, allowing portability and easy injection of code. Functions are implemented using a stack, or a callstack. C is also file-oriented, which means it treats all IO as a file.
Here are some rules for programming in C:
- The program begins at the
int main()
function- Return 0 if successful
- Return anything else if unsuccessful
- Variables must always be declared at the top of the block/function
Compiling
- Function names get changed to a label
int main() --> MAIN
- Variable declarations that happen at the top of the code block are replaced with const
int a = 20; --> CONST R0 #20
- Simple commands like arithmetic get replaced with their counterparts
c = a * b; --> MUL R2 R0 R1
- Return and value gets stored and register and replaced with RET
return 0; --> CONST R3, #0 => RET
Clearly, this kind of compilation will run into a number of issues(ex. more than 7 variables, recursive calls, etc). Therefore, instead of the regfile, we use a stack to store data in data memory. This stack will hold local variables, return values, and arguments.
The C compiler is designed to translate code this way. Rather than changing the code line by line, it splits the program into three parts: prologue, body, epilogue.
- Prologue
- The very first thing the compiler does when a function is called is store R7 in the stack. This is so that the return address is saved and can be easily accessed later. After that, it stores all the variables using
CONST
andSTR
commands.
- The very first thing the compiler does when a function is called is store R7 in the stack. This is so that the return address is saved and can be easily accessed later. After that, it stores all the variables using
- Body
- The body begins by loading back the necessary variables from the stack into registers. It then performs the work of the function, and then stores the data back into data memory
- Epilogue
- This handles the return value, by storing the value in the stack so it is accessible from the scope of the function caller.
As the compiler is running, in order to remember where each variable is stored in data memory, the compiler keeps track of something called a symbol table.
Variable Name | Type | Location(Offset) | Scope |
---|---|---|---|
a | int | -1 | main |
b | int | -2 | main |
c | int | -3 | main |
In order to group the data on the stack that belongs to a single function, we create what is called a frame, which is a window that holds the data defined by a single function. This avoids conflicts in memory when one function calls another or when a function calls itself. These frames have an order that must be followed in order for variable lookups to be easy and exact. This is dependent on the compiler, but for the LC4, the frame follows this structure:
- Temporaries, arguments to callees
- Temporary data in case we run out of registers during the function
- Arguments to other functions we may call within our function
- Local variables
- Callers frame pointer(FP)
- Pointer to the frame of the function that called this function
- Necessary for restoring the frame when our function completes
- Return address(RA)
- Where to return after our function completes
- Return value(RV)
- Value to return
- Arguments
- Input arguments
In the LC4s design, we have dedicated R6 to point to the first slot(temporaries, arguments to callees), and R5 to point to the third slot(caller’s FP). These pointers will help us find things in our frame.
Multiple Functions
Say we had a C code block like below:
int pow(int a, int b) {
int c;
for(c = 1; p > 0; p--) {
c = c * a;
}
return c;
}
int main() {
int a = 2;
int b = 3;
int c = 0;
c = pow(a, b);
}
What would the stack look like?
First, as the program begins at the main function, the main
function’s frame will be pushed onto the stack. Then, when main
calls pow
, the pow
frame will be pushed onto the stack. Anytime a function is called, its frame gets pushed onto the stack with accurate state data. Furthermore, R5 and R6 get updated to the head of the stack, or the frame of the stack of the function that’s currently running. Then, when the function returns, its frame is popped from the stack.