COMS W4115
Programming Languages and Translators
Lecture 20: Run-time Memory Management
April 8, 2015

Lecture Outline

Run-time memory and its management
Procedure activations and activation trees
Activation records
Calling sequences
Allocating space for arrays
Heap memory management

1. Run-time Memory and its Management

The execution of the target program is initially under the control of the operating system.
When a program is invoked, the OS allocates space to the program, loads the target code into the code section, and transfers control to the entry point ("main") of the target program.
The compiler is responsible for generating the target code and orchestrating the use of the run-time memory.
Structure of run-time memory:


	Code

	Static data

	Heap

	   free memory

	Stack

The size of the generated target code is usually known at compile time and can be placed into the statically determined area Code.
Globals and other static items like constants are stored in the static data area.
The heap is used to store objects whose lifetimes exceed the lifetimes of the procedures that created them.
The run-time stack is used to manage the data associated with procedure activations. A pointer called the stack pointer (sp) points to the top of the run-time stack.

2. Procedure Activations and Activation Trees

The compiler should generate target code that is correct and efficient.
We assume that programs do not have concurrency and that when a procedure gets called, control returns to the program point immediately after the call. We also assume that when procedure p calls procedure q, then q returns before p returns.
These assumptions imply that the lifetimes of procedure activations are nested, and that the order of procedure activations can be drawn as a tree, called an activation tree.
Example 1: Consider the following C-like program:


    int f(void) {
      return;
    }

    int g(void) {
      f();
    }


    int main(void) {
      f();
      g();
    }

The activation tree for this program is

Example 2: Consider the following C-like program:


    int f(int x) {
      if (x == 0)
        return g();
      else
        return f(x-1);
    }

    int g(void) {
      return 0;
    }


    int main(void) {
      f(3);
    }

The activation tree for this program is

Note that an activation tree depends on the run-time behavior and may have different shapes depending on the program input.
A stack can be used to keep track of those procedures that are active at a given time during the execution of the program.

3. Activation Records

An activation record (AR), sometimes called a frame, is a data structure that contains information needed to manage a procedure activation.
If procedure p calls q, then q's activation record contains a mix of information about p and q. The execution of p is suspended until q completes at which time p resumes running from the program point in p immediately after the call to q. The activation record for q has information to complete the execution of q and where to resume the execution of p.
The structure of an activation record is dependent on the operating system, the source language, and the execution environment. Let's assume procedure p calls procedure q. Here is a simple example of the fields that might be found in q's activation record:


	result
	argument
	control link
	return address

The result field is used to store the value computed by this activation of q.
The argument field is used to store the value of the actual parameter passed to q.
The control link is a pointer to the AR of the calling procedure p.
The return address is the location in the code section of the run-time memory where execution should resume in p after the call to q.

We will assume there is a pointer called the frame pointer (fp) that points to a known location in an activation record.

4. Calling Sequences

Procedure calls are implemented by calling sequences, code that allocates an activation record on the run-time stack and enters information into its fields.
A return sequence is code invoked after the call to restore the state of the machine so the calling procedure can continue its execution after the call.
The code in a calling sequence is usually divided between the calling procedure (the "caller") and the procedure it calls (the "callee").
When a procedure p calls a procedure q, we might do something like the following:

Evaluate the parameters of q and store them in a new AR for q.
Store the frame pointer (fp) for p as the control link in the AR for q.
Update fp to point to the AR of q.
Store the return address in the AR for q.
Jump to the code for q.
Have q allocate and initialize its local data and temporaries on the stack.

When q exits:

Copy the fp of the AR for q into sp (the top of stack pointer).
Load the control link of the AR for q into the fp.
Jump to the return address.
Change the stack pointer to pop the parameters of q off the run-time stack.

5. Allocating Space for Arrays

The compiler may need to allocate space for array parameters on the run-time stack. Depending on the situation at hand, the size of the array may be fixed or variable.
Fixed-size arrays


   void f(int x) {
     int a;
     int b[4];
     int c;
     ...
   }

Space for the array b can be allocated on the runtime stack between the space for a and c.

Variable-size arrays


   void f(int x) {
     int a;
     int b[n];
     int c;
     ...
   }

A pointer to the space for b can be allocated on the runtime stack between the space for a and c so variables remain a constant offset from the frame pointer. The actual space for b can be allocated after c.

6. Heap Memory Management

The runtime heap is used for data objects whose lifetimes may exist long after the procedure that created them.
The heap memory manager is the subsystem that allocates and deallocates space within the heap.
Garbage collection is the process of finding memory within the heap that is no longer used by the program and can therefore be reallocated to house other data objects. In some programming languages like C allocation and deallocation needs to be done manually using library functions like malloc and free. In other languages like Java it is the built-in garbage collector that deallocates memory.
There are many different garbage collection algorithms that vary in performance metrics such as execution time, space usage, pause time, and program locality.
Mark-and-sweep garbage collection

A basic mark-and-sweep (M&S) garbage collection algorithm is straightforward to implement and works well for languages like C because it does not move data objects during garbage collection.
A M&S collector firsts visits and marks all reachable data objects in the heap, setting the reached-bit of each object to 1.
It then sweeps the entire heap, freeing up unreachable objects.
See Algorithm 7.12 (ALSU, p. 471) for details.

7. Practice Problem

Consider the following program written in a hypothetical statically scoped language that allows nested functions. In this program, main calls f which calls g2 which calls h which calls g1.


     function main() {
       int i;

       function f() {
         int a, b, c;

         function g1() {
           int a, d;

           a = b + c;               // point 1

         }; // end of g1

         function g2(int x) {
           int b, e;

           function h() {
             int c, e;

             g1();
             e = b + a;             // point 2

           }; // end of h

           h();
           a = d + e;               // point 3

         }; // end of g2

         g2(1);

       }; //end of f

       // execution of main begins here

       f();

     }; // end of main

Suppose we have activation records with the following fields:

Parameters

Control Link

Access Link

Return Address

Local data

If function p is nested immediately within function q, then the access link in any AR for p points to the most recent AR for q.

Show the activation records on the run-time stack when execution first arrives at point 1 in the program above.

To which declarations are the references to variables a, b, c at position 1?
To which declarations are the references to variables a, b, e at position 2?
To which declarations are the references to variables a, d, e at position 3?

8. Reading

ALSU, Sections 7.1, 7.2, 7.6.1

aho@cs.columbia.edu

COMS W4115 Programming Languages and Translators Lecture 20: Run-time Memory Management April 8, 2015