COMS W4115
Programming Languages and Translators
Lecture 16: Translating Statements into Three-Address Code
March 25, 2015
Lecture Outline
- Types
- Three-address code
- Translation of assignments
- Arrays
- Boolean expressions
- If-statements
- While-statements
1. Types
- Type inference rules
- Type conversions
- For details see notes for Lecture 15, 3/23/2015.
2. Three-Address Code
- Three-address code is a common intermediate representation
generated by the front end of a compiler.
It consists of instructions with a variety of simple forms:
- Assignment instructions of the form
x = y op z
,
x = op y
, or x = y
where x
,
y
, and z
are names or compiler-generated
temporaries. y
and z
can also be
constants.
- Jump instructions of the form
goto L
,
if x goto L
, ifFalse x goto L
, or
if x relop y goto L
where L
is the label of some three-address instruction
and relop
is a relational operator such as <
,
<=
, ==
, and so on.
- Parameter passing, procedure calling, and procedure returning instructions
of the form
param x
; call p,n
; and return y
.
Here p
is the name of the procedure and n
is
the number of parameters it takes.
- Indexed copy instructions of the form
x = y[i]
and
x[i] = y
.
- Address and pointer assignments of the form
x = &y
x = *y
, and *x = y
.
- Static single-assignment (SSA) form is a variant of three-address code
in which assignments are to variables with distinct names. It also uses
a special function, called a φ-function, to combine two definitions
of the same variable arising from two different control-flow paths.
See ALSU, Section 6.2.4, pp. 369-370.
- In this lecture, we will show to translate common programming
language statements into three-address code using syntax-directed
translation. In practice, these kinds of translation are often
produced by making traversals over the AST.
3. Translation of Assignments
- The assignment statement
a = b + -c;
might
be translated into the following sequence of three-address instructions:
t1 = uminus c
t2 = b + t1
a = t2
Here is an SDTS that generates this kind of three-address code on the fly for
assignments:
S → id = E ; { gen(top.get(id.lexeme) '=' E.addr); }
E → E1 + E2 { E.addr = new Temp();
gen(E.addr '=' E1.addr '+' E2.addr); }
| - E1 { E.addr = new Temp();
gen(E.addr '=' 'uminus' E1.addr); }
| ( E1 ) { E.addr = E1.addr; }
| id { E.addr = top.get(id.lexeme); }
The semantic actions use the attribute E.addr
for the address of
the location where the value of E
is stored and the function
top.get(id.lexeme)
to retrieve the location for
id.lexeme
from the symbol table in its current scope.
The function gen
generates and outputs a three-address instruction.
4. Translation of Arrays
- Referencing a one-dimensional array
- In C and Java, array elements are numbered
0, 1,..., n-1
for an array A
with n elements.
- Element
A[i]
begins in location (base + i × w)
where base
is the relative address of the storage allocated for
A
and w
is the width of each element.
- Common layouts for multidimensional arrays
- Row-major order
- Column-major order
- See Fig. 6.22 (p. 383) for an SDD generating three-address code for
assignments with array references.
- Example: three-address code for the expression
c + a[i][j]
assuming the width of an integer is 4
t1 = i * 12
t2 = j * 4
t3 = t1 + t2
t4 = a[t3]
t5 = c + t4
5. Translation of Boolean Expressions
- Boolean expressions are composed of boolean operators (&&, ||, !)
applied to boolean variables, relational expressions, and other
boolean expressions.
- Short-circuit evaluation: Some languages, such as C and Java, do not require an entire boolean
expression to be evaluated.
- Given
x && y
, if x
is false, then we can conclude the entire expression is false without
evaluating y
.
- Given
x || y
, if x
is true, then we can conclude the entire expression is true without
evaluating y
.
- Numerical encoding
- In C, the numerical value 0 represents false; a nonzero value represents true.
- Positional encoding
- The value of a boolean expression can be represented by a position in three-address
code, and the boolean operators can be translated into jumps.
- The expression
if (x < 100 || x > 200 && x != y)
x = 0;
can be translated into the following three-address instructions:
if x < 100 goto L2
ifFalse x > 200 goto L1
ifFalse x != y goto L1
L2: x = 0
L1:
6. Translation of If-statements
- Boolean expressions often appear in the context of flow-of-control statements
such as:
- If statements
- If-else statement
- See Figs. 6.36 (p. 402) and 6.37 for SDDs translating these statements
with booleans into three-address code.
- For the expression
if (x < 100) || x > 200 && x != y)
x = 0;
these SDDs produce the following three-address instructions:
if x < 100 goto L2
goto L3
L3: if x > 200 goto L4
goto L1
L4: if x != y goto L2
goto L1
L2: x = 0
L1:
This code can be transformed into the code in Section 4 by eliminating the redundant goto
and changing the directions of the tests in the second and third if-statements.
7. Translation of While-statements
- Consider the production
S → while ( B ) S1
for while-statements.
The shape of the code for implementing this production
can take the form:
begin: // beginning of code for S
code to evaluate B
if B is true goto B.true
if B is false goto B.false
B.true:
code to evaluate S1
goto begin
B.false: // this is where control flow will go after executing S
Here is an SDD for this translation (from Fig. 6.36, p. 402):
S → while ( B ) S1 {
begin = newlabel()
B.true = newlabel()
B.false = S.next
S1.next = begin
S.code = label(begin) || B.code ||
label(B.true) || S1.code ||
gen('goto' begin)
}
8. Practice Problems
- Use the SDD of Fig. 6.22 (ALSU, p. 383) to translate the assignment
x = a[i][j] + b[i][j]
.
- Add rules to the SDD in Fig. 6.36 (ALSU, p. 402) to translate
do-while statements of the form:
- S → do S while B
- Show the code your SDD would generate for the program
-
do
do
assign1
while a < b
while c < d
9. Reading
aho@cs.columbia.edu