Lab 2 is due 11:30pm ET 10/3.
Lab 2: Bounds Checker
This lab will introduce you to static instrumentation and runtime checking, in the context of defending against buffer overflows. We've learned from lab 1 that buffer overflows can allow attackers to gain control of a program. In lab 2, we'll build a bounds checker to prevent such attacks. Similar to the baggy bounds checking system, our system consists of a runtime component that tracks and checks bounds and a static component that instruments a program to call into our runtime.
We've set up the working environment for you on workbench.cs.columbia.edu. To ssh into workbench>, use the username and password sent to you via email. You must change password upon the first login. (You may also do this lab on the machine you prefer, but the teaching staff won't have the extra resources to help you with the setup.)
You also need to tell git your email and name (suppose your username is jy2324):
jy2324@workbench:~/lab2$ git config --global user.email "your-email@example.com" jy2324@workbench:~/lab2$ git config --global user.name "Your Name"
Check out the lab 2 source code as follows:
jy2324@workbench:~$ git clone http://debug.cs.columbia.edu/e6121/2012/lab2.git Initialized empty Git repository in /home/jy2324/lab2/.git/ ... jy2324@workbench:~$ cd lab2 jy2324@workbench:~$858:~/lab2$
We'll build our bounds checker within the LLVM compiler framework. Download and build LLVM as follows:
jy2324@workbench:~/lab2$ wget http://llvm.org/releases/3.1/llvm-3.1.src.tar.gz Initialized empty Git repository in /home/jy2324/lab2/.git/ ... 2012-09-28 16:39:24 (5.33 MB/s) - `llvm-3.1.src.tar.gz' saved [11077429/11077429] jy2324@workbench:~$ tar xzvf llvm-3.1.src.tar.gz llvm-3.1.src/ ... llvm-3.1.src/autoconf/m4/visibility_inlines_hidden.m4 jy2324@workbench:~$858:~/lab2$ cd llvm-3.1.src jy2324@workbench:~/lab2/llvm-3.1.src$ mkdir build jy2324@workbench:~/lab2/llvm-3.1.src$ cd build jy2324@workbench:~/lab2/llvm-3.1.src/build$ ../configure --target=x86_64 #build only for x86_64 architecture checking for clang... clang ... config.status: executing tools/sample/Makefile commands jy2324@workbench:~/lab2/llvm-3.1.src/build$ make ENABLE_OPTIMIZED=0 -j #build LLVM with debug information, -j means parallel build llvm[0]: Constructing LLVMBuild project information. ... llvm[0]: ***** Completed Debug+Asserts Build llvm[0]: ***** Note: Debug build can be 10 times slower than an llvm[0]: ***** optimized build. Use make ENABLE_OPTIMIZED=1 to llvm[0]: ***** make an optimized build. Alternatively you can llvm[0]: ***** configure with --enable-optimized.
The last command may take roughly 10 minutes. Now build the lab 2 source code as follows:
jy2324@workbench:~/lab2/llvm-3.1.src/build$ cd ../../bounds jy2324@workbench:~/lab2/bounds$ mkdir build jy2324@workbench:~/lab2/bounds$ cd build jy2324@workbench:~/lab2/bounds/build$ ../configure --with-llvmsrc=$PWD/../../llvm-3.1.src --with-llvmobj=$PWD/../../llvm-3.1.src/build ../configure: line 1654: cd: /home/junfeng/work/e6121/lab2-ta/llvm-3.1.src: Not a directory ... config.status: executing instr/Makefile commands jy2324@workbench:~/lab2/bounds/build$ make ENABLE_OPTIMIZED=0 make[1]: Entering directory `/home/jy2324/lab2/bounds/build/runtime' llvm[1]: Compiling check.cpp for Debug+Asserts build llvm[1]: Building Debug+Asserts Archive Library librt.a clang++ -emit-llvm -o /home/jy2324/lab2/bounds/build/Debug+Asserts/lib/rt.bc -c /home/jy2324/lab2/bounds/runtime/check.cpp make[1]: Leaving directory `/home/jy2324/lab2/bounds/build/runtime' make[1]: Entering directory `/home/jy2324/lab2/bounds/build/instr' llvm[1]: Compiling instr.cpp for Debug+Asserts build llvm[1]: Compiling link.cpp for Debug+Asserts build llvm[1]: Compiling main.cpp for Debug+Asserts build llvm[1]: Linking Debug+Asserts executable bounds llvm[1]: ======= Finished Linking Debug+Asserts Executable bounds make[1]: Leaving directory `/home/jy2324/lab2/bounds/build/instr'
To rebuild the bounds checker after you make changes to the source code in the bounds directory, rerun the last command (make ENABLE_OPTIMIZED=0). To clean up the build, run make ENABLE_OPTIMIZED=0 clean.
Now let's get familiar with LLVM and the bounds checker work flow. We'll compile a simple C file into the LLVM intermediate representation called the bitcode using the Clang compiler frontend as follows:
jy2324@workbench:~/lab2/bounds/build$ clang -emit-llvm -c ../test/t0.c -o t0.bc
The output file t0.bc is a binary bitcode file. You can disassemble it into a human readable representation as follows:
jy2324@workbench:~/lab2/bounds/build$ llvm-dis t0.bc jy2324@workbench:~/lab2/bounds/build$ cat -n t0.ll 1 ; ModuleID = '../test/t0.c' 2 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" 3 target triple = "x86_64-unknown-linux-gnu" 4 5 define i32 @main() nounwind uwtable { 6 entry: 7 %retval = alloca i32, align 4 8 %x = alloca i32, align 4 9 store i32 0, i32* %retval 10 store i32 10, i32* %x, align 4 11 %0 = load i32* %x, align 4 12 ret i32 %0 13 } }
You can also directly compile C to a human readable bitcode file using the -S switch:
jy2324@workbench:~/lab2/bounds/build$ clang -emit-llvm -c ../test/t0.c -S -o t0.ll
We briefly explain the lines in t0.ll here; refer to the LLVM Language Reference Manual for the detailed semantics of LLVM instructions. Lines started with ";", such as line 1, are comments. Lines 2 and 3 provide architecture and OS information for later executable code generation; they are not relevant for this assignment. Lines 5 to 13 define a function @main, which is compiled from t0.c's main function. In LLVM, a function is also a global variable, and the names of all global variables start with @. The i32 on line 5 specifies that the return type of @main is the 32-bit integer type. Line 7 allocates stack space to hold the return value of @main, and line 8 local variable %x. The names of all local variables start with %.
The alloca instruction allocates space for a stack variable of the current function call and returns a pointer to the space. The space will be automatically reclaimed when the call returns via a ret instruction. In this example, t0.c's main() function defines a stack variable int x, so clang emits %x = alloca i32, align 4 where i32 represents the 32-bit integer type. Note that %x is a pointer to a 32-bit integer and its type is i32*.
LLVM bitcode instructions are in Single Static Assignment (SSA) form, meaning that each variable is defined only once. For example, line 7 defines %x, and there will be no other lines that define %x.
To process t0.ll with our bounds checker and generate a hardened version of the file called t0-hardened.ll, run
jy2324@workbench:~/lab2/bounds/build$ ./Debug+Asserts/bin/bounds t0.ll -S
Since we haven't added the code to track and check bounds, right now t0-hardened.ll is identical to t0.ll except that t0-hardened.ll links in our bounds checker's runtime methods such as AllocVar and FreeVar. In the next part of this lab, you'll add the missing code so the hardened versions of the programs will actually track and check bounds.
Part 1: Building a Bounds Checker
Our bounds checker will record bounds information for global variables, stack variables, and heap-allocated buffers. It will perform bounds check for pointer dereference and arithmetic. Whenever an error is detected, it will terminate the execution. For simplicity, our bounds checker will not add padding or change alignment of the original program. In addition, it will eagerly flag any off-bound pointer from pointer arithmetic, even if the pointer may be converted back to be in bound later or the pointer may never be dereferenced. For instance, our bounds checker will flag an error for the code below:
int a[N] = {...}; int *p = a + N; // ERROR! p is off bound!
The source code of our bounds checker are split into two parts. The first part, in directory bounds/instr, operates during compilation. Specifically, file instr.cpp implements an LLVM FunctionPass that will be invoked on each Function. This pass instruments the relevant instructions to call into our runtime methods. It is incomplete; we've marked the places where your code is needed using Lab 2 TODO. File link.cpp implements a simple ModulePass that links a bitcode program with our bounds checker runtime. File main.cpp is for parsing a bitcode program, invoking these passes on it, and writing the hardened bitcode program. You don't need to change link.cpp and main.cpp.
The second part, in directory bounds/runtime, operates at runtime. File check.h declares all methods in our runtime, and check.cpp implements these methods. It is also incomplete, and you'll need to fill in the missing code.
Thanks to the nice design of LLVM, we can simply write a couple of passes with fewer than a few hundred lines of code to instrument a program for bounds checking. However, this is likely the first time you hack a production quality compiler, so be prepared to read a lot of code and programming manuals. A few tips:
- To understand how LLVM invokes the methods provided by an LLVM pass, read this tutorial.
- The value returned by an instruction is represented by the instruction itself. For instance, the pointer returned by an AllocaInst is represented by the AllocaInst itself. In other words, an LLVM instruction object is also a Value object, implemented via C++ inheritance. If you need to use the value returned by an instruction X in another instruction, simply use the instruction object X.
- A GlobalVariable represents a pointer to the global data. This is similar to what alloca returns.
- The LLVM IRBuilder and TypeBuilder can be quite handy at building instructions. See the tutorials (1, 2). These tutorials are slightly out dated, and we've noted the change in the lab skeleton code.
- Pointer arithmetic is implemented using the GetElementPtrInst instruction.
- LLVM uses doxygen to generate documentation for code. You can browse the documentation here
Exercise 1. Read the LLVM tutorials. Study the bounds checker skeleton code. Complete the bounds checker by filling in the missing code in bounds/instr/instr.cpp and bounds/runtime/check.cpp.
Create 10 testcases in directory bounds/test. Your testcases should be designed to tested various aspects of your bounds checker. Run your bounds checker over the testcases and report the results in answers.txt. Commit your changes using git commit.
Part 2: Preventing Buffer Overflow Attacks for a Web Server
Next we'll apply the bounds checker you build to defend against attacks to the zookws web server. For simplicity, we'll perform the attacks to zookws on workbench, instead of the VM you downloaded in lab 1.
To get started, download, patch, and build zookws as follows:
jy2324@workbench:~/lab2$ git clone git://g.csail.mit.edu/6.858-lab-2012 zookws Initialized empty Git repository in /home/jy2324/lab2/6.858-lab-2012/.git/ ... jy2324@workbench:~/lab2$ cd zookws jy2324@workbench:~/lab2/zookws$ wget http://www.cs.columbia.edu/~junfeng/12fa-e6121/hw/Makefile.patch -O - | patch -p1 ... patching file Makefile jy2324@workbench:~/lab2/zookws$ make clang -emit-llvm zookld.c -c -o zookld.bc -g -O0 -std=c99 -Wall -Werror -D_GNU_SOURCE -emit-llvm -fno-stack-protector ...
Now you can run the bounds-checking version of zookws by running:
jy2324@workbench:~/lab2/zookws$ ./clean-env.sh ./zookld zook-exstack.conf
You may need to change the port number in zook-exstack.conf in case port 8080 is already taken by your fellow students.
Exercise 2. Run zookws and send a few legitimate requests to the server. Check whether your bounds checker have false positives by verifying whether the server sends back correct replies. Document the false positives you encounter in answers.txt and fix them.
Run zookws again with the two exploits you created in lab 1. Check whether your bounds checker have false negatives by verifying whether your bounds checker successfully prevents these exploits. Document the false negatives you encounter in answers.txt and fix them.
You are done! Commit all your changes to lab 2 and generate a patch as follows:
jy2324@workbench:~/lab2$ git diff origin > submit.patch
Upload submit.patch to the submission folder of lab 2 in Courseworks.