Lectures
Lectures will be held on MW 2:40-3:55 pm in 233 MUDD.
Course Staff
-
Instructors
Martha A. Kim (martha@cs.columbia.edu)
Vijay A. Saraswat (vijay@saraswat.org)
-
Teaching Assistants
Andrea Lottarini (lottarini@cs.columbia.edu)
Joaquin Ruales (jar2262+4130@columbia.edu)
Office Hours
Consult
calendar (All OH in either CSB 468 or 469).
Course Overview
Learning how to program parallel computers (multi-core,
clusters) productively and efficiently is a critical skill in
this era of concurrency. The course will provide an introduction
to modern parallel systems and their performance
characteristics. It will cover the fundamentals of
data-structure design, analysis and implementation for efficient
parallel execution; programming abstractions for concurrency;
and techniques for reasoning about the behavior and performance
of parallel programs. Particular topics to be covered include:
data parallelism, fine-grained concurrency, locality,
load-balancing, overlapping computation with communication,
reasoning about deadlock-freedom, determinacy, safe
parallelization, implementing frameworks for concurrency (such
as Hadoop Map/Reduce), debugging for correctness and
performance. Students will study many parallel programs drawn
from a variety of application domains (including
high-performance computing, large-scale graph analyses, machine
learning, game playing) Students will be expected to complete a
series of parallel programming projects with good performance on
a cluster of multi-cores, using a modern parallel language, X10.
Prerequisites
Experience in Java, basic understanding of analysis of
algorithms. COMS W1004 and COMS W3137 (or equivalent).
Resources
There is no required textbook, though several optional
recommendations will be provided. The only requirement for this
class is a Columbia CS account, which can be set
up
here.
Attendance
Students are required to attend all classes, and attendance will
be taken. If you need to miss a class, you must email Martha
Kim at least 48 hours in advance of lecture.
Academic Honesty
We take academic honesty extremely seriously, and expect the
same of you. The mini-projects are governed by the
collaboration policy described below, with no collaboration
allowed on the in-class quizzes. Outside of these two policies,
the the Computer Science
Department's
policies
on academic honesty are in effect, and any violations will
be reported to the Dean's office.
Grading Formula
Mini-Projects: 80%
Quizzes: 10%
Participation: 10%
Individual grades will be posted to
courseworks gradebook.
Throught the semester you will complete four mini-projects. For
each one you will work in pairs to implement a performant
parallel computation. You will be expected to demonstrate good
parallel speedups as well as a rationale for your design
decisions, and an analysis of your program's performance. Three
of the projects will be pre-set by course staff, with the fourth
designated "students choice".
Discussion Classes
At the completion of each project, we will have a discussion
class, where approximately five randomly chosen groups will be called to give
"chalk talks" providing an overview of their design, a
description of what brought them to that design, an analysis of
what aspects were/were not successful, and a description of
their speedups.
Turnin
Projects will be structured, with course staff providing a test
harness, Makefile, and, if appropriate, a reference serial
implementation. All submissions are due, via courseworks,
by 11:55pm two nights prior to the discussion class. You
have the option of submitting up until 11:55pm the night before
the discussion class for a 20% deduction in your score. After
that point, we will no longer accept submissions.
Collaboration Policy
Groups are free to exchange ideas and approaches to the
challenge problem freely. However, each group must implement
and understand its own design, and be ready to present it during
the discussion class.
Forming Pairs
You may work with the same or a different partner for each
project. You may declare your partnership or request that a
partner be assigned
using
this declaration form.
Sample X10 Programs
The programs discussed in class are available from SVN in the X10 repository on SourceForge.
See
here.
Use an SVN client to check the code out, e.g. thus:
svn co svn://svn.code.sf.net/p/x10/code/courses/pppp2013 x10-code
Running X10 Programs
For this course you have the following three options for running X10 programs.
- On your own installation: Use the X10 2.4 release available here. (The old link download here should not be used.)
- Columbia CS's shared CLIC Lab
Using your CUCS account, you may log in to clic-lab.cs.columbia.edu. This cluster has 44 nodes. Each of which is:
- Dell Precision T5500 Workstation (Dual Quad Core Processor X5550 @ 2.66GHz + 8M cache)
- 24GB DDR3 ECC SDRAM Memory, 1333MHz, 6 x 4GB
- 1TB SATA 3.0Gb/s, 7200 RPM HardDrive
Note: This is a shared cluster across the department. While it is quite large, machines will be running other loads. It is therefore best for development and rough timing measurements.
Also note: If you are at home and encounter problems ssh'ing into clic-lab, it is likely a TCP/UDP mismatch between CLIC and (usually) Time Warner. There are two workarounds:
- Use Google's DNS servers: Instructions
- Connect directly to one of the clic-lab nodes (e.g., {london,moscow,bern,cairo}.clic.cs.columbia.edu)
- Private spicerack cluster
Unlike clic-lab, spicerack is a dedicated mini-cluster for this course. It consists of:
To run a job on spicerack, you must use the job queuing utility Condor. More info to come on this point, but this will ensure your job runs in isolation and thus gets clean timing measurements.