Course Description
During the past decades, the field of High-Performance Computing (HPC) has been about building supercomputers to solve some of the biggest challenges in science. HPC is where cutting-edge technology (GPUs, low latency interconnects, etc.) is applied to solve scientific and data-driven problems.
One of the key ingredients to the current success of AI is the ability to perform computations on vast amounts of training data. Today, applying HPC techniques to AI algorithms is a fundamental driver for the progress of Artificial Intelligence.
In this course, you will learn HPC techniques typically applied to supercomputing software and how they are applied to obtain the maximum performance from AI algorithms.
You will also learn about techniques for building efficient AI systems. This is especially becoming more critical in the era of large foundation models such as GPT and LLAMA that require massive amounts of computational power and energy.
This course will introduce efficient AI computing techniques for both training and inference. Topics include model compression, pruning, quantization, knowledge distillation, neural architecture search, data/model parallelism, and distributed training
The course is based on PyTorch and CUDA programming.
Objectives
- Use HPC techniques to find and solve performance bottlenecks.
- Do performance measurements and profiling of ML software
- Evaluate the performance of different ML software stacks and hardware systems
- Develop high-performance distributed AI algorithms for efficient training
- Use fast math libraries, CUDA and C++ to accelerate High-Performance ML algorithms
- Model compression techniques such as quantization, pruning, and knowledge distillation.
- Essential HPC techniques to handle large foundation models such as Large Language Models (LLMs)
- Efficient LLM inference and finetuning systems and algorithms: vLLM, FlashAttention, speculative decoding, LoRA/QLoRA, prompt tuning
For details see the Syllabus.
Prerequisites
- General knowledge of computer architecture and operating systems
- C/C++: intermediate programming skills
- Python: intermediate programming skills.
- Good understanding of Neural Network algorithms:
The course is focused on model performance rather than algorithms, and a high-level review of the algorithms will be part of it. However, it is strongly recommended that you come to the course with a good understanding of the following algorithms: logistic regression, feed-forward (basic) neural networks, convolutional neural networks, recurrent neural networks, and transformer architectures.
Course Information
Instructors |
Dr. Kaoutar El Maghraoui Adjunct Professor of Computer Science and Principal Research Scientist, IBM T.J. Watson Research Center, NY |
TAs |
Arnold Caleb Asiimwe, William Das and Wookje Han |
Office Hours for Project Proposals and Discussions | |
Wednesday |  –  Prof. Kaoutar El Maghraoui |
TA Office Hours | |
Tuesday |  –  Wookje Han |
Friday |  –  William Das |
Saturday |  –  Arnold Caleb Asiimwe |
Click the button for online office hours on the Google calendar
Course materials
The course does not follow a specific textbook; however, some books can be used as learning support. Pointers to literature/web links will be provided in class.
Introduction to High-Performance Computing for Scientists and Engineers
Authors: Georg Hager, Gerhard Wellein Editor: CRC Press ISBN: 9781439811924 |
Introduction to High-Performance Scientific Computing (ONLINE)
Authors: Victor Eijkhout with Edmond Chow, Robert van de Geijn |
Computer Architecture 5th Edition - A Quantitative Approach
Authors: John Hennessy, David Patterson Editor: Morgan Kaufmann ISBN: 9780123838728 |
Efficient Processing of Deep Neural Networks
Authors: Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer Morgan & Claypool Publishers ISBN-13: 978-1681738352 |
Homework
There will be five to six homework assignments, mostly involving programming and experiments involving GPUs. Assignments will be based on C/C++, Python, and PyTorch.
Grading | Homework (50%) + Final Project (30%) + Quizzes (15%) + Attendance & Participation (5%) |
Late Homework Policy | Quizzes and project submissions must be submitted on time. Zero credits will begiven for late submissions. |
Throughout the semester, each student has an allocation of 6 'late days.' These can be used only for homework submissions, allowing flexibility without penalty. However, once your total allowance of 6 late days is depleted, the following late submission penalties will apply:
- Original Due Time: Assignments must be submitted on time for full credit.
- Counting Late Days: Late days are calculated daily, with a new late day commencing at 11:59 pm ET.
- Penalty Post-Late Days Allowance: After exceeding the 6 late days allowance, 20% of the total marks will be deducted per additional late day up to 5 days. Beyond that, the assignment will be given zero credits.
Course project, Project proposals are due by the midterm.
Final presentations of all projects towards the end of the course.
Syllabus
Week 1: Introduction to HPC and AI |
||
---|---|---|
Introduction to HPC and ML |
|
|
Week 2: AI performance optimization |
||
AI performance optimization |
|
Assignment 1 out Quiz 1 out |
Week 3: Gradient Descent Optimization Algorithms and PyTorch |
||
Gradient Descent Optimization Algorithms and Pytorch |
|
Quiz 1 due |
Week 4: PyTorch Performance |
||
PyTorch Performance |
|
Homework 1 due Homework 2 out Quiz 2 out |
Week 5: CUDA Basics |
||
CUDA Basics |
|
Quiz 2 due |
Week 6: CUDA Advanced Topics |
||
CUDA Advanced Topic |
|
Homework 2 due Homework 3 out(CUDA) |
Week 7: Efficient Training |
||
Distributed Deep Learning Algorithms and PyTorch |
|
Quiz 3 |
Week 8: Eficient Inference |
||
Sparsity, Model Pruning/Compression |
|
Homework 3 due (CUDA) Homework 4 (DDL) |
Week 9: Efficient Inference |
||
Reduced Precision and Quantization |
|
Quiz 4 |
Week 10: Efficient Inference |
||
Knowledge Distillation |
|
Homework 4 due Homework 3 out (quantization) |
Week 11: Efficient Transformers and LLMs |
||
Efficient Transformers and LLMs |
|
Quiz 5 |
Week 12: Efficient LLM Deployment Systems |
||
Efficient inference algorithms and systems for LLMs |
|
Homework 5 due |
Week 13 |
||
Thanksgivingholiday |
||
Week 14: Neural Architecture Search |
||
Designing Efficient DNNs with Neural Architecture Search |
|
Quiz 6 |
Week 15 and Week 16 Final Project Presentations |
||
Final Project PresentationsProject Presentation |
Project Presentation Due | |
Final Project PresentationProject Presentation |
Final Project Presentation |