Getting into NLP
NLP has a HW0 which is given out the first week of class. Students must do well on HW0 to get into the class. While CS majors are given preference, it is possible for non-majors to get in if they do well on HW0. Of course, it will depend on the number of CS majors who get in. Watch for the posting of HW0 on this website. Everyone on the waitlist is welcomed to do HW0 and it must be turned in by the required date. I will be taking students off the waitlist after we have graded. In the past, I've had juniors and seniors who are majoring in CS, MS students and PhD students. Most students come from CS, but students from other departments get in also.
For pre-requisites, you should have taken at least one of AI, ML or a class that uses deep learning (e.g., Applied Deep Learning or one of the vision classes. If you have taken more programming (e.g., advanced programming, software engineering), that is helpful. Programming Languages and Translators or a class in Linguistics can be helpful, but not required.
Course Information
Time | MW 4:10-5:25pm |
Place | 451 Computer Science Building |
Professor | Kathleen McKeown |
Office Hours | M 1:00-2:00, W 5:30-6:30, online |
kathy@cs.columbia.edu | |
Phone | 212-939-7114 |
Weekly TA hours (EST) are listed below. All TA hours will be held virtually.
Monday | Antonio Camara | ac4443@columbia.edu | 6:00pm-8:00pm |
Tuesday | Faisal Ladhak | faisal@cs.columbia.edu | 12:00pm-2:00pm |
Tuesday | Yanda Chen | yc3384@columbia.edu | 11:00pm-1:00am |
Wednesday | Hariharan Jayakumar | hj2559@columbia.edu | 12-2pm |
Wednesday | Jenny Chen | jc4686@columbia.edu | 7:30-9:30pm |
Thursday | Emily Allaway (Head TA) | eallaway@cs.columbia.edu | 1-3pm |
Friday | Payal Chandak | pc2800@columbia.edu | 5:00-7:00pm |
Here is where to find these rooms on the 7th floor of CEPSR.
Course Description
This course provides an introduction to the field of natural language processing (NLP). We will learn how to create systems that can analyze, understand and produce language. We will begin by discussing machine learning methods for NLP as well as core NLP, such as language modeling, part of speech tagging and parsing. We will also discuss applications such as information extraction, machine translation, text generation and automatic summarization. The course will primarily cover statistical and machine learning based approaches to language processing, but it will also introduce the use of linguistic concepts that play a role. We will study machine learning methods currently used in NLP, including supervised machine learning, hidden markov models, and neural networks. Homework assignments will include both written components and programming assignments.
Requirements
Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor. Do not use these up early! Save them for real emergencies.
We will use Google Cloud for the course. Instructions for setting up the cloud can be found here.
Textbook
Main textbook: Speech and Language Processing (SLP), 3rd Edition, by Jurafsky and Martin.
Recommended: Neural Network Methods for Natural Language Processing (NNNLP) by Yoav Goldberg. It is available online through Columbia's library but you can also purchase a hard copy from the publisher.
Recommended: Deep Learning (DL) by Goodfellow, Bengio and Courville.
Syllabus
This syllabus is still subject to change. Readings may change. But it will give you a good idea of what we will cover.
Week | Class | Topic | Reading | Assignments | |
---|---|---|---|---|---|
1 | Jan 11 | Introduction and Course Overview | HW0: Provided code | ||
Jan 13 | Language modeling | C. 3 (through 3.6), SLP | |||
2 | Jan 20 | Supervised machine learning, text classification | C. 5, SLP | HW1 | |
3 | Jan 25 | Supervised machine learning, Scikit Learn Tutorial | C 4 SLP | ||
Jan 27 | Sentiment and transition to NN | C 4.4 SLP | |||
4 | Feb 1 | Neural Nets | C 3 and 4, NNNLP, also see Michael Collins' Notes | ||
Feb 3 | Distributional Hypothesis and Word Embeddings | C 8 (through 8.5), C 10 (through 10.5.3) NNNLP | HW1 due | ||
5 | Feb 8 | RNNs / POS tagging | C15, 16.1 NNNLP, C 8-8.2, 8.4 SLP | HW2 | |
Feb 10 | Syntax | C 12-12.5 SLP | |||
6 | Feb 15 | Dependency Parsing | C 14-14.4 SLP | ||
Feb 17 | Introduction to Semantics | C 15-15.1, SLP | HW 2 due | ||
7 | Feb 22 | Semantics and Midterm Review | --> Sample Midterm Questions --> Sample Midterm Questions and Answers | ||
Feb 24 | Midterm | ||||
8 | Mar 1 | Spring break | |||
Mar 3 | Spring break | ||||
9 | Mar 8 | Semantics / Intro to Machine Translation | C 11.1-11.2, 11.8 SLP | HW3 | |
Mar 10 | Neural MT | C 11.3-11.7 SLP | Guest speaker: Kapil Thadani | ||
10 | Mar 15 | Advanced Word embeddings and semantics | BERT paper | Reference papers | |
Mar 17 | Word Sense Disambiguation | C 18 SLP SenseBERT |
|||
11 | Mar 22 | Summarization | Extractive Neural Net Approach 1 Extractive Neural Net Approach 2 |
||
24 | Summarization | Abstractive Neural net approach 1 Abstractive Neural net approach 2 |
HW 3 due | ||
12 | Mar 29 | Language Generation | Seq2seq language generation A Good Sample is Hard to Find |
HW 4 | |
Mar 31 | Information Extraction | C. 17 SLP1 IE paper 1: wikification IE paper 2: relation extraction |
|||
13 | Apr 5 | Dialog | Dialog paper | Guest speaker: Or Biran | |
Apr 7 | Bias | Research paper 1 Research paper 2 Research paper 3 | |||
14 | Apr 12 | Bias and Looking to the Future | |||
Apr 14 | Research and Review | Sample Final Questions | HW4 due | ||
Exam Period | Final Exam | ||||
Announcements
Check EdStem for announcements and check courseworks for your grades (only you will see them), and discussion. All questions should be posted through Piazza instead of emailing Professor McKeown or the TAs. They will monitor the discussion lists to answer questions.
Academic Integrity
Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.