Course Information
Time | MW 1:10-2:25pm |
Place | 451 Computer Science Building |
Professor | Kathleen McKeown |
Office Hours | M 2:30-3:30, W 5-6, 722 CEPSR |
kathy@cs.columbia.edu | |
Phone | 212-939-7114 |
Weekly TA hours are listed below. All TA hours will be held in the NLP lab (7LW1 CEPSR).
Monday | Alyssa Hwang | ahh2143@columbia.edu | 3:30-5:30pm |
Tuesday | Elsbeth Turcan (Head TA) | eturcan@cs.columbia.edu | 11am-1pm |
Wednesday | Katy Gero | katy@cs.columbia.edu | 11am-1pm |
Thursday | Emily Allaway | eallaway@cs.columbia.edu | 2-4pm |
Friday | Fei-Tzin Lee | feitzin@cs.columbia.edu | 4:30-6:30pm |
Here is where to find these rooms on the 7th floor of CEPSR.
Course Description
This course provides an introduction to the field of natural language processing (NLP). We will learn how to create systems that can analyze, understand and produce language. We will begin by discussing machine learning methods for NLP as well as core NLP, such as language modeling, part of speech tagging and parsing. We will also discuss applications such as information extraction, machine translation, text generation and automatic summarization. The course will primarily cover statistical and machine learning based approaches to language processing, but it will also introduce the use of linguistic concepts that play a role. We will study machine learning methods currently used in NLP, including supervised machine learning, hidden markov models, and neural networks. Homework assignments will include both written components and programming assignments.
Requirements
Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor. Do not use these up early! Save them for real emergencies.
We will use Google Cloud for the course. Instructions for setting up the cloud can be found here.
Textbook
Required: Natural Language Processing, by Jacob Eisenstein. It is available on github but you can also purchase the final hard copy version from MIT Press. It should be available on Amazon.
Recommended: Speech and Language Processing, 3rd Edition, by Jurafsky and Martin.
Recommended: Neural Network Methods for Natural Language Processing by Yoav Goldberg. It is available online but you can also purchase a hard copy from the publisher.
Syllabus
This syllabus is still subject to change. Readings may change. But it will give you a good idea of what we will cover.
Week | Class | Topic | Reading | Assignments |
---|---|---|---|---|
1 | Sep 4 | Introduction and Course Overview | Ch 1, NLP | HW0 (DUE SEPT 7th 10AM) Provided code |
2 | Sep 9 | Language modeling | Ch 6 through 6.2, NLP | |
Sep 11 | Supervised machine learning, text classification | Ch 2.1-2.6, 2.8, NLP | HW1: Stance Detection HW1 Data |
|
3 | Sep 16 | Supervised machine learning, Scikit Learn Tutorial | C 4.4, 4.5 NLP | |
Sep 18 | Neural Nets | C 3, NLP | ||
4 | Sep 23 | Distributional Hypothesis and Word Embeddings | C 14.2, 14.3, NLP | |
Sep 25 | Neural nets and sentiment analysis | C 4.2 NLP | HW1 due | |
5 | Sep 30 | POS tagging | C 8.1-8.3 Speech and Language, 8.1 NLP | HW2: Emotion Detection Provided code |
Oct 2 | Methods: Hidden Markov Models | C 7 (through 7.5) NLP; C 8.4 Speech and Language | ||
6 | Oct 7 | Syntax | C 11-11.1, Speech and Language; 10.2, 11-11.1, NLP | |
Oct 9 | Dependency Parsing | C 11.2-11.4 NLP | ||
7 | Oct 14 | Introduction to Semantics | C 19-19.3, Speech and Language | HW 2 due |
Oct 16 | Semantics and Midterm Review | C 4.2, NLP Sample Midterm Questions Sample Midterm Questions and Answers | ||
8 | Oct 21 | Midterm | ||
Oct 23 | Advanced Word embeddings and semantics | Reference papers | HW3: Word embeddings and semantics Provided code | |
9 | Oct 28 | Word Sense Disambiguation | C 13 NLP | |
Oct 30 | Machine Translation | C 18.1-18.2 NLP | ||
10 | Nov 4 | Academic Holiday | Academic Holiday, no classes | |
Nov 6 | Neural MT | C 18.3-18.5 NLP | ||
11 | Nov 11 | Summarization | Extractive Neural Net Approach | |
Nov 13 | Summarization | Abstractive Neural net approach | HW 3 due HW4: Sequence to Sequence Modeling Provided code |
|
12 | Nov 18 | Language Generation | Seq2seq language generation | |
Nov 20 | Dialog | Dialog | Guest speaker: Or Biran | |
13 | Nov 25 | Bias | Bias in word embeddings | |
Nov 27 | Academic Holiday | Academic Holiday, no classes | ||
14 | Dec 2 | Information Extraction | Neural Nets, C. 17.4, 17.5.1 | |
Dec 4 | Research and Review | Sample Final Questions | HW4 due | |
15 | Dec 9 | In-class Final Exam | ||
Announcements
Check Piazza for announcements and check courseworks for your grades (only you will see them), and discussion. All questions should be posted through Piazza instead of emailing Professor McKeown or the TAs. They will monitor the discussion lists to answer questions.
Academic Integrity
Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.