CS 4705: Introduction to Natural
Language Processing, Fall 2009
|
|||
Time: |
TTh: 2:40-3:55 |
Place |
1024
Mudd |
Professor: |
Office Hours: |
Tu 4-5,We 4-5,
722 CEPSR |
|
Email: |
|
Phone: |
212-939-7114 |
Teaching Assistant: |
Sara
Rosenthal |
Office Hours: |
M 5:30-6:30, 726 CEPSR Th
1:30-2:30 |
Kaushal Lahankar |
Th
4-5, TA Room, Mudd 122A M 2-3 |
||
Email: |
ss3067@columbia.edu |
Phone: |
212-939-7122 |
Announcements
|| Academic
Integrity || Submission Directions || Description
Links
to Resources || Requirements
|| Syllabus
|| Text
1. Check Columbia Courseworks
for announcements, your grades (only you will see them), and discussion.
Professor McKeown and your TA will monitor the discussion
lists to answer questions.
2. If you are
interested in doing NLP research projects for credit, please let Professor McKeown know. The NLP group often has research opportunities
available.
This
course provides an introduction to the field of computational linguistics, aka
natural language processing (NLP). We will learn how to create systems that can
understand and produce language, for applications such as information
extraction, machine translation, automatic summarization, question-answering,
and interactive dialogue systems. The course will cover linguistic
(knowledge-based) and statistical approaches to language processing in the
three major subfields of NLP: syntax (language structures), semantics (language
meaning), and pragmatics/discourse (the interpretation of language in context).
Homework assignments will reflect research problems computational linguists
currently work on, including analyzing and extracting information from large
online corpora.
Speech and Language
Processing, 2nd Edition, by Jurafsky and
Martin. It will be available from the University Bookstore, as well as from
Amazon and other online providers. It should also be on reserve in the
Engineering Library.
Four homework assignments, a midterm and
a final exam.
Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late
day will be deducted from the homework grade, unless you have a note from your
doctor. Do not use these up early! Save them for real
emergencies.
All
students are required to have a Computer Science Account for this
class. To sign up for one, go to the CRF website and then click on
"Apply for an Account".
Copying or paraphrasing someone's work (code
included), or permitting your own work to be copied or paraphrased, even if
only in part, is not allowed, and will result in an automatic grade of 0 for
the entire assignment or exam in which the copying or paraphrasing was done.
Your grade should reflect your own work. If you believe you are going to have
trouble completing an assignment, please talk to the instructor or TA in
advance of the due date.
Week |
Class |
Topic |
|
Assignments |
1 |
Sep 8 |
Ch 1 |
|
|
|
Sep 10 |
Natural Language and Formal
Language: Regular Expressions and Finite State Automata |
Ch 2, 3 |
|
2 |
Sep 15 |
Ch 4 |
||
|
Sep 17 |
Ch 5 |
|
|
3 |
Sep 22 |
Ch 6 |
|
|
|
Sep 24 |
Ch
12.1-12.14 |
|
|
4 |
Sep 29 |
Parsing with Context Free
Grammars and Evaluation
of POS taggers |
Ch 13 |
HW 1 due |
|
Oct 1 |
Ch
14.1-14.7 |
|
|
5 |
Oct 6 |
Ch
17-17.2, 17.4 |
|
|
|
Oct 8 |
Ch
18.1-18.4 |
|
|
6 |
Oct 13 |
Ch 19 |
|
|
|
Oct 15 |
Ch
20.1-20.5 |
|
|
7 |
Oct 20 |
HW 2
due at midnight (err.. 11:59PM) on 10/21 |
||
|
Oct 22 |
Semantic
Application and |
|
|
|
Oct 27 |
Midterm (midterm solutions) |
|
|
|
Oct 29 |
|
Guest
Speaker: Robert Coyne |
|
8 |
Nov 3 |
Holiday |
Holiday |
Holiday |
|
Nov 5 |
Ch
23.3-23.7 |
HW 3 assigned , HW3 FAQ |
|
10 |
Nov 10 |
|
||
|
Nov 12 |
Ch
22.1-22.4 |
|
|
|
Nov 17 |
Ch
23.1-23.2 |
|
|
|
Nov 19 |
Ch
21.3-21.5 |
HW 3 due,
11:58PM, Sunday Nov. 22nd EXTENSION:
HW3 due 11:58 Nov. 25th |
|
|
Nov 24 |
Ch. 25 |
HW 4 |
|
|
Dec 1 |
Papers |
|
|
13 |
Dec 3 |
And |
Ch
21.1-21.2 |
|
|
Dec 8 |
Papers |
|
|
14 |
Dec 10 |
|
HW 4
due, 11:58 pm, Dec. 13th |
|
|
|
|
|
|
|
Dec. 15-16 |
|
|
Study Days |
|
Dec. 17-23 |
|
|
Final
Exams |
1. Karen Chung
Language and Linguistics links
2. CatSpeak
Places
to look up definitions and descriptions of terminology:
1. Oxford
Dictionary of Linguistics
2. Interesting Language Factoids and Non
Try out
one of the many versions of Eliza on the web.
AT&T
Labs -
1. Appelt and Israel's
information extraction tutorial
(IJCAI-99).
2. Framenet.
1. Ask Jeeves -- a
search engine that answers questions in plain English.
2. Answer Bus --
another Q/A system.
3. Columbia's
NewsBlaster summarizer
4. IBM
summarizer demo (canned)
5. Systran
machine translation (also in use at Babelfish)
6. AT&T Labs -
7. Michael Collins'
Parser
8. On-line dictionaries in many languages.
9. WordNet
10.
Framenet
11.
CoBuildDirect Corpus
12.
AT&T's SCANMail voic
13.
DiaLeague 2001 -- includes a link to
an online dialogue system demo.
14.
James
Allen's Dialogue Modeling for Spoken Language Systems ACL 1997 Tutorial
15.
Festival speech synthesizer demo
and links to other TTS
systems
16.
Julia
Hirschberg's Intonational Variation in Spoken
Dialogue Systems tutorial
Announcements
|| Academic
Integrity
|| Contributions
|| Description
Links
to Resources|| Requirements
|| Syllabus
|| Text