CS 4705: Introduction to Natural Language Processing, Fall 2007
Time: Tues/Thurs 2:40-3:55
Place: 327 Mudd
Professor Julia Hirschberg (Office Hours Tues 4:00-5:00, Wed 10:00-11:00)
julia@cs.columbia.edu, 212-939-7114
Teaching Assistant Frank Enos (Office Hours Tues/Thurs 1:00-2:00)
frank@cs.columbia.edu, 212-939-7193
Announcements | Academic Integrity | Contributions | Description
Links to Resources | Requirements | Syllabus | Text
Announcements:
- Check Columbia Courseworks for announcements, your grades (only you will see them), and discussion. Professor Hirschberg and your TA will monitor the discussion lists to answer questions.
- If you are interested in doing NLP research projects for credit, please let Professor Hirschberg know. The NLP group often has research opportunities available. Other postings may be found at this location.
Description:
This course provides an introduction to the field of computational linguistics, aka natural language processing (NLP). We will learn how to create systems that can understand and produce language, for applications such as information extraction, machine translation, automatic summarization, question-answering, and interactive dialogue systems. The course will cover linguistic (knowledge-based) and statistical approaches to language processing in the three major subfields of NLP: syntax (language structures), semantics (language meaning), and pragmatics/discourse (the interpretation of language in context). Homework assignments will reflect research problems computational linguists currently work on, including analyzing and extracting information from large online corpora.
Textbook:
Speech and Language Processing by Jurafsky and Martin. It will be available from the University Bookstore, as well as from Amazon and other online providers. It should also be on reserve in the Engineering Library. Please check the online errata for the text for each chapter as you read it.
Requirements:
Four homework assignments, a midterm and a final exam. Each student in the course is allowed a total of 4 late days on homeworks with no questions asked; after that, 10% per late day will be deducted from the homework grade, unless you have a note from your doctor. Do not use these up early! Save them for real emergencies.
All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".
Homework submission procedure.
Academic Integrity:
Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.
Syllabus
Links to Resources
cf. also resources available from the text homepage
General
Places to look up definitions and descriptions of terminology:
Chapters 1 and 2
Try out one of the many versions of Eliza on the web.
Chapter3
AT&T Labs - Research Finite State Machine Library
Later Chapters
Chapter 19
- Ask Jeeves-- a search engine that answers questions in plain English.
- Answer Bus -- another Q/A system.
- Columbia's NewsBlastersummarizer
- IBM summarizer demo (canned)
- Systran machine translation (also in use at Babelfish)
- AT&T Labs - Research Finite State Machine Library
- Michael Collins' Parser
- On-line dictionaries in many languages.
- WordNet
- Framenet
- CoBuildDirect Corpus
- AT&T's SCANMail voicemail browsing/search system
- DiaLeague 2001 -- includes a link to an online dialogue system demo.
- James Allen's Dialogue Modeling for Spoken Language Systems ACL 1997 Tutorial
- Festival speech synthesizer demo and links to other TTS systems
- Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial
Julia Hirschberg
Professor, Computer Science
Columbia University
Department of Computer Science
1214 Amsterdam Avenue
M/C 0401
450 CS Building
New York, NY 10027
email: julia@cs.columbia.edu
phone: (212) 939-7114