Lecture Schedule and Readings
This schedule is subject to changes as the course progresses.
Lecture slides are available in the "Files"
section of CourseWorks.
- Jan 24: Course Overview. Information Retrieval I.
- Jan 31: Information Retrieval II.
- Feb 7, 14: Information Retrieval III.
- Christopher D. Manning, Prabhakar Raghavan and Hinrich
Schuetze, Introduction
to Information Retrieval, 2008. Chapter 11
("Probabilistic Information Retrieval") and Chapter 7
("Computing Scores in a Complete Search System")
- Feb 21: Web Search I.
- Feb 28: Information Extraction.
- Eugene Agichtein and Luis
Gravano, Snowball:
Extracting Relations from Large Plain-Text Collections, in
ACM DL 2000
-
Alpa Jain, Panagiotis Ipeirotis, and Luis
Gravano, Building
Query Optimizers for Information Extraction: The SQoUT
Project, in SIGMOD Record, Special Issue on "Managing
Information Extraction," vol. 37, no. 4, December 2008
- [OPTIONAL] Simran Arora, Brandon Yang, Sabri Eyuboglu,
Avanika Narayan, Andrew Hojel, Immanuel Trummer, and
Christopher
RĂ©, Language
Models Enable Simple Systems for Generating Structured Views
of Heterogeneous Data Lakes, in Proc. of the VLDB
Endowment, 2023
- Mar 7: Web Search II.
- Bhaskar Mitra and Nick Craswell,
An
Introduction to Neural Information Retrieval,
Foundations and Trends in Information Retrieval, vol. 13,
no. 1, December 2018. (Click on "PDF" from within Columbia's
network to access the full text of the article;
alternatively, find the article in the "Files" section of
CourseWorks, under "Readings.") Section 2.6 ("Neural
Approaches to IR"), Section 3 ("Unsupervised Learning of
Term Representations"), and Section 4 ("Term Embeddings for
IR"), focusing on the topics covered in class and at the
level of detail in the lecture.
- Kaz Sato and Guangsha Shi, Your RAGs Powered by Google
Search Technology,
Part
1,
Part
2,
Google AI & Machine Learning Blog, February 2024
- [OPTIONAL] Pandu Nayak,
Understanding Searches Better than Ever Before, Google
Search Official Blog, October 2019
- [OPTIONAL] Jay
Alammar, The
Illustrated BERT, ELMo, and co.
- [OPTIONAL] Jacob Devlin et
al., BERT:
Pre-training of Deep Bidirectional Transformers for Language
Understanding, NAACL-HLT 2019
- [OPTIONAL] Tom Brown et
al., Language
Models are Few-Shot Learners, July 2020
- [OPTIONAL] Jason Wei et
al., Finetuned
Language Models Are Zero-Shot Learners, February 2022
- Additional optional materials listed in the lecture notes.
- Mar 14: Midterm exam (closed book, closed notes,
covering first half of course).
- Mar 21: No lecture (Spring Recess).
- Mar 28: Mining Time-Series Data (lecture
by Ioannis Paparrizos,
Ohio State U.). On Zoom only.
- Chotirat A. Ratanamahatana, Jessica Lin, Dimitrios
Gunopulos, Eamonn Keogh, Michail Vlachos, Gautam Das:
Mining
Time Series Data. In Data Mining and Knowledge Discovery
Handbook, pages 1049-1077, 2010 (focus on Sections 1, 2.1,
2.2, 3.1, 3.3, 4, 4.5, and 4.7)
- Apr 4, 11: Data Mining.
- Apr 18, Apr 25: Data Warehousing, OLAP, Decision
Support.
- May 2: Spatial Data Management.
- May 9, 9:30-11:00 a.m. ET: Final exam
(closed book, closed notes, covering second half of course).