Introduction to Data Science

New York University - Spring 2019

Class is held in 60FA 110, Wed 3:00-4:40pm

Office hours: CDS 618

Friday 10-11am: Lecturer, Iddo Drori

Wednesday 11-12pm: Section Leader, Andrea Wang

Monday 1-2pm: Grader, Sai Anirudh Kondaveeti

Thursday 11-12pm: Grader, Divyanch Khanna

Spring 2019 classes begin (Monday, January 28)

Lecture 1 (Wednesday, January 30): Introduction
Data collection, cleaning, storage, retrieval, learning, visualization.
Readings: CASI Epilogue, DSB Ch 1-3.
Science and data science, David M. Blei and Padhraic Smyth, PNAS 2017.
Lab 1: Tableau
Homework 1: in Tableau, due February 11.

Lecture 2 (Wednesday, February 6): Supervised learning, fitting
Readings: DSB Ch 4-5.
Optional: CASI Ch 8.
Lab 2: Python, NumPy, Sklearn.

Lecture 3 (Wednesday, February 13): Unsupervised learning, clustering, dimensionality reduction
Readings: DSB Ch 6.
Lab 3: In-class Kaggle competition
Homework 2: due March 6.

Presidents' day, no classes scheduled (Monday, February 18)

Lecture 4 (Wednesday, February 20): Performance measures, ROC curve
Readings: DSB Ch 7-9.
Lab 4: Sklearn confusion matrix, ROC

Lecture 5 (Wednesday, February 27): SGD, Feature extraction and selection, random forests, gradient boosting
Readings: CASI Ch 16.
Lab 5: Sklearn SGD, feature extraction and selection

Lecture 6 (Wednesday, March 6): Neural Networks, text mining, sequence models
Readings: DSB Ch 10.
Optional: CASI Ch 18.
Lab 6: Spelling correction, NLTK

Lecture 7 (Wednesday, March 13): Features, combining models, example midterm
Readings: DSB Ch 11-12.
Lab 7: Sklearn random forest, boosting

Spring recess (Monday, March 18 to Sunday, March 24)

Lecture 8 (Wednesday, March 27): Midterm in class

Lecture 9 (Wednesday, April 3): Data visualization
Readings: Visualizations that really work, Scott Berinato, HBR, 2016.
Optional: The visual display of quantitative information, Edward Tufte, 2001.
Lab 9: Tableau with Python

Lecture 10 (Wednesday, April 10): Automatic data science.
Lab 10: Autosklearn, TPOT

Lecture 11 (Wednesday, April 17): Bayesian and variational inference.
Readings: CASI Ch 3.
Optional: Variational inference: a review for statisticians, Blei et al., 2018.
Lab 11

Lecture 12 (Wednesday, April 24): Bias and fairness.

Lecture 13 (Wednesday, May 1): Recommender systems.

Lecture 14 (Wednesday, May 8): Quantum information science.

Last day of Spring 2019 classes (Monday, May 13)

Final exam (Friday, May 17): in class 2:00-3:50pm