Introduction to Data Science
New York University - Spring 2018
Class is held in 60FA 110, Wed 2:00-3:40pm
Office hours: CDS 620
Wednesday 12-2pm: Lecturer, Iddo Drori
Tuesday 11am-1pm: Section Leader, Datta Sainath Dwarampudi
Friday 2-4pm: Grader, Samhita Damotharan
Thursday 2-4pm: Grader, Sai Anirudh Kondaveeti
Spring 2018 classes begin (Monday, January 22)
Lecture 1 (Wednesday, January 24): Introduction
Data collection, cleaning, storage, retrieval, learning, visualization.
Readings: CASI Epilogue, DSB Ch 1-3.
Science and data science, David M. Blei and Padhraic Smyth, PNAS 2017.
Optional: Large-scale physical activity data reveal worldwide activity inequality, Tim Althoff, Rok Sosič, Jennifer L. Hicks, Abby C. King, Scott L. Delp, and Jure Leskovec, Nature 2017.
Lab 1: Tableau
Homework 1: in Tableau, due February 5.
Lecture 2 (Wednesday, January 31): Supervised learning, fitting
Readings: DSB Ch 4-5.
Optional: CASI Ch 8.
Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States, Timnit Gebru, Jonathan Krause, Yilun Wang, Duyun Chen, Jia Deng, Erez Lieberman Aiden, and Li Fei-Fei, PNAS 2017.
Lab 2: Python, NumPy, sklearn
Lecture 3 (Wednesday, February 7): Unsupervised learning, clustering, dimensionality reduction
Readings: DSB Ch 6.
Optional: Robust continuous clustering, Sohil Atul Shah and Vladlen Koltun, PNAS 2017.
Lab 3: Kaggle competition
Homework 2: Kaggle class churn competition, due February 19.
Lecture 4 (Wednesday, February 14): Performance measures
Readings: DSB Ch 7-9.
Optional: CASI Ch 12.
Computer-based personality judgments are more accurate than those made by humans, Wu Youyoua, Michal Kosinski, and David Stillwella, PNAS 2015.
Lab 4: sklearn clustering
Lecture 5 (Wednesday, February 21): Text mining.
Readings: DSB Ch 10.
Lab 5: Spelling correction
Homework 3: due March 5.
Lecture 6 (Wednesday, February 28): Feature extraction and selection, review for midterm
Readings: CASI Ch 16, DSB Ch 11-12.
Lab 6: sklearn feature extraction
Lecture 7 (Wednesday, March 7): Midterm in class
Spring Recess (Monday-Sunday, March 12-18)
Snow day (Wednesday, March 21): NYU closed, classes cancelled
Lecture 8 (Wednesday, March 28): Differentiable programming, neural networks
Readings: CASI Ch 18.
Term project: due April 25th.
Lab: Kaggle
Lecture 9 (Wednesday, April 4): Deep neural networks, non-linear PCA
Readings: CASI Ch 19.
Lab: sklearn PCA
Lecture 10 (Wednesday, April 11): Bayesian and variational inference
Readings: CASI Ch 3.
Optional: Variational inference: a review for statisticians, Blei et al., 2018.
Lab
Lecture 11 (Wednesday, April 18): Gradient boosting
Readings: CASI Ch 17.
Lab: XGBoost
Lecture 12 (Wednesday, April 25): Meta data and learning
Lab
Lecture 13 (Wednesday, May 2): Data visualization
Readings: Visualizations that really work, Scott Berinato, HBR, 2016.
Optional: The visual display of quantitative information, Edward Tufte, 2001.
Lab: python visualization packages, time series
Last day of Spring 2018 classes (Monday, May 7)
Final exam (Wednesday, May 9): 2:00-3:50pm, room 110.