CS57300: Spring 2018 — Time: Tue & Thu 9:00a10:15a — Location: Wetherill Lab of Chemistry 320
Bruno Ribeiro
LWSN 2142C (use Piazza to communicate with me)
Regular office hours: Thursdays 10:3011:30am
Office Hour Anomalies: TBD
All communication will be on Piazza (emailing will be SLOW at best and unresponsive at worst).
Linjie Li. Office hours: HASS G50, Thursdays 1:302:30 PM.
Anoop Santhosh. Office hours: HASS G50, Monday 9:0010:00AM.
Akhil Israni. Office hours: HASS G50, Friday 3:004:00PM.
This is the website for CS57300 (graduate) Data Mining.
The final is scheduled at TBD
STAT516 or an equivalent introductory statistics course, CS 381 or an equivalent course that covers basic programming skills (e.g., STAT 598G), or permission of instructor.
This course has a heavy programming assigment load. All assigments are in python. This course will assume the students are procifient on python or can learn it quickly (say, in less than 1 week).
There will be no waiving of prerequisites. Waiving only for trully exceptional cases, where the student can PROVE she/he masters Linear Algebra, Statistics, and C++/Python programming (and can learn Python programming by herself/himself quickly wihtout help).
A skill test will be admistered at a later date to determine whether or not a student qualifies for prerequisite waiving.
The texts below are recommended but not required. Reading materials will be distributed as necessary. Reading assignments will be posted on the schedule, please check regularly.
There will be 5 to 6 homework assignments. The lowest homework grade will be discarded. Homework assignments should be submitted on Blackboard. Details will be provided in the assignments. Programming assignments should written in python 3, unless otherwise noted.
All homework will be due on the designated deadlines. There are no homework extensions. Late homework assignments will not be accepted.
Questions about the details of homework assignments should be posted on Piazza, and will be answered by the TAs or instructor.
There will be one individual course project.
There will be one comprehensive (inclass) final exam. The final will be closed book and closed notes.
All homework will be due on the designated deadlines. There are no homework extensions. Late homework assignments will not be accepted.
Date  Topic  Notes  Reading  Slides 
01/09  Course Overview and Review (Course Objectives, Population sampling. Random variables and distributions.) 

Principles of Data Mining, Chapter 1  Intro (Lecture 1) 
01/11  Review (Linear Algebra review, Statistical Estimation, Maximum Likelhood Estimation (MLE). Using the scholar cluster. Python 3 overview)  Python Resources & Cluster User  Principles of Data Mining, Chapter 2, 4  Lecture 2 (LinAlg, MLE, cluster use) 
01/16  Review (Regression, Posteriors, & Working with Data)  Notes Working with Data  Principles of Data Mining, Chapter 3,4  Lecture 3 
01/18  Principles of Website Functionality & Advertisement  Lecture 4  
01/23  Classification and Regression Tasks (Discriminative v.s. Predictive). Assessing Accuracy 


Lecture 5 (Intro, Model Score & Search, Accuracy Measures, Linear Regression, Logistic Regression) 
01/25  Improving Model Score & Search: Logistic Regression & SVM Classifiers  Lecture 6 (SVM, Perceptron)  
01/30  Controlling Model Space: Priors and Regularization  Chapter 3.4, Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning. Get it online here  Lecture 7  
02/01  Exploratory Data Analysis / Feature Construction  Lecture 8  
02/06  No Class  Homework Time  
02/13  Decision Trees 


Lecture 9 (Decision Trees) 
02/08  Probabilistic Models: The Naive Bayes Model / Knearest neighbors 


Lecture 10 
02/15  Ensemble Methods (Improving weak models with Boosting and Unstable models with Bagging) 
Lecture 11 (Bagging and Boosting)  
02/20  Gradient Boosted Decision Trees 

Lecture 12 (Gradient Boosted Decision Trees)  
02/22  Introduction to Neural Networks 
A Course in Machine Learning (Chapter 4, Perceptron), Hal Daume III 
Lecture 13  
02/27  Deep Neural Networks for Classification Tasks (Feedforward Networks & Backpropagation) 
Pattern Recognition and Machine Learning (Chapter 5.1), Bishop. 
Lecture 14 [code] [pdf]  
03/01  Deep Neural Networks: Training (Backpropagation (cont) + Stochastic Gradients) 
Pattern Recognition and Machine Learning (Chapters 5.25.3), Bishop. 
Lecture 15 [slides] [src]  
03/06  Handson: Training Deep Neural Networks (Theory + Empirical Tricks) 
Pattern Recognition and Machine Learning (Chapters 5.3,5.5), Bishop. 
Lecture 16 [code]  
03/08  Model Evaluation: Finding Good Hypotheses (Measuring Prediction Error: AIC, BIC, Cross Validation)  Lecture 17  
03/13  Spring Break  
03/15  Spring Break  
03/20  Hypothesis Testing and Bayesian Hypothesis Testing (A/B Testing) 

Lecture 18  
03/22  Model Assessment: Testing Multiple Hypotheses 

Lecture 19  
03/27  Decision Under Uncertainty: Multiarmed Bandits (Thompson Sampling) 

Lecture 20  
03/29  Deep Neural Networks: Classification Tasks with Structed Data (Convolutional Networks)  Lecture 21  
04/03  Collaborative Filtering Tasks and Classical Solutions  Lecture 22  
04/05  Deep Neural Networks: Classification Tasks with Structed Data (Graph Data)  Lecture 23  
04/10  Deep Neural Networks: Prediction Tasks with Time Series Data (Latent Markov Embeddings, Intro to NCE, word2vec)  Lecture 24  
04/12  Handson: Training Deep Learning  See Piazza post about GPU resources  Lecture 25  
04/17  Dimensionality Reduction (Classical PCA) 

Lecture 26  
04/19  Representation Learning (Advanced Dimensionality Reduction) 

Lecture 27  
04/24  Descriptive modeling: Clustering from a representation learning perspective  Principles of Data Mining, Chapter 6.4, 9  Lecture 28  
04/26  Final review 