CS 57800: Statistical Machine Learning

Semester:	Fall 2020, also offered on Spring 2020, Spring 2018, Fall 2017 and Fall 2016
Time and place:	Tuesday and Thursday, 3.00pm-4.15pm EST
Instructor:	Jean Honorio (Please send an e-mail for appointments)
TAs:	Chuyang Ke, e-mail: cke at purdue.edu, Office hours: Monday 10am-noon EST Kevin Bello, e-mail: kbellome at purdue.edu, Office hours: Friday 2pm-4pm EST

Machine learning offers a new paradigm of computing — computer systems that can learn to perform tasks by finding patterns in data, rather than by running code specifically written to accomplish the task by a human programmer. The most common machine-learning scenario requires a human teacher to annotate data (identify relevant phenomenon that occurs in the data), and use a machine-learning algorithm to generalize from these examples. Generalization is at the heart of machine learning — how can the machine go beyond the provided set of examples and make predictions about new data. In this class we will look into different machine learning scenarios, look into several algorithms analyze their performance and learn the theory behind them.

A tentative list of topics in supervised learning include: linear and non-linear classifiers, kernels, rating, ranking, collaborative filtering, model selection, complexity, generalization, structured prediction. A tentative list of topics in unsupervised learning and modeling include: mixture models, Bayesian networks, Markov random fields, factor graphs.

Learning Objectives

During the course, students will:

learn about different supervised and unsupervised problems, and their related algorithms.
implement some of those algorithms.
learn the theory behind some algorithms, e.g., geometrical aspects and generalization.
learn algorithm-independent principles, e.g., cross-validation, bias-variance tradeoff.

Prerequisites

This class requires some mathematical background. It's not a math class, however you should be comfortable with linear algebra, calculus, statistics and probability. Programming knowledge is also required.

Textbooks

There is no official text book for this class. I will post slides and pointers to reading materials. Recommended books for further reading include (* freely available online):

* The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman.
* Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David.
* A Course in Machine Learning by Hal Daumé III.
Pattern Classification, 2nd Edition by Richard O. Duda, Peter E. Hart, David G. Stork.
Pattern Recognition and Machine Learning by Christopher M. Bishop.
Machine Learning by Tom Mitchell.
Probabilistic Graphical Models by Daphne Koller and Nir Friedman.

Assignments

There will be up to five homeworks, one midterm exam, one final exam and one project (dates posted on the schedule). The homeworks are to be done individually and in MATLAB. The project is to be done in groups of 3 students.

For the project, you will write a half-page project plan (around 1-2 weeks before the midterm), a 2-4 page preliminary results report (around 1-2 weeks after the midterm) and a 4-8 page final results report (around 1-2 weeks before the final exam). The project should include:

a definition of the problem, possibly relevant to your interests.
a description of the dataset (or datasets) to be used. Datasets should be already publicly available (you should provide a URL), since there is not enough time for you to collect data. Possible datasets include: ADHD 200 (Whole Brain Data), Brain & Nouns, Connectomics, Higgs Boson, Labeled Faces in the Wild, Loan Default Prediction, Movielens, T-Drive, Yahoo Bidding (A1), Yahoo Ranking (C14).
a description of the experimental setup, e.g., cross-validation, parameter tuning, etc.
experimental results, showing not only when the algorithm succeeds but also when the algorithm fails. This might include: plots of number of samples versus accuracy (you can use different subsets of the same dataset), regularization parameter versus accuracy, ROC curves, plots of different datasets, etc.
you are allowed to either implement learning algorithms from scratch or use third-party code (e.g. liblinear). But ANY other thing such as cross-validation, parameter tuning, computing the values for the ROC curve, etc. should be written by yourself.
you can use either MATLAB, C++, Java or Python.
do not spend too much time on things such as "understanding the data", "memory problems because your data is too big", etc. Only if you are already familiar with computer vision, brain data, natural language processing, big data, parallelism, etc. then you can make use of those things, but this will not imply that you will get a higher grade just based on that fact. In general, I would recommend to use easy-to-understand datasets, and smaller subsets of the data, for instance.

Neither I nor the TAs will provide any help regarding programming-related issues.

Grading

Homeworks: 25%
Midterm exam: 25%
Final exam: 25%
Project: 25%

Late policy

Assignments are to be submitted by the due date listed. Each person will be allowed seven days of extensions which can be applied to any combination of homeworks during the semester. Use of a partial day will be counted as a full day. Extensions cannot be used after the final day of classes. Please, use the extension days wisely!

Assignments will NOT BE accepted if they are more than five days late.

Academic Honesty

Please read the departmental academic integrity policy here. This will be followed unless we provide written documentation of exceptions. We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures and homework specification (but not solution). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure. You are expected to take reasonable precautions to prevent others from using your work.

Additional course policies

Please read the general course policies here.

Schedule

Date	Topic (Tentative)	Notes
Tue, Aug 25	Lecture 1: perceptron (introduction)	Homework 0: due on Aug 27, 11.59pm EST - NO EXTENSION DAYS ALLOWED
Thu, Aug 27	Lecture 2: perceptron (convergence), max-margin classifiers, support vector machines (introduction)	Homework 0 due - NO EXTENSION DAYS ALLOWED
Tue, Sep 1	Lecture 3: nonlinear feature mappings, kernels (introduction), kernel perceptron	Homework 0 solution
Thu, Sep 3	Lecture 4: SVM with kernels, dual solution Refs: [1] [2] (not mandatory to be read)	Homework 1: due on Sep 10, 11.59pm EST
Tue, Sep 8	Lecture 5: one-class problems (anomaly detection), one-class SVM, multi-way classification, direct multi-class SVM Refs: [1] [2] [3] [4] (not mandatory to be read)
Thu, Sep 10	Lecture 6: rating (ordinal regression), PRank, ranking, rank SVM Refs: [1] [2] (not mandatory to be read)	Homework 1 due
Tue, Sep 15	Lecture 7: linear and kernel regression, feature selection (information ranking, regularization, subset selection)	Homework 2: due on Sep 22, 11.59pm EST
Thu, Sep 17	Lecture 8: ensembles and boosting
Tue, Sep 22	Lecture 9: performance measures, cross-validation, bias-variance tradeoff, statistical hypothesis testing	Homework 2 due
Thu, Sep 24	Lecture 10: model selection (VC dimension, generalization, structural risk minimization)	Homework 3: due on Oct 1, 11.59pm EST
Tue, Sep 29	Lecture 11: probability review (joint, marginal and conditional probability), independence, maximum likelihood estimation
Thu, Oct 1	Lecture 12: generative probabilistic modeling, maximum likelihood estimation, decision boundary	Homework 3 due
Tue, Oct 6	Lecture 13: mixture models, EM algorithm, convergence, model selection
Thu, Oct 8	MIDTERM (lectures 1 to 12)	Start: Thursday October 8, 3.00pm EST End: Friday October 9, 3.00pm EST
Tue, Oct 13	(midterm solution)
Thu, Oct 15	—	Project plan due (see Assignments for details) [Word] or [Latex] format
Tue, Oct 20	Lecture 14: active learning, kernel regression, Gaussian processes Refs: [1] (not mandatory to be read)
Thu, Oct 22	Lecture 15: dimensionality reduction, principal component analysis (PCA), kernel PCA	Homework 4: due on Oct 29, 11.59pm EST
Tue, Oct 27	Lecture 16: collaborative filtering (matrix factorization), structured prediction (max-margin approach) Refs: [1] (not mandatory to be read)
Thu, Oct 29	Lecture 17: Bayesian networks (motivation, examples, graph, independence) Refs: [1] [2] (not mandatory to be read)	Homework 4 due
Tue, Nov 3	Lecture 18: Bayesian networks (independence, equivalence, learning) Refs: [1] [2] [3, chapters 16-20] (not mandatory to be read)
Thu, Nov 5	Lecture 19: Bayesian networks (introduction to inference), Markov random fields, factor graphs Refs: [1] [2] (not mandatory to be read)	Preliminary project report due (see Assignments for details) - NO EXTENSION DAYS ALLOWED
Tue, Nov 10	—
Thu, Nov 12	Lecture 20: Markov random fields (inference, learning) Refs: [1] [2] [3, chapters 16-20] (not mandatory to be read)
Tue, Nov 17	(lecture 20 continues)
Thu, Nov 19	Lecture 21: Markov random fields (inference in general graphs, junction trees)	Final project report due (see Assignments for details) - NO EXTENSION DAYS ALLOWED
Mon, Nov 23	FINAL EXAM (lectures 13 to 21)	Start: Monday November 23, 4.15pm EST End: Tuesday November 24, 4.15pm EST
Thu, Nov 26	THANKSGIVING VACATION
Tue, Dec 1	(final exam solution)