CS 578: Statistical Machine Learning

Course Information

When: Tu/Th 12:00 pm -- 1:15 pm.

Where: Lawson Computer Science Bldg B155.

Instructor: Yexiang Xue, yexiang AT purdue.edu.

Teaching Assistant: Amy Rechkemmer (arechke AT purdue.edu), Wei-Jen Lee (lee2913 AT purdue.edu).

Office Hour: Yexiang Xue, Tuesdays 11:00 am -- 12:00 pm (by appointment, notified at least by 5 am the previous Monday).
Amy Rechkemmer. Wednesdays 11:00 am -- 12:00 pm. Location: HAAS G40.
Wei-Jen Lee. Mondays 3:30 pm -- 4:30 pm. Location: HAAS G40.

Course website: https://www.cs.purdue.edu/homes/yexiang/courses/19fall-cs578/index.html.

Notifications and homework submission will be through Blackboard (mycourses.purdue.edu).

Online discussion is available at Piazza (piazza.com/purdue/fall2019/cs578).

Course project submission at CMT (https://cmt3.research.microsoft.com/SMLPurdue2019).

Course Description

Machine learning offers a new paradigm of computing – computer systems that can learn to perform tasks by finding patterns in data, rather than by running code specifically written to accomplish the task by a human programmer. The most common machine-learning scenario requires a human teacher to annotate data (identify relevant phenomenon that occurs in the data), and use a machine-learning algorithm to generalize from these examples. Generalization is at the heart of machine learning – how can the machine go beyond the provided set of examples and make predictions about new data. In this class we will look into different machine learning scenarios (supervised and unsupervised), look into several algorithms, analyze their performance and learn the theory behind them.

Prerequisites

(1) Undergraduate level training or coursework in linear algebra, calculus and multivariate calculus, basic probability and statistics; (2) Programming skills: at least master one programming language. Python is highly recommended (self-studying scikit-learn and related packages is expected); (3) Basic skills in using git for maintaining code development. In addition, an undergraduate level course in Artificial Intelligence may be helpful but is not required.

Prerequisite Check: There will be a prerequisite check in the second class of this semester to help both the instructors and yourself determine if your background is ready for this class. The prerequisite check is open-book. You are allowed to bring a one-page letter sized paper as the cheat sheet. However, discussing with other students is not allowed. The score of the prerequisite check DOES NOT contribute to your final scores. However, we will communicate your score and the class's score distribution to you to help you reach a decision.

Text books and Reading Materials

There is no official text book for this class. I will post students' notes along the progress of this course (see note-taking section). Recommended books for further reading include:

Tom Mitchell, Machine Learning, [url]
Christopher M. Bishop, Pattern Recognition and Machine Learning,[url]
Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, [url] (online access available at Purdue Library)
Daphne Koller and Nir Friedman, Probabilistic Graphical Models: Principles and Techniques, [url]

A few useful resources:

ML materials:

A First Encounter with Machine Learning by Max Welling
Introduction to Machine Learning by Alex Smola and S.V.N. Vishwanathan
A Course in Machine Learning by Hal Daume III
Bayesian reasoning and machine learning by David Barber
A tutorial by Andrew Moore

Math references:

The Matrix Cookbook by Kaare Brandt Petersen and Michael Syskind Pedersen
Calculus by Gilbert Strang
Linear Algebra by Gilbert Strang
Introduction to Probability and Statistics by Jeremy Orloff and Jonathan Bloom

Learning Python

For those who are unfamiliar with Python, I strongly encourage you to spend one night learning it by following the official tutorial (see below). I did not know Python until my graduate school. It took me one night to learn it, so can you!

Course Activities and Evaluation

	Percentage final score	Due (exam) date (tentative)
Attendance:	5%
Note taking:	10%	One week after the lecture
Mid-term exam (close book):	20%	Thu 10/17 08:00 pm - 09:30 pm ARMS 1010.
Final exam (close book):	25%	Tue 12/10 08:00 am - 10:00 am WTHR 104.
Course project proposal:	5%	Fri 9/20 05:00 pm Eastern Time.
Course project reviews:	5%	Mon 9/30 05:00 am Eastern Time.
Course project mid-term progress report:	5%	Mon 11/4 05:00 am Eastern Time.
Course project final report:	15%	Mon 11/18 05:00 am Eastern Time.
Course project final presentation:	10%	Mon 11/18 05:00 am Eastern Time.

Attendance: since many lectures will be blackboard only, attendance is highly encouraged. Otherwise, students should be prepared with no slides posted online as review materials.

No Cellphone / Laptop Policy during Class: it is distracting to the instructor, your fellow classmates and you (!) to use cellphones or laptops during the class. To respect everybody, we will enforce an official policy that no cellphones or laptops (except for the instructor) are allowed when the class is in progress. Violations will be punished with deductions in the final grade.

Note taking: this course will involve heavy blackboard demonstrations (actual blackboard, not the website). Therefore, note-taking is absolute necessary. Every student is expected to submit the pdf version of the notes for three lectures (assigned by TA). The TA will select the best two notes for each lecture and distribute them as handouts for everybody (posted on blackboard). The students with at least one of their notes selected by TA will receive full credits in the note-taking category. Others will receive partial credits (grading rubrics coming up soon). The notes are due one week after the lecture.

Selective Homeworks: there will be two homeworks, one released tentatively on 10/1, the other released on 11/18. Both homeworks are selective, and will NOT be graded. However, the homeworks contain questions that are very similar to those in the mid-term and final exams, so it is highly encouraged that students complete both homeworks prior to the review sessions. The mid-term and final review sessions will be conducted by TAs, and will focus on difficult problems in the homeworks.

Course project: MOST important part of this course. Machine learning is a practical field, so it cannot be emphasized more the importance of completing a machine learning project yourself! In addition, because this is a graduate-level course, one important aspect is basic scientific training, including asking the right questions, commenting others' work, literature review, experimental design, coding, scientific writing, and presentation skills. This can ONLY be trained with a real project. Teamwork: students are encouraged to work as teams of two or three.

We provide a few research thrusts and datasets for your reference (see below). You are encouraged to choose a specific project within the overarching theme of one research thrust in the list, although you are free to choose any project at your will as long as it relates to machine learning. The goal is to nurture FANTASTIC course projects, which have potentials to be developed into innovative research papers in the future. Course projects outside of the suggested thrusts will receive less mentoring from the instructors and the TAs, and therefore are less preferred. We encourage you to combine your domain of expertise with machine learning. To guide you through the project, we split the entire process into five parts: proposal, peer review, mid-term report, final report and presentation.

Course project proposal: the proposal will be evaluated by intellectural merit and broader impact (same creteria for NSF proposals). The instructor DO respect that it is a course project, so the bar is much lower. However, the following two aspects are emphasized equally: (i) intellectural merit: how does the project advance machine learning (or your understanding on machine learning, or bringing machine learning into a practical field)? (ii) tractability: is this proposal tractable (as a one-semester course project)? [grading rubrics coming up soon.]

Course project reivews: Each student is asked to review at least three proposals of others. The student is asked to review proposals based on intellectural merit, broader impact, and tractability. Peel reviews are safety belts for other students. Unrealistic proposals should be flagged out. Gaming does not work: the grading of the original propsal will NOT be affected by how other students review your proposal. [grading rubrics coming up soon.]

Course project mid-term progress report: Each group is expected to submit a progress report by the deadline. This is to ensure that all projects are progressing on the right track. [grading rubrics coming up soon.]

Course project final report / presentation: The final report and presentation will be graded in a similar way as conference papers (presentations) by the two TAs and the instructor jointly (although the bar is much lower). [grading rubrics coming up soon.]

Syllabus (Tentative)

Time	Topic	Notes
8/20 Tues.	Introduction, machine learning overview, core machine learning concepts.
8/22 Thurs.	Prerequisite check.
8/27 Tues.	k-nearest neighbors.
8/29 Thurs.	Linear regression; regression with nonlinear basis; regularized regression.
9/3 Tues.	Holiday. No class.
9/5 Thurs.	(Continue the last lecture)
9/10 Tues.	Linear discriminant analysis; Perceptron; logistic regression.
9/12 Thurs.	(Continue the last lecture)
9/17 Tues.	Lagrangian duality; support vector machines.
9/19 Thurs.	(Continue the last lecture)
9/24 Tues.	Convolutional neural nets; kernel methods
9/26 Thurs.	(Continue the last lecture)
10/1 Tues.	Multiclass classification; neural networks.
10/3 Thurs.	(Continue the last lecture)
10/8 Tues.	October break. No class.
10/10 Thurs.	Mid-term review. office hours.
10/15 Tues.	Decision trees; boosting
10/17 Thurs.	(Continue the last lecture)
10/22 Tues.	Clustering; Gaussian mixture models
10/24 Thurs.	(Continue the last lecture)
10/29 Tues.	Probabilistic graphical models; naive Bayes; Markov random fields.
10/31 Thurs.	(Continue the last lecture)
11/5 Tues.	Probabilistic inference. Variable elimination. Sampling. Variational inference.
11/7 Thurs.	(Continue the last lecture)
11/12 Tues.	Counting by XOR streamlining; stochastic optimization with provable guarantees.
11/14 Thurs.	(Continue the last lecture)
11/19 Tues.	Project presentation (1).
11/21 Thurs.	Project presentation (2).
11/26 Tues.	Project presentation (3).
11/28 Thurs.	Thanksgiving.
12/3 Tues.	Project presentation (4).
12/5 Thurs.	Final exam review. Office hours.

Course Project Thrusts

Thrust 1: stochastic optimization: encoding machine learning for decision-making

In data-driven decision-making, we have to reason about the optimal policy of a system given a stochastic model learned from data. For example, one can use a machine learning model to capture the traffic dynamics of a road network. The decision-making problem is: given the traffic dynamics learned from data, what is the most efficient way to travel between a pair of locations? Notice that the solution can change dynamically, depending on the shift in traffic dynamics. As another example in Physics, machine learning models have been used to predict the band-gap of many metal alloy materials. The decision-making problem is: given the machine learning model, what is the best alloy, which is both cheap to synthesize and has a good band-gap property?

The afromentioned examples are stochastic optimization problems, which make robust interventions that maximize the ``expectation'' of stochastic functions learned from data. It arises naturally in many applications ranging from economics, operational research, and artificial intelligence. Stochastic optimization combines two intractable problems, one of which is the inner probablistic inference problem to compute the expectation across exponentially many probabilistic outcomes, and the other of which is the outer optimization problem to search for the optimal policy.

Research questions: (i) if the inner machine learning model is a decision tree, can you compute the optimal policy in polynomial time? How? (ii) What if the inner machine learning model is a logistic regression, a linear SVM, a kernerized SVM, a random forest, or a probabilistic graphical model? (iii) What if the machine learning model is temporal, such as a recurrent neural netowrk or a LSTM? (iv) In case the inner probabilistic inference problem is intractable, existing approaches to solve stochastic optimization problems approximate the intractable probabilistic inference sub-problems either in variational forms, or via the empirical mean of pre-computed, fixed samples. There is also a recent approach which approximates the intractable sub-problems with optimization queries, subject to randomized constraints (see following papers). Question: how does various approximation schemes of the inner machine learning models affect the overall solution quality of the stochastic optimization problem? (v) Suppose we are solving one stochastic optimization problem for a specific application, can we adapt existing approximation schemes in any way to fit the problem instance for better results?

Papers:

Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla P. Gomes, Bart Selman.
Solving Marginal MAP Problems with NP Oracles and Parity Constraints
In the Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), 2016. [pdf] [spotlight video]

Anton J. Kleywegt, Alexander Shapiro, and Tito Homem-de Mello.
The sample average approximation method for stochastic discrete optimization.
SIAM Journal on Optimization, 2002. [pdf]

Miguel Á. Carreira-Perpiñán and Geoffrey E. Hinton.
On contrastive divergence learning.
AISTATS, 2005. [pdf]

Martin Dyer and Leen Stougie.
Computational complexity of stochastic programming problems. Mathematical Programming, 2006. [springer]

John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.
Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML, 2001. [pdf]

Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman.
Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization
In Proc. 30th International Conference on Machine Learning (ICML) 2013. [pdf]

Carla P. Gomes, Ashish Sabharwal, Bart Selman.
Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints.
NIPS 2006. [pdf]

Carla P. Gomes, Willem Jan van Hoeve, Ashish Sabharwal, Bart Selman.
Counting CSP Solutions Using Generalized XOR Constraints.
AAAI 2007. [pdf]

Yexiang Xue*, Xiaojian Wu*, Bart Selman, and Carla P. Gomes.
XOR-Sampling for Network Design with Correlated Stochastic Events.
In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [pdf]
* indicates equal contribution.

Yexiang Xue, Xiaojian Wu, Dana Morin, Bistra Dilkina, Angela Fuller, J. Andrew Royle, and Carla Gomes.
Dynamic Optimization of Landscape Connectivity Embedding Spatial-Capture-Recapture Information.
In Proc. 31th AAAI Conference on Artificial Intelligence (AAAI), 2017. [pdf] [supplementary materials]

Thrust 2: embedding physical constraints into deep neural networks

The emergence of large-scale data-driven machine learning and optimization methodology has led to successful applications in areas as diverse as finance, marketing, retail, and health care. Yet, many application domains remain out of reach for these technologies, when applied in isolation. In the area of medical robotics, for example, it is crucial to develop systems that can recognize, guide, support, or correct surgical procedures. This is particularly important for next-generation trauma care systems that allow life-saving surgery to be performed remotely in presence of unreliable bandwidth communications. For such systems, machine learning models have been developed that can recognize certain commands and procedures, but they are unable to learn complex physical or operational constraints. Constraint-based optimization methods, on the other hand, would be able to generate feasible surgical plans, but currently, have no mechanism to represent and evaluate such complex environments. To leverage the required capabilities of both technologies, we have to find an integrated method that embeds constraint reasoning in machine learning.

In a seminal paper, the authors proposed an approach, which provides a scalable method for machine learning over structured domains. The core idea is to augment machine learning algorithms with a constraint reasoning module that represents physical or operational requirements. Specifically, the authors propose to embed decision diagrams, a popular constraint reasoning tool, as a fully-differentiable layer in deep neural networks. By enforcing the constraints, the output of generative models can now provide assurances of safety, correctness, and/or fairness. Moreover, this approach enjoys a smaller modeling space than traditional machine learning approaches, allowing machine learning algorithms to learn faster and generalize better.

Research questions: (i) are there any other ways to enforce physical constraints other than using a decision diagram in the seminal work? (ii) What if the constraints are too complicated which cannot be fully captured by a decision diagram? (iii) In a specific applicational domain, is there a better way to encode constraints? (iv) Does enforcing physical constraints make machine learning easier or more difficult? Can you quantify the difference? (v) Can we apply this idea in natural language processing, computer vision, reinforcement learning, etc? (vi) Ethics and fairness in machine learning are being discussed in our community. Can we use this technique to guarantee the ethics and/or the fairness of a machine learning model?

Papers:

Yexiang Xue, Willem-Jan van Hoeve.
Embedding Decision Diagrams into Generative Adversarial Networks.
In Proc. of the Sixteenth International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), 2019. [springer]

Md Masudur Rahman, Natalia Sanchez-Tamayo, Glebys Gonzalez, Mridul Agarwal, Vaneet Aggarwal, Richard M. Voyles, Yexiang Xue, and Juan Wachs.
Transferring Dexterous Surgical Skill Knowledge between Robots for Semi-autonomous Teleoperation.
In ROMAN, 2019. [pdf]

Naveen Madapana, Md Masudur Rahman, Natalia Sanchez-Tamayo, Mythra V. Balakuntala, Glebys Gonzalez, Jyothsna Padmakumar Bindu, L. N. Vishnunandan Venkatesh, Xingguang Zhang, Juan Barragan Noguera, Thomas Low, Richard M. Voyles, Yexiang Xue, and Juan Wachs
DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots.
In IROS, 2019. [pdf]

Matt J. Kusner, Brooks Paige, José Miguel Hernández-Lobato.
Grammar Variational Autoencoder.
In Proceedings of the 34th International Conference on Machine Learning, ICML, 2017. [pdf]

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, Rishabh SinghRobust
Text-to-SQL Generation with Execution-Guided Decoding
[pdf]

Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner
Grammar-based Neural Text-to-SQL Generation
[ArXiv]

Thrust 3: machine learning for scientific discovery and/or social good

Machine learning models have defeated the brightest mind in this world (see the story of AlphaGo). Now, instead of using this technology for game playing, can we harness the tremendous progress in AI and machine learning to make our world a better place? In particular, I am curious at problems that have attracted the smartest minds of man kind historically -- the discovery of new science. Besides scientific discovery, can we use machine learning to create positive social impact?

If you think about it: in AlphaGo, machine learning is used to find a strategy in a highly complex space (all possible moves of Go), which beats all opponent's strategies. The problem is similar for scientific discovery, except that we are now playing Go with nature. For example, in materials discovery, we would like to find the best material in a highly complex space (all possible compositions) which enjoys the best properties. Should the strategy which was proven successful for Go work for scientific discovery (and/or AI for social good)?

I am listing a few example papers below in which machine learning are used successfully for scientific discovery and for social good. I hope this can motivate you to discover a good applicational area of machine learning. The key to the success is to combine your domain of expertise with machine learning.

Papers:

Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert B. van Dover, John Gregoire, and Carla Gomes.
Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
In Proc. 29th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), 2017. [pdf][video 1][video 2][video 3]

Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan LeBras, Brendan H Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, and John M. Gregoire.
Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System.
In American Chemical Society Combinatorial Science, Dec, 2016. [DOI][pdf][video 1][video 2][video 3]

Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes.
Phase Mapper: Accelerating Materials Discovery with AI.
In AI Magazine, Vol. 39, No 1. 2018. [paper]

Yexiang Xue, Ian Davies, Daniel Fink, Christopher Wood, Carla P. Gomes.
Avicaching: A Two Stage Game for Bias Reduction in Citizen Science
In the Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2016. [pdf][supplementary materials][video]

Giuseppe Carleo and Matthias Troyer
Solving the quantum many-body problem with artificial neural networks.
In Science, 355, 2017. [website]

Ganesh Hegde and R. Chris Bowen
Machine-learned approximation to Density Functional Theory Hamiltons.
In Scientific Reports, 7, 2016. [ArXiv]

Graham Roberts, Simon Y. Haile, Rajat Sainju, Danny J. Edwards, Brian Hutchinson and Yuanyuan Zhu
Deep Learning for Semantic Segmentation of Defects in Advanced STEM Images of Steels.
Scientific Reports, volume 9, 2019. [website]

Academic Policies

Late policy

All homework, note-taking will be due on the designated deadlines. There are no extensions. Late homework, note-taking assignments will not be accepted.

Academic honesty

Please read the departmental academic integrity policy. This will be followed unless we provide written documentation of exceptions.

Unless stated otherwise, each student should write up their own solutions independently. You need to indicate the names of the people you discussed a problem with; ideally you should discuss with no more than two other people.
NO PART OF THE STUDENT'S ASSIGMENT (PROJECT, NOTES, ETC) SHOULD BE COPIED FROM ANOTHER STUDENT OR FROM OTHER RESEARCHERS OR FROM THE WEB (Plagiarism). We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures or the textbook, homework specification (but not solution), and general ideas of program implementation (but not the code). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure of this course. We use copy detection software, so do not copy code and make changes (either from the Web or from other students). You are expected to take reasonable precautions to prevent others from using your work.
Any student not following these guidelines are subject to an automatic F (final grade).

Other general course policies can be found here.

Resource

Datasets

eBird citizen scince dataset.

Synthetic and real datasets for materials discovery.

Dataset for the corridor-design problem and landscape optimization problem.

Remote sensing images (a code repository which contains code to download from Google Earth engine).

JIGSAWS dataset for robot visual perception, gesture and skill assessment.

UCI Machine Learning Dataset.

Kaggle.