CS 578: Statistical Machine Learning

Course Information

When: Mon/Wed 4:30 pm -- 5:45 pm.

Where: HAAS G066.

Instructor: Yexiang Xue, yexiang@purdue.edu.

Teaching Assistant: Shamik Roy (roy98@purdue.edu), Pradeep Kumar Srinivasan (sriniv68@purdue.edu)

Office Hour: Yexiang Xue, Mon 4:00 -- 4:30 pm (by appointment, notified at least by 5 am the corresponding Monday).
                      Shamik Roy, 1:15 pm -- 2:15 pm on Tuesdays. Location: HAAS G50.
                      Pradeep Kumar Srinivasan, 1:30 -- 2:30 pm on Thursdays. Location: HAAS G50.

Online discussion is available at Blackboard (mycourses.purdue.edu) and Piazza (https://piazza.com/purdue/spring2019/cs578/home). Course project submission at CMT.

 

Course Description

Machine learning offers a new paradigm of computing – computer systems that can learn to perform tasks by finding patterns in data, rather than by running code specifically written to accomplish the task by a human programmer. The most common machine-learning scenario requires a human teacher to annotate data (identify relevant phenomenon that occurs in the data), and use a machine-learning algorithm to generalize from these examples. Generalization is at the heart of machine learning – how can the machine go beyond the provided set of examples and make predictions about new data. In this class we will look into different machine learning scenarios (supervised and unsupervised), look into several algorithms, analyze their performance and learn the theory behind them.

Prerequisites

(1) Undergraduate level training or coursework in linear algebra, calculus and multivariate calculus, basic probability and statistics; (2) Programming skills: at least master one programming language. Python is highly recommended (self-studying scikit-learn and related packages is expected); (3) Basic skills in using git for maintaining code development. In addition, an undergraduate level course in Artificial Intelligence may be helpful but is not required.

Prerequisite Check: There will be a prerequisite check in the second class of this semester to help both the instructors and yourself determine if your background is ready for this class. The prerequisite check is open-book. You are allowed to bring a one-page letter sized paper as the cheat sheet. However, discussing with other students is not allowed. The score of the prerequisite check DOES NOT contribute to your final scores. However, we will communicate your score and the class's score distribution to you to help you reach a decision.

Text books and Reading Materials

There is no official text book for this class. I will post students' notes along the progress of this course (see note-taking section). Recommended books for further reading include:

A few useful resources:

ML materials:

Math references: Learning Python

For those who are unfamiliar with Python, I strongly encourage you to spend one night learning it by following the official tutorial (see below). I did not know Python until my graduate school. It took me one night to learn it, so can you!

Course Activities and Evaluation

Percentage final score Due (exam) date
Attendance: 5%
Note taking: 10% One week after the lecture
Mid-term exam (close book): 20% 2/28/2019, 8:00 -- 9:30 pm, BRNG 2280.
Final exam (close book): 25% TBD.
Course project proposal: 5% 1/27/2019, 11:59 pm E.T.
Course project reviews: 5% 2/3/2019, 11:59 pm E.T.
Course project mid-term progress report: 5% 3/17/2019, 11:59 pm E.T.
Course project final report: 15% 4/10/2019, 11:59 pm E.T.
Course project final presentation: 10% 4/10/2019, 11:59 pm E.T.

Attendance: since many lectures will be blackboard only, attendance is highly encouraged. Otherwise, students should be prepared with no slides posted online as review materials.

No Cellphone / Laptop Policy during Class: it is distracting to the instructor, your fellow classmates and you (!) to use cellphones or laptops during the class. To respect everybody, we will enforce an official policy that no cellphones or laptops (except for the instructor) are allowed when the class is in progress. Violations will be punished with deductions in the final grade.

Note taking: this course will involve heavy blackboard demonstrations (actual blackboard, not the website). Therefore, note-taking is absolute necessary. Every student is expected to submit the pdf version of the notes for three lectures (assigned by TA). The TA will select the best two notes for each lecture and distribute them as handouts for everybody (posted on blackboard). The students with at least one of their notes selected by TA will receive full credits in the note-taking category. Others will receive partial credits (grading rubrics coming up soon). The notes are due one week after the lecture.

Selective Homeworks: there will be two homeworks, one released on 2/11, the other released on 3/25. Both homeworks are selective, and will NOT be graded. However, the homeworks contain questions that are very similar to those in the mid-term and final exams, so it is highly encouraged that students complete both homeworks prior to the review sessions. The mid-term and final review sessions will be conducted by TAs, and will focus on difficult problems in the homeworks.

Course project: MOST important part of this course. Machine learning is a practical field, so it cannot be emphasized more the importance of completing a machine learning project yourself! In addition, because this is a graduate-level course, one important aspect is basic scientific training, including asking the right questions, commenting others' work, literature review, experimental design, coding, scientific writing, and presentation skills. This can only be trained with a real project. Teamwork: students are encouraged to work as teams of two or three. Projects by one individual are discouraged. To guide you through the project, we split the entire process into five parts: proposal, peer review, mid-term report, final report and presentation. We provide datasets (see below) for your reference. However, you are free to choose any project at your will as long as it relates to machine learning. We encourage you to combine your domain of expertise with machine learning.

Course project proposal: the proposal will be evaluated by intellectural merit and broader impact (same creteria for NSF proposals). The instructor DO respect that it is a course project, so the bar is much lower. However, the following two aspects are emphasized equally: (i) intellectural merit: how does the project advance machine learning (or your understanding on machine learning, or bringing machine learning into a practical field)? (ii) tractability: is this proposal tractable (as a one-semester course project)? [grading rubrics coming up soon.]

Course project reivews: Each student is asked to review at least three proposals of others. The student is asked to review proposals based on intellectural merit, broader impact, and tractability. Peel reviews are safety belts for other students. Unrealistic proposals should be flagged out. Gaming does not work: the grading of the original propsal will NOT be affected by how other students review your proposal. [grading rubrics coming up soon.]

Course project mid-term progress report: Each group is expected to submit a progress report by the deadline. This is to ensure that all projects are progressing on the right track. [grading rubrics coming up soon.]

Course project final report / presentation: The final report and presentation will be graded in a similar way as conference papers (presentations) by the two TAs and the instructor jointly (although the bar is much lower). [grading rubrics coming up soon.]

Syllabus (Tentative)

Time Topic Notes
1/7 Mon. Introduction, machine learning overview, core machine learning concepts.
1/9 Wed. Prerequisite check.
1/14 Mon. k-nearest neighbors.
1/16 Wed. Linear regression; regression with nonlinear basis; regularized regression.
1/21 Mon. Holiday. No class.
1/23 Wed. (Continue the last lecture)
1/28 Mon. Linear discriminant analysis; Perceptron; logistic regression.
1/30 Wed. (Continue the last lecture)
2/4 Mon. Multiclass classification; neural networks.
2/6 Wed. (Continue the last lecture)
2/11 Mon. Convolutional neural nets; kernel methods
2/13 Wed. (Continue the last lecture)
2/18 Mon. Lagrangian duality; support vector machines.
2/20 Wed. (Continue the last lecture)
2/25 Mon. Mid-term review. office hours.
2/27 Wed. Decision trees; boosting
3/4 Mon. (Continue the last lecture)
3/6 Wed. TA session: Machine learning software tutorial.
3/11 Mon. Holiday. No class.
3/13 Wed. Holiday. No class.
3/18 Mon. Clustering; Gaussian mixture models
3/20 Wed. (Continue the last lecture)
3/25 Mon. Probabilistic graphical models; naive Bayes; Markov random fields.
3/27 Wed. (Continue the last lecture)
4/1 Mon. Probabilistic inference. Variable elimination. Sampling. Variational inference.
4/3 Wed. (Continue the last lecture)
4/8 Mon. Reinforcement learning.
4/10 Wed. Final exam review. Office hours.
4/15 Mon. Project presentation (1).
4/17 Wed. Project presentation (2).
4/22 Mon. Project presentation (3).
4/24 Wed. Project presentation (4).

 

Academic Policies

Late policy

All homework, note-taking will be due on the designated deadlines. There are no extensions. Late homework, note-taking assignments will not be accepted.

Academic honesty

Please read the departmental academic integrity policy. This will be followed unless we provide written documentation of exceptions.

Other general course policies can be found here.

Resource

Datasets

eBird citizen scince dataset.

Synthetic and real datasets for materials discovery.

Dataset for the corridor-design problem and landscape optimization problem.

Remote sensing images (a code repository which contains code to download from Google Earth engine).

JIGSAWS dataset for robot visual perception, gesture and skill assessment.

UCI Machine Learning Dataset.

Kaggle.