CS 69000-SML: Statistical Machine Learning II

Semester:	Spring 2017, also offered on Fall 2021 and Spring 2019
Time and place:	Tuesday and Thursday, 9am-10.15am, Felix Haas Hall G066
Instructor:	Jean Honorio
Office hours:	Friday, 1.45pm-2.45pm, Lawson Building 2142-J For appointments outside office hours, please send an e-mail.

Optimization and learning theory are at the heart of every machine learning algorithm. This course is an introduction to aspects of optimization and statistics, relevant to machine learning problems. From the optimization side, we will answer questions such as "where does the dual form of support vector machines come from and why does it work?". From the learning theory side, we will analyze how much training data is needed in order to have good generalization. For instance, we will see how optimization concepts such as the Karush-Kuhn-Tucker conditions allow analyzing the number of measurements needed to efficiently solve a sparse linear system (or a linear regression problem). That is, for n unknowns, only O(log n) measurements are required instead of n, as in the case of nonsparse systems.

Learning Objectives

During the course, students will:

learn how to pose machine learning optimization problems.
learn how to transform those problems into hopefully simpler-to-solve problems.
learn how to solve problems by using different algorithms.
learn how to find upper and lower bounds on the amount of data that a machine learning algorithm requires.

Prerequisites

Basic knowledge from calculus and linear algebra is required. Some knowledge from machine learning, data mining or artificial intelligence is also required (CS57300, CS57800 or CS5900-AI0). Necessary concepts from statistics and optimization will be provided during the course.

Textbooks

For the optimization part, we will use Convex Optimization by Boyd and Vandenberghe
For the learning theory part, we will use lecture notes.

Assignments

There will be up to five homeworks, one midterm exam, one final exam and one project (dates posted on the schedule). The homeworks are to be done individually. The project can be done either individually or in groups of two students.

For the project, you will write a half-page project plan (around 1-2 weeks before the midterm), a 2-4 page preliminary results report (around 1-2 weeks after the midterm) and a 4-8 page final results report (around 1-2 weeks before the final exam). The project should include:

a definition of a machine learning problem, possibly relevant to your interests.
a possible transformation of the original problem into a hopefully simpler-to-solve problem (e.g., by duality, minimization with respect to a subset of variables, etc.)
a full implementation of an algorithm (e.g., gradient descent, Newton, interior-point, etc.)
experimental results (e.g., showing the objective function on the Y axis and the number of iterations on the X axis; using different step sizes; comparing different algorithms; showing the risk on the training set versus the risk on the test set, etc.)

Note that you will have to implement ALL the algorithm from scratch. Using an optimization package is not allowed. I suggest you use Matlab, since you might not have enough time to debug your code if you use other languages such as C/C++. I will not provide any help regarding programming-related issues.

Grading

Homeworks: 25%
Midterm exam: 25%
Final exam: 25%
Project: 25%

Late policy

Assignments are to be submitted by the due date listed. Each person will be allowed seven days of extensions which can be applied to any combination of assignments during the semester. Use of a partial day will be counted as a full day. Extensions cannot be used after the final day of classes. Please, use the extension days wisely!

Assignments will NOT BE accepted if they are more than five days late.

Academic Honesty

Please read the departmental academic integrity policy here. This will be followed unless we provide written documentation of exceptions. We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures and homework specification (but not solution). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure. You are expected to take reasonable precautions to prevent others from using your work.

Additional course policies

Please read the general course policies here.

Schedule

We will cover chapters 1-5,9-11 in Convex Optimization by Boyd and Vandenberghe (B&V).

Date	Topic (Tentative)	Notes
Tue, Jan 10	B&V Chapter 1: introduction	Identities on matrix calculus Notes about the Cauchy-Schwarz inequality [1] [2]
Thu, Jan 12	B&V Chapter 2: convex sets
Tue, Jan 17	B&V Chapter 3: convex functions Examples: equation 1 in [1], section 3 in [2] (not mandatory to be read)	Homework 1: due on Jan 24
Thu, Jan 19	(lecture continues)
Tue, Jan 24	B&V Chapter 4: convex optimization problems	Homework 1 due
Thu, Jan 26	(lecture continues)	Homework 1 solution Homework 2: due on Feb 7
Tue, Jan 31	—
Thu, Feb 2	—
Tue, Feb 7	(lecture continues)	Homework 2 due
Thu, Feb 9	B&V Chapter 5: duality
Tue, Feb 14	(lecture continues)	Homework 2 solution
Thu, Feb 16	(lecture continues)
Tue, Feb 21	B&V Chapter 9: unconstrained optimization	Homework 3: due on Feb 28
Thu, Feb 23	B&V Chapter 10: equality constrained optimization
Tue, Feb 28	B&V Chapter 11: interior-point methods	Homework 3 due
Thu, Mar 2	—	Homework 3 solution Project plan due (see Assignments for details)
Tue, Mar 7	MIDTERM	9am-10.15am at Felix Haas Hall G066
Thu, Mar 9	(midterm solution)	Homework 4: due on Mar 21
Tue, Mar 14	SPRING VACATION
Thu, Mar 16	SPRING VACATION
Tue, Mar 21	Learning Theory 1: Concentration inequalities: Markov's and Chebyshev's	Homework 4 due
Thu, Mar 23	Learning Theory 2: Hoeffding's inequality, empirical risk minimization with a finite hypothesis class
Tue, Mar 28	Learning Theory 3: Fano's inequality, empirical risk minimization with a finite hypothesis class
Thu, Mar 30	(lecture continues)
Tue, Apr 4	Learning Theory 4: Subgradient methods, deterministic and stochastic optimization, convergence rates	Project preliminary report due (see Assignments for details)
Thu, Apr 6	Learning Theory 5: Rademacher complexity
Tue, Apr 11	(lecture continues)	Homework 5: due on Apr 18
Thu, Apr 13	Learning Theory 6: Primal-dual witness method, support recovery
Tue, Apr 18	(lecture continues)	Homework 5 due
Thu, Apr 20	(lecture continues)
Tue, Apr 25	Learning Theory 7: Growth function, Vapnik-Chervonenkis (VC) dimension, Sauer-Shelah lemma, Massart lemma	Homework 5 solution
Thu, Apr 27	(lecture continues)	Project final report due (see Assignments for details)
Mon, May 1	FINAL EXAM	1.00pm-3.00pm at Felix Haas Hall G066