# CS 59200-MML: Information Theoretic Methods for Machine Learning and Statistics

## Course Information

Semester: Fall 2021
Time: Mondays and Wednesdays, 4:30-5:45pm
Location: REC 103
Instructor: Anuran Makur, Email: amakur[at]purdue[dot]edu
Office Hours: By appointment (please email instructor)

NOTE: All further correspondence from hereon will take place through Brightspace.

## Course Description

This is an advanced graduate level 3 credit seminar course that will delve into information theoretic (and related mathematical) techniques that have become prevalent in the analysis of machine learning and statistics problems in recent times. For instance, information measures like mutual information often emerge naturally in several machine learning contexts, such as in learning graphical model structures. In high-dimensional statistics, the most general approaches to developing sample complexity lower bounds for minimax estimation problems are information theoretic. Moreover, two of the most popular methods for establishing concentration of measure inequalities, which are the workhorses of modern non-asymptotic analysis in machine learning and statistics, are also information theoretic in nature. We will cover the mathematical foundations of these techniques (and more) in this course.

Since this is a seminar course, with the exception of a few introductory lectures from the instructor, the course will consist of student-led lectures on a list of pertinent topics chosen by the instructor. Students will base their presentations on a combination of carefully selected research papers, monographs, and graduate level textbooks. Note that this course will count towards both MS and PhD plans of study in CS.

## Prerequisites

To appreciate the fast-paced arguments in this course, students require a certain level of mathematical maturity. Specifically, they should be proficient in probability theory, should have had exposure to basic analysis proofs, and should have completed standard courses in calculus, linear algebra, and discrete mathematics. Some knowledge of information theory and high-dimensional statistics is helpful, but not required; we will try to introduce the relevant concepts from scratch.

## Syllabus

This course will cover the following five major topics:

• Information measures: $$f$$-divergences, information inequalities, strong data processing inequalities, and applications
• Concentration of measure: large deviations theory, Poincaré and logarithmic Sobolev inequalities, entropy and transportation methods, hypercontractivity
• Minimax estimation theory: Cramér-Rao bound, van Trees inequality, Le Cam's two point method, Fano's method
• Metric entropy methods: covering and packing, Gaussian comparison lemmata, Dudley's entropy integral, generic chaining
• Generalization bounds: Vapnik-Chervonenkis dimension, Rademacher complexity, information theoretic approaches

Students will be evaluated based on their attendance and participation during lectures/presentations, around 2 class presentations per individual (depending on enrollment), and lecture notes that they will prepare on their designated presentation topics. The breakdown of grading is as follows:

• Participation: 20%
• Presentation: 40%
• Lecture Notes: 40%

Participation: Students are required to attend all lectures/presentations (subject to Protect Purdue guidelines). Furthermore, students are expected to do the required reading before lectures, and engage with presenters during classes.

Presentation: Each student will prepare around 2 presentations (depending on enrollment) on the topics given in the schedule below, and present to their peers during classes. Each presentation will be around 65 minutes in duration with the remaining time left for questions. Students must select 2 presentation topics from 2 different parts of the schedule for instructor approval by Friday, August 27, 2021 in the Google sheets link found in the welcome email; please fill in your name and email next to your chosen topics. The instructor will post relevant references at least 1 week before each lecture. Moreover, for involved topics, the instructor will provide specific advice to presenters on what to include in their presentations. The final presentation schedule will be posted in the "Presentation Schedule" section on Brightspace.

Lecture Notes: Each student will prepare lecture notes corresponding to their presentations. You must write up these lecture notes by yourself, try to follow the notational conventions and storyline of the class, and carefully cite all references, i.e., any references proposed by the instructor and any additional references you use. The lecture notes for each presentation will be around 5 pages long excluding references, and you will use $$\LaTeX$$ for typesetting them. Please use the "template" provided in the "Lecture Notes" section on Brightspace, and be sure to submit the PDFs and all source files in a zipped folder to the "Lecture Notes 1" and "Lecture Notes 2" assignments. (See the instructor's lecture notes for examples.) The lecture notes for each presentation will be due at 4:30pm exactly 1 week after the presentation. For example, if you give two presentations on August 25 and September 8, respectively, then your first lecture notes will be due on September 1, 4:30pm, and your second lecture notes will be due on September 15, 4:30pm. The only exceptions will be any presentations on December 6 and 8; lecture notes corresponding to these presentations will be due at 11:59pm on December 10. All lecture notes will be posted in the "Lecture Notes" section on Brightspace. Please do not distribute these notes. They are for your personal educational use only.

## Schedule

This is subject to small changes over the course of the semester.

### Part 1: Information Measures

Date Lecture
Mon, 08/23 Introduction and course overview
Wed, 08/25 1. $$f$$-divergences I
Mon, 08/30 1. $$f$$-divergences II
Wed, 09/01 2. Basic information inequalities
Mon, 09/06 NO CLASS (Labor Day)
Wed, 09/08 3. Strong data processing inequalities I
Mon, 09/13 3. Strong data processing inequalities II

### Part 2: Concentration of Measure

Date Lecture
Wed, 09/15 1. Basic tail bounds
Mon, 09/20 2. Concentration bounds
Wed, 09/22 3. Large deviations theory
Mon, 09/27 4. Functional inequalities
Wed, 09/29 5. Entropy method
Mon, 10/04 6. Transportation method

### Part 3: Minimax Estimation Theory

Date Lecture
Wed, 10/06 1. Binary hypothesis testing
Mon, 10/11 NO CLASS (October Break)
Wed, 10/13 1. Hypothesis testing asymptotics
Mon, 10/18 2. Cramér-Rao bounds I
Wed, 10/20 2. Cramér-Rao bounds II
Mon, 10/25 3. Le Cam's two point method
Wed, 10/27 4. Fano's method

### Part 4: Metric Entropy Methods

Date Lecture
Wed, 11/03 1. Covering and packing
Mon, 11/08 2. Gaussian comparison lemmata
Wed, 11/10 3. Sudakov minoration
Mon, 11/15 4. Chaining
Wed, 11/17 5. KL divergence covering

### Part 5: Generalization Bounds

Date Lecture
Mon, 11/22 1. Vapnik-Chervonenkis theory
Wed, 11/24 NO CLASS (Thanksgiving Vacation)