Tuesday and Thursday, 1.30pm-2.45pm, Mathematical Sciences Building 175
Jean Honorio, Lawson Building 2142-J
(Please send an e-mail for appointments)
Machine learning offers a new paradigm of computing — computer systems that can learn to perform tasks by finding patterns in data, rather than by running code specifically written to accomplish the task by a human programmer. The most common machine-learning scenario requires a human teacher to annotate data (identify relevant phenomenon that occurs in the data), and use a machine-learning algorithm to generalize from these examples. Generalization is at the heart of machine learning — how can the machine go beyond the provided set of examples and make predictions about new data. In this class we will look into different machine learning scenarios, look into several algorithms analyze their performance and learn the theory behind them.
A tentative list of topics in supervised learning include: linear and non-linear classifiers, kernels, rating, ranking, collaborative filtering, model selection, complexity, generalization, structured prediction. A tentative list of topics in unsupervised learning and modeling include: mixture models, Bayesian networks, Markov random fields, factor graphs.
During the course, students will:
learn about different supervised and unsupervised problems, and their related algorithms.
implement some of those algorithms.
learn the theory behind some algorithms, e.g., geometrical aspects and generalization.
There will be up to eight homeworks, one midterm exam, one final exam and one project (dates posted on the schedule). The homeworks are to be done individually and in Python. The project is to be done in groups of 4 students.
For the project, you will write a half-page project plan (around 1-2 weeks before the midterm), a 2-4 page preliminary results report (around 1-2 weeks after the midterm) and a 4-8 page final results report (around 1-2 weeks before the final exam).
The project should include:
a definition of the problem, possibly relevant to your interests.
a description of the experimental setup, e.g., cross-validation, parameter tuning, etc.
experimental results, showing not only when the algorithm succeeds but also when the algorithm fails. This might include: plots of number of samples versus accuracy (you can use different subsets of the same dataset), regularization parameter versus accuracy, ROC curves, plots of different datasets, etc.
you are allowed to either implement learning algorithms from scratch or use third-party code (e.g. liblinear). But ANY other thing such as cross-validation, parameter tuning, computing the values for the ROC curve, etc. should be written by yourself.
you can use either Python, C++, MATLAB or Java.
do not spend too much time on things such as "understanding the data", "memory problems because your data is too big", etc. Only if you are already familiar with computer vision, brain data, natural language processing, big data, parallelism, etc. then you can make use of those things, but this will not imply that you will get a higher grade just based on that fact. In general, I would recommend to use easy-to-understand datasets, and smaller subsets of the data, for instance.
Neither I nor the TAs will provide any help regarding programming-related issues.
Midterm exam: 20%
Final exam: 20%
Assignments are to be submitted by the due date listed. Assignments will not be accepted if they are even one minute late. Extensions will be granted only due to serious and documented medical or family emergencies but never after the homework solution is released.
Please read the departmental academic integrity policy here. This will be followed unless we provide written documentation of exceptions. We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures and homework specification (but not solution). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure. You are expected to take reasonable precautions to prevent others from using your work.