CS 592: AI for Scientific Discovery

Course Information

When: Mon/Wed 4:30 pm -- 5:45 pm.

Where: LWSN B134.

Instructor: Yexiang Xue, yexiang@purdue.edu.

Office Hour: Mon. 3:30 pm -- 4:30 pm. Where: LWSN 2142V (or virtual meeting). Appointments need to be filed at least 24 hours in prior.

Course website: https://www.cs.purdue.edu/homes/yexiang/courses/21fall-cs592/index.html.

Notifications and slides will be via Brightspace (https://purdue.brightspace.com/).

Discussions will be via emails, office hours, (virtual) meetings.

Course project submission at CMT (https://cmt3.research.microsoft.com/).


Course Description

Despite the latest progress of Artificial Intelligence (AI), fundamental knowledge gaps need to be addressed before AI can be proven useful to accelerate scientific discovery and design. AI-driven scientific discovery aims at discovering new science knowledge automatically from experiment data, while science-based design searches for better designs guided by scientific knowledge. The main difficulty for AI to be applied in both fields lies in the disconnection between the knowledge learned by neural networks in the form of parameter weights, and human knowledge such as physics laws and constraints. In scientific discovery, such disconnection results in black-box models, where one can hardly verify if any new knowledge has been learned and how the model extrapolates to unseen environments. In science-based design, such disconnection leads to useless designs violating physical rules from pure data-driven approaches.

This class explores ways to advance scientific discovery and science-based desgin via novel AI technologies. The goal of this course has two folds. First, this course intends to expose students with fundamental computational tools to address challenges in scientific discovery and desgin, such as mathematical programming, planning, constraint satisfaction, multi-agent modeling and statistical machine learning. Second, this course intends to motivate students with successful applications of artificial intelligence on real-world problems in scientific discovery and design. We intend to cover successful applications of artificial intelligence in discovering new materials with human computation, ecological monitoring through citizen science programs, etc.

Classes will consist of instructor presentations, student presentations, and group discussions. The first few lectures consist of introductions to basic computational tools, such as constraint programming, probabilistic inference, supervised and unsupervised learning, etc. Then the course moves to discussing successful applications of AI on scientific discovery and design. Students are expected to (1) read, discuss, and present research papers, (2) complete a semester-long class project in groups, (3) review and comment on one class project proposal from other groups.



Basic knowledge of linear algebra and calculus, a basic course in probability and statistics (e.g., STAT301/350/416, CS373), and basic programming skills (e.g., CS381) are required. Students without this background should have a discussion with the instructor prior to registering the course.


Target Students

Graduate or senior undergraduate students with interest in machine learning, data mining and artificial intelligence in general. This course also welcomes students from related fields, such as agriculture, economics, applied math, physics, chemistry, and engineering, who are interested in using computational tools to solve real-world problems in their domain of interest.


Course Activities and Evaluation

% final score Due (exam) date (tentative)
Attendance: 10%
Course presentations: 25% Topic, papers to read, and slides to use due 2 weeks prior to presentation.
Course project proposal: 20% 5 pm, ET, Sep 17.
Course project reviews: 10% 5 pm, ET, Oct 1.
Course project mid-term progress report: 10% 5 pm, ET, Nov 5.
Course project final report (and slides): 15% 5 pm, ET, Dec 3.
Course project final presentation: 10%

Attendance: Attendance is highly encouraged. The instructor will include random in course small quizzes to account for attendance.

Course presentations: Each student needs to present at least one topic in the class. Students are encouraged to form teams in the presentation. The available topics can be seen in the syllabus section.

Course project: MOST important part of this course. AI is a practical field, so it cannot be emphasized more the importance of completing a project yourself! In addition, because this is a graduate-level course, one important aspect is basic scientific training, including asking the right questions, commenting others' work, literature review, experimental design, coding, scientific writing, and presentation skills. This can ONLY be trained with a real project. Each student is encouraged to lead in one research topic.

We provide a few research thrusts and datasets for your reference (see below). You are encouraged to choose a specific project within the overarching theme of one research thrust in the list, although you are free to choose any project at your will as long as it relates to AI for scientific discovery and design. The goal is to nurture GROUND-BREAKING course projects, which have potentials to be developed into innovative research papers in the future. Course projects outside of the suggested thrusts will receive less mentoring from the instructors and the TAs, and therefore are less preferred. We encourage you to combine your domain of expertise with AI. To guide you through the project, we split the entire process into five parts: proposal, peer review, mid-term report, final report and presentation.

Course project proposal: the proposal will be evaluated by intellectural merit, broader impact, and tractability (same creteria for NSF proposals). The instructor DO respect that it is a course project, so the bar is much lower. However, the following three aspects are emphasized equally: (i) intellectural merit: how does the project advance machine learning (or your understanding on machine learning); (ii) broad impact: how does the course project bring impact to a practical field via machine learning? (iii) tractability: is this proposal tractable (as a one-semester course project)? [grading rubrics will be posted on Brightspace.]

Course project reivews: Each student is asked to review at least three proposals of others. The student is asked to review proposals based on intellectural merit, broader impact, and tractability. Peer reviews are safety belts for other students. Unrealistic proposals should be flagged out. We will also host review panels in the class, where reviewers discuss project proposals and give panel summary reviews. Gaming does not work: the grading of the original propsal will NOT be affected by how other students review your proposal. [grading rubrics will be posted on Brightspace.]

Course project mid-term progress report: Each group is expected to submit a progress report by the deadline. This is to ensure that all projects are progressing on the right track. [grading rubrics will be posted on Brightspace.]

Course project final report / presentation: The final report and presentation will be graded in a similar way as conference papers (presentations) by TAs and the instructor jointly (although the bar is much lower). [grading rubrics will be posted on Brightspace.]

Grading Scale

The exact grading scale will be determined at the end of the semester, but use the following as a guideline for the course. The instructor promises that any alterations will be in favor of more generosity, not less.
Grade Score Grade Score Grade Score
A+ 100-96.0 A 95.9-93.0 A- 92.9-90.0
B+ 89.9-86.0 B 85.9-83.0 B- 82.9-80.0
C+ 79.9-76.0 C 75.9-73.0 C- 72.9-70.0
D+ 69.9-66.0 D 65.9-63.0 D- 62.9-60.0

Syllabus (Tentative)

Time Topic Notes
8/23 Mon. Introduction and logistics: Slides and annoucements will be posted on Brightspace (https://purdue.brightspace.com/). Please let the instructor know if you cannot log into the course page on Brightspace.
8/25 Wed. Course overview: AI-driven scientific discovery and design.
8/30 Mon. The Emergence of Non-human Intelligence for Scientific Discovery I
9/1 Wed. The Emergence of Non-human Intelligence for Scientific Discovery II
9/6 Mon. Labor day, no class.
9/8 Wed. Statistical Learning and Optimization: Bayesian networks; Markov logic network; conditional random field.
     History progressed from logic to probability.
9/13 Mon. Statistical Learning and Optimization
9/15 Wed. Probabilistic Inference: variational approach; sampling.
     How to make inference in a statistical relational model?
9/20 Mon. Probabilistic Inference: XOR stream-lining.
     Why is it not always a good idea to do MCMC?
9/22 Wed. Structural Knowledge Learning and Structure Generation: contrasive divergence.
     Why is probabilitic inference a central piece in unsupervised learning?
9/27 Mon. Structural Knowledge Learning and Structure Generation: deep probabilistic embedding; generative adversarial networks.
     How is what we have talked about connected to recent progress in deep unsupervised learning?
9/29 Wed. Crowdsourcing Scientific Discovery
10/4 Mon. Review Panel I for Course Proposals
10/6 Wed. Review Panel II for Course Proposals
10/11 Mon. No class (October Break)
10/13 Wed. Fei Fang
10/18 Mon. Physics-based Animation using Reinforcement Learning
Paper list: [here]
10/20 Wed. Lin Yang
10/25 Mon. Curiosity: Building Artificial Scientists Paper list: [IEEE] [ArXiv] [ArXiv]
10/27 Wed. Noise2Noise: Image Restoration without clean data Paper list: [ArXiv] [ArXiv]
11/1 Mon. Neural Ordinary Differential Equations: Theory and Applications Ideas: [paper]
Theory: [paper]
Application: [paper]
11/3 Wed. Deep Generative Model in Reinforcement Learning VAE: [paper]
GAN: [paper]
CycleGAN: [paper]
StarGAN: [paper]
Dreamer: [paper]
11/8 Mon. Applying Reinforcement Learning on Energy Management Strategy Design for Hybrid Electric Vehicle [paper]
11/10 Wed. Learning Optimization Processes Through Neural Network Architectures [paper]
11/15 Mon. Predicting Human Decision Under Uncertainty [paper]
11/17 Wed. Acclerate Scientific Discovery and Design via Hashing and Randomization
11/22 Mon. Acclerate Scientific Discovery and Design via Human Computation
11/24 Wed. Thanksgiving.
11/29 Mon. AI-driven Active Scientific Discovery I.
12/1 Wed. AI-driven Active Scientific Discovery II.
12/6 Mon. Final project presentation I.
12/8 Wed. Final project presentation II.


Course Project Thrusts

Thrust 1: stochastic optimization: encoding machine learning for decision-making

In data-driven decision-making, we have to reason about the optimal policy of a system given a stochastic model learned from data. For example, one can use a machine learning model to capture the traffic dynamics of a road network. The decision-making problem is: given the traffic dynamics learned from data, what is the most efficient way to travel between a pair of locations? Notice that the solution can change dynamically, depending on the shift in traffic dynamics. As another example in Physics, machine learning models have been used to predict the band-gap of many metal alloy materials. The decision-making problem is: given the machine learning model, what is the best alloy, which is both cheap to synthesize and has a good band-gap property?

The afromentioned examples are stochastic optimization problems, which make robust interventions that maximize the ``expectation'' of stochastic functions learned from data. It arises naturally in many applications ranging from economics, operational research, and artificial intelligence. Stochastic optimization combines two intractable problems, one of which is the inner probablistic inference problem to compute the expectation across exponentially many probabilistic outcomes, and the other of which is the outer optimization problem to search for the optimal policy.

Research questions: (i) if the inner machine learning model is a decision tree, can you compute the optimal policy in polynomial time? How? (ii) What if the inner machine learning model is a logistic regression, a linear SVM, a kernerized SVM, a random forest, or a probabilistic graphical model? (iii) What if the machine learning model is temporal, such as a recurrent neural netowrk or a LSTM? (iv) In case the inner probabilistic inference problem is intractable, existing approaches to solve stochastic optimization problems approximate the intractable probabilistic inference sub-problems either in variational forms, or via the empirical mean of pre-computed, fixed samples. There is also a recent approach which approximates the intractable sub-problems with optimization queries, subject to randomized constraints (see following papers). Question: how does various approximation schemes of the inner machine learning models affect the overall solution quality of the stochastic optimization problem? (v) Suppose we are solving one stochastic optimization problem for a specific application, can we adapt existing approximation schemes in any way to fit the problem instance for better results?


Yexiang Xue, Zhiyuan Li, Stefano Ermon, Carla P. Gomes, Bart Selman.
Solving Marginal MAP Problems with NP Oracles and Parity Constraints
In the Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), 2016. [pdf] [spotlight video]

Anton J. Kleywegt, Alexander Shapiro, and Tito Homem-de Mello.
The sample average approximation method for stochastic discrete optimization.
SIAM Journal on Optimization, 2002. [pdf]

Miguel Á. Carreira-Perpiñán and Geoffrey E. Hinton.
On contrastive divergence learning.
AISTATS, 2005. [pdf]

Martin Dyer and Leen Stougie.
Computational complexity of stochastic programming problems. Mathematical Programming, 2006. [springer]

John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.
Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML, 2001. [pdf]

Stefano Ermon, Carla Gomes, Ashish Sabharwal, and Bart Selman.
Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization
In Proc. 30th International Conference on Machine Learning (ICML) 2013. [pdf]

Carla P. Gomes, Ashish Sabharwal, Bart Selman.
Near-Uniform Sampling of Combinatorial Spaces Using XOR Constraints.
NIPS 2006. [pdf]

Carla P. Gomes, Willem Jan van Hoeve, Ashish Sabharwal, Bart Selman.
Counting CSP Solutions Using Generalized XOR Constraints.
AAAI 2007. [pdf]

Yexiang Xue*, Xiaojian Wu*, Bart Selman, and Carla P. Gomes.
XOR-Sampling for Network Design with Correlated Stochastic Events.
In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [pdf]
* indicates equal contribution.

Yexiang Xue, Xiaojian Wu, Dana Morin, Bistra Dilkina, Angela Fuller, J. Andrew Royle, and Carla Gomes.
Dynamic Optimization of Landscape Connectivity Embedding Spatial-Capture-Recapture Information.
In Proc. 31th AAAI Conference on Artificial Intelligence (AAAI), 2017. [pdf] [supplementary materials]

Thrust 2: embedding physical constraints into deep neural networks

The emergence of large-scale data-driven machine learning and optimization methodology has led to successful applications in areas as diverse as finance, marketing, retail, and health care. Yet, many application domains remain out of reach for these technologies, when applied in isolation. In the area of medical robotics, for example, it is crucial to develop systems that can recognize, guide, support, or correct surgical procedures. This is particularly important for next-generation trauma care systems that allow life-saving surgery to be performed remotely in presence of unreliable bandwidth communications. For such systems, machine learning models have been developed that can recognize certain commands and procedures, but they are unable to learn complex physical or operational constraints. Constraint-based optimization methods, on the other hand, would be able to generate feasible surgical plans, but currently, have no mechanism to represent and evaluate such complex environments. To leverage the required capabilities of both technologies, we have to find an integrated method that embeds constraint reasoning in machine learning.

In a seminal paper, the authors proposed an approach, which provides a scalable method for machine learning over structured domains. The core idea is to augment machine learning algorithms with a constraint reasoning module that represents physical or operational requirements. Specifically, the authors propose to embed decision diagrams, a popular constraint reasoning tool, as a fully-differentiable layer in deep neural networks. By enforcing the constraints, the output of generative models can now provide assurances of safety, correctness, and/or fairness. Moreover, this approach enjoys a smaller modeling space than traditional machine learning approaches, allowing machine learning algorithms to learn faster and generalize better.

Research questions: (i) are there any other ways to enforce physical constraints other than using a decision diagram in the seminal work? (ii) What if the constraints are too complicated which cannot be fully captured by a decision diagram? (iii) In a specific applicational domain, is there a better way to encode constraints? (iv) Does enforcing physical constraints make machine learning easier or more difficult? Can you quantify the difference? (v) Can we apply this idea in natural language processing, computer vision, reinforcement learning, etc? (vi) Ethics and fairness in machine learning are being discussed in our community. Can we use this technique to guarantee the ethics and/or the fairness of a machine learning model?


Yexiang Xue, Willem-Jan van Hoeve.
Embedding Decision Diagrams into Generative Adversarial Networks.
In Proc. of the Sixteenth International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research (CPAIOR), 2019. [springer]

Md Masudur Rahman, Natalia Sanchez-Tamayo, Glebys Gonzalez, Mridul Agarwal, Vaneet Aggarwal, Richard M. Voyles, Yexiang Xue, and Juan Wachs.
Transferring Dexterous Surgical Skill Knowledge between Robots for Semi-autonomous Teleoperation.
In ROMAN, 2019. [pdf]

Naveen Madapana, Md Masudur Rahman, Natalia Sanchez-Tamayo, Mythra V. Balakuntala, Glebys Gonzalez, Jyothsna Padmakumar Bindu, L. N. Vishnunandan Venkatesh, Xingguang Zhang, Juan Barragan Noguera, Thomas Low, Richard M. Voyles, Yexiang Xue, and Juan Wachs
DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots.
In IROS, 2019. [pdf]

Matt J. Kusner, Brooks Paige, José Miguel Hernández-Lobato.
Grammar Variational Autoencoder.
In Proceedings of the 34th International Conference on Machine Learning, ICML, 2017. [pdf]

Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Oleksandr Polozov, Rishabh SinghRobust
Text-to-SQL Generation with Execution-Guided Decoding

Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner
Grammar-based Neural Text-to-SQL Generation

Thrust 3: machine learning for scientific discovery and/or social good

Machine learning models have defeated the brightest mind in this world (see the story of AlphaGo). Now, instead of using this technology for game playing, can we harness the tremendous progress in AI and machine learning to make our world a better place? In particular, I am curious at problems that have attracted the smartest minds of man kind historically -- the discovery of new science. Besides scientific discovery, can we use machine learning to create positive social impact?

If you think about it: in AlphaGo, machine learning is used to find a strategy in a highly complex space (all possible moves of Go), which beats all opponent's strategies. The problem is similar for scientific discovery, except that we are now playing Go with nature. For example, in materials discovery, we would like to find the best material in a highly complex space (all possible compositions) which enjoys the best properties. Should the strategy which was proven successful for Go work for scientific discovery (and/or AI for social good)?

I am listing a few example papers below in which machine learning are used successfully for scientific discovery and for social good. I hope this can motivate you to discover a good applicational area of machine learning. The key to the success is to combine your domain of expertise with machine learning.


Yexiang Xue, Junwen Bai, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Johan Bjorck, Liane Longpre, Santosh K. Suram, Robert B. van Dover, John Gregoire, and Carla Gomes.
Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery.
In Proc. 29th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI), 2017. [pdf][video 1][video 2][video 3]

Santosh K. Suram, Yexiang Xue, Junwen Bai, Ronan LeBras, Brendan H Rappazzo, Richard Bernstein, Johan Bjorck, Lan Zhou, R. Bruce van Dover, Carla P. Gomes, and John M. Gregoire.
Automated Phase Mapping with AgileFD and its Application to Light Absorber Discovery in the V-Mn-Nb Oxide System.
In American Chemical Society Combinatorial Science, Dec, 2016. [DOI][pdf][video 1][video 2][video 3]

Junwen Bai, Yexiang Xue, Johan Bjorck, Ronan Le Bras, Brendan Rappazzo, Richard Bernstein, Santosh K. Suram, Robert Bruce van Dover, John M. Gregoire, Carla P. Gomes.
Phase Mapper: Accelerating Materials Discovery with AI.
In AI Magazine, Vol. 39, No 1. 2018. [paper]

Yexiang Xue, Ian Davies, Daniel Fink, Christopher Wood, Carla P. Gomes.
Avicaching: A Two Stage Game for Bias Reduction in Citizen Science
In the Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2016. [pdf][supplementary materials][video]

Giuseppe Carleo and Matthias Troyer
Solving the quantum many-body problem with artificial neural networks.
In Science, 355, 2017. [website]

Ganesh Hegde and R. Chris Bowen
Machine-learned approximation to Density Functional Theory Hamiltons.
In Scientific Reports, 7, 2016. [ArXiv]

Graham Roberts, Simon Y. Haile, Rajat Sainju, Danny J. Edwards, Brian Hutchinson and Yuanyuan Zhu
Deep Learning for Semantic Segmentation of Defects in Advanced STEM Images of Steels.
Scientific Reports, volume 9, 2019. [website]

Academic Policies

Late policy

Assignments are to be submitted by the due date listed. Each person will be allowed two days of extensions which can be applied to any combination of assignments (homework/projects only; exams excluded) during the semester without penalty. After that, a late penalty of 15% per day will be assigned. The use of a partial day will be counted as a full day. Use of extension days must be stated explicitly at the time of the late submission (by accompanying email to ALL TAs and the instructor), otherwise, late penalties will apply. Extensions cannot be used after the final day of classes (ie., December 8). Extension days cannot be rearranged after they are applied to a submission. Additional no-penalty late days may be introduced in the later part of the semester conditioned on the completion of the course evaluations (details to be finalized). Assignments, project reports, etc, will NOT BE accepted if they are more than five days late (and receive zero points). Additional extensions will be granted only due to serious and documented medical or family emergencies. Use the late days wisely!

Attendance Policy during COVID-19

Students are expected to attend all classes remotely unless they are ill or otherwise unable to attend class. If they feel ill, have any symptoms associated with COVID-19, or suspect they have been exposed to the virus, students should stay home and contact the Protect Purdue Health Center (496-INFO).

In the current context of COVID-19, in-person attendance cannot be a factor in the final grades. However, timely completion of alternative assessments can certainly be part of the final grade. Students need to inform the instructor of any conflict that can be anticipated and will affect the timely submission of an assignment or the ability to take an exam.

Classroom engagement is extremely important and associated with your overall success in the course. The importance and value of course engagement and ways in which you can engage with the course content even if you are in quarantine or isolation, will be discussed at the beginning of the semester. Student survey data from Fall 2020 emphasized students’ views of in-person course opportunities as critical to their learning, engagement with faculty/TAs, and ability to interact with peers.

Only the instructor can excuse a student from a course requirement or responsibility. When conflicts can be anticipated, such as for many University-sponsored activities and religious observations, the student should inform the instructor of the situation as far in advance as possible. For unanticipated or emergency conflicts, when advance notification to an instructor is not possible, the student should contact the instructor/instructional team as soon as possible by email, through Brightspace, or by phone. In cases of bereavement, quarantine, or isolation, the student or the student’s representative should contact the Office of the Dean of Students via email or phone at 765-494-1747. Our course Brightspace includes a link to the Dean of Students under ‘Campus Resources.’

Academic Guidance in the Event a Student is Quarantined/Isolated

If you must quarantine or isolate at any point in time during the semester, please reach out to me via email so that we can communicate about how you can continue to learn remotely. Work with the Protect Purdue Health Center (PPHC) to get documentation and support, including access to an Academic Case Manager who can provide you with general guidelines/resources around communicating with your instructors, be available for academic support, and offer suggestions for how to be successful when learning remotely. Your Academic Case Manager can be reached at acmg@purdue.edu. Importantly, if you find yourself too sick to progress in the course, notify your academic case manager and notify me via email or Brightspace. We will make arrangements based on your particular situation.

Academic honesty

Please read the departmental academic integrity policy. This will be followed unless we provide written documentation of exceptions.

Classroom Guidance Regarding Protect Purdue (in case students use common spaces for studying)

The Protect Purdue Plan, which includes the Protect Purdue Pledge, is campus policy and as such all members of the Purdue community must comply with the required health and safety guidelines. Required behaviors in this class include: staying home and contacting the Protect Purdue Health Center (496-INFO) if you feel ill or know you have been exposed to the virus, properly wearing a mask in classrooms and campus building, at all times (e.g., mask covers nose and mouth, no eating/drinking in the classroom), disinfecting desk/workspace before and after use, maintaining appropriate social distancing with peers and instructors (including when entering/exiting classrooms), refraining from moving furniture, avoiding shared use of personal items, maintaining robust hygiene (e.g., handwashing, disposal of tissues) prior to, during and after class, and following all safety directions from the instructor.

Students who are not engaging in these behaviors (e.g., wearing a mask) will be offered the opportunity to comply. If non-compliance continues, possible results include instructors asking the student to leave class and instructors dismissing the whole class. Students who do not comply with the required health behaviors are violating the University Code of Conduct and will be reported to the Dean of Students Office with sanctions ranging from educational requirements to dismissal from the university.

Any student who has substantial reason to believe that another person in a campus room (e.g., classroom) is threatening the safety of others by not complying (e.g., not properly wearing a mask) may leave the room without consequence. The student is encouraged to report the behavior to and discuss the next steps with their instructor. Students also have the option of reporting the behavior to the Office of the Student Rights and Responsibilities. See also Purdue University Bill of Student Rights.

Nondiscrimination Statement

Purdue University is committed to maintaining a community which recognizes and values the inherent worth and dignity of every person; fosters tolerance, sensitivity, understanding, and mutual respect among its members; and encourages each individual to strive to reach his or her potential. In pursuit of its goal of academic excellence, the University seeks to develop and nurture diversity. The University believes that diversity among its many members strengthens the institution, stimulates creativity, promotes the exchange of ideas, and enriches campus life. A hyperlink to Purdue’s full Nondiscrimination Policy Statement is included here.


Purdue University strives to make learning experiences as accessible as possible. If you anticipate or experience physical or academic barriers based on disability, you are welcome to let me know so that we can discuss options. You are also encouraged to contact the Disability Resource Center at: drc@purdue.edu or by phone: 765-494-1247.

Mental Health/Wellness Statement

If you find yourself beginning to feel some stress, anxiety and/or feeling slightly overwhelmed, try WellTrack. Sign in and find information and tools at your fingertips, available to you at any time.

If you need support and information about options and resources, please contact or see the Office of the Dean of Students. Call 765-494-1747. Hours of operation are M-F, 8 am- 5 pm.

If you find yourself struggling to find a healthy balance between academics, social life, stress, etc. sign up for free one-on-one virtual or in-person sessions with a Purdue Wellness Coach at RecWell. Student coaches can help you navigate through barriers and challenges toward your goals throughout the semester. Sign up is completely free and can be done on BoilerConnect. If you have any questions, please contact Purdue Wellness at evans240@purdue.edu.

If you’re struggling and need mental health services: Purdue University is committed to advancing the mental health and well-being of its students. If you or someone you know is feeling overwhelmed, depressed, and/or in need of mental health support, services are available. For help, such individuals should contact Counseling and Psychological Services (CAPS) at 765-494-6995 during and after hours, on weekends and holidays, or by going to the CAPS office on the second floor of the Purdue University Student Health Center (PUSH) during business hours.

Emergency Preparation

In the event of a major campus emergency, course requirements, deadlines and grading percentages are subject to changes that may be necessitated by a revised semester calendar or other circumstances beyond the instructor’s control. Relevant changes to this course will be posted onto the course website or can be obtained by contacting the instructors or TAs via email or phone. You are expected to read your @purdue.edu email on a frequent basis.

Other general course policies can be found here.



eBird citizen scince dataset.

Synthetic and real datasets for materials discovery.

Dataset for the corridor-design problem and landscape optimization problem.

Remote sensing images (a code repository which contains code to download from Google Earth engine).

UCI Machine Learning Dataset.


Online Resources

Machine learning:

Math references: Learning Python

For those who are unfamiliar with Python, I strongly encourage you to spend one night learning it by following the official tutorial (see below). I did not know Python until my graduate school. It took me one night to learn it, so can you!