Math 490, Mathematics of Machine Learning, Fall 2019

Professor:Dr. Kay Kirkpatrick
Contact:231 Illini Hall, kkirkpat(at)
TA:Sophie Phuong Le, phuong2(at)
Course site:
Lectures: MWF 2:00-2:50pm in 245 Altgeld Hall through August 30; then 341 Altgeld Hall starting on September 4.
Office hours: Mondays and Fridays, 3:00-3:50, or by appointment. I would be happy to answer your questions in my office anytime as long as I'm not otherwise engaged, and before and after class are good times to catch me either in my office or in the classroom.
Textbook: The main text will be Understanding Machine Learning: From Theory to Algorithms, 1st Edition by Shai Shalev-Shwartz and Shai Ben-David:
Grading policy: Homework: 40% of the course grade
Two Midterms: 15% each, one on October 11, and one on December 11. Please let me know as soon as possible if you need any accommodation.
Final Project: 30%, on a topic of your choice related to the course. See HW#3 for options. Our assigned final exam time is 8:00-11:00am, Monday, December 16. Attendance/participation for everyone (not just speakers) will be graded during the final exam period, estimated to run from 10am to 11am on Dec. 16, in the usual classroom. Exceptions are allowed for medical and other serious reasons.
Inclusion and Justice: I am committed to affirming the identities, realities and voices of all students, especially students from historically marginalized or under-represented backgrounds. I value the use of self-specified gender pronouns, and I require respect for everybody. Please contact me to receive disability accommodations. You should also know that I'm a mandatory reporter. My pronouns are she/her/hers.

Homework (due Fridays in class or to my email address or my mailbox in AH 250 by the end of class): You are encouraged to work together on the homework, but I ask that you write up your own solutions and turn them in separately.

Late homework will not be graded, so I will drop your two lowest homework scores.

HW #1 is due the second Friday by 5pm: please send me an email introducing yourself: for instance, what name you prefer to be called, your major and hobbies, why you're interested in probability, or anything else you want to share or have questions about. It would also be helpful to attach a photo of yourself to help me connect your face with your name. Please also answer the questions on the info sheet here.

HW #2 is due Fri 9/13 in class: available here, but excluding the book problem 2.1.
Solutions here.

HW #3 is due Fri 9/20:
Part 1) Textbook problems: p. 20 #2.1, p. 30 #3.6, p. 35 #4.2, p. 41 #5.1, due by email (with copy to the TA) or to my mailbox in AH 250 by 2:50pm, because class on Friday 9/20 is cancelled.
Part 2) Email me by 5pm (with copy to the TA) about your final project, indicating (a) whether you'll pick a talk, paper, poster or webpage, (b) which topics you are considering, (c) what is your motivation, and (d) 2 or 3 references you will work from--books, articles, and technical websites are all fine. Talks will be 15 minutes each, subject to time constraints. Papers will be 3-10 pages. Webpage will be 2-5 screens long, possibly with an interactive simulation that you program.
Solutions here.

ANNOUNCEMENT: I will be giving grace period(s) for emailed homework parts related to the final project (but not problem-set HWs): each of you has 48 hours total of automatic grace period(s), which can be used as two 24-hour grace periods or one 48-hour one, etc. In order to use some grace hours, please email me by the time that the HW is due, letting me know how many hours of grace period you are taking, i.e., what time I can expect your completed HW to be sent.

HW #4 is due Fri 9/27:
Part 1) due by the end of class: available here.
Part 2) due by 5pm: read "How to give a good colloquium" by John McCarthy, and email both prof and TA a few sentences describing a) a piece of anyone's speaking advice that you have used in the past and how well it worked, and b) a piece of McCarthy's advice that you will try in the future. Solutions here.

HW #5 is due Fri 10/4 in class:
Part 1) Before you do the other HW items, read John Lee's essay "Some Remarks on Writing Mathematical Proofs" and
Part 2) Put his advice into practice for the rest of the HW in at least 2 specific ways that you flag for the grader with citations. The remaining HW problems will be from the textbook: Ch. 6: #6.1, #6.4, #6.8.

The first exam will be October 11, covering all of the course material (lectures and HWs) up to and including the Friday before the exam, Oct 4.

HW #6 due Fri 10/18 by the end of class, textbook problems: #9.1, #9.6, #10.2 Solutions here.

HW #7 due Fri 10/25 by the end of class:
Part 1) textbook problems: #10.3, #12.1, and #12.3. Solutions here.
Part 2) Read the writing blogpost Five common writing mistakes, and email Kay and Sophie with a two-sentence summary of what you learned from it.

HW #8 due Fri 11/1 to Kay (or her email with subject line "[Math 490] HW#8 ..." or her mailbox in AH 250) by 5pm: Proposal for final project.
1. Identify your topic and your thesis statement (see link).
2. Think about your audience: your classmates, not just me; people in STEM, not just in your field. Answer two or more of the following questions: Why should your audience care? What do you want them to take away from your project? How can you clarify the benefits of your project to your audience?
3. Specifics: What kinds of audiovisual aids will you be choosing to use? Your project should have at least one item of visual interest (picture, simulation, etc.), and at least one item of technical interest (theorem, algorithm, etc.). Which two or three definitions or key ideas will you introduce to your audience? What is a good example (think n=2) that illustrates the main point of your project? Can you find a story that's related to your topic?
4. Describe the main messages of at least 3 references that you are using, i.e., what are their thesis statements?

HW #9 due Fri 11/8 by email to Kay and Phuong by 5pm:
(1) Read this review of Safiya Umoja Noble's Algorithms of Oppression, and write a two sentence summary.
(2) Read one of the following on technical communication that is relevant to your final project, and write a two-sentence summary with subject line: [Math 490] HW#9 readings ...
"How to Talk Mathematics" by Paul Halmos
Doumont's downloadable booklet on slide design for scientific talks
"Slides are not all evil" by Jean-luc Doumont.
"How to give a good 20-minute math talk" by William Ross
"The Science of Scientific Writing" by Gopen and Swan
"How to Write Mathematics" by Halmos.

HW #10 due Fri 11/15 by the end of class: textbook problems #12.4, #13.1, #13.3, #14.1, #14.2. Also, please email me about if you're giving a talk or not--I'd like a preliminary head-count for planning our final exam meeting time, and the final exam is replaced by talks, as you know. Solutions here. Solutions here.

HW#11 due Nov 22 by 3pm: part A) textbook problem #15.1. Solutions here.
Plus this revising exercise B) trimming words. Option 1): Take an old email of yours to someone important that was too long (more than 2-3 paragraphs of 2-3 lines each), and trim it down without losing key information. You may fictionalize/redact names, etc., for privacy. Include word counts before and after: the after count should be no more than 85% of the before count. Option 2): Revise two slides of a talk, maybe yours or someone else's, according to the principles that you've learned. This may include finding or drawing a picture to illustrate the main point of the slide, or making the wording more efficient to reduce word-wrap. The result should look something like this example, with four slides: two originals and two improved versions (hand-drawn is fine).

HW#12, due Dec 6:
Part 1) book problems 15.3 and 16.3 due by the end of class Friday. Solutions here.
Part 2) First draft of your final project due by the end of the week. For a paper, you should have at least 1.5 pages of text (12 pt font; 1 to 1.5 spacing), about half a page outlining the remainder, a figure, and citations. For a talk, you should have at least 8 finished slides, plus the remainder of the talk outlined (e.g., headlines only). If you turn your first draft in as hard-copy, I can mark it up with specific suggestions; if you turn it in electronically, I will reply by email with more general comments/suggestions. If you'd like a particular kind of feedback (e.g., if you hand in hard-copy but you only want general suggestions), please let me know.

Other resources, especially for editing help:
UIUC writing center:
Purdue Online Writing Lab:

NO HW due the week of December 9. Instead there is Exam 2 in class on Dec 11. Here's a practice exam.

Extra credit opportunity due by December 16: a book review, based on a book that will help you with your final project, which you can borrow from my office or get on your own. For this extra credit, you should read (at least) part of the book, summarize its main points, find a nice quotation from it, and provide a recommendation of who should read it. Please also return the book by the final :)

Revising advice from Stephen B. Heard, The Scientist's Guide to Writing, p. 198ff.
1. Read for self-revision at the time of day that you think least clearly.
2. Change your font to something unfamiliar or strange-looking.
3. Read your draft out loud.
4. Remind yourself to read like a reader, not like the writer.
5. Check for unclear pronoun antecedents: this, that, these, and those.
6. Does your topic sentence of each paragraph cohere with the rest of the paragraph?
7. Do you have transitions between paragraphs and sections? How are the different topics related?

All lecture videos are available here.

Week 1: Introduction: probability background, ML set-up, defining error/loss/risk, Empirical Risk Minimizatiom algorithm, sections 2.2-2.3 (including Appendix B as needed)
Monday: lecture notes and example of how to use the CLT
Wednesday slides
Friday lecture notes

Week 2: Defining learnability, Sections 2.3-3.2
Wednesday lecture notes
Friday lecture notes

Week 3: An example not in the book; Sections 3.2-5.1, exercise 4.1, and the beginning of the proof of the No Free Lunch Theorem.
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 4: Sections 5.1-6.3
Monday lecture notes which has the probability lemma and proof that we used on 9/13.
Wednesday lecture notes
Friday there was no class.

Week 5: Sections 6.3-6.5.2
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 6: Sections 6.5.2-9.2
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 7: Sections 9.2-10.1
Monday lecture notes
Wednesday lecture notes
Friday [EXAM1] 2018 midterm 1

Week 8: Chapter 10
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 9: Chapter 12
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 10: Chapter 13
Monday lecture notes
Wednesday lecture notes (has a typo fix of Monday's page 3)
Friday lecture notes

Week 11: Chapter 14
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 12: Chapter 15
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 13: Chapter 16
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

Week 14: Chapter 20
Monday lecture notes
Wednesday lecture notes
Friday lecture notes

All final paper/webpages/etc are due by email by the end of the morning, Monday, Dec. 16, and speakers should email me final slides before the final exam period (and use their own machines to project unless otherwise agreed upon). Attendance/participation for everyone will be graded during the final exam period, estimated to run from 10am to 11am on Dec. 16, in the usual classroom. Exceptions are allowed for medical and other serious reasons.

Departures from "traditional" machine learning will be encouraged for the final project, and you have a choice of paper (3-10 pages), talk (15 minutes), poster (regular size), or webpage (3-6 screens) Here's an example of a project webpage:

Final project ideas: summary/analysis of anything by Alan Turing, Sarah M. Brown (, Timnit Gebru (, Sanmi Koyejo (, Giuseppe Longo (
Bengio's The Consciousness Prior (
Neuronal Synchrony in Complex-Valued Deep Networks (
Quantum computing and quantum information: Peter Shor's famous paper, Deutsch and Marletto.
Biological phenomena that don't fit the DNA=code metaphor, e.g., Denis Noble's work.
Ethics and algorithms: Cathy O'Neil's book Weapons of Math Destruction; Safiya Umoja Noble's book Algorithms of Oppression.
Reproducibility crises in psychology and ML/AI.
Spurious correlations in Big Data: Calude and Longo, 2016.
Backpropagation vs. neural back-feed of information to pre-synaptic neuron.
Causality and causal inference: Judea Pearl's work.


This is an advanced course on the mathematics of machine learning, and some probability/statistics and programming are prerequisites for this course. Machine learning is a growing field at the intersection of probability, statistics, optimization, and computer science, which aims to develop algorithms for making predictions based on data. This course will cover foundational models and mathematics for machine learning, including statistical learning theory and neural networks, with a project component.

Topics: We will be covering most of the topics in Chapters 2, 3, 4, 5, 6, 9, 10, 12, 13, 14, 15, 16, 19, and 20. We will skip most of Chapters 7, 8, 11, 17, and 18. At the end of the semester, time permitting, we will cover some of my recent research. The final exam period will probably be spent on talks by your classmates, so please plan to be here then.

Why work on your communication skills?

"It usually takes me more than three weeks to prepare a good impromptu speech." --Mark Twain

I think that success in your career (any career) depends in part on how well you communicate your ideas and persuade other people, so I am giving you a chance to learn and practice good writing or presenting skills. Some of the homework assignments will lead up to the final project, for which you will have a choice of topic (related to the course) and of communication format (paper or talk). The homework will be graded partly on clarity, brevity, and coherence. This is a great opportunity to improve your writing or presenting skills, in order to make your ideas more clear and persuasive--and to succeed.

"I am sorry I have had to write you such a long letter, but I did not have time to write you a short one." --Blaise Pascal

Emergency information link and the new one.

Some more resources for writing and speaking:

Su: Good Math Writing
Halmos: How to Write Mathematics
Gopen and Swan: The Science of Scientific Writing
Williams: Style: The Basics of Clarity and Grace (book, any edition), Longman.

Bruce Reznick's list of resources
Gallian: How to Give a Good Talk
Shaw: Making Good Talks Into Great Ones
Gallo: Public-Speaking Lessons from TED Talks
Lerman: Math job talk advice
Steele: Speaking tips organized in categories that includes this great but little-known tip about graphs on slides

FAQ here: Course info:

FAQ0 Announcement: the first few lectures (Monday, August 26th, and Wednesday, the 28th, and possibly more) will be held in a larger room, AH 245, instead of the previously scheduled room. Everyone interested in the course is welcome to attend, because of the abundance of seats. Once the number of attendees gets small enough, we will switch from AH 245 to the previously scheduled room, AH 341. The first day of class will have some lecturing on the general set-up for machine learning, in addition to an in-class demonstration centered on the question: How can one learn whether coins are fair or not, and how long does it take to learn that? The second day of class will be an introductory-level talk about my research based on my Beckman talk slides titled BIO-LOGIC, about biological computation. The third class meeting will be a traditional lecture introducing some more fundamental concepts about the machine learning framework and empirical risk minimization.

FAQ1: How much time and work does this course require?
A1: That depends on your preparation as well as how much you want to get out of the course (more work usually correlates with more benefit, at least before the point of diminishing returns). An average of 5 hours of work per week outside the classroom is expected, and the range of actual work might vary between, say, 1 hour and 10 or more. If you believe that the course is taking you significantly more time than it should, please let me know with data.

FAQ2: I would like to register for your course but it is full. Will I be able to get in?
A2: It is very common for people to drop the course during the first few weeks. I would advise checking seating availability every day and being ready to pounce once a seat opens up. In the meantime, you are allowed to attend lecture, but you should make sure that every student who is registered gets to sit down in a seat. So you might need to stand or to sit on the floor until you are registered. I may request a larger room for the first lecture or two, and I may have the lectures video-recorded. Please watch the course website for such updates.

FAQ3: Will Math 490 be offered again after Fall 2019?
A3: UPDATED 12/13/2019: Math 595 ML will be offered in Spring 2020, and unfortunately no ML courses will be offered by me in the academic year 2020-2021, because I will not be teaching at all.

FAQ4: I have taken some probability/statistics but not Math 461 or Stat 410. Will I be prepared for Math 490?
A4: If your previous probability class included a lot of advanced concepts, such as laws of large numbers, Markov and Chebyshev inequalities, central limit theorem, etc., then you may be sufficiently prepared for Math 490.

FAQ5: I have not taken any probability/statistics courses. Can I still take Math 490?
A5: One possibility for highly motivated students (who are ready for a lot of extra studying) is take a probability class (M461 or S410) at the same time as M490. But I would recommend taking these courses in the prerequisite order that I have determined, because probability is fundamental and crucial for ML theory.