Syllabus

CSCI 0451: Machine Learning

This is an advanced elective on the topic of algorithms that learn patterns from data. Artificial intelligence, predictive analytics, computational science, pattern recognition, signal processing, and data science are all disciplines that draw heavily on techniques from machine learning.

Learning Objectives

A learning objective is a primary goal for your learning by the end of the course. You’re successful in CSCI 0451 when you achieve excellence against these learning objectives. We have six learning objectives in CSCI 0451.

Theory You will describe the broad mathematical structure of modern machine learning algorithms, and the details of several simple examples.
Implementation You will implement classification, regression, and clustering algorithms in efficient, usable Python programs.
Navigate You will navigate the package ecosystem for machine learning in Python.
Experiment You will experiment with machine learning models, audit their performance, and communicate about your findings.
Social Responsibility You will interrogate sources of bias and harm in machine learning models, especially with regard to gender, race, and class.
Project You will complete a long-term project that involves significant implementation or experimentation with machine learning tools.

Readings

It is not necessary for you to purchase any books for this course. We will have regular readings from a few different texts, with the most frequent one being:

The following resources cover similar topics to our core readings at a similar level. If something doesn’t click for you in our core reading, you may want to check the treatment in one of these resources.

I will also sometimes draw readings from the following sources:

The following books are relatively advanced. I’ll often suggest additional readings from these books for students who are interested in diving deeper, especially into theoretical considerations.

In case you’re wondering, I primarily learned out of Bishop and Hastie et al.

Social Annotation

A an extremely useful way for you to engage with the readings is to make comments, ask questions, and answer questions about them as you are reading. For this reason, I’ll be providing most links to readings through Hypothes.is. You’ll be able to make marginal comments and view the marginal comments of others. I’ll also regularly be checking on your annotations to see what questions might have come up with the readings.

Logistics and Key Policies

Lecture Mondays and Wednesdays
75 Shannon Street, Room 203
Section A: 12:45pm-2:00pm
Section B: 2:15pm-3:30pm
Instructor Dr. Phil Chodrow
75 Shannon Street, Room 218
pchodrow@middlebury.edu, though please see email policy below.
Student Hours
  • Mondays, 3:30pm-4:30pm
  • Thursdays, 2:00pm-4:00pm
  • Fridays, 2:00pm-3:00pm
Important Policies I encourage you to call me Phil or Prof. Phil. “Professor Chodrow” is fine if that’s what’s most comfortable for you.

You need a laptop and an internet connection for this course, but you don’t need to buy any books or other supplies.

Generally speaking, you should only email me if you need to talk about something personal or sensitive. We’ll use Slack for all standard course communications.

Student Hours are your time to come chat with me about course content. I want to see you in Student Hours.

I expect you to prepare to ask for help when approaching me and your fellow students for help on course content.

Masks are required during class time and Student Hours.

Rough Schedule Of Topics

See the complete schedule for more details! I am still working on populating this schedule, and will try to have it set up at least two weeks in advance at all times.

  • Week 1: Welcome! Classification, auditing. Data science techniques in Python. Introduction to algorithmic bias.
  • Week 2: Models, model families, and algorithms. More on classification. Logistic regression by hand. Optimization.
  • Week 3: More on optimization. Overfitting, feature selection, model choice. Decision regions.
  • Week 4: Regression. Linear least-squares regression by hand, several ways. Feature engineering. Bias in a medical recommender system.
  • Week 5: Flex, review of fundamentals.
  • Week 6: Spring Break!
  • Week 7: Formal characterizations of bias and fairness in machine learning; limitations.
  • Week 8: Unsupervised problems: clustering and dimensionality reduction.
  • Week 9: Introduction to deep learning. Backpropagation. Classification of tabular data.
  • Week 10: Deep architectures. Image classification. Text classification and word embeddings.
  • Week 11: Deep generative models. Interpretability and explainability.
  • Week 12: Flex, topics by request.
  • Week 13: Project presentations.

What Will Class Time Look Like?

My plan is for most class periods to look like a “lecture sandwich:”

  • 10 to 15 minutes of a warmup activity that addresses the recent lectures and readings.
  • 40-50 minutes of lecture, punctuated by short activities and breaks.
  • 10-15 minutes of a closing activity that helps us get solid with the day’s content.

The Warmup Activity

On most days, we’ll have a warmup activity. The warmup activity will usually ask you to engage with the readings and complete a small amount of work ahead of class time. This could be a short piece of writing, a math problem, or an implementation of a Python function.

Each day, a few students will be randomly selected to present their work to a small group of peers. It’s ok to ask for help or even pass if you’re not feeling confident in your solution, but you should plan to at least make a good attempt at the warmup before every class period. Your participation on the warmup activity is an important aspect of presence in the course, and I’ll ask you to reflect on it when proposing your course grade.

Demonstrating Your Learning

Collaborative Grading

This course is collaboratively graded. In a nutshell, this means:

You may have also heard the term ungrading to refer to a similar approach.
  • There are no points or scores attached to any assignment. When you turn in assignments, you’ll get feedback on how to revise/resubmit, improve or otherwise proceed in the course, but you won’t get “graded.”
  • There also aren’t any firm due dates, although I will give you suggestions on how to maintain a good pace.
  • Periodically throughout the semester, you will complete reflection activities to help you take stock of your learning and achievement in the course. In your final activity at the end of the semester, you’ll make a proposal for your letter grade in the course, and support it with evidence of your learning. You and I will then meet to discuss how the course went for you, using your reflection activity and proposal as a starting point. In this conversation, you and I will agree on your final letter grade for the course, which I will then submit to the registrar.
All work you wish to be considered toward your achievement in the course needs to be submitted by the end of Finals Week.

Reflection activities:

Why Collaborative Grading?

Because grading is broken! Traditional points-based grading is ineffective at both (a) accurately measuring student learning and (b) motivating students to learn. I broadly agree with Jesse Stommel when he writes:

Agency, dialogue, self-actualization, and social justice are not possible in a hierarchical system that pits teachers against students and encourages competition by ranking students against one another. Grades (and institutional rankings) are currency for a capitalist system that reduces teaching and learning to a mere transaction. Grading is a massive co-ordinated effort to take humans out of the educational process.

I’d prefer to just not give you grades at all. But, Middlebury says I have to, and so my aim is to instead put the process of grading under your control to the greatest extent that I reasonably can.

Assignments

There are three kinds of assessed assignments in this course, plus a mysterious “Other” category.

Blog Posts Blog posts are the primary way in which you will demonstrate your understanding of course content. Blog posts usually involve: written explanation of some relevant theory; implementation one or more algorithms according to written specifications; performing experiments to test the performance of the implementations; and communicating findings in a professional way. Some blog posts will be more like short essays than problem sets or programming assignments. Your blog posts will be hosted on your own public website (which you will create). This website will serve as your portfolio for the course.
Project Your project is a large-scale undertaking that you will design and complete, usually in a group of 2 or 3, over the course of the semester. Your project should usually involve some combination of data collection, implementation, research of related work, experimentation, deployment, or theory work (but not necessarily all components). Projects are expected to demonstrate deep engagement with both the course content and the problem selected.
Process Reflections At the beginning of the course, you’ll write a process reflection describing your aspirations for the course—what you want to learn and achieve, and how you’d like to be assessed against your goals. We’ll have a second process reflection mid-way through the course that will allow you to reflect on your progress toward your objectives and consider changing direction if needed. At the end of the course, you’ll write a summary reflection on your learning, accomplishment, and engagement with the class. This is also the place where you’ll propose your final letter grade.

I’ll usually give you written feedback on your process reflections. We’ll also meet at the end of the course to discuss your final reflection and agree on your letter grade for the course.
Other…? You may have some topic or idea that especially interests you and which you want to explore. If you’d like to work on this topic and use it to demonstrate your learning in the course, you can propose it to me. I may have suggestions or requested modifications before I agree to count the work in your course portfolio.

Best-By Dates

While we don’t have formal due dates, there is a benefit to keeping yourself on a schedule. It’s best to complete assignments close to the time when we covered the corresponding content in class, and it’s important for your wellbeing not to let work pile up. I’ll provide “best-by” dates for all assignments. These are my recommendations for when you should submit the first versions of these assignments to me for feedback.

An image of four of the Pokemon Squirtle welcoming a fifth Squirtle. The original four Squirtles are labeled “unfinished assignment #1” through #4. The fifth Squirtle is labeled “new assignment.” Image credit: Dr. Spencer Bagley

Feedback

I won’t “grade” your individual assignments, but our course team and I will offer you feedback about what I thought was successful and where you can improve. My general expectation is that you will often (though not always) revise your work in response to feedback and resubmit it. Revising in response to feedback is one of the single most effective ways for you to deepen your learning.

I’ll usually describe the importance of revisions on your assignment using one of the following categories:

  • No revisions suggested: you’ve done great work and should focus on the next thing.
  • Revisions useful: you have opportunities for improvement on this assignment, but focusing on the next topic or assignment may be a better use of your time—use your judgment.
  • Revisions encouraged: the best use of your time is to respond to feedback and resubmit, rather than moving on to the next assignment.
  • Incomplete: the assignment isn’t sufficiently complete for it to be used as evidence of your learning.

What Work Do You Need To Do?

At the beginning of the semester, you’ll write a process letter that will outline what you’d like to learn and achieve in the course. It’s ok if you don’t meet all your aspirations by the end of the course. To help guide you in your goal-setting and work-planning, I do have some general expectations.

I am likely to consider your time in my course to be highly successful if you do at least one of the following things:

Time spent being stuck doesn’t count as “productive hours” – get help if you need it!
  • You complete almost all assignments with a high degree of quality, including revising in response to my feedback.
  • You spend on average 10 productive hours of work time on this course outside of class.
  • You complete many assignments that I give you, and also propose and complete alternative work that demonstrates your learning and achievement.

Am I Ready for CSCI 0451?

As you enter the course, I’m assuming that you are ready to reflect thoughtfully on guiding your own learning, that you have some achievement in developing programs, and that you have a strong math foundation.

Directing Your Learning

This course asks you to set your own goals and motivate yourself to achieve them. Neither of these tasks are easy. It’s ok to mess up every now and then – we all do! The real question is whether you’re going to look at mistakes and make time to reflect on what to do next time.

Programming

  • You can write moderately-complex, object-oriented software.
  • You are comfortable reading software documentation and researching how to perform a task that you haven’t seen before.
  • You know what a terminal is and how to perform simple operations at the command line.
  • You have experience debugging your code and you are ready to do it a lot more.

Math

I am assuming that you remember most of MATH 0200 and CSCI 0200. It’s ok if you haven’t memorized every single fact. What I need is for you to be ready to rapidly look up what you need so that you won’t be slowed down by math along the way.

  • Matrix multiplication and inner products
  • Everything about \(\mathbf{A}\mathbf{x} = \mathbf{b}\).
  • Visualizing linear spaces.
  • Eigenvalues, eigenvectors, positive-definite matrices.
  • Derivatives, critical points of functions.
  • Sample spaces, probability distribution functions.
  • Random variables, mean and variance.
  • Conditional probability and expectations.

Reviews/Diagnostics

  • This resource from Stanford’s CS246 contains most of the linear algebra that you’ll need for the course. The only big topic that’s missing is treatment of the existence of solutions of the linear system \(\mathbf{A}\mathbf{x} = \mathbf{b}\) in terms of the rank of \(\mathbf{A}\). You don’t need to have memorized everything here, but most of it should look familiar.
  • Probability is not a formal requirement for CSCI 0451, but some probability can certainly be useful. To brush up on some basics, I suggest Chapter 2 of Introduction to Probability for Data Science by Stanley Chan. This treatment may be a little more advanced than what you learned in CSCI 0200, but you should recognize many of the main ideas.

Course Policies

Laptops

Please bring a laptop, and make sure that it has at least 75 minutes of charge.

If you ever find yourself temporarily in need of a laptop, the Computer Science department has 10 rotating Dell laptops available to our students. These come pre-installed with software for most of the courses in the major. They are available to be loaned out short-term or long-term based on your need (as determined by you). To request a laptop for short-term use (like a single class period), email me ahead of time.

On Long-Term Use: College policy has changed recently to include the expectation that every student have a laptop available. The college provides laptops to those who need them where “need” is based on Student Financial Services calculations. If you anticipate needing a laptop for the whole term, we encourage you to inquire with Student Financial Services and the library first due to our smaller pool of equipment. However, our department commits to meeting the needs of every student, so do not be afraid to reach out if you believe you need one of our laptops for any length of time.

COVID-19 Considerations

Masks Are Required in CSCI 0451

The Computer Science Department policy states that:

We in the Computer Science department value a safe learning and working environment for all. While we can’t eliminate the risks associated with COVID-19, evidence suggests that widespread masking can significantly reduce the transmission and severity of disease. In order to protect the health of our community, the CS department recommends that students and faculty wear masks in CS learning spaces, including classrooms, office hours, and public areas. We acknowledge the College policy gives instructors the final say over classroom masking requirements, and expect all students to respect instructors’ stated policies in each course.

In alignment with this policy, I require you to wear masks in class and office hours. I encourage you to wear masks during help sessions and at all other times when you are inside 75 Shannon Street.

If you arrive in class without a mask, I will offer you one. I will expect you to either wear it or excuse yourself from class that day.

Academic Integrity and Collaboration

Academic Integrity

Briefly, academic integrity means that you assume responsibility for ensuring that the work you submit demonstrates your learning and understanding.

To be frank, it’s pretty easy to act without integrity (i.e. cheat) in this course. First, there’s a lot of solution code for machine learning tasks in Python online. Second, I’m literally asking you all to post your assignments publicly online. So, there are lots of opportunities to turn in assignments without actually doing the learning that those assignments are designed to offer you.

I assume that both of us want you to learn some cool stuff. Cheating stops you from doing that, and ultimately wastes both your time and mine. I won’t be vigorously hunting for academic integrity violations, but I may ask you to discuss code or theory with me in class or in our meetings. If I notice you struggling to explain code that you submitted for feedback, I may have questions.

Trust me. Neither of us want this.

Collaboration

I love it! Please collaborate in ways that allow you and your collaboration partners to fully learn from and engage with the content. Sharing small snippets of code or math is often helpful to get someone unstuck, but sharing complete function implementations or mathematical arguments is usually counterproductive.

Here are some general guidelines for how I think about collaboration.

Course Environment

You deserve to be welcomed and celebrated by our community. We embrace diversity of age, background, beliefs, ethnicity, gender, gender identity, gender expression, national origin, religious affiliation, sexual orientation, and other visible and non-visible categories. Discrimination is not tolerated in my classroom.

You deserve a learning environment free from gender-based discrimination, sexual harassment, sexual assault, domestic violence, dating violence, and stalking. If you experience these behaviors or otherwise know of a Title IX violation, you have many options for support and/or reporting. Middlebury’s Civil Rights and Title IX Office (CRTIX) can help you navigate your options. Please be aware that I am a Responsible Employee, which means that I am required by the College to report incidents of sexual harassment or sexual violence to CRTIX. There are resources for emotional and mental health care, advocacy, and academic support listed here, some of which are confidential.

You deserve to fully and equitably participate in our learning environment. I am actively putting effort into ensuring that course materials are screen-reader accessible, and welcome feedback on where I can do better. Middlebury’s Disability Resource Center can help you remove barriers to learning in this and other courses.

You deserve to be addressed in the manner that reflects who you are. I welcome to tell me your pronouns and/or preferred name at any time, either in person or via email. Conversely, please address your classmates according to their expressed preferences.

Beyond This Course

General Advice

I am always happy to talk with you about your future plans, including internships, research opportunities, and graduate school applications. Because I am a creature of the academy, I am less knowledgeable about industry jobs, although you are welcome to ask about those too. You can drop in during Student Hours or email me to make an appointment.

Letters of Recommendation

Writing letters of recommendation for students is a fundamental part of my job and something that I am usually very happy to do. Here’s how to ask me for a letter.



© Phil Chodrow, 2023