Frontpage‎ > ‎Courses‎ > ‎

CSCI-UA.0473-​001 Introduction to Machine Learning


This course introduces undergraduate computer science students to the field of machine learning. Assuming no prior knowledge in machine learning, the course focuses on two major paradigms in machine learning which are supervised and unsupervised learning. In supervised learning, we learn various methods for classification and regression. Dimensionality reduction and clustering are discussed in the case of unsupervised learning. If time permits, we will learn how to extend those methods to be deep.

Target Audience

This course is aimed at 3rd- or 4th-year undergraduate students in computer science.

For non-CS students

Please contact either Romeo Kumar <kumar _at_> or Leeann Longi <longi _at_>, the student adivors at the CS Department, directly.

General Information

  • Lecture: 9.30-10.45am on Mondays and Wednesdays at 60 5th Ave (C12)

    • These are chalkboard lectures, and there will not be any slide.

  • Office Hours

    • Instructor: 11.00-12.00 on Mondays at 60 5th Ave (616)

    • TA: 2-3pm on Fridays at 60 5th Ave (660)

  • Grading: Homeworks (40%) + Final Exam (40%) + Review Paper (20%)

  • Final Exam: 8.00am-9.50am on May 10 (Wed)

  • Lecture Note

    • The lecture note will be updated before each lecture at



  • MATH-UA 121 Calculus I

  • CSCI-UA 310 Basic Algorithms

  • MATH-UA 140 Linear Algebra (may be taken together)


  • MATH-UA 235 Probability and Statistics

  • MATH-UA 234 Mathematical Statistics

  • DS-GA 1001 Introduction to Data Science

  • DS-GA 1002 Statistical and Mathematical Methods


Note that the schedule below is only a guideline. The content of each lecture will be decided as the course progresses.





Müller & Guido



Course Introduction

Classification I: Problem setup, Logistic regression

4.3.2, 4.3.4


Ch 2 (56-62)


Classification II: Overfitting, Validation, Regularization


1.4.7-1.4.8, 6.5 (6.5.1-6.5.3)

Ch 2 (26-29), Ch 5 (252-275)

Classification III: Stochastic gradient descent algorithm




Classification IV: Support vector machine and loss functions

7.1 (7.1.1-7.1.2), 6.1-6.2


Ch 2 (92-104)

Classification V: Nonlinear classification and kernel method


Classification VI: Other classifiers

4.1.4, 4.1.6,

3.5, 1.4.2

Ch 2 (68-70, 70-83)

Classification VII: Ensemble methods

14.2, 14.3

16.2.5, 16.4.3

Ch 2 (83-92)


Regression I: Linear regression, regularization

3.1 (3.1.1-3.1.5)

7.2-7.3, 7.5 (7.5.1, 7.5.4)

Ch 2 (45-55)

Regression II: Regularization and prior distribution


5.1-5.3, 6.5.1, 7.5.1


Regression III: Gaussian process regression


15.1, 15.2

Dimensionality Reduction I: Problem setup


Dimensionality Reduction II: Principal component analysis

12.1 (12.1.1-12.1.2)

12.2 (12.2.1, 12.2.3), 12.3.2

Ch 3 (140-155)

Dimensionality Reduction III: Probabilistic principal component analysis, EM algorithm

12.2 (12.2.1-12.2.2), 9.3

12.2.4-12.2.5, 11.4 (11.4.1)


Dimensionality Reduction III: Gaussian process latent variable model


Dimensionality Reduction IV: Matrix factorization, collaborative filtering

12.2.3, Ilin and Raiko (2010)

Ch 3 (156-163)


Clustering I: Problem setup and evaluation

25 (25.1)

Ch 3 (191-207)

Clustering II: k-mean clustering


Ch 3 (176-181)


Clustering III: Mixture of Gaussians

9.2, 9.3.2

11.2.1, 11.4.2 (

Clustering IV: Other clustering methods

25.4, 25.5

Ch 3 (182-187)


Time Series I: MoG to HMM


Optional topics

Time Series II: PCA to Kalman Filter

18.1, 18.3


There will be bi-weekly homeworks, starting from the second week of the semester. Each homework will be announced at the beginning of the lecture on Wednesday every other week. The answer must be submitted by email to the grader within two weeks after the announcement, and there will be no extension. All the answers must be typesetted using either LaTeX or Microsoft Word and submitted as a pdf file. Handwritten answers will not be accepted. Each homework may include one or more programming assignments.

Review Paper

As a part of the course, a student is expected to read at least five research papers on one of the following topics and summarize them into a single review paper.

  • From perceptron, neocognitron to modern convolutional networks.

  • Matrix Factorization for Collaborative Filtering: from SVD, non-negative matrix factorization, probabilistic PCA to Bayesian matrix factorization

  • Gaussian Process Latent Variable Models: from PCA to Gaussian process latent variable models and deep Gaussian process

  • Unsupervised representation learning: independent component analysis, sparse coding, restricted Boltzmann machines and denoising autoencoders

  • From k-means algorithm, Gaussian mixture models to the infinite Gaussian mixture model

Each student must compile a list of at least ten papers to read by March 1 and send by email the list to the instructor for feedback. Based on the feedback, the student should choose at least four papers from the list and write a review paper. The review paper must put different models under a single general framework and describe each model as its special case. In doing so, the similarities among those models will naturally emerge, and differences must be separately discussed in detail. The final review paper must be sent by email to the instructor by May 5.


A student in this course is expected to act professionally. Please also follow the GSAS regulations on academic integrity found here