This year I am trying Piazza. Please go to the following page to check the up-to-date syllabus: https://piazza.com/nyu/fall2016/dsga3001/home You will be automatically enrolled, if you have officially enrolled to the course. ## OverviewHow should natural languages be understood and analyzed? In this course, we will examine some of the modern computational approaches, mainly using deep learning, to understanding, processing and using natural languages. Unlike conventional approaches to language understanding, we will focus on how to represent and manipulate linguistic symbols in a continuous space. ## Target AudienceThe course is mainly intended for master- and doctorate-level students in computer science and data science. The number of seats is limited, and the priority is given to the students enrolled in the master’s programme at the Center for Data Science and those in the Ph.D. programme of the Department of Computer Science, Courant Institute of Mathematical Sciences. ## General Information
Lecture: 7.10pm - 9.00pm on Tuesdays at SILVER CENTER, ROOM 405 Laboratory: 8.00pm - 9.00pm on Wednesdays at TBA Instructor: Kyunghyun Cho Teaching Assistants Tian Wang Meihao Chen
Office Hours Instructor: 6.00pm - 7.00pm (location: Office 1001 on 715 Broadway) TA: TBA (location: TBA)
Grading: Prerequisite Exam (10%) + Lab Assignments (25%) + Final Project (25%) + Final Exam (40%) Course Site: https://piazza.com/nyu/fall2016/dsga3001/home Distribution of lecture notes and slides Lab assignments Final project
Book: There will be no single textbook. A reading list for each lecture will be provided separately, and a student is expected to read them before the lecture. However, the following books are highly recommended during the course: Goodfellow, Courville and Bengio. Deep Learning. 2016. http://www.iro.umontreal.ca/~bengioy/dlbook/ Manning and Schütze. Foundations of Statistical Natural Language Processing. 1999. http://nlp.stanford.edu/fsnlp/ Jurafsky and Martin. Speech and language processing, 2nd edition. 2009. http://dl.acm.org/citation.cfm?id=1214993 Cho. Foundations and Advances in Deep Learning. Ph.D. Thesis. 2014. https://aaltodoc.aalto.fi/handle/123456789/12729
Lecture Note https://github.com/nyu-dl/NLP_DL_Lecture_Note/blob/master/lecture_note.pdf It is updated frequently as the course progresses.
## PrerequisitesA student is expected to be familiar with the following topics: Undergraduate level Probability and Statistics Undergraduate level Linear Algebra Undergraduate level Calculus Machine Learning: DS-GA-1003 or CSCI-UA.0480-007
A student is encouraged to try the following languages/frameworks in advance: Python: Numpy, TensorFlow or Theano Lua: Torch
A student is expected to have taken the following courses before taking this course: DS-GA-1002: Statistical and Mathematical Methods DS-GA-1003: Machine Learning and Computational Statistics
This course is complementary to LING-GA 3340: Seminar in Semantics: Artificial Neural Networks CSCI-GA.2590: Natural Language Processing CSCI-GA.3033-001: Statistical Natural Language Processing CSCI-UA.0480-006: Special Topics: Natural Language Processing CSCI-GA.2585-001: Speech Recognition DS-GA-1008: Deep Learning
## Schedule (Draft)
## Lab AssignmentsFirst of all, it is mandatory to attend the first ten lab sessions. Missing any of these sessions will result in a lower grade/score. There will be four lab assignments during these ten lab sessions: Convolutional neural network for document classification TA in charge: Tian Wang Deadline: September 28 Bag-of-n-grams and fast document classification TA in charge: Meihao Chen Deadline: October 12 Feedforward language modelling TA in charge: Tian Wang Deadline: October 26 Character-level recurrent language modelling TA in charge: Meihao Chen Deadline: November 16
For each lab assignment, a student is expected to hand in a short report outlining the model, its implementation and experimental results (up to 3 pages long). Note that the office hours of the lecturer are not meant for assisting students on these assignments. ## Final ProjectIn this course, a student is expected to conduct a research project related to the topics presented during the lectures. The topic of each research project is to be agreed upon with the lecturer and teaching assistants based on the topic proposal submitted by a student. The deadline for the topic proposal is October 16, and the proposal should consist of up to 4 pages of the description of the topic, method and experimental procedures. Once the proposal has been submitted, the student will receive a confirmation and feedback by email from the lecturer and/or teaching assistants in two weeks. The proposal must be submitted by email to TA Meihao Chen. The final report is due on 19 December. The final report should include the description of the task, models, experiments and conclusion and be up to 6 pages long excluding unlimited pages reserved for references (a more specific instruction on the format will be announced later.) The final report must be submitted by email to TA Tian Wang. The deadlines for both proposal and final report will not be extended. Students are encouraged but are not required to form a team of up to two members. Each team must be formed by September 21, and the list of members must be submitted to the TA’s according to the instruction given during the first three lab sessions. Any submission, including the topic proposal as well as the final report, must state clearly the contribution of each member. The failure to include it will result in a lower grade. ## TopicsStudents are encouraged to choose a topic from the following candidate topics. If there is a clear and compelling reason, students may choose to work with another topic upon the approval of the lecturer. Students are encouraged to find recent literatures on one of these topics and prepare to discuss it with the lecturer and/or teaching assistants, in order to narrow down a specific topic. Students are encouraged and expected to use the lab sessions and office hours to ask the questions on practical issues implementing these models and running experiments. ## Generic TopicsA student or a team of two students may choose any of the following topics for their final project. Machine translation [See Ch. 6 in the lecture note] Goal: Comparison of different paradigms of machine translation Models: phrase-based translation system (Moses), neural machine translation system (dl4mt or nematus) Data: More than two language pairs from TED or One language pair from WMT’16 Machine Comprehension [Hermann et al., 2015] Goal: Implementing a question answering system with neural networks Models: implement two different approaches Data: CNN Dataset from Google DeepMind (http://cs.nyu.edu/~kcho/DMQA/) or TTIC Who did What [Onishi et al., 2016] Visual Question-Answering [Antol et al., 2015; Zhou et al., 2016] Goal: Implementing a visual question-answering system with neural networks Models: implement two different approaches Data: VQA from Microsoft (http://www.visualqa.org/)
## Special TopicsEach of the following topics may only be taken by one student or one team of two students. Any student who wants to work on one of the following topics needs to come talk to me as soon as possible. Learning the Natural Language of Black Holes Mentors: Dr. Daniela Huppenkothen (NYU) and Dr. Victoria Grinberg (MIT) Description: Click here Multi-turn Dialogue based Q&A Data Collection Framework Mensors: Prof. Kyunghyun Cho Description: Click here
## RemarksA student in this course is expected to act professionally. Please also follow the GSAS regulations on academic integrity found here http://gsas.nyu.edu/page/academic.integrity |