Practical Natural Language Processing
Instructor: Rasika Bhalerao
rasikabh@nyu.edu
Computer Science, New York University Tandon School of Engineering
Fall 2020
Syllabus
Lectures: Mondays 2:00 - 4:30 in JABS 474 and online
Office hours: Thursdays 1 - 2pm and by appointment
TA: Zhihao Zhang, zz2432@nyu.edu, Office hours Fridays 2-4pm
Zoom links and assignment submission are on NYU Classes. To join a lecture on Zoom, find it in the “Zoom” tab in the NYU Classes site. The password is nlp2020.
This course will provide an introduction to various techniques in natural language processing with a focus on practical use. Topics will include bag-of-words, English syntactic structures, part-of-speech tagging, parsing algorithms, anaphora/coreference resolution, word representations, deep learning, and a brief introduction to current research.
The course will cover basic implementations in Python as well as APIs and tools for advanced text processing. There will be brief weekly assignments, a midterm exam, and a final project including a programming component, a written report, and a short presentation.
Date | Topic | Slides | Assignment Due |
---|---|---|---|
Sep 9 | Introduction, Review of Statistics and Machine Learning (Wednesday following Monday schedule) | Lecture 1 slides | |
Sep 14 | Bag of Words, Tfidf, Naive Bayes | Lecture 2 slides | Assignment 1 |
Sep 21 | English Syntactic Structures and Part-of-Speech Tagging | Lecture 3 slides | Assignment 2, Statement of project interest |
Sep 28 | POS Tagging and Parsing Algorithms | Lecture 4 slides | Assignment 3 |
Oct 5 | Constituency and Dependency Parsing | Lecture 5 slides | Assignment 4 |
Oct 12 | Coreference Resolution | Lecture 6 slides | Assignment 5 |
Oct 19 | Unsupervised Learning, Discuss practice midterm exam | Lecture 7 slides | Project proposal, Practice midterm (recommended) |
Oct 26 | Midterm Exam | The midterm exam (instructions on NYU Classes) | |
Nov 2 | Deep Learning and Word Representations | Lecture 8 slides | Assignment 6 |
Nov 9 | Deep Learning and Language Models | Lecture 9 slides | |
Nov 16 | Commonly Used Python tools and APIs | Lecture 10 slides | |
Nov 23 | Selected Topics voted on by students: knowledge graphs, web search engines (+ TextRank), transfer learning, chat / dialogue | Lecture 11 slides | Assignment 7 |
Nov 30 | Recent Advances and Current Research in NLP | Lecture 12 slides | Paper draft due |
Dec 7 | Final Project Presentations | ||
Dec 14 | No class | Final paper due |
Learning Objectives:
- A solid understanding of the basics of natural language processing (NLP)
- Hands-on implementation of basic algorithms in NLP
- Familiarity with the challenges of NLP
- Broadened ideas of how NLP affects the world
- Exposure to current research
Prerequisites:
- Experience with/willingness to quickly learn Python
- A course in algorithms and data structures
- Interest in practical natural language processing
Resources
- “Speech and Language Processing” by Daniel Jurafsky and James H. Martin
- Blog posts and websites posted in class
- Lecture slides on course website
Grades
- 20% Midterm exam
- 30% Weekly assignments
- 50% Final project
- 2% Statement of interest
- 5% Proposal
- 10% Paper draft
- 15% Presentation
- 18% Final paper
Policies
I will conduct lectures over Zoom from the classroom. Students have the option to come in person to lectures, but I will not take attendance. All other components of the course are online. Zoom links can be found on the NYU Classes site.
The weekly assignments will include written portions and Python programming portions.
Homework assignments are due on the NYU Classes site by the indicated due date. Late assignments will not be accepted. If you have extenuating circumstances, please speak to me before the assignment is due. Assignments without submissions will receive a grade of zero. Weekly assignments should take on average one hour per week.
It is allowed (and recommended) to use textbooks, online references, your class notes, and any other references for the assignments. You can also discuss the assignments with other students and the TA. However, the final submission must be your own work. Please cite any sources and collaborators in your submission.
Final Project
The final project should address a problem by applying existing NLP techniques. Examples include predicting a culture trend using sentiment analysis on Twitter, or figuring out the meanings of emojis in Venmo payments.
The final project will be done in groups of 3 or 4 students. In the case of extreme imbalance in work distribution, grades for each student in the group will be adjusted based on participation. The final project grade will come from five components: statement of interest, proposal, paper draft, presentation, and final paper.
Each student will submit a statement of interest, which is up to a paragraph describing a topic on which you would be interested in doing a project. It is okay at this stage to be unsure or have half-formed ideas. I will then assign students into groups based on topics. Students who don’t submit this statement will be randomly assigned.
Each group will submit a project proposal. The goal for the project proposal is to get feedback on feasibility and check for any anticipated missing pieces. It should be a few paragraphs (up to a page) describing:
- Your idea
- The problem you are solving
- Where you will get data
- What NLP you will do
- How you will evaluate your methods
- Anything else needed to understand your project
Each group will submit a paper draft, which is essentially the first half of the final paper. There is no length guideline, and you will not be penalized for lack of results at this stage. It should:
- introduce and motivate your problem and solution
- explain how others have solved similar problems (if applicable)
- describe your dataset
- explain any results
Each group will give a 9-minute presentation on Zoom on December 7. It is recommended to have one person share a screen with slides. The presentation should have:
- introduction
- motivation for the problem
- description of the dataset
- explanation of the NLP and other methods used
- preliminary results
- conclusion
Each group will submit a final paper. The paper should include:
- Introduction / motivation / description of the problem
- Brief description of how others have solved similar problems
- Description of the dataset and where you got it
- Explanation of NLP and other methods used
- Results / discussion of results (Did you solve the problem?)
- Conclusion
- Works cited
Regret Clause
We will follow NYU’s policy for acadmic integrity for students, which states the repercussions of academic dishonesty. However, exceptions will be made for students who demonstrate regret within 72 hours.
“If you commit some act that is not reasonable but bring it to the attention of the course’s heads within 72 hours, the course may impose local sanctions that may include an unsatisfactory or failing grade for work submitted, but the course will not refer the matter for further disciplinary action except in cases of repeated acts.” [1]
[1] David J. Malan, Brian Yu, and Doug Lloyd. 2020. Teaching Academic Honesty in CS50. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ’20). Association for Computing Machinery, New York, NY, USA, 282–288. DOI:https://doi.org/10.1145/3328778.3366940
Moses Center Statement of Disability
If you are a student with a disability who is requesting accommodations, please contact New York University’s Moses Center for Students with Disabilities at 212-998-4980 or mosescsd@nyu.edu. You must be registered with CSD to receive accommodations. Information about the Moses Center can be found at https://www.nyu.edu/students/communities-and-groups/students-with-disabilities.html. The Moses Center is located at 726 Broadway on the 2nd floor.