Practical Natural Language Processing
Northeastern University Khoury College of Computer Sciences
Spring 2023
Instructor: Rasika Bhalerao
r.bhalerao@northeastern.edu
Lectures:
Thursdays 9am - 12:20pm in room 1010
Professor Rasika’s Office Hours:
Tuesdays 10am - 12pm outside room 1010 and on MS Teams
TAs:
- Sai Swetha Pasam (pasam.s@northeastern.edu)
- Office Hours Tuesdays 2-4pm and Wednesdays 10-11am on MS Teams
- Mihir Milind Kapile (kapile.m@northeastern.edu)
- Office Hours Mondays 2-4pm on MS Teams
Schedule:
Date | Topic | Slides | Assignment Due |
---|---|---|---|
Jan 12 | Introduction, Review Python and Machine Learning | Lecture 1 Slides | |
Jan 19 | Bag of Words, Tfidf, Naive Bayes | Lecture 2 Slides | Assignment 1 (setup) |
Jan 26 | Classification continued, feature engineering | Lecture 3 Slides | |
Feb 2 | English Syntactic Structures and Part-of-Speech Tagging | Lecture 4 Slides | Assignment 2 (classifying text), Statement of project interest |
Feb 9 | Constituency and Dependency Parsing | Lecture 5 Slides | Assignment 3 (constituency grammars) |
Feb 16 | Coreference Resolution, Unsupervised Learning | Lecture 6 Slides | Assignment 4 (HMM and POS tagging) |
Feb 23 | Word Representations, Neural Networks, Discuss practice midterm exam | Lecture 7 Slides | Project proposal, Practice midterm (recommended) |
Mar 2 | Deep Learning and Language Models | Lecture 8 Slides | Assignment 5 (Parsing) |
Mar 16 | Midterm Exam | The midterm exam (instructions on Canvas) | |
Mar 23 | Transformers, Pretrained Language Models, Finetuning, Commonly used Python tools | Lecture 9 Slides | Assignment 6 (Unsupervised learning) |
Mar 30 | GPT | Lecture 10 Slides | Paper draft due |
Apr 6 | Reinforcement learning, ChatGPT, ethical data science | Lecture 11 Slides | Assignment 7 (BERT) |
Apr 13 | Selected Topics chosen by students: speech recognition, knowledge graphs, web search engines (+ TextRank), chat / dialogue | Lecture 12 Slides | Assignment 8 (GPT) |
Apr 20 | Final Project Presentations | Final paper due |
This course will provide an introduction to various techniques in natural language processing with a focus on practical use. Topics will include bag-of-words, English syntactic structures, part-of-speech tagging, parsing algorithms, anaphora/coreference resolution, word representations, deep learning, language models, and a brief introduction to current new models.
The course will cover basic implementations in Python as well as APIs and tools for advanced text processing. There will be brief weekly assignments, a midterm exam, and a final project including a programming component, a written report, and a short presentation.
Learning Objectives:
- A solid understanding of the basics of natural language processing (NLP)
- Hands-on implementation of basic algorithms in NLP
- Familiarity with the challenges of NLP
- Broadened ideas of how NLP affects the world
- Exposure to current research
Resources
- Textbook: “Speech and Language Processing” by Daniel Jurafsky and James H. Martin
- Lecture slides (and assignments) will be linked in the schedule above
- Microsoft Teams will be used for virtual TA office hours, one-on-one meetings, and quick messaging
- Canvas will be used to collect grades and send announcements
- Gradescope will be used for assignment submission
Grades
- 20% Midterm exam
- 30% Weekly assignments
- 50% Final project
- 2% Statement of interest
- 5% Proposal
- 10% Paper draft
- 15% Presentation
- 18% Final paper
Final grades will be assigned based on the overall percentage calculated using the weightings listed above (no curving). There is no absolute direct mapping to letter grades, but the minimum overall percentage required to obtain each letter grade will be no higher than the following: A (94%), A- (90%), B+ (87%), B (83%), B- (80%), C- (65%).
Assignment submission and late days
Assignments will be submitted on Canvas/Gradescope.
Each student gets eight free, no-questions-asked late days for the term. We will keep track of your late days - you do not need to email us to use them. Each late day provides a penalty-free, additional 24-hours to submit the assignment.
Only three late days may be used on any given assignment. Late days cannot be divided fractionally. Late days can only be used on individual homework assignments (not group projects, presentations, or exams).
Respect for diversity
Classrooms full of computing students from diverse backgrounds and perspectives are crucial for us to make progress in our field.
It is my intent that diverse students will be successful in this course, that each student’s learning needs are addressed both in and out of class, and that the diversity that each student brings to this class is viewed as a resource, strength, and benefit. I expect you to feel challenged and sometimes outside of your comfort zone in this course, but it is my intent to present materials and activities that are inclusive and respectful of all persons, no matter their gender, sexual orientation, disability, age, socioeconomic status, ethnicity, race, culture, perspective, and other background characteristics. We should all strive for these principles both inside and outside of the classroom.
The course meetings are on Thursdays. If a class meeting conflicts with your religious observances, please let me know in the first two weeks of the class so that we can make other arrangements. Northeastern University respects the religious practices of its students, faculty, and staff and is committed to ensuring that all students are able to observe their religious beliefs without academic penalty.
Class rosters are provided to each instructor with each student’s legal name. I will gladly honor your request to address you by an alternate name and/or pronoun. Please advise me of this early in the semester so that I may make appropriate changes to my records.
Global Learner Support
Northeastern University’s Global Learner Support (GLS) offers “language, cultural, and academic support while promoting the development of intercultural competence and global understanding.” They offer tutoring, workshops, and much more. Visit https://gls.northeastern.edu/ Links to an external site.to learn more.
Academic accommodations
If you have a documented need for an academic accommodation, please contact the professor within the first two weeks so we can have a conversation about how best to make appropriate arrangements.
If you require support during the course due to a disability please ensure that you are already registered with the Disability Resource Center, and contact your course instructors to coordinate any support needed during the course.
Mental health issues are real and can prevent you from doing your best work. Your Khoury advisor is your primary contact for accessing University resources. You can also directly access University Health and Counseling Services. Do not hesitate to make use of them as needed. Please do not wait until it has seriously impacted your work.
Collaboration with humans
Computer science, both academically and professionally, is a collaborative discipline. In any collaboration, however, all parties are expected to make their own contributions and to generously credit the contributions of others. In our class, therefore, collaboration on homework and programming assignments is encouraged, but you as an individual are responsible for understanding all the material in the assignment and doing your own work. Always strive to do your best, give generous credit to others, start early, and seek help early from both your professors and classmates.
The following rules are intended to help you get the most out of your education and to clarify the line between honest and dishonest work. The professor reserves the right to ask you to verbally explain the reasoning behind any answer or code that you turn in and to modify your project grade based on your answers. It is vitally important that you turn in work that is your own. Follow the guidelines for academic honesty.
If you have had a substantive discussion of any homework or programming solution with a classmate, then be sure to cite them in your report. If you are unsure of what constitutes “substantive”, then ask us or err on the side of caution. You will not be penalized for working together. You must not copy answers or code from another student either by hand or electronically. Another way to think about it is that you should be talking English with one another, not Java.
The following rules apply to anything you hand in for a grade.
- You may not copy anyone else’s code under any circumstances. This includes online sources.
- You may not permit any other student to see any part of your program.
- You may not permit yourself to see any part of another student’s program.
- You may not publicly post your homework project code in a chat or discussion where another student can see or copy it.
- You may consult online resources as part of your course work, but you may not copy code from online sources. If you get an idea of how to solve a problem from an online source, include a short citation at the top of your file.
The university’s academic integrity policy discusses actions regarded as violations and consequences for students: Office of Student Conduct and Conflict Resolution - Academic Integrity Policy
Collaboration with language models
Pair programming with a language model has arrived, and it is available to all for free. Collaborating with a language model is going to become as common as Google searching bugs or discussing algorithms with a classmate. I want to prepare you for the future of this field while also ensuring everyone has a solid grasp of the fundamentals.
This semester, we will allow limited collaboration with a large language model such as ChatGPT (https://chat.openai.com/chat) or Github Copilot (https://github.com/features/copilot). For each homework assignment, I will list the things that you can use a model for. You may not use the model for any tasks other than what I list in the assignments. For example, I might say that you can ask the model to write code to get the concatenation of a list of strings. This is intended to save time spent getting the syntax for simple or mundane tasks. It is in your best interest (to learn what you need for interviews) to use language models only for the small tasks that I list, and not to have a model solve the main part of the assignment for you. We also generally don’t want AI to be writing all our code, but we want to use it as a tool.
At the top of each assignment, please credit:
- Any humans with whom you discussed the assignment
- Any language models with which you collaborated, and a brief explanation of how (example: “I used Github Copilot to create the list on line 9”)
- Any online sources that you used (like documentation or Stackoverflow links)
Title IX
Title IX of the Education Amendments of 1972 protects individuals from sex or gender-based discrimination, including discrimination based on gender-identity, in educational programs and activities that receive federal financial assistance.
Northeastern’s Title IX Policy prohibits Prohibited Offenses, which are defined as sexual harassment, sexual assault, relationship or domestic violence, and stalking. The Title IX Policy applies to the entire community, including students of all genders, faculty, and staff.
If you or someone you know has been a survivor of a Prohibited Offense, confidential support and guidance can be found through University Health and Counseling Services staff and the Center for Spirituality, Dialogue, and Service clergy members. By law, those employees are not required to report allegations of sex or gender-based discrimination to the University.
Alleged violations can be reported non-confidentially to the Title IX Coordinator within The Office for University Equity and Compliance at: titleix@northeastern.edu and/or through NUPD (Emergency 617.373.3333; Non-Emergency 617.373.2121). Reporting Prohibited Offenses to NUPD does NOT commit the victim/affected party to future legal action.
Faculty members are considered “responsible employees” at Northeastern University, meaning they are required to report all allegations of sex or gender-based discrimination to the Title IX Coordinator.
In case of an emergency, please call campus police.
Please visit the Office for University Equity and Compliance for a complete list of reporting options and resources both on- and off-campus.