Practical Natural Language Processing

Northeastern University Khoury College of Computer Sciences

Spring 2023

Instructor: Rasika Bhalerao

r.bhalerao@northeastern.edu

Lectures:

Thursdays 9am - 12:20pm in room 1010

Professor Rasika’s Office Hours:

Tuesdays 10am - 12pm outside room 1010 and on MS Teams

TAs:

Schedule:


DateTopicSlidesAssignment Due
Jan 12Introduction, Review Python and Machine LearningLecture 1 Slides 
Jan 19Bag of Words, Tfidf, Naive BayesLecture 2 SlidesAssignment 1 (setup)
Jan 26Classification continued, feature engineeringLecture 3 Slides 
Feb 2English Syntactic Structures and Part-of-Speech TaggingLecture 4 SlidesAssignment 2 (classifying text),
Statement of project interest
Feb 9Constituency and Dependency ParsingLecture 5 SlidesAssignment 3 (constituency grammars)
Feb 16Coreference Resolution, Unsupervised LearningLecture 6 SlidesAssignment 4 (HMM and POS tagging)
Feb 23Word Representations, Neural Networks, Discuss practice midterm examLecture 7 SlidesProject proposal,
Practice midterm (recommended)
Mar 2Deep Learning and Language ModelsLecture 8 SlidesAssignment 5 (Parsing)
Mar 16Midterm Exam The midterm exam
(instructions on Canvas)
Mar 23Transformers, Pretrained Language Models, Finetuning, Commonly used Python toolsLecture 9 SlidesAssignment 6 (Unsupervised learning)
Mar 30GPTLecture 10 SlidesPaper draft due
Apr 6Reinforcement learning, ChatGPT, ethical data scienceLecture 11 SlidesAssignment 7 (BERT)
Apr 13Selected Topics chosen by students: speech recognition, knowledge graphs, web search engines (+ TextRank), chat / dialogueLecture 12 SlidesAssignment 8 (GPT)
Apr 20Final Project Presentations Final paper due


This course will provide an introduction to various techniques in natural language processing with a focus on practical use. Topics will include bag-of-words, English syntactic structures, part-of-speech tagging, parsing algorithms, anaphora/coreference resolution, word representations, deep learning, language models, and a brief introduction to current new models.

The course will cover basic implementations in Python as well as APIs and tools for advanced text processing. There will be brief weekly assignments, a midterm exam, and a final project including a programming component, a written report, and a short presentation.


Learning Objectives:

Resources

Grades

Final grades will be assigned based on the overall percentage calculated using the weightings listed above (no curving). There is no absolute direct mapping to letter grades, but the minimum overall percentage required to obtain each letter grade will be no higher than the following: A (94%), A- (90%), B+ (87%), B (83%), B- (80%), C- (65%).

Assignment submission and late days

Assignments will be submitted on Canvas/Gradescope.

Each student gets eight free, no-questions-asked late days for the term. We will keep track of your late days - you do not need to email us to use them. Each late day provides a penalty-free, additional 24-hours to submit the assignment.

Only three late days may be used on any given assignment. Late days cannot be divided fractionally. Late days can only be used on individual homework assignments (not group projects, presentations, or exams).

Respect for diversity

Classrooms full of computing students from diverse backgrounds and perspectives are crucial for us to make progress in our field.

It is my intent that diverse students will be successful in this course, that each student’s learning needs are addressed both in and out of class, and that the diversity that each student brings to this class is viewed as a resource, strength, and benefit. I expect you to feel challenged and sometimes outside of your comfort zone in this course, but it is my intent to present materials and activities that are inclusive and respectful of all persons, no matter their gender, sexual orientation, disability, age, socioeconomic status, ethnicity, race, culture, perspective, and other background characteristics. We should all strive for these principles both inside and outside of the classroom.

The course meetings are on Thursdays. If a class meeting conflicts with your religious observances, please let me know in the first two weeks of the class so that we can make other arrangements. Northeastern University respects the religious practices of its students, faculty, and staff and is committed to ensuring that all students are able to observe their religious beliefs without academic penalty.

Class rosters are provided to each instructor with each student’s legal name. I will gladly honor your request to address you by an alternate name and/or pronoun. Please advise me of this early in the semester so that I may make appropriate changes to my records.

Global Learner Support

Northeastern University’s Global Learner Support (GLS) offers “language, cultural, and academic support while promoting the development of intercultural competence and global understanding.” They offer tutoring, workshops, and much more. Visit https://gls.northeastern.edu/ Links to an external site.to learn more.

Academic accommodations

If you have a documented need for an academic accommodation, please contact the professor within the first two weeks so we can have a conversation about how best to make appropriate arrangements.

If you require support during the course due to a disability please ensure that you are already registered with the Disability Resource Center, and contact your course instructors to coordinate any support needed during the course.

Mental health issues are real and can prevent you from doing your best work. Your Khoury advisor is your primary contact for accessing University resources. You can also directly access University Health and Counseling Services. Do not hesitate to make use of them as needed. Please do not wait until it has seriously impacted your work.

Collaboration with humans

Computer science, both academically and professionally, is a collaborative discipline. In any collaboration, however, all parties are expected to make their own contributions and to generously credit the contributions of others. In our class, therefore, collaboration on homework and programming assignments is encouraged, but you as an individual are responsible for understanding all the material in the assignment and doing your own work. Always strive to do your best, give generous credit to others, start early, and seek help early from both your professors and classmates.

The following rules are intended to help you get the most out of your education and to clarify the line between honest and dishonest work. The professor reserves the right to ask you to verbally explain the reasoning behind any answer or code that you turn in and to modify your project grade based on your answers. It is vitally important that you turn in work that is your own. Follow the guidelines for academic honesty.

If you have had a substantive discussion of any homework or programming solution with a classmate, then be sure to cite them in your report. If you are unsure of what constitutes “substantive”, then ask us or err on the side of caution. You will not be penalized for working together. You must not copy answers or code from another student either by hand or electronically. Another way to think about it is that you should be talking English with one another, not Java.

The following rules apply to anything you hand in for a grade.

The university’s academic integrity policy discusses actions regarded as violations and consequences for students: Office of Student Conduct and Conflict Resolution - Academic Integrity Policy

Collaboration with language models

Pair programming with a language model has arrived, and it is available to all for free. Collaborating with a language model is going to become as common as Google searching bugs or discussing algorithms with a classmate. I want to prepare you for the future of this field while also ensuring everyone has a solid grasp of the fundamentals.

This semester, we will allow limited collaboration with a large language model such as ChatGPT (https://chat.openai.com/chat) or Github Copilot (https://github.com/features/copilot). For each homework assignment, I will list the things that you can use a model for. You may not use the model for any tasks other than what I list in the assignments. For example, I might say that you can ask the model to write code to get the concatenation of a list of strings. This is intended to save time spent getting the syntax for simple or mundane tasks. It is in your best interest (to learn what you need for interviews) to use language models only for the small tasks that I list, and not to have a model solve the main part of the assignment for you. We also generally don’t want AI to be writing all our code, but we want to use it as a tool.

At the top of each assignment, please credit:

Title IX

Title IX of the Education Amendments of 1972 protects individuals from sex or gender-based discrimination, including discrimination based on gender-identity, in educational programs and activities that receive federal financial assistance.

Northeastern’s Title IX Policy prohibits Prohibited Offenses, which are defined as sexual harassment, sexual assault, relationship or domestic violence, and stalking. The Title IX Policy applies to the entire community, including students of all genders, faculty, and staff.

If you or someone you know has been a survivor of a Prohibited Offense, confidential support and guidance can be found through University Health and Counseling Services staff and the Center for Spirituality, Dialogue, and Service clergy members. By law, those employees are not required to report allegations of sex or gender-based discrimination to the University.

Alleged violations can be reported non-confidentially to the Title IX Coordinator within The Office for University Equity and Compliance at: titleix@northeastern.edu and/or through NUPD (Emergency 617.373.3333; Non-Emergency 617.373.2121). Reporting Prohibited Offenses to NUPD does NOT commit the victim/affected party to future legal action.

Faculty members are considered “responsible employees” at Northeastern University, meaning they are required to report all allegations of sex or gender-based discrimination to the Title IX Coordinator.

In case of an emergency, please call campus police.

Please visit the Office for University Equity and Compliance for a complete list of reporting options and resources both on- and off-campus.