Syllabus

Your single source of course information

General Information

 
Instructors: Prof. Slava Jankin, PhD and Dr. Hannah Béchara
Office: 3.15
Office Hours By arrangement. Email the instructors directly.
Class Times Tuesdays 12:00-16:00

Instructor Information

Slava Jankin is Professor of Data Science and Public Policy at the Hertie School of Governance. He is the Director of the Hertie School Data Science Lab. His research and teaching is primarily in the field of natural language processing and machine learning. Before joining the Hertie School faculty, he was a Professor of Public Policy and Data Science at University of Essex, holding a joint appointment in the Institute for Analytics and Data Science and Department of Government. At Essex, Slava served as a Chief Scientific Adviser to Essex County Council, focusing on artificial intelligence and data science in public services. He previously worked at University College London and London School of Economics. Slava holds a PhD in Political Science from Trinity College Dublin.

Hannah Béchara is an NLP post-doc who inadvertently found herself hired by Hertie’s Data Science lab. In between training neural networks and support vector machines, Hannah occasionally teaches programming classes in Python, the programming language for winners. For reasons yet unclear, the University of Wolverhampton decided to award Hannah a PhD in Computer Science.

Course Contents and Learning Objectives

Course Contents

Natural Language Processing (NLP) is a key technology of the information age. Automatically processing natural language outputs is a key component of artificial intelligence. Applications of NLP are everywhere because people and institutions largely communicate in language. Recently, statistical techniques based on neural networks have achieved a number of remarkable successes in natural language processing leading to a great deal of commercial and academic interest in the field. This course provides an overview of modern data-driven models to richer structural representations of how words interact to create meaning. We will discuss salient linguistic phenomena and successful computational models. We will also cover machine learning techniques relevant to natural language processing.

Main Learning Objectives

In this course, students will gain a thorough introduction to cutting-edge research in Deep Learning for NLP. Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models. This year, the course will be taught for the first time using PyTorch thus allowing students to learn one of the most widely used Python development environments for machine learning.

Target Group

Anyone who is interested in being prepared for the brave new world when AI takes over and SkyNet rules the world. You won’t be able to stop it, but you’ll be able to understand it!

Grading and Assignments

The main requirement of the final project is to produce a substantial new research paper using machine learning in support of an empirical research question. The page limit will be short, but the expectations for clarity and thoroughness will be high.

Composition of Final Grade

 
Assignment 1: Project Proposal and Literature Review Deadline: Sesson 3 20%
Assignment 2: Midterm Report Deadline: Session 4 20%
Assignment 3:Final Report Deadline: Session 6 40%
Assignment 4:Presentation Project Presentations: Session 6 10%

The assessment for the course consists of a research project, presentation and participation. The research project must be done in teams of 2-4 (individual submissions will not be accepted for the project).

The aim of the assessments is three-fold. First, it will provide you with the opportunity to apply the concepts learned in this class creatively, which helps you with understanding material more deeply. Second, designing and working on a unique project in a team which is something that you will encounter, if you haven’t already, in the workplace, and the project helps you prepare for that. Third, along with the opportunity to practice and the satisfaction of working creatively, students can use this project to enhance their portfolio or resume.

Note about grading. There is no “perfect project.” While you are encouraged to be ambitious, the most important aspect of this research project is your learning experience. Hence, you don’t want to pick something that is too easy for you, but similarly, you don’t want to choose a project where you are not certain that is out of the scope of this class. The project proposal is not graded by how exciting your project is but based on whether you follow the objectives of the project proposal, project presentation, and project report. For instance, if your project ends up being unsuccessful – for example, if you choose to design a classifier and it doesn’t achieve the desired accuracy – it will not negatively affect your grade as long as you are honest, describe the potential issues well, and suggest improvements or further experiments. Again, the objective of this project is to provide you with hands-on practice and an opportunity to learn.

Assignment Details

Assignment 1: Project proposal and literature review (20%) – 3 pages and 5 references

Assignment 2: Midterm report (20%) – 4 pages and 10 references

Assignment 3: Final report (40%) – 8 pages and unlimited references

Assignment 4: Presentation (10%)

Participation grade (10%)

Marking guide and formatting

We will provide detailed instructions and the marking guide for each component of the research project. You must use the supplied LaTeX template for writing the proposal, midterm and final reports.

Contribution Statements

The final report must state (in one or two sentences) the contributions of each team member. This doesn’t count toward the page limit. Team projects which fail to include this will receive a 1% grade deduction (out of the total course grade).

Projects Submitted to Multiple Classes

You may submit a project to both this course and Python Programming for Data Scientists class – we encourage it! – but if you do this, your project must be ambitious and thorough enough to justify the amount of credit you’ll be getting for it. If your course project overlaps with a project that will be submitted to another class, you must inform both instructors by the midterm report deadline, and get confirmation from the instructors of both classes that the combined project is substantial enough.

Optional: Sharing your Project

You are encouraged to share your final project report online after you completed the course – for example, via GitHub or on a personal website online.

Late submission of assignments

For each day the assignment is turned in late, the grade will be reduced by 10% (e.g. submission two days after the deadline would result in 20% grade deduction).

Attendance

Students are expected to be present and prepared for every class session. Active participation during lectures and seminar discussions is important. If unavoidable circumstances arise which prevent attendance or preparation, the instructor should be advised by email with as much advance notice as possible. Please note that students cannot miss more than two out of 12 course sessions. For further information please consult the Examination Rules §10.

Academic Integrity

The Hertie School is committed to the standards of good academic and ethical conduct. Any violation of these standards shall be subject to disciplinary action. Plagiarism, deceitful actions as well as free-riding in group work are not tolerated. See Examination Rules §16.

Compensation for Disadvantages

If a student furnishes evidence that he or she is not able to take an examination as required in whole or in part due to disability or permanent illness, the Examination Committee may upon written request approve learning accommodation(s). In this respect, the submission of adequate certificates may be required. See Examination Rules §14.

General Readings

Required:

Delip Rao & Brian McMahon. Natural Language Processing with PyTorch

Recommended:

Dan Jurafsky and James H. Martin. “Speech and Language Processing (3rd ed. draft)”

Jacob Eisenstein. “Natural Language Processing”

Steven Bird, Ewan Klein And Edward Loper. “Natural Language Processing with Python”

Optional:

A. Aldo Faisal and Cheng Soon Ong. Mathematics for Machine Learning

Session Overview

Date Title
1 11.02.2020 Introduction: A Quick Maths Overview + Traditional NLP
2 25.02.2020 Introduction to Pytorch + Foundations of Neural Networks
3 10.03.2020 Feed Forward Neural Networks
4 31.03.2020 Embeddings
5 14.04.2020 Sequence Modelling
6 28.04.2020 Project Presentations