The difference between speech and language processing and other data processing is the use of knowledge of language. In this course, we will study how to describe, process and compute different levels of language knowledge including Phonetics and Phonology, Morphology, Syntax, Semantics, and how the language knowledge is used in speech and language applications such as named entities recognition, information extraction, question answering, speech recognition, and speech synthesis.

Teaching team


Instructor
Zhizheng Wu
TA
Xi Chen

Poster Session


A final project poster session is planned by the end of the course (tentatively May 21st 2023). This is to provide students the opportunities to connect with speech and language research/industry community.

Anyone from the CUHK-Shenzhen and speech and language technology community are welcome to join. More details will be provided when it is close to the event. Feel free to reach out!



Here is the review of the poster session event, with invited talks from industry.

Logistics


Course Information


This course is designed as the first course for students who are interested in speech and language technology. The first half of the course focuses on the fundamentals and introduces tools for students to use, and the second half emphasises on applications, giving students the opportunity to know how speech and language technology could impact human life. In particular, the topics include:

  • Understanding human speech: spectrogram, fundamental frequency, formant, etc
  • Human sounds and their organization
  • Words and their relationship to other words
  • Syntax: Structure of sentences
  • Text processing and regular expressions
  • Language models
  • Embedding: Representations of the meaning of words
  • Word classifications and Named entities recognition
  • Applications: speech recognition, speech synthesis, machine translation, chatbot, etc

Prerequisites

Textbooks

Recommended Books:

Grading Policy (CSC3160/MDS6002)

Assignments (30%)

Midterm exam (25%)

We will have a mid-term exam on March 9th 2023. The scope of the mid-term exam is from lecture 1 to lecture 12.

Final project (40%)

You need to write a project proposal (2 pages) and a project report (max 6 pages) for the final project. Here is the report template. You are also expected to report project milestones and make a project poster presentation. After the final project deadline, feel free to make your project open source.

Participation (5%)

Here are some ways to earn the participation credit, which is capped at 5%.

Late Policy

The penalty is 0.5% off the final course grade for each late day.

Schedule


Date Lecture Description Readings Lecture Note Events/Deadlines
Jan 4 Tutorial 0: GitHub, LaTeX, and Colab Learn LaTeX in 30 minutes
Colab official tutorial
Official tutorials of GitHub
[Slides] Self-study
Jan 5 Lecture 1: Introduction and course overview [Slides]
[Video]
Jan 10 Lecture 2: Machine learning in a nutshell Deep Learning in a Nutshell: Core Concepts
Machine learning, explained
[Slides]
[Video]
Jan 11 Tutorial 1: PyTorch PyTorch Quickstarts
PyTorch Installation
[Slides]
[Video]
[Colab]
Jan 12 Lecture 3: Understanding sound and acoustics Pitch, loudness and timbre
What is a Sound Spectrum?
[Slides]
[HTML]
[Video]
[Code]
Assignment 1 out
Jan 15 Tutorial 2: TorchAudio (by Torchaudio team) TorchAudio Documentation [Slides]
[Video]
10:00am via zoom
Jan 17 Lecture 4: Understanding human speech Voice Acoustics: an introduction
Introduction to Speech Processing
[Slides]
[Video]
[Code]
Feb 9 Lecture 5: Human sounds and their organization Chapter 25: Phonetics [Slides]
[Video]
Feb 14 Lecture 6: Text processing and regular expressions Chapter 2: Regular Expressions, Text Normalization, Edit Distance [Slides]
[Video]
Assignment 2 out
Assignment 1 due (11:59pm)
Feb 15 Tutorial 3: Text processing Python Regular Expression Documentation
NLTK Tokenize Documentation
[Slides]
[Video]
[Colab]
Feb 15 Project Release Singing Voice Conversion Project
Detecting Generated Abstract Project
Voice Spoofing Detection Project
[Slides]
[Video]
Feb 16 Lecture 7: Words and their relationship to other words Chapter 8: Sequence Labeling for Parts of Speech and Named Entities [Slides]
[Video]
Feb 21 Lecture 8: Syntax: Structure of sentences [Slides]
[Video]
Feb 23 Lecture 9: Language models Chapter 3: N-gram Language Models
Chapter 7: Neural Networks and Neural Language Models
[Slides]
[Video]
Assignment 2 due (11:59pm)
Feb 28 Lecture 10: Language models Chapter 3: N-gram Language Models
Chapter 7: Neural Networks and Neural Language Models
The Illustrated Transformer
[Slides]
[Video]
Mar 2 Lecture 11: Embedding: Representations of the meaning of words Chapter 6: Vector Semantics and Embeddings [Slides]
[Video]
Project proposal due (11:59pm)
Mar 7 Lecture 12: Embedding: Representations of the meaning of words Chapter 6: Vector Semantics and Embeddings [Slides]
[Video]
Mar 8 Tutorial 4: Word embedding [Slides]
[Video]
[Colab]
Mar 9 Midterm exam Assignment 3 out
Mar 14 Word embedding [Slides]
[Video]
Mar 15 Tutorial 5: Visualization and plotting Case Study - Zoom Out and Observe: News Environment Perception for Fake News Detection [Slides]
[Colab]
[Video]
Mar 16 Lecture 13: SLP Application - Sentiment analysis [Slides]
[Video]
Mar 21 Lecture 14: SLP Application - Text summarization [Slides]
[Video]
Assignment 3 due (11:59pm)
Mar 22 Lecture 15: Summarizing Conversations: From Meetings to Social Media (by Nancy Chen) Invited talk. Location: DY103, Time: 12-13
Mar 28 Lecture 16: SLP Application - Fundamentals of speech recognition (by Xiong Xiao) [Slides] Invited guest lecture
Mar 30 Mid-term break Project milestone 1 due (11:59pm)
Apr 6 Lecture 17: SLP Application - Voice conversion(by Shuai Wang) [Slides] Invited guest lecture
Apr 11 Final project development In-class office hour
Apr 13 Final project development In-class office hour
Apr 18 Lecture 18: SLP Application - Text-to-speech synthesis [Slides]
[Video]
Project milestone 2 due (11:59pm)
Apr 20 Lecture 19: SLP Application - Machine translation Chapter 13: Machine Translation [Slides]
[Video]
Apr 25 Lecture 20: SLP Application - Question answering Chapter 14: Question Answering [Slides]
[Video]
Apr 27 Lecture 21: SLP Application - Chatbot Chapter 15: Chatbots and Dialogue Systems [Slides]
[Video]
May 4 No class How to write the final report? Final project report early submission due (11:59pm)
May 9 No class
May 11 Final project report due (11:59pm)
May 21 Final project poster session This session is open to the CUHK-Shenzhen community and invited guests. Details will be available soon.
Time: 9am - 12:00pm for poster session, 1:30pm - 5:30pm for talks from external experts.
There will be companies offering full-time job and internship opportunities.