A final project poster session is planned by the end of the course (tentatively May 21st 2023). This is to provide students the opportunities to connect with speech and language research/industry community.
Anyone from the CUHK-Shenzhen and speech and language technology community are welcome to join. More details will be provided when it is close to the event. Feel free to reach out!
Here is the review of the poster session event, with invited talks from industry.
This course is designed as the first course for students who are interested in speech and language technology. The first half of the course focuses on the fundamentals and introduces tools for students to use, and the second half emphasises on applications, giving students the opportunity to know how speech and language technology could impact human life. In particular, the topics include:
Recommended Books:
We will have a mid-term exam on March 9th 2023. The scope of the mid-term exam is from lecture 1 to lecture 12.
You need to write a project proposal (2 pages) and a project report (max 6 pages) for the final project. Here is the report template. You are also expected to report project milestones and make a project poster presentation. After the final project deadline, feel free to make your project open source.
Here are some ways to earn the participation credit, which is capped at 5%.
The penalty is 0.5% off the final course grade for each late day.
Date | Lecture Description | Readings | Lecture Note | Events/Deadlines |
---|---|---|---|---|
Jan 4 | Tutorial 0: GitHub, LaTeX, and Colab | Learn LaTeX in 30
minutes Colab official tutorial Official tutorials of GitHub |
[Slides] | Self-study |
Jan 5 | Lecture 1: Introduction and course overview | [Slides] [Video] |
||
Jan 10 | Lecture 2: Machine learning in a nutshell | Deep Learning
in a Nutshell: Core Concepts Machine learning, explained |
[Slides] [Video] |
|
Jan 11 | Tutorial 1: PyTorch | PyTorch
Quickstarts PyTorch Installation |
[Slides] [Video] [Colab] |
|
Jan 12 | Lecture 3: Understanding sound and acoustics | Pitch,
loudness and timbre What is a Sound Spectrum? |
[Slides] [HTML] [Video] [Code] |
Assignment 1 out |
Jan 15 | Tutorial 2: TorchAudio (by Torchaudio team) | TorchAudio Documentation | [Slides] [Video] |
10:00am via zoom |
Jan 17 | Lecture 4: Understanding human speech | Voice Acoustics: an introduction Introduction to Speech Processing |
[Slides] [Video] [Code] |
|
Feb 9 | Lecture 5: Human sounds and their organization | Chapter 25: Phonetics | [Slides] [Video] |
|
Feb 14 | Lecture 6: Text processing and regular expressions | Chapter 2: Regular Expressions, Text Normalization, Edit Distance | [Slides] [Video] |
Assignment 2 out Assignment 1 due (11:59pm) |
Feb 15 | Tutorial 3: Text processing | Python Regular Expression
Documentation NLTK Tokenize Documentation |
[Slides] [Video] [Colab] |
|
Feb 15 | Project Release |
Singing Voice Conversion
Project Detecting Generated Abstract Project Voice Spoofing Detection Project |
[Slides] [Video] |
|
Feb 16 | Lecture 7: Words and their relationship to other words | Chapter 8: Sequence Labeling for Parts of Speech and Named Entities | [Slides] [Video] |
|
Feb 21 | Lecture 8: Syntax: Structure of sentences | [Slides] [Video] |
||
Feb 23 | Lecture 9: Language models | Chapter 3: N-gram Language
Models Chapter 7: Neural Networks and Neural Language Models |
[Slides] [Video] |
Assignment 2 due (11:59pm) |
Feb 28 | Lecture 10: Language models | Chapter 3: N-gram Language
Models Chapter 7: Neural Networks and Neural Language Models The Illustrated Transformer |
[Slides] [Video] |
|
Mar 2 | Lecture 11: Embedding: Representations of the meaning of words | Chapter 6: Vector Semantics and Embeddings | [Slides] [Video] |
Project proposal due (11:59pm) |
Mar 7 | Lecture 12: Embedding: Representations of the meaning of words | Chapter 6: Vector Semantics and Embeddings | [Slides] [Video] |
|
Mar 8 | Tutorial 4: Word embedding | [Slides] [Video] [Colab] |
||
Mar 9 | Midterm exam | Assignment 3 out | ||
Mar 14 | Word embedding | [Slides] [Video] |
||
Mar 15 | Tutorial 5: Visualization and plotting | Case Study - Zoom Out and Observe: News Environment Perception for Fake News Detection |
[Slides] [Colab] [Video] |
|
Mar 16 | Lecture 13: SLP Application - Sentiment analysis | [Slides] [Video] |
||
Mar 21 | Lecture 14: SLP Application - Text summarization | [Slides] [Video] |
Assignment 3 due (11:59pm) | |
Mar 22 | Lecture 15: Summarizing Conversations: From Meetings to Social Media (by Nancy Chen) | Invited talk. Location: DY103, Time: 12-13 | ||
Mar 28 | Lecture 16: SLP Application - Fundamentals of speech recognition (by Xiong Xiao) | [Slides] | Invited guest lecture | |
Mar 30 | Mid-term break | Project milestone 1 due (11:59pm) | ||
Apr 6 | Lecture 17: SLP Application - Voice conversion(by Shuai Wang) | [Slides] | Invited guest lecture | |
Apr 11 | Final project development | In-class office hour | ||
Apr 13 | Final project development | In-class office hour | ||
Apr 18 | Lecture 18: SLP Application - Text-to-speech synthesis | [Slides] [Video] |
Project milestone 2 due (11:59pm) | |
Apr 20 | Lecture 19: SLP Application - Machine translation | Chapter 13: Machine Translation | [Slides] [Video] |
|
Apr 25 | Lecture 20: SLP Application - Question answering | Chapter 14: Question Answering | [Slides] [Video] |
|
Apr 27 | Lecture 21: SLP Application - Chatbot | Chapter 15: Chatbots and Dialogue Systems | [Slides] [Video] |
|
May 4 | No class | How to write the final report? | Final project report early submission due (11:59pm) | |
May 9 | No class | |||
May 11 | Final project report due (11:59pm) | |||
May 21 | Final project poster session |
This session is open to the CUHK-Shenzhen community and invited guests. Details will be
available soon.
Time: 9am - 12:00pm for poster session, 1:30pm - 5:30pm for talks from external experts. There will be companies offering full-time job and internship opportunities. |