COMS 6998:
Advanced Topics in Spoken Language Processing
Instructors: Julia Hirschberg
Time: M 4: 10-6:00 (Spring 2022)
Location: Mudd 1127
Prerequisite: COMS
4705 or another speech or NLP class
Description: This class will introduce students to spoken language
processing: basic concepts, analysis
approaches, and applications.
Required readings:
Jurafsky & Martin 2021
(3rd edition) chapters
These and other readings are linked from this syllabus for
each class.
Suggested:
Keith
Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley. 2011.
Resources:
A list of resources can be found here.
Office Hours
Julia Hirschberg: Tu 4:30-5:30pm
Lin Ai: W 12-1pm
Run Chen: Th 2-3pm
Zhecan Wang: F 5-6pm
Grade Breakdown
20% weekly
posts
20% HW1
30% HW2
30% HW3
Also please note our late policies:
For weekly posts:
Sunday deadline 11:59pm; 1 late day allowed but 1 point lost
For homeworks: 3 late days allowed
but 5 points lost for each late day
Academic Integrity
The SEAS academic integrity policy is found here.
The CS academic integrity policy is found here.
Syllabus
Note: Schedule and readings are
subject to change. Readings labeled with
* are optional.
Date
|
Topic
|
Readings
|
Assignments
|
Week 1: 1/24
|
Introduction
to Speech Processing
|
|
|
Week 2: 1/31
|
From
Sounds to Language
|
Jurafsky & Martin Chapter 25 (sections 1-3)
|
|
Week 3: 2/7
|
Acoustics
of Speech
|
Jurafsky & Martin Chapter 25 (sections 4-6)
|
|
Week 4: 2/14
|
Tools
for Speech Analysis
|
Praat Tutorial (Chapter 11 - scripting - is
optional)
Download Praat
|
HW1: Praat Recording and
Analysis (assigned)
|
Week 5: 2/21
|
Analyzing
Speech Prosody
|
ToBI Conventions
Guidelines
for ToBI Labeling
Prosody
and Meaning
|
|
Week 6: 2/28
|
Text-to-Speech
Synthesis (Rose Sloan)
|
Jurafsky & Martin Chapter 26 (Introduction, sections
6, 8)
*Prosody Prediction from Syntactic, Lexical, and Word
Embedding Features, *Comparing
acoustic and textual representations of previous linguistic context for
improving Text-to-Speech,
Tacotron:
Towards end-to-end speech synthesis (skim)
Where do the
improvements come from in sequence-to-sequence neural tts?
(* not required)
|
HW1 due
|
Week 7: 3/7
|
Spoken
Dialogue Systems
|
Jurafsky & Martin Chapters 22,
23,
and 24
|
HW2 assigned
|
Week 8: 3/14
|
Spring Break: No classes.
|
|
|
Week 9: 3/21
|
Speech
Recognition (Bhuvana Ramabhadran)
|
Jurafsky & Martin Chapter 26 (Introduction, sections
1-5, 8)
An Overview
of End-to-End Automatic Speech Recognition ?
Conformer
Parrotron: a Faster and
Stronger End-to-end Speech Conversion and Recognition Model for Atypical
Speech
|
|
Week 10: 3/28
|
Speech
Analysis: Entrainment in Spoken Language and Empathy
|
Measuring acoustic-prosodic entrainment with respect to
multiple levels and dimensions
Nora
the Empathetic Psychologist
11
Nonverbal Ways to Express Empathy And Camaraderie
With Your Team
|
HW2 due
|
Week 11: 4/4
|
Speech Analysis: Personality
(Michelle Levine) and Mental
State
|
Detecting
late-life depression in Alzheimer's disease through analysis of speech and
language ?
A Cross-modal Review of Indicators
for Depression Detection Systems ?
Automatic Recognition of
Personality in Conversation ?
|
|
Week 12: 4/11
|
Speech
Analysis: Emotion and Sentiment Detection (Zixioafan
Yang)
|
Predicting
Arousal and Valence from Waveforms and Spectrograms using Deep Neural
Networks
The 6 Types of
Basic Emotions and Their Effect on Human Behavior
|
HW3
assigned
|
Week 13: 4/18
|
Speech
Analysis: Sarcasm (Smaranda Muresan) and Humor
and CHoRaL(Lin Ai, Shayan Hooshmand)
|
“Laughing at you or
with you”: The Role of Sarcasm in Shaping the Disagreement Space
"Sure,
I did the right thing": A system for sarcasm detection in speech
"Yeah, right": Sarcasm recognition for spoken
dialogue systems
Why can’t
robots understand sarcasm?
Multimodal
Indicators of Humor in Video
|
|
Week 14: 4/25
|
Speech
Analysis: Deception and Trust
|
Acoustic-Prosodic
and Lexical Cues to Deception and Trust:
Deciphering How People Detect Lies
Multimodal
Deception Detection using Automatically Extracted Acoustic, Visual and
Lexical Features
|
|
Week 15: 5/2
|
Speech Analysis: Charisma,
Likability and Style
|
What
Makes a Speaker Charismatic? Producing
and Perceiving Charismatic Speech
"Would You Buy A Car From
Me?"-- On the Likability of Telephone Voices
Extracting
Social Meaning: Identifying Interactional Style in Spoken Conversation
|
HW3 due
|