COMS 6998:
Advanced Topics in Spoken Language Processing
Instructors: Julia Hirschberg
Time: Tu 4: 10-6:00 (Spring 2024)
Location: Mudd 1127
Prerequisite: COMS
4705 or another speech or NLP class and experience in Machine Learning
Description: This class will introduce students to spoken language
processing: basic concepts, analysis approaches, and
applications. Applications include Text-to-Speech Synthesis,
dialogue systems, and analysis of entrainment, empathy, personality, emotion,
humor and sarcasm, deception and trust, radicalization and charisma, all using
text and speech information and some visual features as well.
Required readings:
Jurafsky & Martin 2023
(3rd edition draft) chapters
These and other readings are linked from this syllabus for
each class.
Suggested:
Keith
Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley. 2011.
Resources:
A list of resources can be found here.
Office Hours
Julia Hirschberg: TBD
Debasmita Bhattacharya: Th 2-4pm
Yu-Wen Chen: F 2-4pm
Ziwei (Sara) Gong: M 2-4pm
Ananya Sahu: W 12-2pm
Grade Breakdown
5% attendance
20% weekly
posts
20% HW1
25% HW2
30% HW3
Also please note our late policies:
For weekly posts:
Monday deadline 11:59pm; 1 late day allowed but 1 point lost
For homework: 3 late days allowed but 5 points lost for each
late day
Academic Integrity
The SEAS academic integrity policy is found here.
The CS academic integrity policy is found here.
Syllabus
Note: Schedule and readings are
subject to change. Readings labeled with
* are optional.
Date
|
Topic
|
Readings
|
Assignments
|
Week 1: 1/16
|
Introduction
to Speech Processing
|
|
|
Week 2: 1/23
|
From
Sounds to Language
|
Jurafsky & Martin Chapter 28 (Chapters 1-3)
|
|
Week 3: 1/30
|
Acoustics
of Speech
|
Jurafsky & Martin Chapter 28 (sections 4-6)
|
|
Week 4: 2/6
|
Tools
for Speech Analysis
|
*Praat Tutorial (just use for reference)
Watch
all these Praat video tutorials here
(1-7)
*Also some video tutorials on acoustics of speech here
Download the latest version of Praat
Record your own voice saying
these sentences
Bring your laptop and headphones to class
|
HW1: Praat Recording and
Analysis (assigned)
|
Week 5: 2/13
|
Analyzing
Speech Prosody
|
ToBI Conventions
AuToBI
Prosody
and Meaning
*Guidelines
for ToBI Labeling
|
|
Week 6: 2/20
|
Text-to-Speech
Synthesis (Rose Sloan, Bard College); Producing
Trustworthy Voices (Sarah Ita Levitan, Hunter
College)
|
Jurafsky & Martin Chapter 16 (Introduction, sections
6, 8)
*Prosody Prediction from Syntactic, Lexical, and Word
Embedding Features, *Comparing
acoustic and textual representations of previous linguistic context for
improving Text-to-Speech,
*Where do the
improvements come from in sequence-to-sequence neural tts?
The
sound of trustworthiness: Acoustic-based modulation of perceived voice
personality
|
HW1 due
|
Week 7: 2/27
|
Speech
Recognition (Bhuvana Ramabhadran,
Google)
|
Jurafsky & Martin Chapter 16 (Introduction, sections
1-5, 7-8)
Twenty-Five Years of Evolution in Speech and
Language Processing
*Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
*Robust Speech Recognition via Large-Scale Weak
Supervision
*SLM: BRIDGE THE THIN GAP BETWEEN SPEECH AND TEXT
FOUNDATION MODELS
|
|
Week 8: 3/5
|
Spoken Dialogue Systems (Eleanor
Lin, Yu-Wen
Chen, Siyan
Lin; Empathy (JH,
Tony
Chen) in Spoken Language
|
Jurafsky & Martin Chapters 14,
15,
27
RASwDA: Re-Aligned Switchboard Dialog Act Corpus for
Dialog Act Prediction in Conversations
*Generative
Agents: Interactive Simulacra of Human Behavior
Nora
the Empathetic Psychologist
*11
Nonverbal Ways to Express Empathy And Camaraderie With Your Team
|
HW2 assigned
|
Week 9: 3/12
|
Spring Break: No classes
|
|
|
Week 10: 3/19
|
Speech Analysis: Emotion and Sentiment Detection (Zixoafan Yang, Apple); Emotion
Elicitation (Sara Gong) (remote on Zoom Class – will be saved to Video Library
after)
|
Predicting
Arousal and Valence from Waveforms and Spectrograms using Deep Neural
Networks
Emotions and
Types of Emotional Responses
Eliciting
Rich Positive Emotions in Dialogue Generation
|
|
Week 11: 3/26
|
Speech Analysis: Entrainment;
Code-Switching
(Debasmita Bhattacharya)
|
Identifying
Entrainment in Task-oriented Conversations
What
Code-Switching Strategies are Effective in Dialog Systems?
|
HW2 due
|
Week 12: 4/2
|
Speech Analysis: Personality
(Michelle Levine) and Mental
State
|
Predicting
the Big 5 personality traits from digital footprints on social
media: A
meta-analysis
Multimodal
Deep Learning for Mental Disorders Prediction from Audio Speech Samples
Speech
Processing Approach for Diagnosing Dementia in an Early Stage
|
|
Week 13: 4/9
|
Speech
Analysis: Sarcasm,
Simile and Metaphor (Tuhin Chakrabarty); WordsEye (Bob Coyne, WordsEye)
|
“Laughing at you or
with you”: The Role of Sarcasm in Shaping the Disagreement Space
"Sure, I did the right thing": A system for sarcasm
detection in speech
"Yeah, right": Sarcasm recognition for spoken
dialogue systems
*Why can’t
robots understand sarcasm?
|
HW3
assigned
|
Week 14: 4/16
|
Speech
Analysis: Charisma;
Humor
|
What
Makes a Speaker Charismatic? Producing
and Perceiving Charismatic Speech
"Would You Buy A Car From
Me?"-- On the Likability of Telephone Voices
Extracting
Social Meaning: Identifying Interactional Style in Spoken Conversation
Multimodal
Indicators of Humor in Video
CHoRaL: Collecting
Humor Reaction Labels from Millions of Social Media Users
|
|
Week 15: 4/23
|
Speech Analysis: Deception and Trust; Intent
Detection and Radicalization (Lin Ai, Columbia)
|
Acoustic-Prosodic
and Lexical Cues to Deception and Trust:
Deciphering How People Detect Lies
Multimodal
Deception Detection using Automatically Extracted Acoustic, Visual and
Lexical Features
Identifying
the Popularity and Persuasiveness of Right- and Left-leaning Group Videos on Social Media
Unveiling
the Influencers of Radical Content: A Multimodal Analysis of QAnon Videos
|
HW3 Due
|