COMS 6998:
Advanced Topics in Spoken Language Processing
Instructors: Julia Hirschberg
Time: Tu 4: 10-6:00 (Fall 2022)
Location: Mudd 1127
Prerequisite: COMS
4705 or another speech or NLP class
Description: This class will introduce students to spoken language
processing: basic concepts, analysis
approaches, and applications.
Applications include Text-to-Speech Synthesis, dialogue systems, and
analysis of entrainment, personality, emotion, humor and sarcasm, deception and
charisma.
Required readings:
Jurafsky & Martin 2021
(3rd edition) chapters
These and other readings are linked from this syllabus for
each class.
Suggested:
Keith
Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley. 2011.
Resources:
A list of resources can be found here.
Office Hours
Julia Hirschberg: Th 4-5pm (in CEPSR 705)
Lin Ai: Tu 2-3pm (zoom)
Run Chen: W 4:30-5:30 (zoom)
Arushi Sahai: F 11:30-2:301-2pm (in CEPSR 7LW3) and on zoom
Grade Breakdown
20% weekly
posts
20% HW1
30% HW2
30% HW3
Also please note our late policies:
For weekly posts:
Monday deadline 11:59pm; 1 late day allowed but 1 point lost
For homeworks: 3 late days allowed
but 5 points lost for each late day
Academic Integrity
The SEAS academic integrity policy is found here.
The CS academic integrity policy is found here.
Syllabus
Note: Schedule and readings are
subject to change. Readings labeled with
* are optional.
Date
|
Topic
|
Readings
|
Assignments
|
Week 1: 9/6
|
Introduction
to Speech Processing
|
|
|
Week 2: 9/13
|
From
Sounds to Language
|
Jurafsky & Martin Chapter 25 (sections 1-3)
|
|
Week 3: 9/20
|
Acoustics
of Speech
|
Jurafsky & Martin Chapter 25 (sections 4-6)
|
|
Week 4: 9/27
|
Tools
for Speech Analysis
|
Praat Tutorial
Some
video tutorials: here
and a larger set here
Download the latest version of Praat
|
HW1: Praat Recording and
Analysis (assigned)
|
Week 5: 10/4
|
Analyzing
Speech Prosody
|
ToBI Conventions
AuToBI
Prosody
and Meaning
*Guidelines
for ToBI Labeling
|
|
Week 6: 10/11
|
Text-to-Speech
Synthesis (Rose Sloan) – This class will be remote, on zoom. Please find the zoom link in Courseworks Zoom Class Sessions Tomorrow 4:00pm
|
Jurafsky & Martin Chapter 26 (Introduction, sections
6, 8)
*Prosody Prediction from Syntactic, Lexical, and Word
Embedding Features, *Comparing
acoustic and textual representations of previous linguistic context for
improving Text-to-Speech,
*Where do the
improvements come from in sequence-to-sequence neural tts?
|
HW1 due
|
Week 7: 10/18
|
Spoken
Dialogue Systems
|
Jurafsky & Martin Chapters 22,
23,
and 24
|
HW2 assigned
|
Week 8: 10/25
|
Speech Analysis: Charisma,
Likability and Style (Andrew Rosenberg)
|
What
Makes a Speaker Charismatic? Producing
and Perceiving Charismatic Speech
"Would You Buy A Car From
Me?"-- On the Likability of Telephone Voices
Extracting
Social Meaning: Identifying Interactional Style in Spoken Conversation
|
|
Week 9: 11/1
|
Speech Analysis: Emotion
and Sentiment Detection (Zixioafan Yang) and
Sara (Ziwei) Gong Emotion
Elicitation -- This class will be
remote, on zoom. Please find the zoom
link in Courseworks Zoom Class Sessions Tomorrow 4:00pm
|
Predicting
Arousal and Valence from Waveforms and Spectrograms using Deep Neural
Networks
The 6 Types of
Basic Emotions and Their Effect on Human Behavior
|
HW2 due
|
11/8
|
No class: Election Day
|
|
|
Week 10: 11/15
|
Speech Analysis: Personality
(Michelle Levine) and Mental
State
|
Predicting
the Big 5 personality traits from digital footprints on social
media: A
meta-analysis
Multimodal
Deep Learning for Mental Disorders Prediction from Audio Speech Samples
Speech
Processing Approach for Diagnosing Dementia in an Early Stage
|
|
Week 11: 11/22
|
Speech Analysis: Deception
and Trust and Radicalization
|
Acoustic-Prosodic
and Lexical Cues to Deception and Trust:
Deciphering How People Detect Lies
Multimodal
Deception Detection using Automatically Extracted Acoustic, Visual and
Lexical Features
Identifying
the Popularity and Persuasiveness of Right- and Left-learning
Group Videos on Social Media
|
HW3
assigned
|
Week 12: 11/29
|
Speech Analysis: Entrainment
in Spoken Language and Empathetic
Conversations
|
Measuring acoustic-prosodic entrainment with respect to
multiple levels and dimensions
Nora
the Empathetic Psychologist
11
Nonverbal Ways to Express Empathy And Camaraderie
With Your Team
|
|
Week 13: 12/6
|
Speech
Analysis: Sarcasm
(Smaranda Muresan) and Humor
in CHoRaL (Shayan Hooshmand) and MoreHumor (Lin Ai)
|
“Laughing at you or
with you”: The Role of Sarcasm in Shaping the Disagreement Space
"Sure, I did the right thing": A system for sarcasm
detection in speech
"Yeah, right": Sarcasm recognition for spoken
dialogue systems
*Why can’t
robots understand sarcasm?
Multimodal
Indicators of Humor in Video
CHoRaL: Collecting
Humor Reaction Labels from Millions of Social Media Users
|
HW3 Due
|
|
|
|
|