COMS 6998: Advanced Topics in Spoken Language Processing

Instructors: Julia Hirschberg

Time:  Tu 4: 10-6:00 (Spring 2024)

Location: Mudd 1127

 

 

Prerequisite: COMS 4705 or another speech or NLP class and experience in Machine Learning

Description:  This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.  Applications include Text-to-Speech Synthesis, dialogue systems, and analysis of entrainment, empathy, personality, emotion, humor and sarcasm, deception and trust, radicalization and charisma, all using text and speech information and some visual features as well.

 

Required readings:

Jurafsky & Martin 2023 (3rd edition draft) chapters

These and other readings are linked from this syllabus for each class.

Suggested:

Keith Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.

 

Resources:

A list of resources can be found here.

 

Office Hours

Julia Hirschberg: TBD

Debasmita Bhattacharya: Th 2-4pm

Yu-Wen Chen: F 2-4pm

Ziwei (Sara) Gong: M 2-4pm

Ananya Sahu: W 12-2pm

Grade Breakdown

5% attendance

20% weekly posts

20% HW1

25% HW2

30% HW3

 

Also please note our late policies:

For weekly posts:  Monday deadline 11:59pm; 1 late day allowed but 1 point lost

For homework: 3 late days allowed but 5 points lost for each late day

 

Academic Integrity

The SEAS academic integrity policy is found here.

The CS academic integrity policy is found here.

Syllabus

Note: Schedule and readings are subject to change.  Readings labeled with * are optional.

 

Date

Topic

Readings

Assignments

Week 1: 1/16

Introduction to Speech Processing

 

Week 2: 1/23

From Sounds to Language

Jurafsky & Martin Chapter 28 (Chapters 1-3)

Week 3: 1/30

Acoustics of Speech

Jurafsky & Martin Chapter 28 (sections 4-6)

Week 4: 2/6

Tools for Speech Analysis

*Praat Tutorial (just use for reference)

Watch all these Praat video tutorials here (1-7)

*Also some video tutorials on acoustics of speech here

Download the latest version of Praat

Record your own voice saying these sentences

Bring your laptop and headphones to class

 

HW1: Praat Recording and Analysis (assigned)

Week 5: 2/13

Analyzing Speech Prosody

ToBI Conventions

AuToBI

Prosody and Meaning

*Guidelines for ToBI Labeling

Week 6: 2/20

Text-to-Speech Synthesis (Rose Sloan, Bard College); Producing Trustworthy Voices (Sarah Ita Levitan, Hunter College)

Jurafsky & Martin Chapter 16 (Introduction, sections 6, 8)

*Prosody Prediction from Syntactic, Lexical, and Word Embedding Features, *Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech,

*Where do the improvements come from in sequence-to-sequence neural tts?

The sound of trustworthiness: Acoustic-based modulation of perceived voice personality

HW1 due

Week 7: 2/27

Speech Recognition (Bhuvana Ramabhadran, Google)

Jurafsky & Martin Chapter 16 (Introduction, sections 1-5, 7-8)
Twenty-Five Years of Evolution in Speech and Language Processing

*Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

*Robust Speech Recognition via Large-Scale Weak Supervision

*SLM: BRIDGE THE THIN GAP BETWEEN SPEECH AND TEXT FOUNDATION MODELS

 

Week 8: 3/5

Spoken Dialogue Systems (Eleanor Lin, Yu-Wen Chen, Siyan Lin; Empathy (JH, Tony Chen) in Spoken Language

Jurafsky & Martin Chapters 14, 15, 27

RASwDA: Re-Aligned Switchboard Dialog Act Corpus for Dialog Act Prediction in Conversations

*Generative Agents: Interactive Simulacra of Human Behavior

Nora the Empathetic Psychologist

*11 Nonverbal Ways to Express Empathy And Camaraderie With Your Team

HW2 assigned

Week 9: 3/12

Spring Break: No classes

 

 

Week 10: 3/19

Speech Analysis: Emotion and Sentiment Detection (Zixoafan Yang, Apple); Emotion Elicitation (Sara Gong) (remote on Zoom Class – will be saved to Video Library after)

Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks

Emotions and Types of Emotional Responses

Eliciting Rich Positive Emotions in Dialogue Generation

 

Week 11: 3/26

Speech Analysis: Entrainment; Code-Switching (Debasmita Bhattacharya) 

Identifying Entrainment in Task-oriented Conversations

What Code-Switching Strategies are Effective in Dialog Systems?

HW2 due

Week 12: 4/2

Speech Analysis: Personality (Michelle Levine) and Mental State

Predicting the Big 5 personality traits from digital footprints on social

media: A meta-analysis

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples

Speech Processing Approach for Diagnosing Dementia in an Early Stage

Week 13: 4/9

Speech Analysis: Sarcasm, Simile and Metaphor   (Tuhin Chakrabarty); WordsEye (Bob Coyne, WordsEye)

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

"Sure, I did the right thing": A system for sarcasm detection in speech

"Yeah, right": Sarcasm recognition for spoken dialogue systems

*Why can’t robots understand sarcasm?

HW3 assigned

Week 14: 4/16

Speech Analysis: Charisma; Humor

What Makes a Speaker Charismatic?  Producing and Perceiving Charismatic Speech

"Would You Buy A Car From Me?"-- On the Likability of Telephone Voices

Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation

Multimodal Indicators of Humor in Video

CHoRaL: Collecting Humor Reaction Labels from Millions of Social Media Users

 

Week 15: 4/23

Speech Analysis: Deception and Trust; Intent Detection and Radicalization (Lin Ai, Columbia)

Acoustic-Prosodic and Lexical Cues to Deception and Trust:  Deciphering How People Detect Lies

Multimodal Deception Detection using Automatically Extracted Acoustic, Visual and Lexical Features

Identifying the Popularity and Persuasiveness of Right- and Left-leaning Group Videos on Social Media

Unveiling the Influencers of Radical Content: A Multimodal Analysis of QAnon Videos

HW3 Due