COMS 6998: Advanced Topics in Spoken Language Processing

Instructors: Julia Hirschberg

Time:  Tu 4: 10-6:00 (Spring 2023)

Location: Mudd 633

 

 

Prerequisite: COMS 4705 or another speech or NLP class

Description:  This class will introduce students to spoken language processing:  basic concepts, analysis approaches, and applications.  Applications include Text-to-Speech Synthesis, dialogue systems, and analysis of entrainment, empathy, personality, emotion, humor and sarcasm, deception and trust, radicalization and charisma, all using text and speech information and some visual features as well.

 

Required readings:

Jurafsky & Martin 2023 (3rd edition draft) chapters

These and other readings are linked from this syllabus for each class.

Suggested:

Keith Johnson. Acoustic & Auditory Phonetics (3rd edition). Wiley.  2011.

 

Resources:

A list of resources can be found here.

 

Office Hours

Julia Hirschberg: Th 1-2:30pm

Debasmita Bhattacharya: W 3-5pm

Yu-Wen Chen: F 2-4pm

Ziwei (Sara) Gong: Tu 1-3pm

Grade Breakdown

20% weekly posts

20% HW1

30% HW2

30% HW3

 

Also please note our late policies:

For weekly posts:  Monday deadline 11:59pm; 1 late day allowed but 1 point lost

For homeworks: 3 late days allowed but 5 points lost for each late day

 

Academic Integrity

The SEAS academic integrity policy is found here.

The CS academic integrity policy is found here.

Syllabus

Note: Schedule and readings are subject to change.  Readings labeled with * are optional.

 

Date

Topic

Readings

Assignments

Week 1: 1/17

Introduction to Speech Processing

 

Week 2: 1/24

From Sounds to Language

Jurafsky & Martin Chapter 28 (Chapters 1-3)

Week 3: 1/31

Acoustics of Speech

Jurafsky & Martin Chapter 28 (sections 4-6)

Week 4: 2/7

Tools for Speech Analysis

*Praat Tutorial (just use for reference)

Watch all these Praat video tutorials here (1-7)

*Also some video tutorials on acoustics of speech here

Download the latest version of Praat

Record your own voice saying these sentences

 

HW1: Praat Recording and Analysis (assigned)

Week 5: 2/14

Analyzing Speech Prosody

ToBI Conventions

AuToBI

Prosody and Meaning

*Guidelines for ToBI Labeling

Week 6: 2/21

Text-to-Speech Synthesis (Rose Sloan, Bard College): This class will be on Zoom.

Jurafsky & Martin Chapter 16 (Introduction, sections 6, 8)

*Prosody Prediction from Syntactic, Lexical, and Word Embedding Features, *Comparing acoustic and textual representations of previous linguistic context for improving Text-to-Speech,

*Where do the improvements come from in sequence-to-sequence neural tts?

HW1 due

Week 7: 2/28

Spoken Dialogue Systems

Jurafsky & Martin Chapters 14, 15, 27

 

Week 8: 3/7

Speech Analysis: Entrainment and Empathy in Spoken Language

Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions

Nora the Empathetic Psychologist

11 Nonverbal Ways to Express Empathy And Camaraderie With Your Team

HW2 assigned

Week 9: 3/14

Spring Break: No classes

 

 

Week 10: 3/21

Speech Analysis: Emotion and Sentiment Detection (Zixiaofan Yang, Apple; Emotion Elicitation, Sara (Ziwei) Gong, Columbia): This class will be on Zoom.

Predicting Arousal and Valence from Waveforms and Spectrograms using Deep Neural Networks

Emotions and Types of Emotional Responses

 

Week 11: 3/28

Speech Recognition:   Speech model personalization and its application to dysarthric speech: the journey from research to production (Fadi Biadsy, Google)

Jurafsky & Martin Chapter 16 (Introduction, sections 1-5, 7-8)

Listen, Attend and Spell

Attention is All You Need

*Conformer: Convolution-augmented Transformer for Speech Recognition

*Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

*A Scalable Model Specialization Framework for Training and Inference using Submodels and its Application to Speech Model Personalization

HW2 due

Week 12: 4/4

Speech Analysis: Personality (Michelle Levine, Columbia) and Mental State

Predicting the Big 5 personality traits from digital footprints on social

media: A meta-analysis

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples

Speech Processing Approach for Diagnosing Dementia in an Early Stage

Week 13: 4/11

Speech Analysis: Sarcasm (Smaranda Muresan, Columbia) and Humor (Lin Ai, Columbia)

“Laughing at you or with you”: The Role of Sarcasm in Shaping the Disagreement Space

"Sure, I did the right thing": A system for sarcasm detection in speech

"Yeah, right": Sarcasm recognition for spoken dialogue systems

*Why can’t robots understand sarcasm?

Multimodal Indicators of Humor in Video

CHoRaL: Collecting Humor Reaction Labels from Millions of Social Media Users

HW3 assigned

Week 14: 4/18

Speech Analysis: Charisma, Likability and Style (Andrew Rosenberg, Google)

What Makes a Speaker Charismatic?  Producing and Perceiving Charismatic Speech

"Would You Buy A Car From Me?"-- On the Likability of Telephone Voices

Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation

 

Week 15: 4/25

Speech Analysis: Deception and Trust; Radicalization (Lin Ai, Columbia)

Acoustic-Prosodic and Lexical Cues to Deception and Trust:  Deciphering How People Detect Lies

Multimodal Deception Detection using Automatically Extracted Acoustic, Visual and Lexical Features

Identifying the Popularity and Persuasiveness of Right- and Left-learning Group Videos on Social Media

HW3 Due