Julia Hirschberg

LSA 7800-074: Text-to-Speech Synthesis, Summer 2011

Time: Tues/Fri 10:30-12:15
Place: ECCE 152

Professor Julia Hirschberg (Office Hours TBD)
julia@cs.columbia.edu

Description

Text-to-Speech synthesis (TTS) is the technology behind the speech generation found in most Spoken Dialogue Systems. The goal of TTS research is to produce speech that sounds as natural as speech a human would produce -- using only text as input. In this class, we will explore the different components of current TTS systems, including text analysis, pronunciation assignment, intonation assignment, and speech realization, and how many of these might be improved using more linguistic knowledge. We will examine existing commercial systems and develop evaluation procedures for them. Students will work in pairs to build simple TTS systems of their own from Festival TTS components.

Requirements

Students should have a basic knowledge of one scripting language (e.g. Perl, Python) or find another class member with such knowledge to partner with.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work.

Required texts:

Daniel Jurafsky and James H. Martin Speech and Language Processing (second edition). Pearson: Prentice Hall. 2009. See errata before you do each reading assignment in case there are updates. Chapter 8. Online version is here.

Other required readings are available online via links from this syllabus.

Grading:

Course Project and class participation.

Syllabus

Date	Topic	Reading Assignments	HW Due Dates and Other Assignments
July 8	Introduction and Speech Generation Overview	J&M 8 (pp. 249-50, 281-84); TTS-history; Historical examples J&M 8 pp. 281-284
July 12	Text Normalization	J&M 8.1, Sproatetal01 Building a TTS System; Black-Festival-Notes	Project assignment: Option 1 Option2 Exercise-for-all
July 15	Modeling Pronunciation	J&M 8.2; Ghoshaletal09	.
July 19	Prosody Modeling	Hirschberg03; J&M 8.3.0-8.3.6;ToBI labeling conventions
July 22	Predicting Prosody from Text	8.3.7
July 26	Information Status: Focus and Given/New	GBrown83, Prince92, Terken&Hirschberg93
July 29	Backend Synthesis	J&M &M 8.4-5, 8.6 Tokuda35al02
Aug 2	TTS Evaluation	J&M 8.6	Projects Due

Links to Resources

cf. also resources available from the text homepage

Places to look up definitions and descriptions of terminology:

Other resources

Karen Chung Language and Linguistics links
CatSpeak
On-line dictionaries in many languages.
Festival speech synthesizer demo and links to other TTS systems
Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial

Julia Hirshberg Portrait

Julia Hirschberg
Professor, Computer Science

Columbia University
Department of Computer Science
1214 Amsterdam Avenue
M/C 0401
450 CS Building
New York, NY 10027

email: julia@cs.columbia.edu
phone: (212) 939-7114

Download CV