LSA 7800-074: Text-to-Speech Synthesis, Summer 2011
Time:
Tues/Fri 10:30-12:15
Place: ECCE 152
Professor
Julia Hirschberg (Office Hours
TBD)
julia@cs.columbia.edu
Announcements |
Academic
Integrity |
Description
Readings |
Resources |
Requirements |
Syllabus
Description
Text-to-Speech synthesis (TTS) is the technology behind the speech generation found in most Spoken Dialogue Systems. The goal of TTS research is to produce speech that sounds as natural as speech a human would produce -- using only text as input. In this class, we will explore the different components of current TTS systems, including text analysis, pronunciation assignment, intonation assignment, and speech realization, and how many of these might be improved using more linguistic knowledge. We will examine existing commercial systems and develop evaluation procedures for them. Students will work in pairs to build simple TTS systems of their own from Festival TTS components.
Requirements
Students
should have a basic knowledge of one scripting language (e.g. Perl, Python) or
find another class member with such knowledge to partner with.
Academic Integrity
Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work.
Required texts:
Daniel Jurafsky and James H. Martin Speech and Language Processing (second edition). Pearson: Prentice Hall. 2009. See errata before you do each reading assignment in case there are updates. Chapter 8. Online version is here.
Other required readings
are available online via links from this syllabus.
Grading:
Course Project and class participation.
Syllabus
Date |
Topic |
Reading Assignments |
HW Due Dates and Other Assignments |
July 8 |
J&M 8 (pp. 249-50, 281-84); TTS-history; Historical examples J&M 8 pp. 281-284 |
|
|
July 12 |
J&M 8.1, Sproatetal01 |
||
July 15 |
J&M 8.2; Ghoshaletal09 |
. |
|
|
Hirschberg03; J&M 8.3.0-8.3.6;ToBI labeling conventions |
||
July 22 | Predicting Prosody from Text | 8.3.7 | |
July 26 |
|
||
July 29 |
J&M &M 8.4-5, 8.6
Tokuda35al02 |
|
|
Aug 2 |
J&M 8.6 |
Projects Due |
Links to Resources
cf. also resources available from the text homepage
Places to look up definitions and descriptions of terminology:
Other resources
- Karen Chung Language and Linguistics links
- CatSpeak
- On-line dictionaries in many languages.
- Festival speech synthesizer demo and links to other TTS systems
- Julia Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial
Julia Hirschberg
Professor, Computer Science
Columbia University
Department of Computer Science
1214 Amsterdam Avenue
M/C 0401
450 CS Building
New York, NY 10027
email: julia@cs.columbia.edu
phone: (212) 939-7114