Submission procedure explained below.
In this assignment, you are going to build your own limited-domain text-to-speech system. This is a complex task, so it is divided into three simpler subtasks.IMPORTANT: This part of the
assignment cannot be done remotely. You have to do it on one of the Linux
computers in the
Speech Lab, for which you need to
sign-up
first.
Note: It might also be possible to do this locally on one of the Linux computers in the CLIC Lab. However, we have not tested the software there, and we cannot guarantee that it will work. So if you do this you will be on your own."
|
Add these lines to your ~/.bashrc file:
export PATH=/proj/speech/tools/festival/festival/bin:$PATH
export PATH=/proj/speech/tools/festival/speech_tools/bin:$PATH
export FESTVOXDIR=/proj/speech/tools/festival/festvox
export ESTDIR=/proj/speech/tools/festival/speech_tools
Remember that you have to log out and back in for the changes to the ~/.profile file to take effect. You can check if the changes are in effect by running echo $FESTVOXDIR. The value specified above should be displayed.
NB: Everyone must do Part A. Teams will turn in all 2-3 talking clocks with the team submission.
Log on locally to a Linux computer in the Speech Lab. Open a Terminal window (Applications » Accessories » Terminal) and run the following commands, replacing USERNAME with the user name of your CS account (e.g. fb2175).
In http://festvox.org/bsv/x1003.html you can find a detailed explanation of each step.
Step | Commands | Comments |
---|---|---|
1 | mkdir /proj/speech/users/cs4706/USERNAME cd /proj/speech/users/cs4706/USERNAME mkdir time cd time |
Create a directory and cd into it. |
2 | $FESTVOXDIR/src/ldom/setup_ldom SLP time xyz | Setup dir |
At this point, take a look at these two files: • etc/time.data, which contains a set of utterances that should cover all the possible variations in the domain; • festvox/SLP_time_xyz.scm, which defines several functions (in Scheme) to convert a time like "07:57" into an utterance like "The time is now, a little after five to eight, in the morning". In order to build a new limited domain it is necessary to rewrite these files. For this part of the homework you do not need to edit these files, but you will in Parts B and C. |
||
3 | festival -b festvox/build_ldom.scm '(build_prompts "etc/time.data")' | Generate prompts |
4 | bin/prompt_them etc/time.data | Record prompts. You need a microphone for this step. You will be asked to read out 24 prompts. Read the recording tips before starting. |
5 | bin/make_labs prompt-wav/*.wav | Autolabel prompts |
6 | festival -b festvox/build_ldom.scm '(build_utts "etc/time.data")' | Build utterances |
7 | cp etc/time.data etc/txt.done.data | |
8 | bin/make_pm_wave wav/*.wav bin/make_pm_fix pm/*.pm |
Extract pitchmarks & fix them |
9 | bin/simple_powernormalize wav/*.wav | Power normalization |
10 | bin/make_mcep wav/*.wav | MCEP vectors |
11 | festival -b festvox/build_ldom.scm '(build_clunits "etc/time.data")' | Build LDOM Synthesizer |
12 | festival festvox/SLP_time_xyz_ldom.scm '(voice_SLP_time_xyz_ldom)' | Run your synthesizer |
13 | (saytime) (saythistime "07:57") (saythistime "14:22") |
Once in Festival, use these commands to make your synthesizer say
the time. Use CTRL+D to exit Festival. |
cd /proj/speech/users/cs4706/USERNAME/time
festival
(load "festvox/SLP_time_xyz_ldom.scm")
(voice_SLP_time_xyz_ldom)
(Parameter.set 'Audio_Method 'Audio_Command)
(Parameter.set 'Audio_Required_Rate 16000)
(Parameter.set 'Audio_Required_Format 'wav)
(Parameter.set 'Audio_Command "cp $FILE time1.wav")
(saytime)
(Parameter.set 'Audio_Command "cp $FILE time2.wav")
(saythistime "07:57")
(Parameter.set 'Audio_Command "cp $FILE time3.wav")
(saythistime "14:22")
IMPORTANT: All steps in Part B can be done remotely. |
In Part A, you built a simple talking-clock TTS system. Now, you will build a TTS system for the Spoken Dialogue System application you have chosen. And, instead of recording it using a neutral voice, you may want to choose a particular style or personality that you think is most appropriate for your domain and application.
Define as formally as possible what the input and output of your TTS system is going to look like.
The time is now, EXACTNESS MINUTE INFO(, in the DAYPART).
where:
EXACTNESS = {exactly, just after, a little after, almost}
MINUTE = {-, five past, ten past, quarter past, twenty past, twenty-five
past, half past, twenty-five to, twenty to, quarter to, ten to, five to}
INFO = {one, two, three, four, five, six, seven, eight, nine, ten, eleven,
twelve, midnight}
DAYPART = {morning, afternoon, evening}
Every time the talking clock receives an input like 07:57, it will translate it into a sentence like The time is now, a little after five to eight, in the morning, and later synthesize it.
This TTS has four degrees of freedom (EXACTNESS, MINUTE, INFO, DAYPART). The number of possible sentences is approximately 4 x 12 x 12 x 2 = 1152.
Your limited domain must have at least five degrees of freedom, and you have to provide an estimate of the number of possible sentences that could be generated.
Log on to your CS account (NB: /proj/speech/ is only available from the following machines: chat, felix, fluffy, veu, voce, voix, and lincoln) and run the following commands, where USERNAME is the user name of your CS account (e.g. fb2175), and TOPIC is a string such as 'number', 'weather', 'street', etc.:
Step | Commands | Comments |
---|---|---|
1 | mkdir /proj/speech/users/cs4706/USERNAME/partc |
Create a directory for part c. |
2 | cd /proj/speech/users/cs4706/USERNAME/partc $FESTVOXDIR/src/ldom/setup_ldom SLP TOPIC xyz |
Setup the directory for part c. |
Next, you need to design the prompts for your TTS system. As you saw in Part A, the talking clock uses the prompts in time/etc/time.data:
( time0001 "The time is now, exactly five past one, in the morning." )Now, you have to create a similar file for your domain, and save it as partc/etc/TOPIC.data. NOTE: The spaces after '(' and before ')' in each line are critical.
For an explanation on how to design the prompts, go to http://www.festvox.org/bsv/c941.html#AEN952.
IMPORTANT: Section C.2 of this part of the work cannot be done remotely. You have to do it on one of the Linux computers in the Speech Lab, for which you need to sign-up first. |
In Part B, you started building your limited-domain TTS system. You defined its input and output, designed the set of prompts you will record.
Now, in order to complete your TTS system you need to:
a) record the set of prompts (section C.2);
b) write a script that transforms an input string into an English sentence, and
sends it to Festival to synthesize it (section C.3).
Log on locally to a Linux computer in the Speech Lab. Open a Terminal window (Applications » Accessories » Terminal) and run the following commands, replacing USERNAME with the user name of your CS account (e.g. fb2175), and TOPIC with a string such as 'number', 'weather', 'street', etc.
Before starting, read the tips for part C, which you may find useful.
Step | Commands | Comments |
---|---|---|
1 | cd /proj/speech/users/cs4706/USERNAME/partc | |
2 | The file etc/TOPIC.data should contain the prompts you designed. Make sure that its syntax is correct. | |
3 | festival -b festvox/build_ldom.scm '(build_prompts "etc/TOPIC.data")' | Generate prompts |
4 | bin/prompt_them etc/TOPIC.data | Record prompts. You need a microphone for this step. You will be asked to read out, one by one, the prompts you designed. Before starting, review the recording tips. |
5 | bin/make_labs prompt-wav/*.wav | Autolabel prompts |
6 | festival -b festvox/build_ldom.scm '(build_utts "etc/TOPIC.data")' | Build utterances |
7 | cp etc/TOPIC.data etc/txt.done.data | |
8 | bin/make_pm_wave wav/*.wav bin/make_pm_fix pm/*.pm |
Extract pitchmarks & fix them |
9 | bin/simple_powernormalize wav/*.wav | Power normalization |
10 | bin/make_mcep wav/*.wav | MCEP vectors |
11 | festival -b festvox/build_ldom.scm '(build_clunits "etc/TOPIC.data")' | Build LDOM Synthesizer |
12 | festival festvox/SLP_TOPIC_xyz_ldom.scm '(voice_SLP_TOPIC_xyz_ldom)' | Run Festival (use CTRL+D to exit). |
13 | Now, you can make your synthesizer say sentences in your
domain by hand. For example, in the time domain you could
do that by running: (SayText "The time is now, a
little after twenty past two, in the afternoon.") Warning: If you use words that do not belong to your domain, the
synthesizer will default to a Festival voice. |
Now, you need to write a script that receives an input string as you defined
in Part B, and transforms it into an English sentence that your TTS system can
synthesize. For example, in the time domain the script would transform a
string like "14:22" into a sentence like "The time is now, a little after twenty
past two, in the afternoon." A simple (and perhaps long) case
or switch statement should be enough to achieve this.
You must test your script properly before submission (in one of the speech-lab
machines). If you sibmit a not working script, you will lose a lot of points,
and we will not debug your code.
Use this Perl script as a basis. Update the variables $USERNAME and $TOPIC, complete the code where marked, and define the function generate_sentence, which does the input-output transformation. Please comment your code thoroughly.
The rest of the code in this script creates a temporary Festival script and runs it. That temporary Festival script loads your limited domain and creates a wav file with the resulting synthesis (you did exactly the same thing by hand in Part A for the time domain). You should not need to modify any of this.
Note: It is also possible to do the transformation part of the script (input string to English sentence) using the Scheme programming language, and later import your script into Festival. If you want to do it that way, please check with the TA first.
1. For Part A, save each team member's three wave files: time1.wav, time2.wav, and time3.wav (see part A) in a subfolder named <YourUni-parta>. All 2 or 3 of these subfolders should be put into a subfolder named parta. So you should have a folder names parta which contains subfolders for each team members own clock.
2. For Part B, save the following files in partb subfolder:
Now, compress (in zip format only) the main folder to YourUni1-YourUni2-YourUni3-PROJ1.zip (e.g., fb2175-myy435-uma2129-PROJ1.zip). I.e., the file name should include the unis of all team members in alphabetical order. Submit this zip file in Courseworks. |