Due: Monday, May 5, 2008, by 2:40pm.
In this homework, you are going to design and build your own speech understanding system for reserving train tickets given a grammatical and formal English utterance (one sentence only) that indicates the departure and destination cities, and the departure day and time.
The system will consist of two main components:
a) An Automatic Speech Recognizer (ASR): we provide you with a script that builds the ASR using HTK (HMM toolkit). The ASR acoustic models will be trained on TIMIT, BDC, and Columbia Games corpora. The input to this component is a wav file (audio format: mono, sampling rate: 8Khz), and the output will be the automatic transcript in mlf file format (see an example below)
b) An Understanding Component: The input to this component is the ASR transcript in (a), and the output will be a table that contains the following concepts which will be extracted automatically from the ASR transcript.
1. Departure city:
2. Destination city: (same as above)
3. Departure day: Sunday, Monday, …, Saturday
4. Departure Time: Morning, Noon, Afternoon, Evening, Night, Anytime
Here are two examples. Given the following utterances:
1) I would like a ticket from
The output of your system should be:
Departure city |
|
Destination |
|
Day |
Friday |
Time |
Morning |
2) I need to go to
The output of your system should be:
Departure city |
|
Destination |
|
Day |
Monday |
Time |
Evening |
You are required to create a grammar that covers as many ways as you can think of to make your system very flexible, but your grammar must be limited, otherwise the perplexity of the ASR would be very high (which would result with a high word error rate). Part of the homework is to determine what to cover and how much (Precision vs. Recall).
Here is an example of a grammar that covers the above two examples.
I. ASR component:
You should run the following commands to build your speech recognizer.
1. cd
/proj/speech/users/cs4706/asrhw
2. mkdir USERNAME (e.g., fb2175)
3.
cd USERNAME
#the
following command will take a lot time (around 2 hours). While it’s running
please read chapter 1,2,3 from the HTK toolkit book
to have a general idea about what the script is going to do. You can find the
details about the steps in chapter 3.
4.
/proj/speech/users/cs4706/tools/htk/htk/asr/train-asr.sh USERNAME
# At this point you have your speech recognizer ready. The
acoustic models (monophones and triphones)
are trained on TIMIT, BDC, and games corpora.
Test your ASR:
1. mkdir
/proj/speech/users/cs4706/asrhw/USERNAME/test/
2.
Record two wave files (8Khz and mono) in praat that contains the two above utterances. Best performance
is: leave ~1 second of silence at the beginning and ~1 second at the end of the
file (while recording). Call your files test1.wav, and test2.wav, save the
files into /proj/speech/users/cs4706/asrhw/USERNAME/test/
Save
the grammar from here to gram (not gram.txt) in /proj/speech/users/cs4706/asrhw/USERNAME
3.
cd /proj/speech/users/cs4706/asrhw/USERNAME
4. /proj/speech/users/cs4706/tools/htk/htk/asr/recognizePath.sh USERNAME ./test
#the
above script (in 4) takes as an argument a path and runs the recognizer on all
the wave files in this path. Feel free to copy and change it so it will take a
filename (recommended)
5. more out.mlf
#to see the output of the recognizer
Now, you have a speech recognizer that takes a speech wave file (or a folder that contains your speech files) and generates the transcript in an mlf file format (example)
II. Write a program (java is preferable) that takes an mlf file and generates the concept table (see example 1 and 2, above). Add a tab between the field name and value and new line after each pair.
Example:
java –jar extractConcepts.jar
out.mlf
III. Write a shell script that takes an input a wav file and generates the concept table:
Example:
RecognizeConcepts.sh ~/test/test2.wav
Output example:
Departure city: |
|
Destination: |
|
Day: |
Friday |
Time: |
Morning |
You will be graded based on concept error rate + task accomplishment
(similar to
Summation: (50%)
1) Your program (in II) extractConcepts.jar (extractConcepts.java, or extractConcepts.pl, ...) (15 points)
2) Your shell script (recognizeConcepts.sh) (15 points)
3) make.sh (that compiles your code) (10 points)
4) readme.txt: (10 points)
i. how to run your progtams
ii. two examples to run part II and III
*** Upload these files in one zip file
USERNAME.zip (e.g., fb2175.zip) to courseworks
(50%) quality of your system