Intelligibility Testing
Semantically-Unpredictable Sentences
We use what are called "syntactically correct but semantically
unpredictable" sentences for intelligibility testing. This ensures
that listeners don't have the advantage of being correctly able to guess an
unintelligible word based on context.
Speech lab students: most of the time, we will be using a set of SUS
that have already been generated, but if you need to generate new ones
for English, use /proj/tts/examples/susgen.py.
Everyone else: you can generate SUS for English of the standard form
used by NIT for the Blizzard Challenge evaluation
(http://research.nii.ac.jp/src/en/NITECH-EN.html)
here:
https://github.com/ecooper7/SUSgen
Mechanical Turk Intelligibility HITs
On cheshire, under /var/www/amt/, create new directories, one
per voice, named as whatever the next unused numbers are, and put the
12 SUS .wav files for each voice into that directory. I typically
also put a README file in that directory saying which voice the audio
files came from, just to keep track.
Then, open write_csv.py and edit the folders
variable to contain the numbers of the folders for your new voices
that you want to evaluate. Run the script, and upload the resulting
.csv file to our "Transcription Data Sel New" task.
Latin Square Intelligibility HIT
Since listener bias is known to influence results, we wish to
distribute the bias of each listener over each voice. Thus, we have
switched from having each listener evaluate sentences spoken all by
one voice, to a Latin-square setup.
To use Kai-Zhan's scripts for generating Latin-square CSV files for
MTurk, please see:
cheshire.cs.columbia.edu:/var/www/macrophone/README.md
Automatic Intelligibility Evaluation using ASR APIs
- IBM Watson: This API costs 2 cents per
minute of audio, so please use it carefully and sparingly. See our credentials
here: /proj/tts/results/watson/watson_credentials.txt.
Some documentation on how to use the speech-to-text API can be found
here:
https://www.ibm.com/watson/developercloud/doc/speech-to-text/getting-started.html.
We generally call the API through curl.
Note that our
payment for Watson has expired so if we want to use it again, we
will have to set up new payment.
- Google: This API costs money after
the first hour of audio each month, so please use it carefully
and sparingly. Pricing info
here.
An example of how to call the API, including our API key, can be found
under /proj/tts/results/google/do_google_asr.py. [Note
that APIs are constantly changing and this script is likely out
of date.]
We have also observed that less intelligible speech sent to this
API in languages other than English will sometimes return
English transcriptions; this is because of code-switched
language models.
Note that our payment for our Google API account
has expired. We have not renewed it because we aren't
really using the Google API anymore. If we do want to use it
again in the future, we will have to set up a payment method again.
- Wit.ai: under /proj/tts/results/wit.ai/,
the scripts in that directory can be used to send audio to the
API. wit_asr.sh can be used to send all the .wav files in
a specified directory to the API. run_multi.py can be used
to run wit_asr.sh on multiple voices (multiple directories)
all at once. write_wit_output_csv.py formats the API
results so we can compute WER on them using our scripts. Wit.ai is
free to use.
Computing Word Error Rate
For evaluating Latin square HIT results from MTurk, please see:
/proj/tts/examples/wer/process_latinsquare.py