Intelligibility Testing

Semantically-Unpredictable Sentences

We use what are called "syntactically correct but semantically unpredictable" sentences for intelligibility testing. This ensures that listeners don't have the advantage of being correctly able to guess an unintelligible word based on context.

Speech lab students: most of the time, we will be using a set of SUS that have already been generated, but if you need to generate new ones for English, use /proj/tts/examples/susgen.py.

Everyone else: you can generate SUS for English of the standard form used by NIT for the Blizzard Challenge evaluation (http://research.nii.ac.jp/src/en/NITECH-EN.html) here: https://github.com/ecooper7/SUSgen

Mechanical Turk Intelligibility HITs

On cheshire, under /var/www/amt/, create new directories, one per voice, named as whatever the next unused numbers are, and put the 12 SUS .wav files for each voice into that directory. I typically also put a README file in that directory saying which voice the audio files came from, just to keep track.

Then, open write_csv.py and edit the folders variable to contain the numbers of the folders for your new voices that you want to evaluate. Run the script, and upload the resulting .csv file to our "Transcription Data Sel New" task.

Latin Square Intelligibility HIT

Since listener bias is known to influence results, we wish to distribute the bias of each listener over each voice. Thus, we have switched from having each listener evaluate sentences spoken all by one voice, to a Latin-square setup.

To use Kai-Zhan's scripts for generating Latin-square CSV files for MTurk, please see:
cheshire.cs.columbia.edu:/var/www/macrophone/README.md

Automatic Intelligibility Evaluation using ASR APIs

IBM Watson: This API costs 2 cents per minute of audio, so please use it carefully and sparingly. See our credentials here: /proj/tts/results/watson/watson_credentials.txt. Some documentation on how to use the speech-to-text API can be found here: https://www.ibm.com/watson/developercloud/doc/speech-to-text/getting-started.html. We generally call the API through curl.
Note that our payment for Watson has expired so if we want to use it again, we will have to set up new payment.
Google: This API costs money after the first hour of audio each month, so please use it carefully and sparingly. Pricing info here.
An example of how to call the API, including our API key, can be found under /proj/tts/results/google/do_google_asr.py. [Note that APIs are constantly changing and this script is likely out of date.]
We have also observed that less intelligible speech sent to this API in languages other than English will sometimes return English transcriptions; this is because of code-switched language models.
Note that our payment for our Google API account has expired. We have not renewed it because we aren't really using the Google API anymore. If we do want to use it again in the future, we will have to set up a payment method again.
Wit.ai: under /proj/tts/results/wit.ai/, the scripts in that directory can be used to send audio to the API. wit_asr.sh can be used to send all the .wav files in a specified directory to the API. run_multi.py can be used to run wit_asr.sh on multiple voices (multiple directories) all at once. write_wit_output_csv.py formats the API results so we can compute WER on them using our scripts. Wit.ai is free to use.

Computing Word Error Rate

For evaluating Latin square HIT results from MTurk, please see:
/proj/tts/examples/wer/process_latinsquare.py