Pairwise Naturalness HITs
We have a pairwise preference HIT on Mechanical Turk for comparing two
voices. Workers can compare as many or as few utterances as they
want. We get 5 ratings per pair, and typically post 12 sentences per
voice. Voices are presented A/B or B/A randomly, to factor out
order effects. Our HIT is a forced choice, that is, there is no "no
preference" option. Once HITs are completed, we can evaluate for
significance by doing a z-test and computing a two-tailed p-value. Instructions below are for
Speech Lab students.
Posting HITs
This work must be done on cheshire, under /var/www.
cp -r amt_EMPTY amt##
cd amt##/scripts
Where ## is the next number that hasn't been used yet. Then change the following variables in setup.py:
- run_id
- run_name
- path
- theVoices
Then alter make_linked_dirs.sh if necessary -- it assumes
that your test utterances are in a directory structure
like voicename/nat/hts_engine/, which may not always
be the case.
Then run setup.py.
Then get the input CSV from docs/upload/[name].csv
Then upload it to our "Naturalness HIT Pairwise Batch" task and post them.
Evaluating Results
Download the results CSV file and put it
under amt##/docs/download and rename
to batch_results.csv.
Then in scripts, run processResults.py.