Voice Training

Voice Training Checklist

Speech lab students: please see the voice training checklist to make sure everything is in place before running voice training.

Running the Training

Once all the data is in place, run this from the top-level voice directory:

perl scripts/Training.pl scripts/Config.pm > train.log 2> err.log

This typically takes a while and is best run under a screen session. If you want to have it email you when done so you don't have to keep checking on it, you can run it like this using the mail command:

( perl scripts/Training.pl scripts/Config.pm > train.log 2> err.log ; echo "email content" | mail -s "email subject" your@email.address )

If the training fails before it is finished, you can see how far it got:

grep "Start " train.log

These steps match the ones at the end of scripts/Config.pm, so once you debug the problem, you can pick up the voice training from where it left off, rather than starting over from the beginning, by switching off the steps that already completed successfully in scripts/Config.pm by switching their 1s to 0s.

You will know the voice has trained successfully when there are .wav files in the gen/qst001/ver1/hts_engine/ directory.

Voice Output

Test synthesis utterances can be found under gen/qst001/ver1. hts_engine contains utterances synthesized using HTS-engine, and 1mix, 2mix, and stc contain synthesis using either SPTK or STRAIGHT.

The different speech waveform generation methods are as follows (from this thread):

1mix
- Distributions: Single Gaussian distributions
- Variances: Diagonal covariance matrices
- Engine: HMGenS
2mix
- Distributions: Multi-variate Gaussian distributions (number of mixture is 2).
- Variances: Diagonal covariance matrices
- Engine: HMGenS
stc
- Distributions: Single Gaussian distributions
- Variances: Semi-tied covariance matrices
- Engine: HMGenS
hts_engine
- Distributions: Single Gaussian distributions
- Variances: Diagonal covariance matrices
- Engine: hts_engine

Errors and Solutions

MKEMV
- If this doesn't output anything: this happened when I had a different Config.pm already on my path (my .cpanplus directory) that it was pulling in. This is fixed by forcing a path to the Config.pm that you want to use, e.g. ./Config.pm.
IN_RE
- ERROR [+6510] LOpen: Unable to open label file /local/users/ecooper/voices/bdc/data/cmp/h1r/cu_us_bdc_h1r_0001.lab
  You have the wrong paths in data/labels/full.mlf and/or data/labels/mono.mlf.
- ERROR [+2121] HInit: Too Few Observation Sequences [0] FATAL ERROR - Terminating program /proj/speech/tools/HTS/htk/bin/HInit Error in /proj/speech/tools/HTS/htk/bin/HInit -A -C /local2/ecooper/datasel/short15/configs/trn.cnf -D -T 1 -S /local2/ecooper/datasel/short15/data/scp/train.scp -m 1 -u tmvw -w 5000 -H /local2/ecooper/datasel/short15/models/qst001/ver1/cmp/init.mmf -M /local2/ecooper/datasel/short15/models/qst001/ver1/cmp/HInit -I /local2/ecooper/datasel/short15/data/labels/mono.mlf -l zh -o zh /local2/ecooper/datasel/short15/proto/qst001/ver1/state-5_stream-4_mgc-105_lf0-3.prt
  This happened when we were training a voice on the 15min of shortest utterances. These utterances were mostly empty and there probably weren't enough good examples for a lot of phonemes. This error basically just means there's not enough good data to train a voice. "Good" examples means examples that contain enough frames to model with a 5-state HMM -- if there are fewer than 5 frames, then that example can't be used to train a model. Adding in more examples of that phoneme won't necessarily help unless they are at least 5 frames long. cf. http://hts.sp.nitech.ac.jp/hts-users/spool/2007/msg00232.html
MMMMF
- If the training just hangs at this step and doesn't continue. Running top shows that nothing is running. Ctrl-C to kill the training shows this:
  Error in /proj/speech/tools/HTS/htk/bin/HHEd -A -B -C /local2/chevlev/datasel/F0meanMiddle15min/configs/trn.cnf -D -T 1 -p -i -d /local2/chevlev/datasel/F0meanMiddle15min/models/qst001/ver1/cmp/HRest -w /local2/chevlev/datasel/F0meanMiddle15min/models/qst001/ver1/cmp/monophone.mmf /local2/chevlev/datasel/F0meanMiddle15min/edfiles/qst001/ver1/cmp/lvf.hed /local2/chevlev/datasel/F0meanMiddle15min/data/lists/mono.list
  We never really figured out what this problem was, but we were able to successfully run the voice on a different machine with more memory, so it may have been a memory issue.
ERST0
- It just says error: probably also an out-of-memory issue.
MN2FL
- An error about not being able to allocate that much memory: We were running training on a 32bit machine. The same voice trained successfully on a 64bit machine.
- Error in /proj/speech/tools/HTS/htk/bin/HHEd -A -B -C /proj/tts/voices/fisher/overarticulated_one_2_copy/configs/trn.cnf -D -T 1 -p -i -H /proj/tts/voices/fisher/overarticulated_one_2_copy/models/qst001/ver1/cmp/monophone.mmf -w /proj/tts/voices/fisher/overarticulated_one_2_copy/models/qst001/ver1/cmp/fullcontext.mmf /proj/tts/voices/fisher/overarticulated_one_2_copy/edfiles/qst001/ver1/cmp/m2f.hed /proj/tts/voices/fisher/overarticulated_one_2_copy/data/lists/mono.list
  When you run the command by itself, it just says Killed.
  Also suspected to be an out-of-memory error because the same voice trained successfully on a machine with more memory.
ERST1
- Processing Data: f1a_0001.cmp; Label f1a_0001.lab ERROR [+7321] CreateInsts: Unknown label pau
  This happened because our full.mlf path was pointing to the mono labels.
- Error in /proj/speech/tools/HTS/htk/bin/HERest
  with the rest of the command line. Running the command by itself just terminates in Killed. Suspected out of RAM. More info / suggestions on this from the mailing list:
  http://hts.sp.nitech.ac.jp/hts-users/spool/2014/msg00146.html
- ERROR [+5010] InitSource: Cannot open source file fullcontextlabel
  Something is probably missing from one of your lists under data/lists. Make sure those were made properly / re-make if necessary. When you re-start the training, you will actually need to start from the previous step, MN2FL.
CXCL1
- ERROR [+2661] LoadQuestion: Question name RR-Vowel invalid
  I had a typo in my custom questions file -- a duplicate definition for question RR-Vowel when the second one was supposed to be RR-Vowel_R.
FALGN
MCDGV
- MCDGV: making data, labels, and scp from fae_0210.lab for GV...Cannot open No such file or directory at Training.pl line 1216, <SCP> line 210.
  Every .cmp file in train.scp needs to have a corresponding .lab file in gv/qst001/ver1/fal/. These get created during training during an alignment step. If some utterance is too difficult to align (often because of background noise or errors in the transcription), then the .lab file just won't get created. There are two possible fixes for this:
  1. Remove utterances which couldn't align from your train.scp, by checking which .lab files are missing from that directory and then just deleting the corresponding lines in train.scp. Then, pick up the training from this step ($MCDGV). Speech lab students: we have built this into the voice training recipe.
  2. Force the training to accept a bad alignment by increasing the beam width ($beam in Config.pm). You can set it to 0 to disable the beam (i.e. infinite beam). Note that wider beams will consume more memory. More info here: http://hts.sp.nitech.ac.jp/hts-users/spool/2014/msg00119.html Pick up the training from the step right before this one, $FALGN.
TMSPF
- It gets stuck here and fills up the drive, writing to mspf/stats/nat/mgc_dim0.data infinitely until your drive fills up:
  Check your questions file and make sure that all of the phonemes in your phoneset are represented there.
PGEN1
- Generating Label alice01.lab ERROR [+9935] Generator: Cannot find duration model X^x-pau+ae=l@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_2/G:0_0/H:x=x^1=10|0/I:19=12/J:79+57-10 in current list
  Your data/lists/full_all.list does not include information from this gen label. Possibly because you added in gen labels after creating full_all.list. Re-make full_all.list to include all gen labels.
CONVM
- ERROR [+7035] Failed to find macroname .SAT+dec_feat3
  Make sure that your $spkr in scripts/Config.pm is correct.
SEMIT
PGENS