Voice Training
Voice Training Checklist
Speech lab students: please see the voice
training checklist to make sure everything is in place before
running voice training.
Running the Training
Once all the data is in place, run this from the top-level voice
directory:
perl scripts/Training.pl scripts/Config.pm > train.log 2>
err.log
This typically takes a while and is best run under a screen session.
If you want to have it email you when done so you don't have to keep
checking on it, you can run it like this using the mail command:
( perl scripts/Training.pl scripts/Config.pm > train.log 2>
err.log ; echo "email content" | mail -s "email subject"
your@email.address )
If the training fails before it is finished, you can see how far it
got:
grep "Start " train.log
These steps match the ones at the end of scripts/Config.pm,
so once you debug the problem, you can pick up the voice training from
where it left off, rather than starting over from the beginning, by
switching off the steps that already completed successfully
in scripts/Config.pm by switching their 1s
to 0s.
You will know the voice has trained successfully when there are .wav
files in the gen/qst001/ver1/hts_engine/ directory.
Voice Output
Test synthesis utterances can be found
under gen/qst001/ver1. hts_engine contains
utterances synthesized using HTS-engine,
and 1mix, 2mix, and stc contain synthesis
using either SPTK or STRAIGHT.
The different speech waveform generation methods are as follows (from
this
thread):
- 1mix
- Distributions: Single Gaussian distributions
- Variances: Diagonal covariance matrices
- Engine: HMGenS
- 2mix
- Distributions: Multi-variate Gaussian distributions (number of
mixture is 2).
- Variances: Diagonal covariance matrices
- Engine: HMGenS
- stc
- Distributions: Single Gaussian distributions
- Variances: Semi-tied covariance matrices
- Engine: HMGenS
- hts_engine
- Distributions: Single Gaussian distributions
- Variances: Diagonal covariance matrices
- Engine: hts_engine
Errors and Solutions
- MKEMV
- If this doesn't output anything: this happened when I had a
different Config.pm already on my path
(my .cpanplus directory) that it was pulling in. This is
fixed by forcing a path to the Config.pm that you want to
use, e.g. ./Config.pm.
- IN_RE
- ERROR [+6510] LOpen: Unable to open label file
/local/users/ecooper/voices/bdc/data/cmp/h1r/cu_us_bdc_h1r_0001.lab
You have the wrong paths in data/labels/full.mlf and/or data/labels/mono.mlf.
- ERROR [+2121] HInit: Too Few Observation Sequences
[0]
FATAL ERROR - Terminating program
/proj/speech/tools/HTS/htk/bin/HInit
Error in /proj/speech/tools/HTS/htk/bin/HInit -A -C
/local2/ecooper/datasel/short15/configs/trn.cnf -D -T 1 -S
/local2/ecooper/datasel/short15/data/scp/train.scp -m 1 -u tmvw -w
5000 -H
/local2/ecooper/datasel/short15/models/qst001/ver1/cmp/init.mmf -M
/local2/ecooper/datasel/short15/models/qst001/ver1/cmp/HInit -I
/local2/ecooper/datasel/short15/data/labels/mono.mlf -l zh -o zh
/local2/ecooper/datasel/short15/proto/qst001/ver1/state-5_stream-4_mgc-105_lf0-3.prt
This happened when we were training a voice on the 15min of
shortest utterances. These utterances were mostly empty and there
probably weren't enough good examples for a lot of phonemes. This
error basically just means there's not enough good data to train a
voice.
"Good" examples means examples that contain enough frames to model
with a 5-state HMM -- if there are fewer than 5 frames, then that
example can't be used to train a model. Adding in more examples
of that phoneme won't necessarily help unless they are at least 5
frames long. cf. http://hts.sp.nitech.ac.jp/hts-users/spool/2007/msg00232.html
- MMMMF
- If the training just hangs at this step and doesn't continue.
Running top shows that nothing is
running. Ctrl-C to kill the training shows this:
Error in /proj/speech/tools/HTS/htk/bin/HHEd -A -B -C
/local2/chevlev/datasel/F0meanMiddle15min/configs/trn.cnf -D -T 1
-p -i -d
/local2/chevlev/datasel/F0meanMiddle15min/models/qst001/ver1/cmp/HRest
-w
/local2/chevlev/datasel/F0meanMiddle15min/models/qst001/ver1/cmp/monophone.mmf
/local2/chevlev/datasel/F0meanMiddle15min/edfiles/qst001/ver1/cmp/lvf.hed
/local2/chevlev/datasel/F0meanMiddle15min/data/lists/mono.list
We never really figured out what this problem was, but we were
able to successfully run the voice on a different machine with
more memory, so it may have been a memory issue.
- ERST0
- It just says error: probably also an out-of-memory issue.
- MN2FL
- An error about not being able to allocate that much memory:
We were running training on a 32bit machine. The same voice
trained successfully on a 64bit machine.
- Error in /proj/speech/tools/HTS/htk/bin/HHEd -A -B -C
/proj/tts/voices/fisher/overarticulated_one_2_copy/configs/trn.cnf
-D -T 1 -p -i -H
/proj/tts/voices/fisher/overarticulated_one_2_copy/models/qst001/ver1/cmp/monophone.mmf
-w
/proj/tts/voices/fisher/overarticulated_one_2_copy/models/qst001/ver1/cmp/fullcontext.mmf
/proj/tts/voices/fisher/overarticulated_one_2_copy/edfiles/qst001/ver1/cmp/m2f.hed
/proj/tts/voices/fisher/overarticulated_one_2_copy/data/lists/mono.list
When you run the command by itself, it just
says Killed.
Also suspected to be an out-of-memory error because the same voice
trained successfully on a machine with more memory.
- ERST1
- Processing Data: f1a_0001.cmp; Label f1a_0001.lab
ERROR [+7321] CreateInsts: Unknown label pau
This happened because our full.mlf path was pointing to the
mono labels.
- Error in /proj/speech/tools/HTS/htk/bin/HERest
with the rest of the command line. Running the command by itself
just terminates in Killed. Suspected out of RAM. More
info / suggestions on this from the mailing list:
http://hts.sp.nitech.ac.jp/hts-users/spool/2014/msg00146.html
- ERROR [+5010] InitSource: Cannot open source file
fullcontextlabel
Something is probably missing from one of your lists
under data/lists. Make sure those were made properly /
re-make if necessary. When you re-start the training, you will
actually need to start from the previous step, MN2FL.
- CXCL1
- ERROR [+2661] LoadQuestion: Question name RR-Vowel invalid
I had a typo in my custom questions file -- a duplicate
definition for question RR-Vowel when the second one
was supposed to be RR-Vowel_R.
- FALGN
- MCDGV
- MCDGV: making data, labels, and scp from fae_0210.lab for
GV...Cannot open
No such file or directory at Training.pl line 1216,
<SCP> line 210.
Every .cmp file in train.scp needs to have a
corresponding .lab file in gv/qst001/ver1/fal/.
These get created during training during an alignment step. If
some utterance is too difficult to align (often because of
background noise or errors in the transcription), then
the .lab file just won't get created. There are two
possible fixes for this:
- Remove utterances which couldn't align from
your train.scp, by checking which .lab files
are missing from that directory and then just deleting the
corresponding lines in train.scp. Then, pick up the
training from this step ($MCDGV). Speech lab students: we have built
this into the voice training recipe.
- Force the training to accept a bad alignment by increasing
the beam width ($beam in Config.pm). You can
set it to 0 to disable the beam (i.e. infinite beam). Note
that wider beams will consume more memory. More info here:
http://hts.sp.nitech.ac.jp/hts-users/spool/2014/msg00119.html
Pick up the training from the step right before this one, $FALGN.
- TMSPF
- It gets stuck here and fills up the drive, writing
to mspf/stats/nat/mgc_dim0.data infinitely until your
drive fills up:
Check your questions file and make sure that
all of the phonemes in your phoneset are represented there.
- PGEN1
- Generating Label alice01.lab
ERROR [+9935] Generator: Cannot find duration model
X^x-pau+ae=l@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:1+1+2/D:0_0/E:x+x@x+x&x+x#x+x/F:content_2/G:0_0/H:x=x^1=10|0/I:19=12/J:79+57-10
in current list
Your data/lists/full_all.list does not include
information from this gen label. Possibly because you added in gen
labels after creating full_all.list. Re-make full_all.list to include
all gen labels.
- CONVM
- ERROR [+7035] Failed to find macroname .SAT+dec_feat3
Make sure that your $spkr in scripts/Config.pm is correct.
- SEMIT
- PGENS