Merlin Instructions and Troubleshooting
Before You Get Started
- Install Merlin from github:
https://github.com/CSTR-Edinburgh/merlin
Do not install any of the dependencies from pip. The dependencies are
already installed on Speech Lab machines kata, paka, muur, and
iring. If you install your own versions, they will be the wrong
versions and Merlin will not work. Please only run the compilation
step from the installation instructions (compile_tools.sh).
- Only one process may be running on a given GPU at a time.
Please check that no one is using the GPU on the machine you are
working on. You can check by
running merlin/src/gpu_lock.py. You can also see if anyone
is logged into the machine (and possibly about to start using the
GPU) by running who. If someone is already
using the GPU, then your process will run on CPU and take much
longer. Other speech lab students may also be using the GPU but not
with Merlin, so in general, always check who to see if anyone else
is using the machine.
- Always make sure your GPU lock has been released when you are done.
If your voice trains successfully, then the lock should get released
automatically. If you kill the process manually or if it errors
out, then it is possible that the lock will not get released. Check
and release manually by using gpu_lock.py.
Speech Lab GPU Machines
Our GPU machines in lab are kata, paka, muur, iring, and hecate.
- If you want to use hecate, you will need an account on the
machine. Please ask Rose if you need one.
- If you are using paka, muur, or iring,
your recipename/scripts/submit.sh should have:
gpu$gpu_id
- If you are using kata,
your recipename/scripts/submit.sh should have:
cuda$gpu_id
This is because kata has a newer version of the GPU drivers than the
other machines.
Merlin Instructions:
- Check that the GPU on the machine you are using is currently
free, by running python merlin/src/gpu_lock.py
- Navigate to egs/build_your_own_voice/s1
- Run ./01_setup.sh voice_name
- Navigate to the experiments/voice_name directory. You should see
directories labeled duration_model and acoustic_model.
- Go to duration_model/data, create a text file with the
filenames of every utterance you want to train on, one per line with
no file extensions. Name this file file_id_list.scp. (If you're
training on a subset, you can probably get this file by copying over
a file we've already made of filenames. If you're training on the
whole corpus, you can use the command "ls [directory containing
label files]| sed 's/.\{4\}$//' > file_id_list.scp".)
- Using the script
merlin/misc/scripts/frontend/utils/normalize_lab_for_merlin.py,
normalize the label files and create a directory of labels inside
the data directory named label_phone_align. It takes as command line
arguments the input directory of label files, the output directory,
the label style (which will be phone_align), and the text file with
the filenames.
- Copy the label_phone_align directory and the file_id_list.scp text file to acoustic_model/data.
- Run ./03_prepare_acoustic_features.sh [path_to_wav_dir] [path_to_feat_dir] in merlin/egs/build_your_own_voice/s1
- Go to experiments/[voice_name]/test_synthesis, add your own test files and add the names to test_id_list.scp.
- Then, make a directory within test_synthesis named prompt-lab
containing normalized label files for your test utterances. Because
Merlin's normalization script requires timestamps and our test label
files don't have them, first use the Python script
/proj/tts/examples/addtimestamps.py, which takes the input directory
of the label files, the output directory for the label files, and
the text file with list of filenames as command line arguments. Once
you've output those label files, use the same normalization script
you used to set up your training data.
- Return to the s1 directory and open up the file
conf/global_settings.cfg and edit the Train, Valid, and Test values
to be the sizes of your training, validation, and test sets. (You
can check the entire size of your training corpus by using the wc
command on the file_id_list.scp file you've created. I
generally follow the demos and make the test and validation sets
each 1/10 the size of the training set—that is, 5/6 of the training
corpus is training, 1/12 is validation, and 1/12 is test.). You will
also need to edit QuestionFile to point to the question file
associated with your language. Question files in Merlin are located
in /misc/questions.
- Also in global_settings.cfg, change label style to your own setting.
- Run 04-07 in merlin/egs/build_your_own_voice/s1.
- Synthesized wav files can be found in
experiments/[your_voice]/acoustic_model/gen for the validation and
test portions of your training corpus and in
experiments/[your_voice]/test_synthesis/wav.
Things to check if you get an error:
- Are any of your data files (either label files or acoustic
features) empty? The latter can be caused by the feature extraction
script throwing errors (even if the wav files aren't empty). In
these cases, you can either try regenerating the relevant file or
simply remove that utterance from the file id list.
- Are the values for the number of utterances correct in
conf/global_settings.cfg? Double check this, especially if you've
removed utterances due to other problems.
- Are your labels normalized for Merlin? If you're getting a lot of
"silence not found" warnings, you may have forgotten to run the
normalization script on your labels (since Merlin indicates silence
differently than in HTS). The quick way to check this is to open up your
label file and check if silence is indicated using "sil" (correct
for Merlin) or "pau" (needs to be normalized).
- Are you using the most recent version of the label files? If you're
getting unintelligible voices, you may be using an old version of
the label files and are missing a feature. Open up a label file and
check what the last feature before "/C:" is in a random line. If
it's a vowel, you're good to go. If it's 0, regenerate the labels
using the latest version of festival.
- Are your label files aligned properly? If you're getting errors about
mismatched numbers of frames, you probably are using label files
generated from either misaligned utts or an older alignment
script. Again, regenerate the labels (using the latest version of
festival and ehmm alignment).
- Is validation loss increasing on every iteration after the first 5?
Merlin doesn't output the model during the first 5 rounds of
training, and it doesn't update the model if validation loss
increases, so in some rare cases, if validation loss increases every
time after the first 5 iterations, the acoustic model will finish
training without ever outputting a model. If this is the case, you
can edit the source code so that it's forced to output a model
anyways. (The easiest way to do this is probably to edit the code so
it saves the best model even in the first 5 iterations, which you
can modify around line 325 in run_merlin.py.) However, if validation
loss is increasing that much, it's likely the voice won't be very
good anyways, so you should consider going back and checking for
errors in your data instead.
- If you get a MemoryError during acoustic model
training, you can set the buffer size smaller
in conf/acoustic_voicename.conf. However, don't set
it too low, or you will get a ValueError: could not
broadcast input array instead. I have found that a buffer
size of 10000 avoids both errors, but it may depend on your data.
Miscellaneous Tips:
- If you need to free up some space, the following can be deleted
from the voice and the voice will still be able to synthesize new
utterances, under experiments/voicename/[acoustic,duration]_model/inter_module:
- binary_label_425
- nn_mgc_lf0_vuv_bap_187
- nn_norm_mgc_lf0_vuv_bap_187
- nn_no_silence_lab_425
- nn_no_silence_lab_norm_425
- Does your voice just sound a lot worse than you'd expect, given
your data? Check your label files. Is the "syllable vowel" feature
correct? If not, then Festival didn't know which phonemes in your
phoneset were vowels. Are there symbols in your phoneset that are
also delimiter symbols in the label file format? This will break
things. Also make sure that the phoneset you are using in your
training labels matches the phoneset used in the test synthesis labels.
- If you are running the slt_arctic demo and you see an
error about pygpu, then go into this
file: merlin/egs/slt_arctic/s1/scripts/submit.sh and change
this line:
THEANO_FLAGS="mode=FAST_RUN,device=cuda$gpu_id,"$MERLIN_THEANO_FLAGS
Change cuda$gpu_id to gpu$gpu_id and
the error should no longer show.
Adding In Phrasing:
If you have information about where the phrase breaks are in your file, you can do the following to train your voice to incorporate this phrasing.
- Add an extra feature to each line of each of your label files. Add "/K:" to the end of each line, followed by "B" if the line corresponds to a phone in the last word before a phrase break or "NB" if it does not.
- Add the following line to your questions file: "QS "Word_Brk" {/K:B}"
Last updated 10/31/2018 by ecooper
Speech lab students: to edit this page, go to /proj/speech/html/merlin.html