See this reference page for a list of the Babel languages and their language codes.
export PATH=/proj/tts/tools/babel_scripts/build/festival/bin:$PATH
export BABELDIR=/proj/tts/data/babeldir
export ESTDIR=/proj/tts/tools/babel_scripts/build/speech_tools
export FESTVOXDIR=/proj/tts/tools/babel_scripts/build/festvox
export SPTKDIR=/proj/tts/tools/babel_scripts/build/SPTK
export BABELDIR=/proj/tts/data/babeldir
Then make sure the Babel language you want is in $BABELDIR, e.g. BABEL_BP_105 (Turkish), and if it's not, then symlink it in.
cd /proj/tts/tools/babel_scripts
mkdir turkish_omniglot
cd turkish_omniglot
Then run the following, which should all be on one line:
/proj/tts/tools/babel_scripts/make_build
setup_voice turkish_omniglot
$BABELDIR/BABEL_BP_105/conversational/reference_materials/lexicon.txt
$BABELDIR/BABEL_BP_105/conversational/training/transcription $BABELDIR/BABEL_BP_105/conversational/training/audio
Create txt.done.data file with transcripts under etc/. This is a file containing the utterance filename IDs with the transcripts. It should look something like this:
( uniph_0001 "a whole joy was reaping." )
( uniph_0002 "but they've gone south." )
( uniph_0003 "you should fetch azure mike." )
It is recommended to run this command in an emacs shell, since it appears to handle utf-8 the best and cause the fewest problems.
This step will reveal words that are not in the lexicon. Get a list of those words and use Sequitur G2P or Phonetisaurus to generate their pronunciations.
For syllabification, there are a few options:
For the stress markers (the numbers at the end of each syllable unit, currently all just 0) we are just continuing to put 0 for now. We have stress information available in the Babel lexicon, but it is not currently incorporated.
Once you have both the pronunciations and the syllabifications for the OOV words, add them to festvox/lex.scm in the proper format, anywhere in the file. Then, use this Festival command to sort them into the appropriate order and create the final lexicon:
cd yourvoicedirectory
$ESTDIR/../festival/bin/festival -b festvox/yourvoicename_phoneset.scm '(set! lex_syllabification nil)' '(lex.compile "festvox/lex.scm" "festvox/cmu_babel_lex.out")'
Once all of the OOV pronunciations have been added, re-run the build_prompts command. If you still need to fix any errors, remember to re-run until no more errors.
If you get an error "Wave files are missing. Aborting ehmm." then check the file names in txt.done.data vs. those in wav/ - something is likely missing or duplicate. We've also found that using symlinks for wavs sometimes causes this. Use full paths for symlinks, not relative paths.
Then the .utt files should be there in festival/utts.
Also, you will have to do phone mapping for Merlin as well, if any of the phoneme names are the same as any delimiters in the label file format before converting to lab, followed by lab normalization for Merlin as well.