Amharic
Questions File
The questions file for Amharic lives here:
/proj/tts/voices/merlin/misc/questions/questions_amharic_v3.hed
Babel Conversational Data
HTS baseline: /proj/tts/voices/babel/amharic
David has trained a Merlin baseline as well as many 4-hour subset
voices. These are located here:
/proj/tts/voices/merlin/egs/build_your_own_voice/s1/experiments
This data had the wav/sph issue that we encountered in a lot of the
babel data. Use amharic_sphonly
and amharic_sphtxt.
Some phonemes have > and < in their names, which will
break things further down the pipeline, so they get replaced
with GT and LT in the names. The scripts do the renaming in
some places but not others, so we had to make sure they got
changed everywhere. Thanks to David for creating a patch to
resolve this -- speech lab students can find it under /proj/tts/examples/babel_scripts to see how it was fixed, but the patch has been applied to the main script so you don't need to do anything different.
Rare phonemes in loan words had to get merged with similar phonemes to
contain enough data for HTS voice training.
The phoneme 1 was automatically marked as a
consonant in the phoneset.scm file. This results in
"novowel" for the "syllable vowel" feature when you do
utt->lab conversion. We just changed this by hand and then
re-did the utt->lab conversion (remember to select the Amharic
voice for DUMPFEATS in the Makefile).
There were still some special character phonemes that were
not brackets, that had to get mapped to HTS-compatible phoneme
names in the utt phase. These were 1
-> one, @ -> at, and #
-> wb.
Babel Read Speech
Elshadai has trained some voices with this data, but they ended up
sounding a lot worse than we expected, and we're not sure why. She
even created a hand-selected set of high-quality utterances and re-aligned, but the output from the trained
voice still sounds worse than we'd expect. This voice (along with the
scp file saying which utterances were used) is here:
/proj/tts/tools/ebiru/merlin/egs/build_your_own_voice/s1/experiments/male_scripted_cleanv1
Audio Bible
This data was collected and segmented by David. Data can be found
under /proj/tts/data/amharic/bible.
Voices can be found
under /proj/tts/voices/merlin/egs/build_your_own_voice/s1/experiments/.
- amharic_bible_sample: trained on 5 hours of data.
Omniglot and Field Support
Collected by Olivia. I don't think there was enough data by itself to
build a voice but we could potentially use for adaptation experiments.
Read Religious Texts
Data currently being prepared by Yishak.