Tuning F0 Extraction Ranges
Setting reasonable values for UPPERF0 and LOWERF0
for each speaker is important because anything outside of that range
wil be automatically considered as unvoiced and the f0 will not get
extracted, and the resulting voice output will sound "hoarse" as a result of
inappropriate devoicing. We have observed this especially using the
STRAIGHT vocoder. It is safe to set a very wide range and it
seems that many people do this, however it is also common advice to
not set the f0 range wider than it has to be for that speaker, so that
the f0 extraction does not mistakenly bring in f0 values from
background noise, or produce aliasing errors. We have found using BURNC data that we could not
hear much difference between a voice trained on one speaker with a
wide f0 range vs. a tighter range, most likely because that data does
not contain background noise, but there may be more of a difference
depending on your data.
Here are some relevant threads from the hts-users mailing list:
Tuning
F0 parameters
Someone
using a very wide F0 range
Another
wide F0 range and advice to adjust it
Using the Reaper F0 Extractor to Pick your F0 Range
We chose the Reaper F0 extraction tool because it does not require you
to set an initial F0 range for extraction, but you could use any f0 tool
and just start by setting a very wide range.
REAPER: Robust Epoch And
Pitch EstimatoR
Speech lab students: our reaper installation is here:
/proj/tts/tools/REAPER/build/reaper
See the --help output for how to run it on a sample of a
speaker's audio (make sure it is a decent amount) and output the f0
output in ascii format. Then, also under build, you can
plug in your f0 output filename into f0_minmax_hist.py and
run it, which will output the min and max f0
values extracted by reaper, as well as a histogram of the f0
distribution, so that you can select a sensible f0 range that excludes outliers.