Input File : 'fae_0001.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:08.49 = 135872 samples ~ 636.9 CDDA sectors
File Size : 272k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
First, make sure these are on your $PATH:
/proj/tts/hts-2.3/SPTK-3.9/installation/bin/
/proj/tts/hts-2.3/speech_tools/bin
Next, assuming your audio is already in 16k .wav format, use this command to convert to the appropriate .raw format for HTS:
ch_wave -c 0 -F 32000 -otype raw in.wav | x2x +sf | interpolate -p 2 -d | ds -s 43 | x2x +fs > out.raw
See below for converting other formats.
Errors and Solutions
sox input.wav -r 16000 output.wav
sox inputfile.sph outputfile.wav
Except that this won't work if it is the particular .sph format that uses
'shorten' compression. If that's the case, you'll see this error:
sph: unsupported coding `ulaw,embedded-shorten-v2.00'
In that case, you need to use the NIST tool sph2pipe:
sph2pipe -p [-c 1|2] infile outfile
The -p forces it to the 16k format required above.
The -c picks channel 1 or 2 if you want to separate them, e.g. for speakers.
Speech lab students: we have a copy of this
under /proj/speech/tools/sph2pipe_v2.5
Even after you do this, it seems to still retain a .sph header, so
you'll next have to use
sox infile outfile
to force convert it to regular .wav.
Also, sometimes you need to do
sph2pipe -p -f wav in.sph out.wav
again, depending on the particular version of the .sph format.
-p -- force conversion to 16-bit linear pcm
-f typ -- select alternate output header format 'typ'
five types: sph, raw, au, rif(wav), aif(mac)