Columbia Games Corpus » ToBI Labeling Guidelines
» 0. Files and Links
» 1. Getting started
» 2. Getting the annotation files
» 3. Correcting common mistakes in the .words files
» 4. Guidelines and examples
» 5. Labeling ambiguities
» 6. How to save and submit your work
« Back to the main page
0. Files and Links
ToBI Documentation
1. Getting started
- Go to the File Locking page.
- Look for the wav file you want to label and check the following:
its "Orthographic Transcription" and "Word Alignment" tasks must have "finished" status,
and its "ToBI Labeling" task must have "not started" status and no person assigned to it.
If these conditions are not met, do not work on this file.
- Click "EDIT" in the "ToBI Labeling" task of the wav file you want to label.
- On the next screen, change its status to "in progress", and enter your name.
2. Getting the annotation files
- In the "ToBI Labeling" column of the File Locking page,
you will notice that under the status label there are
links to the several files.
WaveSurfer
- Save a copy on your computer of the ".wav", ".words",
".breaks" and ".misc" files
(right-click on each of them and choose Save target as...,
Save link as..., or similar).
Important: Do not change the name of these files,
since they must match exactly the name of the .wav file.
Example: "s09.objects.1.A.wav", "s09.objects.1.A.words",
"s09.objects.1.A.breaks", "s09.objects.1.A.misc".
- Now you are ready to begin labeling with WaveSurfer.
Put all four files (".wav", ".words", ".breaks", ".misc") in the same
directory, and open the ".wav" file from WaveSurfer, choosing
the "ToBI-Games" configuration.
- The ".breaks" file corresponds to ToBI's breaks tier, and has one
X label at each word boundary. To change a label, click on the X
and replace it with a new label using the keyboard.
- The ".misc" file corresponds to the miscellaneous tier, with
labels for self-repairs, coughs, laughs and breaths. This tier
should be already labeled, but if you find things not correctly
labeled, feel free to update it.
- The ".words" file corresponds to the orthographic tier. There should
be not need to modify this file, but feel free to update it if you
do not agree with the orthographic transcription or alignment.
Praat
- Save a copy on your computer of the ".wav" and ".TextGrid",
files
(right-click on each of them and choose Save target as...,
Save link as..., or similar).
Important: Do not change the name of the ".TextGrid" file,
since it must match exactly the name of the .wav file.
Example: "s09.objects.1.A.wav", "s09.objects.1.A.TextGrid".
- Now you are ready to begin labeling with Praat.
3. Correcting common mistakes in the .words
files
In the ortographic tier you might find some common
mistakes. If you do, please correct them and submit the .words file along with the rest.
4. Guidelines and examples
Whispery or creaky voice
Try to label as much as possible of this, using X*?, X-? and X%? when you cannot tell the type of pitch accent
or phrasal tone.
HiF0 and pitch halving/doubling
If the HiF0 occurs in a region that is pitch halved or doubled, mark it as usual in the tones tier
and specify its actual value at the same position in the misc tier using the notation "HiF0:215".
Restarts (1p and %r)
(We do not use the "isofrag" label.)
Interruptions
If the speaker you are labeling is interrupted by the other speaker, mark so in the misc tier with the special "interrupt" label.
- Folder with all examples
5. Labeling ambiguities
L- L% vs. !H- L%
In cases where it is not clear how best to indicate a gradual final
fall, use X-? L%. In the example, listen to the word "is".
TextGrid file
L* vs. !H*
Sometimes at the end of a series of downsteps it is hard to differentiate between L* and !H*.
In cases in which both pitch accents are acceptable, choose !H*. Look at the word "mime" in this
example.
TextGrid file
- Folder with all examples
6. How to save and submit your work
Once you have finished labeling a complete .wav file, follow these steps:
- WaveSurfer:
Send the new ".breaks", ".tones", and ".misc" files to agus [at] cs.columbia.edu,
maintaining the original file names. If you modified the ".words" file, submit it too.
Praat:
Send the new ".TextGrid" file to agus [at] cs.columbia.edu.
- Go to the File Locking page.
- Click "EDIT" in the "ToBI Labeling" task of the wav file you have labeled.
- On the next screen, change the task's status to "finished", and enter the current date.