Columbia Games Corpus » Orthographic Alignment Guidelines
» 0. Files and Links
» 1. Getting started
» 2. How to download files
» 3. How to start a new alignment session
» 4. How to play a wav file
» 5. General guidelines and common mistakes
» 6. Miscellaneous tier (Do not forget this!)
» 7. How to save and submit your work
» Alignment examples
« Back to the main page
0. Files and links
- Download WaveSurfer from WaveSurfer Homepage.
- Download the following package, uncompress it, and follow the
instructions in the file "README.txt".
» Package for Linux, Windows and Mac (v1.2)
1. Getting started
- Go to the File Locking page.
- Look for the wav file you want to align and check the following:
- its "Orthographic Transcription" task must have "finished" status,
- its "Word Alignment" task must have "not started" status
and no person assigned to it,
- under the status of the "Orthographic Transcription" there must be
two links to the ".wav" and ".words-auto" files.
If these conditions are not met, do not work on this file.
- Click "EDIT" in the "Word Alignment" task of the wav file you want to align.
- On the next screen, change its status to "in progress", and enter your name.
2. How to download files
- In the "Word Alignment" column of the "File Locking" page,
you will notice that under the status label there are
links to the ".wav" and ".words-auto" files. Save a copy of them on your computer
(right-click (Mac: CTRL + left-click) on each of them and choose Save target as...,
Save link as..., or similar).
- On your computer, change the extension of the .words-auto file to .words
Important: The name of the ".words" file must match exactly
the name of the .wav file.
Example: "s09.objects.1.A.wav", "s09.objects.1.A.words".
3. How to start a new alignment session
- Place the .wav and the .words files in the same folder.
Both files must have the same name, with the appropriate extension.
Example: "s06.cards.2.B.wav" and "s06.cards.2.B.words".
- Open WaveSurfer.
- Open the wav file you want to align, selecting the "Games
Transcription" configuration when prompted.
This will open a window with four panes: waveform, spectogram,
"words" pane and "misc" pane.
You will be working in the "words" pane,
where you will correct an alignment automatically generated,
and in the "misc" pane,
where you will label other phenomena like coughs and laughs, for example.
You should see something like this:
4. How to play a wav file
To play the wav file you just opened, use the toolbar at the upper-right corner:
|
Play starting at the current position, or play the current selection only. (shortcut: SPACE BAR) |
|
Loop the current selection. |
|
Pause. (shortcut: SPACE BAR) |
|
Stop. |
|
Close this wav file. |
To play only a word regardless of what is selected,
click on the words pane, drag the mouse over a word and press CTRL + SPACEBAR.
5. General guidelines and common mistakes
- Word tags can be moved by dragging them with the mouse in the
usual way. Align each word tag at the right edge of the word.
- Mark silence at the end of the silence, as a separate word,
labeled only with the special symbol '#'.
Only silences longer than 50 milliseconds should be labeled.
However note that some consonants (known as stops) are
generally preceded by a closure,
in which
airflow is completely blocked and no sound produced before the release (for example:
[p], [t], [k], [b], [d], and [g]).
This "stop closure" should not be marked as a silence preceding or following a word.
» Examples
Another type of stop is a glottal stop, which
can appear word-initially if the word begins with a vowel.
A common example occurs between the 2 syllables in "uh-oh".
» Examples
-
When the last sound of one word is the same as, or very similar to,
the first sound of the next word
(examples: "fish shop", "top part", "with this")
it may be difficult to place the word boundary. A
good strategy is to first allocate a portion of the sound in common to each of the
two words (starting with a 50/50 allocation). Then play each word separately,
moving the boundary between the two words, trying to make each sound complete
and natural.
» Examples
- Make sure you place the boundaries of words before and after
silences correctly.
Often, the automatic aligner is inaccurate in marking the true beginning and true
end of words. You must correct that.
» Example
Transcription Errors
If you find an error in the transcription,
correct it on the new .words file that you are
working on. Do not correct the
original transcription.
- If you think a word or silence is missing, you can add it:
- left-click on the place you think it should be, and
- type it in with the keyboard.
- If you think a word or silence is incorrect,
you can edit it:
- left-click on it, and
- make the correction with the keyboard.
- Remember the Transcription Guidelines.
If you find an error related to them, correct it
(example: possessives should not use apostrophe).
The "Browse" Dialog:
WaveSurfer provides another useful feature for doing text
transcription. By right-clicking (Mac: CTRL + left-click)
in the words tier and choosing the "Browse..." option,
you can open a dialog that allows:
- viewing the complete text transcription file,
- searching for specific words, and
- playing the wav file, highlighting the current word.
6. Miscellaneous tier
Label the following events in the "misc" tier:
- Coughs as 'cough'
- Breaths as 'breath'
- Sniffs as 'sniff'
- Laughs as 'laugh'
- Throat-clearings as 'throat-clearing'
- Lipsmacks and other labial or alveolar clicks as 'smack'.
- Self-repairs as 'self-repair':
Self repairs are speech events in which a speaker interrupts
him/herself to correct or rephrase what they have just said. They are
typically divided into 3 parts:
- the reparandum, which is what gets
rephrased subsequently, and may end in a word fragment or in an abrupt cut-off,
but need not;
- a disfluent portion, which may contain silence, one or more filled
pauses (e.g. 'um' or 'er'), or words such as 'I mean', 'that is to
say';
- the repair, which is the phrase substituted for the
reparandum.
In the first example below, "it's in line" is the
reparandum, the silence (indicated by "#") is the disfluent portion,
and "his suspenders are in line" is the repair.
Stammers are also considered self-repairs, and are marked
once somewhere among the reparanda (see the last three examples).
Examples of self-repairs:
- it's in line # his suspenders are in line
- just above the mer- oh look at that rhinoceros
- it's to the l- right of the mermaid
- so it should be it should hit the bottom of the ear
- just above the mer- the yellow mermaid
- like one two inches away
- it looks # looks like a piano
- I I I I think it's in line
- It's an a- a- apple
- and the lion the blue lion
-
If you are in doubt about one particular label, append a '?' to it.
Examples: 'self-repair?', 'laugh?', 'cough?', etc.
-
Articulatory errors:
If the speaker utters a word with an articulatory error (example:
"sho should" instead of "so should"), the correct phrase is transcribed
in the orthographic tier, and in the misc tier an 'arterr' label
is added, putting between brackets
something that sounds like what was uttered
(example: 'arterr[sho should]').
How and where should these things be marked?
- Mark coughs, breaths, laughs, throat-clearings and smacks in the approximate place where they happen.
- Mark self-repairs in the reparandum (the string of words that is later repaired).
To label these events, right-click (Mac: CTRL + left-click) on the "misc" pane and select the
corresponding option from the menu.
7. How to save and submit your work
To save the transcription:
- Right-click (Mac: CTRL + left-click) on the "words" pane and select
"Save Transcription As...".
Save the file with exact same name
as the "wav" file, now with extension ".words", and in the same folder.
(Replace the previous version if necessary.)
- Right-click (Mac: CTRL + left-click) on the "misc" pane and select
"Save Transcription As...".
Save the file with exact same name
as the "wav" file, now with extension ".misc", and in the same folder.
(Replace the previous version if necessary.)
Once you have finished the alignment, follow these steps:
- Send the new .words and .misc files to agus [at] cs.columbia.edu,
maintaining the file names (example: "s06.cards.2.B.words" and
"s06.cards.2.B.misc").
- Go to the File Locking page.
- Click "EDIT" in the "Word Alignment" task of the wav file you have transcribed/aligned.
- On the next screen, change the task's status to "finished",
and enter the current date.