Columbia Games Corpus » Orthographic Transcription Guidelines
« Back to the main page
1. Getting started
- Go to the File Locking page.
- Check that no task for the wav file you want to transcribe has "in progress" status, and
that no person has been assigned to it.
If these conditions are not met, do not work on this file.
- Click "EDIT" in the "Orthographic Transcription" task of the wav file you want to transcribe.
- On the next screen, change the status of the wav file to "in progress",
and enter your name.
- The status labels for all of the "Orthographic Transcription" tasks are links.
Clicking on one of the links brings you to a web interface to transcribe
that file.
- The files are broken up into segments (by splitting on silences). For each
segment, you should --fingers crossed, etc.-- see an audio player in your
browser and a text box to enter the transcription of that particular short
segment.
2. Transcription guidelines
- Mark silences with the special # symbol.
- Some of the segments will be nothing but breathing, crosstalk,
etc., in which case you should just enter a single # for the
transcription.
Punctuation/Symbols
The only
punctuation marks allowed are apostrophes which are used to
indicate absent letters in a contraction (with no
spaces), such as don’t.
There are no
periods (.) or commas (,). If you hear saint or street type it that
way, do not abbreviate.
Do not use dashes
except to indicate false starts (e.g. p- papa). For example, mother
in law would be 3 separate words.
Do not use apostrophes to mark possessives. For example,
John's car would be Johns car.
Capital Letters
There are five instances in which capital letters are to be used:
- The personal pronoun “I”
- Anything that is spelled, such at “A T and T”, "S M I T H", "A
O L"; make sure to leave a space between each capital letter to
indicate the letter itself was said.
- Proper nouns and adjectives, for example:
- names of places (cities, states, rivers, etc.): Florham Park, New
Jersey, New England, the United States, Mississippi River, Rocky
Mountains, Death Valley, East Tennessee, the South, Main Street,
River Road, Mountain Avenue; if they say University Ave type it just
that way, don't expand Ave to Avenue
- names of companies: Charlie Browns, Texaco, American Express
- names of people: Ann, Jim Jones, Walt, Honey and Hon (not dear and darling)
- names of groups: Senate, Congress
- months of the year, days of the week: December, Thursday
- God and words for God: God, Lord, Allah, Buddha
- holidays: Thanksgiving, Christmas, New Years Day, New Years Eve, Easter
- Letters in isolation you capitalize, J T Jones, A M (as in the time of day)
- False start if know word intended (Aug- August; note that each is aligned as for a word)
Examples of times when letter(s) should not be capitalized:
- The first word of
the transcription should not be capitalized, even if it is the start
of a sentence, unless it falls under one of the rules for
capitalization listed above.
- The first word of
any sentence should not be capitalized, unless it falls under one of
the rules for capitalization listed above.
- noon, midnight
Titles
Spell out all words:
mister, doctor, junior, miss, misses, miz (for Ms.), monsignor,
father.
Shortened Words
When words are
shortened by the speaker, there are times we add letters to make them
correct words, for example:
tryin → trying
an → and
cause → because
til → till
cuse → excuse
ok → okay
mm'kay → okay
na → no
'bout → about
'em → them
The following words, however, should be transcribed as indicated:
Disfluencies
Filled pauses are
non-speech vocalizations. They are transcribed as uh, um, er, ah,
mm, eh and sometimes oh, in the orthographic tier. They can also be
combined to include uhhm, uhhuh, mmhm.
False starts,
whether they are repaired or not, are indicated as such with a
hyphen. These occur when the speaker stops in the middle of a word
and either substitutes another word or continues with the same word.
If the incomplete word is not known, indicate so with '?-'.
Examples:
- “I wanna call nine 0o two sev- nine four nine 0o six hundred”
- “hi I want to make a ph- phone call”
- “directory assistance ple- ”
- “it is below the ?- blue lion”
Digit Rules
All numbers are to be typed out as words:
- one eight hundred, two forty five Eighth Street, three o’clock
- four A M, December fifth
The number 0 said as “oh” is typed as 0o (the number zero
followed by a lower case letter o).
Unintelligible Speech
- When you do not have a clue as to what is being said, use 'uu' in
the words tier.
- When you are not sure about a word, but have an idea of what is being said,
add '?' after the word. Example: "the big blue? lion".
- When you do not know if the speaker did a false start or not, use '-?' (do
not confuse with '?-', see Disfluencies above).