Coding Overheard Speech (OvS)Files

This page contains instructions for annotating the overheard speech files. Any questions, email Jasenia Hartman at [email protected]

NOTE: When you are working on a file, mark the file as in progress on Asana by moving it to the "in progress" column

General Naming system

OvS files are named the following format OvS_subID_subMo

The first code stands for Overheard Speech
The second code refers to the subject’s seedlings subject ID
The third code refers to the infant’s age at which recording occurred

Ex 1: OvS_17_07 refers to SEEDLinG subject 17 at 07 months

Ex 2: OvS_17_SF5 refers to SEEDLinG subject 17 is Second Follow-up (SF) clips when they are 5 years of age

Paths and Folders

1. The path to overheard speech directory is /Volumes/Fas-Phyc-PEB-Lab/OvSpeech/SubjectFiles/Seedlings/overheard_speech. Here, you will find several relevant subfolders:

eafs
annotations-in-progress
annotations-to-be-superchecked
annotations-complete

Getting Started

Go to the eafs folder
Find the folder that corresponds to the file you were assigned to annotate on ClickUp
Copy the assigned file folder into the annotations-in-progressfolder
In the annotations-in-progress folder, rename the copied folder and eaf file by adding your initials to the end of the folder name (e.g. OvS_45_07 >> OvS_45_07_JH)
Before you begin annotation, you will need to (1) link audio file and (2) add CDS tier type
1. Link audio file:
  1. In ELAN, open the eaf file >> "Edit" >> "Linked Files…"
  2. Click "Add..."
  3. For SF5 files, go to: Seedlings/Subject_Files/SubID/subID_subMo/audio_recordings/subID_subMo.wav
    Note: SubID refers to second code of the OvS file, subMo refers to the third code of the OvS file (e.g. OvS_17_SF5 path = Seedlings/Subject_Files/17/17_SF5/Home_Visit/Processing/Audio_Files/17_SF5.wav)
  4. For files other than SF5, go to: Seedlings/Subject_Files/SubID/subID_subMo/Home_Visit/Processing/Audio_Files/subID_subMo.wav
    Note: SubID refers to second code of the OvS file, subMo refers to the third code of the OvS file (e.g. OvS_17_07 path = Seedlings/Subject_Files/17/17_07/Home_Visit/Processing/Audio_Files/17_07.wav)
2. Add CDS tier type:
  1. In ELAN, go to "Edit" >> "Edit Control Vocabularies..."
  2. Click "External CV"
  3. Enter this URL link: https://raw.githubusercontent.com/BergelsonLab/public-files/main/ACLEW-blab-vocabularies.ecv
  4. Click "Ok " >> "Close"
  5. Go to "Type" >> "Add New Tier Type..."
  6. Enter the following for CDS:
  1. Click "Add" >> "Close"

After completing the above steps, you can now begin annotating. Annotate the .eaf file, one clip at a time, following ACLEW standards, in the version of your folder in annotations-in-progress.

General Approach/Tips for Coding OvS files:

Speech Segmentation:
- Listen to the context + 2 min clip in its entirety (without segmenting)
  - Identify whether there is (a) speech or (b) silence >> document on ClickUp under the respective clip number (e.g. clip 1 = speech, clip 2 = silence, clip 3= speech…)
    If there is speech, mentally keep a record of how many speakers are talking in the clip while listening to the clip
- After listening to the clip, identify the most prominent speaker (e.g. the most talkative, loudest, producing the clearest speech)
  - Focus on first segmenting for the most prominent speaker in its entirety (rather than coding for one speaker at a time across all 15 clips)
  - If there are multiple speakers:
    Select the loudest speaker >> work your way down to the quietest one
    Pick a time frame (30 sec) and identify multiple speakers within that time frame
    If speech is indecipherable where you can’t identify a clear cut boundary between where the speech begins and ends, don’t annotate
Annotation/minCHAT:
- Start with the most prominent speaker (e.g. the most talkative, loudest, producing the clearest speech)
- When unsure about minCHAT format, immediately go to GS tutorials/PPT slides to
- For not-so-clear speech:
  - Go to control >> change the rate to 50 ~ 80 and see whether some/all of the speech is audible
  - Listen to the speech segment at least 3 times
  - If 80% sure what the speaker is saying, include in the transcription; if not, default to xxx.
XDS/CDS
- Once xds coded for C, immediately add a tier for cds to avoid missing tiers

General tips:
- Start one clip at a time
For SF5 files: you'll hear a lot of actual CHI speech with real words
- Distinguish between CHI vs. other children in the file
- Code CHI for lex and mwu (no need for vcm), accordingly
Take notes about any coding issues you face as you go (e.g., if there is a annotation that you would like reviewed or if there is anything weird/difficult to code with this file)
if there are any utterances directed to a child, you will have to create a dependent cds subtier under the xds for the speaker (For instructions, See Step 4 under Gold Standard Test under the coding for different types of child directed speech page).

Once you're done, slide the task to the “ready for superchecker” on asana.

PreviousOverheard Speech NextTranscribing the CHI tier

Last updated 1 month ago