Coding Overheard Speech (OvS)Files
This page contains instructions for annotating the overheard speech files. Any questions, email Jasenia Hartman at [email protected]
NOTE: When you are working on a file, mark the file as in progress on Asana by moving it to the "in progress" column
General Naming system
OvS files are named the following format OvS_subID_subMo
The first code stands for Overheard Speech
The second code refers to the subject’s seedlings subject ID
The third code refers to the infant’s age at which recording occurred
Ex 1: OvS_17_07 refers to SEEDLinG subject 17 at 07 months
Ex 2: OvS_17_SF5 refers to SEEDLinG subject 17 is Second Follow-up (SF) clips when they are 5 years of age
Paths and Folders
1. The path to overheard speech directory is /Volumes/Fas-Phyc-PEB-Lab/OvSpeech/SubjectFiles/Seedlings/overheard_speech
. Here, you will find several relevant subfolders:
eafs
annotations-in-progress
annotations-to-be-superchecked
annotations-complete
Getting Started
Go to the
eafs
folderFind the folder that corresponds to the file you were assigned to annotate on ClickUp
Copy the assigned file folder into the
annotations-in-progress
folderIn the
annotations-in-progress
folder, rename the copied folder and eaf file by adding your initials to the end of the folder name (e.g. OvS_45_07 >> OvS_45_07_JH)Before you begin annotation, you will need to (1) link audio file and (2) add CDS tier type
Link audio file:
In ELAN, open the eaf file >> "Edit" >> "Linked Files…"
Click "Add..."
For SF5 files, go to:
Seedlings/Subject_Files/SubID/subID_subMo/audio_recordings/subID_subMo.wav
Note: SubID refers to second code of the OvS file, subMo refers to the third code of the OvS file (e.g. OvS_17_SF5 path =
Seedlings/Subject_Files/17/17_SF5/Home_Visit/Processing/Audio_Files/17_SF5.wav)
For files other than SF5, go to:
Seedlings/Subject_Files/SubID/subID_subMo/Home_Visit/Processing/Audio_Files/subID_subMo.wav
Note: SubID refers to second code of the OvS file, subMo refers to the third code of the OvS file (e.g. OvS_17_07 path =
Seedlings/Subject_Files/17/17_07/Home_Visit/Processing/Audio_Files/17_07.wav)
Add CDS tier type:
In ELAN, go to "Edit" >> "Edit Control Vocabularies..."
Click "External CV"
Enter this URL link:
https://raw.githubusercontent.com/BergelsonLab/public-files/main/ACLEW-blab-vocabularies.ecv
Click "Ok " >> "Close"
Go to "Type" >> "Add New Tier Type..."
Enter the following for CDS:
Click "Add" >> "Close"
After completing the above steps, you can now begin annotating. Annotate the .eaf file, one clip at a time, following ACLEW standards, in the version of your folder in annotations-in-progress.
General Approach/Tips for Coding OvS files:
Speech Segmentation:
Listen to the context + 2 min clip in its entirety (without segmenting)
Identify whether there is (a) speech or (b) silence >> document on ClickUp under the respective clip number (e.g. clip 1 = speech, clip 2 = silence, clip 3= speech…)
If there is speech, mentally keep a record of how many speakers are talking in the clip while listening to the clip
After listening to the clip, identify the most prominent speaker (e.g. the most talkative, loudest, producing the clearest speech)
Focus on first segmenting for the most prominent speaker in its entirety (rather than coding for one speaker at a time across all 15 clips)
If there are multiple speakers:
Select the loudest speaker >> work your way down to the quietest one
Pick a time frame (30 sec) and identify multiple speakers within that time frame
If speech is indecipherable where you can’t identify a clear cut boundary between where the speech begins and ends, don’t annotate
Annotation/minCHAT:
Start with the most prominent speaker (e.g. the most talkative, loudest, producing the clearest speech)
When unsure about minCHAT format, immediately go to GS tutorials/PPT slides to
For not-so-clear speech:
Go to control >> change the rate to 50 ~ 80 and see whether some/all of the speech is audible
Listen to the speech segment at least 3 times
If 80% sure what the speaker is saying, include in the transcription; if not, default to xxx.
XDS/CDS
Once xds coded for C, immediately add a tier for cds to avoid missing tiers
General tips:
Start one clip at a time
For SF5 files: you'll hear a lot of actual CHI speech with real words
Distinguish between CHI vs. other children in the file
Code CHI for lex and mwu (no need for vcm), accordingly
Take notes about any coding issues you face as you go (e.g., if there is a annotation that you would like reviewed or if there is anything weird/difficult to code with this file)
if there are any utterances directed to a child, you will have to create a dependent cds subtier under the xds for the speaker (For instructions, See Step 4 under Gold Standard Test under the coding for different types of child directed speech page).
Once you're done, slide the task to the “ready for superchecker” on asana.
Last updated