High Volubility Project

project completed during the 2019-2020 academic year

Overview of the project

The ACLEW High Volubility Project seeks to annotate segments from recordings from the SEEDLINGS corpus and other lab corpuses using the ACLEW Annotation Scheme.

Specifically, fifteen 2-minute clips from recordings with the highest frequency of speech, or volubility, have been identified for annotation based on certain vetting criteria (number of child utts., adult utts. and turn exchanges).

Round 1 = clips 1-5 Round 2 = clips 6-10 Round 3 = clips 11-15

Setting up the coding process

Overview: everything in this repository needs to be coded, with the files being disperse across the participating labs according to the ACLEW_list_of_corpora spreadsheet (located in the lab drive). The Bergelson Lab is responsible for coding the WAR and BER subset of these files.

  1. Copy the relevant .eaf files from https://github.com/aclew/Highvol_templates into raw_WAR_HV and raw_BER_HV

  2. Copy the relevant .wav files from Own Cloud into new folders titled WAR_hv_media and BER_hv_media

  3. Create a spreadsheet to log progress (here it is)

  4. Create an Asana project board to visualize distribution of who's working on what

Workflow

Ideally, the workflow would be to do steps A, B, C in this order:

(A) For each round in [round1, round2, round3]: 1. code the round 2. run the minchat checker 3. review the round 4. run the minchat checker (B) Final review 1. final review the three rounds 2. run the minchat checker (C) Secondpass 1. make the secondpass edits on the three rounds 2. run the minchat checker However, due to the nature of distributing assignments across coders, this ideal order doesn't always happen. Nonetheless, each step will be executed. Details for each step are described below.

A) Coding

  • One coder will annotate the first round, aka the first five 2-minute clips from the identified recording

    • The first five clips = the first five chronologically, not necessarily the top five most voluble

  • Coders should follow the same workflow implemented in the Gold Standard Tutorials

    • beginning with segmentation for clip 1

    • then annotation of the CHI tier and its dependent tiers for clip 1

    • then annotation of all other xds (non-CHI) tiers for clip 1

    • then ending with transcription of all non-CHI speakers clip 1

    • repeat for clip 2, and so on

  • Same idea for the second and third rounds (aka clips 6-10, clips 11-15, respectively)

    • Try to distribute the rounds across three different coders, so that one person does not code all 15 clips. This helps reduce the side-effects that any particular coder might have.

B) Minchat

The Minchat checker allows annotators to automatically check for basic minCHAT errors in their transcriptions so that they can manually fix those errors and submit them.

Read about the details and instructions here in the repo's README.

C) Reviewing rounds

  • Initial review: after a coder finishes annotating a round, the clips in that round will be reviewed by another coder.

    • Carefully look/listen through the five annotated clips. Look out for any errors. Fix those errors. If you notice any recurring errors, let the original coder know, and they should correct those errors themself.

  • Secondpass: there are some things to look out for to make things more holistic across the different aclew teams

Things to look out for during review

  • tight segmentations: make sure there is not an excessive amount of silence at the beginning and end of each segmentation

  • consistent speakers across rounds: make sure coder 1's FA1 is not coder 2's FA2

  • correct codes: make sure things are correctly classified as canonical, non-canonical, adult-directed speech, etc.

  • correct transcriptions: make sure things are coded according to minCHAT standards, syllables are spelled consistently across rounds and files, ...

Logging progress

  • Once you finish a step, log the date here in the ACLEW Planning spreadsheet.

  • Lab manager will push progress to the raw_WAR_hv and raw_BER_hv github repos

Last updated