# Audio Reliability Checks

{% hint style="warning" %}
Caution! Lines become REVERSED during reliability checks: for utterances that contain TWO OR MORE coded nouns, the coded nouns do not appear in the order that they are uttered.

Please especially take careful note when there are two instances of the SAME WORD within the same utterance. They can be distinguished only by their Annotation ID​‌
{% endhint %}

**Background: review** [**Audio Annotation Checks**](https://gitbook.bergelsonlab.com/data-pipeline/audio/annotation-checks) **if necessary**

## 1) Generate Reliability files \[Lab Coordinator]

{% hint style="info" %}
&#x20;Do not do this until AFTER you have already [sent the recodes back to SubjectFiles](https://gitbook.bergelsonlab.com/archive/seedlings/scatter)!
{% endhint %}

Relevant repo: <https://github.com/SeedlingsBabylab/reliability>

Go to `/Volumes/Fas-Phyc-PEB-Lab/Seedlings/Working_Files/`

Create a directory for the month you're running reliability on, called "reliability\_\[month]". Within that directory, make an audio and video subfolder. In audio, make the following subfolders:

* full\_files
* orig\_10\_percent
* reliability\_checks
* spreadsheets
* debug
  * compare\_csvs (folder within debug)

Open [get\_cha.py](https://github.com/SeedlingsBabylab/collect/blob/master/get_cha.py) in Atom (this is in "collect" scripts). The `start_dir` should be set to Subject Files. The `out_dir` should be set to the full\_files folder that you just created. The `subj` should be set to "" and the `month` should be set to whichever month you want to gather (ex.: "11"). Run the script from the terminal:

```
python get_cha.py
```

Now we need to extract the 10% of annotations and fill them into new cha files. To do this, we run the `batch_sample.py` script. It takes one argument: the path to the full\_files folder. This script will output files with 10% of annotations replaced with "word &=X\_X\_MOT" into the "orig\_10\_percent" and "reliability\_checks" folders.

## 2) Conduct Reliability Recodes \[RA]

### a) Setup

* Navigate to your assigned .cha in the following directory:\
  `/Volumes/Fas-Phyc-PEB-Lab/Seedlings/Working_Files/reliability_[month]/audio/reliability_checks`
* **Copy the .cha** into its corresponding **Processing/Audio\_Files** folder (in Subject\_Files):\
  `/Volumes/Fas-Phyc-PEB-Lab/Seedlings/Subject_Files/.../Home_Visit/Processing/Audio_Files`
  * NOTE: Please **DO NOT** leave the file in this folder when not in use! If you do not finish an audio reliability file by the end of your shift, move the .cha back into the reliability\_checks folder and replace the older version, and leave a note for yourself in Asana to finish the file.
* Open the file in CLAN. Go to **Mode -> turn OFF Chat mode** (shortcut: Esc-m)
* Also in **Mode -> click 'Expand bullets'** (shortcut: Esc-a) and **timestamps** will show up at the end of each segment tier.&#x20;
* Go to Edit -> CLAN Options -> Make sure Auto-Wrap in TEXT mode and Auto-Wrap in CLAN Output are both **unchecked** (not on!).

### b) Check codes

Search (Mac shortcut: ⌘ + F, PC shortcut: Ctrl + F) for "X\_X". The codes that you need to fill in have the form "word &=X\_X\_\[SPEAKER]\_\[annotid]" where X is utterance type and object presence for each coded word.

Place your cursor a few lines above the word in order to hear a bit of context. Listen to the utterance and replace the X in each code with what you think are the appropriate codes.

To move to the next "X\_X" use ⌘ + G or Ctrl + G. Continue until you fill in all of the codes.

If the word **should not have been coded**, make the utterance type and object presence "o". Don't delete any codes from the blank\_rel\_10.cha file.

If you notice that the speaker code is wrong, or if there are missing words in the section of the file you're listening to, **don't** change it in the recode file, but **do** leave a note in this google doc with the timestamp and description of the change you need to make or the missing word you need to add:

<https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing>

\*Since this doc is stored on the cloud, DO NOT write any identifiable info (e.g. names) on this.

When you're finished, move the .cha back into the reliability\_checks directory, replacing the older version.

Mark your task complete on Asana.

## 3) Generate the spreadsheet \[Lab Coordinator]

After all the blank annotations in the **reliability\_checks** folder have been recoded, we need to compare those annotations with the original ones. Run the `batch_compare.py` script. It takes one argument: the path to the folder that contains the full\_files, reliability\_checks, etc. folders. It will generate a csv of the mismatches between the recodes and originals, which it will output to the spreadsheets folder.

If the script crashes, compare the two csv files that were sent to the **debug** folder. Find where there is a mismatch and fix accordingly in the orig or recode .cha.

Place a copy of the reliability spreadsheet in the folder: `Volumes/Fas-Phyc-PEB-Lab/Seedlings/Compiled_Data/reliability_sheets_FINAL`. This one will remain clean and will not be used for RA's to do consensus meetings.

On the copy of the spreadsheet that will remain inside the month folder, use the following directions to make the consensus spreadsheet:&#x20;

* Use the EXACT text function to compare utt\_type and obj\_pres columns from orig and recode. The EXACT formula should look like this: `=EXACT(C2:C1779, E2:E1779)`. Then use CONCATENATE to compare TRUE/FALSE columns. The CONCATENATE formula should look like this: `=CONCATENATE (I2:I1779,"_",J2:J1779)`. Filter out TRUE\_TRUE to find mismatched codes.

Assign RAs audio consensus and wordmerge tasks--for audios, [changes are made directly in the .cha file in subject files](#steps-for-audio-consensus)

## 4) Conduct consensus \[RA]

The **consensus spreadsheet** points out any differences in the file between the original answers and your re-coded answers.&#x20;

### a) Set up the consensus spreadsheet

1. Open the **consensus spreadsheet** for the month you are doing reliability for\
   `Fas-Phyc-PEB-Lab/Seedlings/Working_Files/ reliability_[month]/audio/ spreadsheets`
2. Select+all the cells. Click the **Filter** icon in the upper right-hand corner.&#x20;
3. On the **True\_False column**, select the drop-down arrow. U**n-select** **TRUE\_TRUE** (this removes the matching codes; now only mismatches remain).&#x20;
4. On the **File column**, select the drop-down arrow and select only the name of the reliability file you want to check.&#x20;

![For "baba," object presence is mismatched. For "diaper+bag," utterance type is mismatched.](https://3364608434-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LD2B3y86yJyNeihFLqD%2F-LJKKd_co6cFGBc_5aFp%2F-LJKSRjd_eglN_uevWU2%2FScreen%20Shot%202018-08-07%20at%201.35.46%20PM.png?alt=media\&token=aebe12b1-ea78-48e3-ad60-b85e440812e1)

### b) Set up the sparse\_code.cha

1. Find the sparse\_code.cha for the file in Subject Files\
   `Fas-Phyc-PEB-Lab/Seedlings/Subject_Files/[SubjectNo]/[SubjectNo_month]/Home_Visit/Coding/Audio_Annotation`
2. **Copy-paste** the sparse\_code.cha into its `HomeVisit/Processing/Audio_Files`folder
   1. ***Do not drag and drop!***
3. Open the .cha file. Press Ctrl+a on your keyboard to view timestamps in the file.

### c) Do the consensus

#### *If you have a consensus buddy...*

1. Grab a consensus buddy (any research assistant who is currently present)
2. Grab a headphones splitter. Plug both sets of headphones in.
3. Look at the spreadsheet. For each mismatched code...&#x20;
   1. On the spreadsheet, copy the **onset** timestamp.&#x20;
   2. Go to the .cha file. Press Ctrl+f. Paste the timestamp into the search box. Press Enter.&#x20;
   3. Listen from a little before the utterance until a little after (for context)
   4. What do you think the code should be? Discuss with your partner. Reference the spreadsheet, [Annotation Notes](https://gitbook.bergelsonlab.com/data-pipeline/annotation-notes), CWI, or other documentation as necessary. Come to a conclusion.
   5. If you disagreed with the original code, make any necessary changes in the **sparse\_code.cha directly in Subject Files** (NOT in the reliability spreadsheet or anywhere else!).
   6. Repeat for every mismatched code from the spreadsheet.
   7. Clan check + Save
   8. Move the sparse\_code.cha back to its `HomeVisit/Coding/Audio_Annotation` folder
      1. No need to run the add\_annotid, parseclan, wordmerge, etc. scripts at this time -- that is in the post-consesnsus step

#### ***If you don't have a consensus buddy (i.e. nobody else is present at the moment)...***

1. Look at the spreadsheet. For each mismatched code...
   1. On the spreadsheet, copy the **onset** timestamp.&#x20;
   2. Go to the .cha file. Press Ctrl+f. Paste the timestamp into the search box. Press Enter.
   3. Listen from a little before the utterance until a little after (for context)
   4. Without being influenced by the original code (as indicated in the .cha) or by the recode (as indicated in the spreadsheet), what do you think the code should be?
      1. If you agree with the original code: leave as is
      2. If you agree with the recode: change it to the recode
      3. If you don't agree with either:
         1. Consult documentation such as Annotation Notes and CWI
         2. Then, if you still don't agree with either the original OR recode, make a note in the Reliability Issues doc saying that you need to consult with someone about it.
   5. Make any necessary changes in the **sparse\_code.cha directly in Subject Files** (NOT in the reliability spreadsheet or anywhere else!).
   6. Clan check + Save
   7. Move the .cha file back to `Coding/Audio_Annotation`
      1. Replace the old version of the .cha file in `Coding/Audio_Annotation`
      2. Delete the one that you were working with in `Processing/Audio_Files`.
      3. No need to run the add\_annotid, parseclan, wordmerge, etc. scripts at this time -- that will happen in the post-consesnsus step
2. Repeat for every mismatched code from the spreadsheet.

### d) Post-consensus&#x20;

***For each of your assigned files:***

1. **Reliability issues doc**
   1. Copy-paste the .cha  from `Coding/Audio_Annotation` into `Processing/Audio_Files`
   2. Open the **Reliability Check Coding Issues** document. This is the doc that you used to write notes to yourself when you were doing the recodes.\
      <https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing>
   3. Check if the doc contains any notes about your file.
   4. Implement those fixes directly in the .cha in Subject Files and then highlight the comment on the doc once it is taken care of.
2. **Run the scripts and check the basic levels**
   1. Move the .cha file back to `Coding/Audio_Annotation`
      1. Replace the old version of the .cha file in `Coding/Audio_Annotation`
      2. Delete the one that you were working with in `Processing/Audio_Files`.
   2. Ask Zhenya to add annotation IDs. See [here](https://gitbook.bergelsonlab.com/data-pipeline/audio/audio-add-annotation-ids).
   3. [update-sparse\_code.csvs](https://gitbook.bergelsonlab.com/data-pipeline/basic-levels/update-sparse_code.csvs "mention")


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://gitbook.bergelsonlab.com/data-pipeline/audio/audio-reliability.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
