# Video Reliability Checks

## Note: this is outdated, the Python scripts most likely won't work as-is. We aren't going to go through this exact process again though, so this is not a problem we need to solve. If someone does need to revive this at a future point, they should take inspiration from the scripts but otherwise they should write new code using blabpy/blabr as much as possible.

## 1. Generate Reliability Files \[Lab coordinator]

* **Create the following empty directories within `reliability_13/video/`**
  * batch\_wordmerge\_output
  * converge\_out
  * final\_out
  * full\_files
  * orig\_10\_percent
  * processed\_and\_old
  * recode\_and\_orig\_opfs
  * reliability\_checks
  * spreadsheets
* **Gather a copy of all of the month's .opf files**
  * The script to use is `collect/opf_sparsecode.py` It is located in [this repo](https://github.com/SeedlingsBabylab/collect/blob/master/opf_sparsecode.py).
  * `python3 opf_sparsecode.py [path/to/opf_directories.txt][path/to/full_files/][month]`
  * This puts things into **full\_files**
* **Generate the to-be-recoded files**
  * The script to use is `datavyu_scripts/batch_recode.rb`It is located in [this repo](https://github.com/SeedlingsBabylab/datavyu_scripts).
    * Must run it using Datavyu by going to Script > Run Script
    * Make sure to use Datavyu v.1.3.6 or later
    * Need to set some params by editing the script itself...only because we can't use args when running scripts thru Datavyu
      * $input\_dir
        * this is the full\_files directory with all the original XX month opf files
      * $output\_dir
        * This is where it will output the "recode.opf" files
        * Set it to the empty reliability\_checks directory mentioned above
      * $original\_out
        * This is where the "recode\_orig.opf" files will be output
        * Set it to the empty orig\_10\_percent folder mentioned above
  * This puts files into two of the above folders:
    * 10% of the cells in each `full_files/*.opf` file get extracted and blanked into a new `reliability_checks/*_recode.opf` file --> to be recoded
    * The same 10% of the cells in each `full_files/*.opf` file get extracted but NOT blanked into a new `orig_10_percent/*_recode.opf` file --> a record of the original annotations

## 2. Recode 'em \[RA]

Have everything in the reliability\_checks folder recoded

#### Step 1: Check codes

1. Navigate to your assigned .opf in the following directory and open in DataVyu. `Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_[month]/video/reliability_checks`
2. Add video data from the appropriate Subject File
3. Watch through the relevant section of video and fill in the **utterance type** and **object presence** of each cell.
   * If there are any **other changes besides** utt type and obj presence, ***make a note of it but do not change in this file***
     * **Notes should be kept in this Google document:** [**https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing**](https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing)
     * Since this is stored on the cloud, DO NOT write any identifiable info on the doc
   * Ex: if a word should not have been coded, make the utterance type and object presence codes lowercase letter "o"
   * If a word was coded wrong in a way that changes the utterance type (book title "Baby+animals" coded as "animals,") make the utterance type "o"
4. Save the file and mark task as complete on Asana. You're done for now!

## 3. Generate consensus files \[Lab Coordinator]

### 3a) Generate the comparison spreadsheet

1. After all the files are recoded, copy all the files from (a) **reliability\_checks** and (b) **orig\_10\_percent** into a single directory called **recode\_and\_orig\_opfs**
2. **Generate a comparison spreadsheet**
   * (a) Use `batch_basic_level.rb` to create a sort of .csv for each subject
     * First open this script in Atom (or another text editor of choice) and set some variables
       * $input\_dir = the **recode\_and\_orig\_opfs** folder
       * $output\_dir = the **spreadsheets** folder
     * Then run the script via dataVyu (i.e. Script > Run Script > batch\_basic\_level.rb)
     * Stuff will populate into the spreadsheets folder -- namely, a total of 88 spreadsheets corresponding to the contents of recode\_and\_orig\_opfs
   * (b) Use `video_compare_spreadsheet.py` to create another sort of spreadsheet that juxtaposes original and recoded cells
     * The script takes one argument: a path to the **spreadsheets** folder from step (1a)
     * Command to run:`python Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/reliability/video_compare_spreadsheet.py Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/spreadsheets/`
     * The script will spit out a spreadsheet, titled **video\_reliability\_comparison.csv**, into the **spreadsheets** folder
   * (c) Place a copy of this **video\_reliability\_comarison.csv** spreadsheet into `Fas-Phyc-PEB-Lab/Duke/Seedlings/Compiled_Data/reliability_sheets_FINAL/[month]`
     * Except rename it to **13\_video\_reliability.csv**
   * (d) assess the spreadsheet
     * use Excel's filter feature -- does anything in the new\_utt\_type or new\_present columns look off? nothing should be blank, new\_utt\_type should all be d/i/n/o/q/r/s/u and new\_present should be y/n/o/u; note that 'u' should be used sparingly
     * if anything looks off, tell the RA to go back and fix it
     * repeat step 1-2 until they look okay

### 3b) Generate the comparison opfs

1. Run a script that merges the columns from each pair of recode.opf and recode\_orig.opf files into a single file which will have 2 columns. This is the "combine\_recode.rb" script.
   * In order to run the script, you need to set the directory paths within this script:
     * $input\_dir
       * this is the directory with both recode.opf and recode\_orig.opf files: **recode\_and\_orig\_opfs**
     * $output\_dir
       * this is the empty **converge\_out** directory mentioned above
   * This will fill the **converge\_out** directory with opf files containing 2 columns (filenames ending in "converge\_rel.opf").
     * These are the cells that were mismatched between recode.opf and recode\_orig.opf files. If a "converge\_rel.opf" file is empty, it means there were no mismatches.
     * ***SEE NOTE BELOW***

## 4. Conduct consensus \[RA]

**For each of your assigned video reliability files:**

1. **Open the opf from the following directory:** `Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_[month]/video/converge_out`
2. **Check if there are mismatches**
   1. If the file is empty, this means there were no mismatches. You are done; move on to the next file
   2. If the file is not empty, continue to step 3 or 4
   3. ***SEE NOTE BELOW***
3. **Grab a consensus buddy-- any other coder.**
   1. What should the code be? Come to a conclusion.
   2. Change the codes in the right column (the "recode" column) to reflect your final conclusion.
   3. Again, if you don't think the word should be coded at all, mark it as "o"
4. **If you don't have a consensus buddy (i.e. nobody else is present at the moment)...**
   1. Listen from a little before the utterance until a little after (for context)
   2. Without being influenced by the original code (as indicated in the left column) or by the recode (as indicated in the right column), what do you think the code should be?
      1. If you agree with the original code OR the recode: change the codes in the right column (the "recode" column) to reflect your final conclusion
      2. If you don't agree with either: consult documentation such as Annotation Notes and CWI. Then, if you still don't agree with either the original or recode, make a note in the [Reliability Issues doc](https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing) saying that you need to consult with someone about it.

{% hint style="warning" %}
**this is the note below**

at least since the pre-july 2019 version of the gitbook noted that no mismatches = empty file, but at least as of the 19-20 academic year it seems like if there are no mismatches, it just doesn't output a file at all. if that is the case, it would be nice if the script could output an error if it doesn't parse throguh 44 files -- so that i can know whether (noOutputFile = noMismatches = okay) or (noOutputFile = didn't get parsed = not okay)
{% endhint %}

## 5. Merge in the changes \[Lab Coordinator]

Use `merge_reliability.rb` to merge the changes into where they are supposed to go

* First open this script in Atom (or another text editor of choice) and set a few variables:
  * $origin\_in = the **full\_files** folder
  * $recode\_in = the **converge\_out** folder
  * $output\_dir = the **final\_out** folder
* Then run the script via dataVyu (i.e. Script > Run Script > merge\_reliability.rb)
* Stuff (namely, the final/updated .opfs) will populate into the **final\_out** folder

### n.b. where scripts are locally stored:

* `batch_basic_level.rb` Fas-Phyc-PEB-Lab/Seedlings/Scripts\_And\_Apps/Github/seedlings/datavyu
* `video_compare_spreadsheet.py` Fas-Phyc-PEB-Lab/Seedlings/Scripts\_And\_Apps/Github/seedlings/reliability
* `merge_reliability.rb` Fas-Phyc-PEB-Lab/Seedlings/Scripts\_And\_Apps/Github/seedlings/datavyu
* `scatter.py` Fas-Phyc-PEB-Lab/Seedlings/Scripts\_And\_Apps/Github/seedlings/scatter
* `video_bl.py` Fas-Phyc-PEB-Lab/Seedlings/Scripts\_And\_Apps/Github/seedlings/collect

## 6. Post-consensus processing \[RA]

Open the reliability coding issues doc:

<https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing>

* see if it contains any notes that you left about what needs to be changed in your file
* if there are changes, then make your changes into this file:
  * `Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/final_out`
* once you have made the relevant changes, then highlight the comments in the coding issues docx
* remember to save
* If you have added any words, remember to [run add\_annotation\_id\_video.py](https://app.gitbook.com/@bergelsonlab/s/blab/~/drafts/-MXcaD5_hcQqBEGvyK1O/data-pipeline/video/video-annotation-checks#video-scripts)
* [Check for errors](https://app.gitbook.com/@bergelsonlab/s/blab/~/drafts/-MXcaD5_hcQqBEGvyK1O/data-pipeline/video/video-annotation-checks#video-scripts) with `run_all_postannotation.rb`

## 7. More post-consensus processing \[Lab coordinator]

### 7a) Send 'em back: updated .opfs

Use `scatter/opf.py` to send these final/updated .opfs back to Subject Files

* This script takes one argument: the path to the **final\_out** directory
* Also, use the --rename flag (which will rename all video opfs to "\_sparse\_code")
* Command to run:

  `python Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/scatter/opf.py Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/final_out --rename`

  * NOTE: as of 11/26/19, users need to cd into the the scatter repo in order to run; due to the way paths are hard-coded into the script

### 7b) Batch-wordmerge the video files

* (7bi) Use `batch_basic_level.rb` again to put things into a new folder
  * First create a folder called **processed\_and\_old** by hand; i.e. create reliability\_13/video/processed\_and\_old
    * "processed" = freshly processed during this round of reliability
    * "old" = the older, previous version currently in Subject\_Files
  * Then open this script in Atom (or another text editor of choice) and set some variables
    * $input\_dir = the **final\_out** folder
    * $output\_dir = the **processed\_and\_old** folder
  * Then run the script via dataVyu (i.e. Script > Run Script > batch\_basic\_level.rb)
    * Stuff will populate into the **processed\_and\_old** folder. This stuff is the "processed" component of "processed\_and\_old".
* (7bii) Use `collect/video_bl.py` to copy "old" basic levels from Subject\_Files into **processed\_and\_old**
  * First open the script in Atom (or another text editor of choice) and set some variables
    * months = \['13']
  * Then run the script
    * This script takes two arguments:
      * argv\[1] is the path/to/Subject\_Files
      * argv\[2] is the path/to/processed\_and\_old
    * Command to run:\
      `python /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/collect/video_bl.py /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Subject_Files /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/processed_and_old`
    * More stuff will populate into the **processed\_and\_old** folder. This is the "old" component of "processed\_and\_old"
* (7biii) Run the batch-wordmerge script
  * Command to run:\
    `python /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/wordmerge2/wordmerge2_annotid.py /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/processed_and_old /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/batch_wordmerge_output video`

### 7c) Check basic levels

* One by one open each subject's month 13 sparse code in {audio: Subject Files, video: batch\_wordmerge\_output} and double-check the basic levels, fixing anything that needs to be fixed

### 7d) Send 'em back: video basic levels

* The .csv basic level stuff, created during batch wordmerge and populated into **batch\_wordmerge\_output** just now, will be sent back at this time using scatter; see here for [how to use scatter to send back basic levels](https://gitbook.bergelsonlab.com/archive/seedlings/scatter#to-send-back-basic-level-wordmerged-csv-files)
