Video Reliability Checks

assessing inter-coder reliability

Note: this is outdated, the Python scripts most likely won't work as-is. We aren't going to go through this exact process again though, so this is not a problem we need to solve. If someone does need to revive this at a future point, they should take inspiration from the scripts but otherwise they should write new code using blabpy/blabr as much as possible.

1. Generate Reliability Files [Lab coordinator]

  • Create the following empty directories within reliability_13/video/

    • batch_wordmerge_output

    • converge_out

    • final_out

    • full_files

    • orig_10_percent

    • processed_and_old

    • recode_and_orig_opfs

    • reliability_checks

    • spreadsheets

  • Gather a copy of all of the month's .opf files

    • The script to use is collect/opf_sparsecode.py It is located in this repo.

    • python3 opf_sparsecode.py [path/to/opf_directories.txt][path/to/full_files/][month]

    • This puts things into full_files

  • Generate the to-be-recoded files

    • The script to use is datavyu_scripts/batch_recode.rbIt is located in this repo.

      • Must run it using Datavyu by going to Script > Run Script

      • Make sure to use Datavyu v.1.3.6 or later

      • Need to set some params by editing the script itself...only because we can't use args when running scripts thru Datavyu

        • $input_dir

          • this is the full_files directory with all the original XX month opf files

        • $output_dir

          • This is where it will output the "recode.opf" files

          • Set it to the empty reliability_checks directory mentioned above

        • $original_out

          • This is where the "recode_orig.opf" files will be output

          • Set it to the empty orig_10_percent folder mentioned above

    • This puts files into two of the above folders:

      • 10% of the cells in each full_files/*.opf file get extracted and blanked into a new reliability_checks/*_recode.opf file --> to be recoded

      • The same 10% of the cells in each full_files/*.opf file get extracted but NOT blanked into a new orig_10_percent/*_recode.opf file --> a record of the original annotations

2. Recode 'em [RA]

Have everything in the reliability_checks folder recoded

Step 1: Check codes

  1. Navigate to your assigned .opf in the following directory and open in DataVyu. Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_[month]/video/reliability_checks

  2. Add video data from the appropriate Subject File

  3. Watch through the relevant section of video and fill in the utterance type and object presence of each cell.

    • If there are any other changes besides utt type and obj presence, make a note of it but do not change in this file

    • Ex: if a word should not have been coded, make the utterance type and object presence codes lowercase letter "o"

    • If a word was coded wrong in a way that changes the utterance type (book title "Baby+animals" coded as "animals,") make the utterance type "o"

  4. Save the file and mark task as complete on Asana. You're done for now!

3. Generate consensus files [Lab Coordinator]

3a) Generate the comparison spreadsheet

  1. After all the files are recoded, copy all the files from (a) reliability_checks and (b) orig_10_percent into a single directory called recode_and_orig_opfs

  2. Generate a comparison spreadsheet

    • (a) Use batch_basic_level.rb to create a sort of .csv for each subject

      • First open this script in Atom (or another text editor of choice) and set some variables

        • $input_dir = the recode_and_orig_opfs folder

        • $output_dir = the spreadsheets folder

      • Then run the script via dataVyu (i.e. Script > Run Script > batch_basic_level.rb)

      • Stuff will populate into the spreadsheets folder -- namely, a total of 88 spreadsheets corresponding to the contents of recode_and_orig_opfs

    • (b) Use video_compare_spreadsheet.py to create another sort of spreadsheet that juxtaposes original and recoded cells

      • The script takes one argument: a path to the spreadsheets folder from step (1a)

      • Command to run:python Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/reliability/video_compare_spreadsheet.py Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/spreadsheets/

      • The script will spit out a spreadsheet, titled video_reliability_comparison.csv, into the spreadsheets folder

    • (c) Place a copy of this video_reliability_comarison.csv spreadsheet into Fas-Phyc-PEB-Lab/Duke/Seedlings/Compiled_Data/reliability_sheets_FINAL/[month]

      • Except rename it to 13_video_reliability.csv

    • (d) assess the spreadsheet

      • use Excel's filter feature -- does anything in the new_utt_type or new_present columns look off? nothing should be blank, new_utt_type should all be d/i/n/o/q/r/s/u and new_present should be y/n/o/u; note that 'u' should be used sparingly

      • if anything looks off, tell the RA to go back and fix it

      • repeat step 1-2 until they look okay

3b) Generate the comparison opfs

  1. Run a script that merges the columns from each pair of recode.opf and recode_orig.opf files into a single file which will have 2 columns. This is the "combine_recode.rb" script.

    • In order to run the script, you need to set the directory paths within this script:

      • $input_dir

        • this is the directory with both recode.opf and recode_orig.opf files: recode_and_orig_opfs

      • $output_dir

        • this is the empty converge_out directory mentioned above

    • This will fill the converge_out directory with opf files containing 2 columns (filenames ending in "converge_rel.opf").

      • These are the cells that were mismatched between recode.opf and recode_orig.opf files. If a "converge_rel.opf" file is empty, it means there were no mismatches.

      • SEE NOTE BELOW

4. Conduct consensus [RA]

For each of your assigned video reliability files:

  1. Open the opf from the following directory: Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_[month]/video/converge_out

  2. Check if there are mismatches

    1. If the file is empty, this means there were no mismatches. You are done; move on to the next file

    2. If the file is not empty, continue to step 3 or 4

    3. SEE NOTE BELOW

  3. Grab a consensus buddy-- any other coder.

    1. What should the code be? Come to a conclusion.

    2. Change the codes in the right column (the "recode" column) to reflect your final conclusion.

    3. Again, if you don't think the word should be coded at all, mark it as "o"

  4. If you don't have a consensus buddy (i.e. nobody else is present at the moment)...

    1. Listen from a little before the utterance until a little after (for context)

    2. Without being influenced by the original code (as indicated in the left column) or by the recode (as indicated in the right column), what do you think the code should be?

      1. If you agree with the original code OR the recode: change the codes in the right column (the "recode" column) to reflect your final conclusion

      2. If you don't agree with either: consult documentation such as Annotation Notes and CWI. Then, if you still don't agree with either the original or recode, make a note in the Reliability Issues doc saying that you need to consult with someone about it.

this is the note below

at least since the pre-july 2019 version of the gitbook noted that no mismatches = empty file, but at least as of the 19-20 academic year it seems like if there are no mismatches, it just doesn't output a file at all. if that is the case, it would be nice if the script could output an error if it doesn't parse throguh 44 files -- so that i can know whether (noOutputFile = noMismatches = okay) or (noOutputFile = didn't get parsed = not okay)

5. Merge in the changes [Lab Coordinator]

Use merge_reliability.rb to merge the changes into where they are supposed to go

  • First open this script in Atom (or another text editor of choice) and set a few variables:

    • $origin_in = the full_files folder

    • $recode_in = the converge_out folder

    • $output_dir = the final_out folder

  • Then run the script via dataVyu (i.e. Script > Run Script > merge_reliability.rb)

  • Stuff (namely, the final/updated .opfs) will populate into the final_out folder

n.b. where scripts are locally stored:

  • batch_basic_level.rb Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/datavyu

  • video_compare_spreadsheet.py Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/reliability

  • merge_reliability.rb Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/datavyu

  • scatter.py Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/scatter

  • video_bl.py Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/collect

6. Post-consensus processing [RA]

Open the reliability coding issues doc:

https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing

  • see if it contains any notes that you left about what needs to be changed in your file

  • if there are changes, then make your changes into this file:

    • Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/final_out

  • once you have made the relevant changes, then highlight the comments in the coding issues docx

  • remember to save

  • If you have added any words, remember to run add_annotation_id_video.py

  • Check for errors with run_all_postannotation.rb

7. More post-consensus processing [Lab coordinator]

7a) Send 'em back: updated .opfs

Use scatter/opf.py to send these final/updated .opfs back to Subject Files

  • This script takes one argument: the path to the final_out directory

  • Also, use the --rename flag (which will rename all video opfs to "_sparse_code")

  • Command to run:

    python Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/scatter/opf.py Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/final_out --rename

    • NOTE: as of 11/26/19, users need to cd into the the scatter repo in order to run; due to the way paths are hard-coded into the script

7b) Batch-wordmerge the video files

  • (7bi) Use batch_basic_level.rb again to put things into a new folder

    • First create a folder called processed_and_old by hand; i.e. create reliability_13/video/processed_and_old

      • "processed" = freshly processed during this round of reliability

      • "old" = the older, previous version currently in Subject_Files

    • Then open this script in Atom (or another text editor of choice) and set some variables

      • $input_dir = the final_out folder

      • $output_dir = the processed_and_old folder

    • Then run the script via dataVyu (i.e. Script > Run Script > batch_basic_level.rb)

      • Stuff will populate into the processed_and_old folder. This stuff is the "processed" component of "processed_and_old".

  • (7bii) Use collect/video_bl.py to copy "old" basic levels from Subject_Files into processed_and_old

    • First open the script in Atom (or another text editor of choice) and set some variables

      • months = ['13']

    • Then run the script

      • This script takes two arguments:

        • argv[1] is the path/to/Subject_Files

        • argv[2] is the path/to/processed_and_old

      • Command to run: python /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/collect/video_bl.py /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Subject_Files /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/processed_and_old

      • More stuff will populate into the processed_and_old folder. This is the "old" component of "processed_and_old"

  • (7biii) Run the batch-wordmerge script

    • Command to run: python /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/wordmerge2/wordmerge2_annotid.py /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/processed_and_old /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/batch_wordmerge_output video

7c) Check basic levels

  • One by one open each subject's month 13 sparse code in {audio: Subject Files, video: batch_wordmerge_output} and double-check the basic levels, fixing anything that needs to be fixed

7d) Send 'em back: video basic levels

Last updated