Video Reliability Checks
assessing inter-coder reliability
Note: this is outdated, the Python scripts most likely won't work as-is. We aren't going to go through this exact process again though, so this is not a problem we need to solve. If someone does need to revive this at a future point, they should take inspiration from the scripts but otherwise they should write new code using blabpy/blabr as much as possible.
1. Generate Reliability Files [Lab coordinator]
Create the following empty directories within
reliability_13/video/
batch_wordmerge_output
converge_out
final_out
full_files
orig_10_percent
processed_and_old
recode_and_orig_opfs
reliability_checks
spreadsheets
Gather a copy of all of the month's .opf files
The script to use is
collect/opf_sparsecode.py
It is located in this repo.python3 opf_sparsecode.py [path/to/opf_directories.txt][path/to/full_files/][month]
This puts things into full_files
Generate the to-be-recoded files
The script to use is
datavyu_scripts/batch_recode.rb
It is located in this repo.Must run it using Datavyu by going to Script > Run Script
Make sure to use Datavyu v.1.3.6 or later
Need to set some params by editing the script itself...only because we can't use args when running scripts thru Datavyu
$input_dir
this is the full_files directory with all the original XX month opf files
$output_dir
This is where it will output the "recode.opf" files
Set it to the empty reliability_checks directory mentioned above
$original_out
This is where the "recode_orig.opf" files will be output
Set it to the empty orig_10_percent folder mentioned above
This puts files into two of the above folders:
10% of the cells in each
full_files/*.opf
file get extracted and blanked into a newreliability_checks/*_recode.opf
file --> to be recodedThe same 10% of the cells in each
full_files/*.opf
file get extracted but NOT blanked into a neworig_10_percent/*_recode.opf
file --> a record of the original annotations
2. Recode 'em [RA]
Have everything in the reliability_checks folder recoded
Step 1: Check codes
Navigate to your assigned .opf in the following directory and open in DataVyu.
Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_[month]/video/reliability_checks
Add video data from the appropriate Subject File
Watch through the relevant section of video and fill in the utterance type and object presence of each cell.
If there are any other changes besides utt type and obj presence, make a note of it but do not change in this file
Notes should be kept in this Google document: https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing
Since this is stored on the cloud, DO NOT write any identifiable info on the doc
Ex: if a word should not have been coded, make the utterance type and object presence codes lowercase letter "o"
If a word was coded wrong in a way that changes the utterance type (book title "Baby+animals" coded as "animals,") make the utterance type "o"
Save the file and mark task as complete on Asana. You're done for now!
3. Generate consensus files [Lab Coordinator]
3a) Generate the comparison spreadsheet
After all the files are recoded, copy all the files from (a) reliability_checks and (b) orig_10_percent into a single directory called recode_and_orig_opfs
Generate a comparison spreadsheet
(a) Use
batch_basic_level.rb
to create a sort of .csv for each subjectFirst open this script in Atom (or another text editor of choice) and set some variables
$input_dir = the recode_and_orig_opfs folder
$output_dir = the spreadsheets folder
Then run the script via dataVyu (i.e. Script > Run Script > batch_basic_level.rb)
Stuff will populate into the spreadsheets folder -- namely, a total of 88 spreadsheets corresponding to the contents of recode_and_orig_opfs
(b) Use
video_compare_spreadsheet.py
to create another sort of spreadsheet that juxtaposes original and recoded cellsThe script takes one argument: a path to the spreadsheets folder from step (1a)
Command to run:
python Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/reliability/video_compare_spreadsheet.py Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/spreadsheets/
The script will spit out a spreadsheet, titled video_reliability_comparison.csv, into the spreadsheets folder
(c) Place a copy of this video_reliability_comarison.csv spreadsheet into
Fas-Phyc-PEB-Lab/Duke/Seedlings/Compiled_Data/reliability_sheets_FINAL/[month]
Except rename it to 13_video_reliability.csv
(d) assess the spreadsheet
use Excel's filter feature -- does anything in the new_utt_type or new_present columns look off? nothing should be blank, new_utt_type should all be d/i/n/o/q/r/s/u and new_present should be y/n/o/u; note that 'u' should be used sparingly
if anything looks off, tell the RA to go back and fix it
repeat step 1-2 until they look okay
3b) Generate the comparison opfs
Run a script that merges the columns from each pair of recode.opf and recode_orig.opf files into a single file which will have 2 columns. This is the "combine_recode.rb" script.
In order to run the script, you need to set the directory paths within this script:
$input_dir
this is the directory with both recode.opf and recode_orig.opf files: recode_and_orig_opfs
$output_dir
this is the empty converge_out directory mentioned above
This will fill the converge_out directory with opf files containing 2 columns (filenames ending in "converge_rel.opf").
These are the cells that were mismatched between recode.opf and recode_orig.opf files. If a "converge_rel.opf" file is empty, it means there were no mismatches.
SEE NOTE BELOW
4. Conduct consensus [RA]
For each of your assigned video reliability files:
Open the opf from the following directory:
Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_[month]/video/converge_out
Check if there are mismatches
If the file is empty, this means there were no mismatches. You are done; move on to the next file
If the file is not empty, continue to step 3 or 4
SEE NOTE BELOW
Grab a consensus buddy-- any other coder.
What should the code be? Come to a conclusion.
Change the codes in the right column (the "recode" column) to reflect your final conclusion.
Again, if you don't think the word should be coded at all, mark it as "o"
If you don't have a consensus buddy (i.e. nobody else is present at the moment)...
Listen from a little before the utterance until a little after (for context)
Without being influenced by the original code (as indicated in the left column) or by the recode (as indicated in the right column), what do you think the code should be?
If you agree with the original code OR the recode: change the codes in the right column (the "recode" column) to reflect your final conclusion
If you don't agree with either: consult documentation such as Annotation Notes and CWI. Then, if you still don't agree with either the original or recode, make a note in the Reliability Issues doc saying that you need to consult with someone about it.
this is the note below
at least since the pre-july 2019 version of the gitbook noted that no mismatches = empty file, but at least as of the 19-20 academic year it seems like if there are no mismatches, it just doesn't output a file at all. if that is the case, it would be nice if the script could output an error if it doesn't parse throguh 44 files -- so that i can know whether (noOutputFile = noMismatches = okay) or (noOutputFile = didn't get parsed = not okay)
5. Merge in the changes [Lab Coordinator]
Use merge_reliability.rb
to merge the changes into where they are supposed to go
First open this script in Atom (or another text editor of choice) and set a few variables:
$origin_in = the full_files folder
$recode_in = the converge_out folder
$output_dir = the final_out folder
Then run the script via dataVyu (i.e. Script > Run Script > merge_reliability.rb)
Stuff (namely, the final/updated .opfs) will populate into the final_out folder
n.b. where scripts are locally stored:
batch_basic_level.rb
Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/datavyuvideo_compare_spreadsheet.py
Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/reliabilitymerge_reliability.rb
Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/datavyuscatter.py
Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/scattervideo_bl.py
Fas-Phyc-PEB-Lab/Seedlings/Scripts_And_Apps/Github/seedlings/collect
6. Post-consensus processing [RA]
Open the reliability coding issues doc:
https://docs.google.com/document/d/1eKncqrDu5OXwDb559--ILXAdgZSzAMqUedcuV65hFvw/edit?usp=sharing
see if it contains any notes that you left about what needs to be changed in your file
if there are changes, then make your changes into this file:
Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/final_out
once you have made the relevant changes, then highlight the comments in the coding issues docx
remember to save
If you have added any words, remember to run add_annotation_id_video.py
Check for errors with
run_all_postannotation.rb
7. More post-consensus processing [Lab coordinator]
7a) Send 'em back: updated .opfs
Use scatter/opf.py
to send these final/updated .opfs back to Subject Files
This script takes one argument: the path to the final_out directory
Also, use the --rename flag (which will rename all video opfs to "_sparse_code")
Command to run:
python Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/scatter/opf.py Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/final_out --rename
NOTE: as of 11/26/19, users need to cd into the the scatter repo in order to run; due to the way paths are hard-coded into the script
7b) Batch-wordmerge the video files
(7bi) Use
batch_basic_level.rb
again to put things into a new folderFirst create a folder called processed_and_old by hand; i.e. create reliability_13/video/processed_and_old
"processed" = freshly processed during this round of reliability
"old" = the older, previous version currently in Subject_Files
Then open this script in Atom (or another text editor of choice) and set some variables
$input_dir = the final_out folder
$output_dir = the processed_and_old folder
Then run the script via dataVyu (i.e. Script > Run Script > batch_basic_level.rb)
Stuff will populate into the processed_and_old folder. This stuff is the "processed" component of "processed_and_old".
(7bii) Use
collect/video_bl.py
to copy "old" basic levels from Subject_Files into processed_and_oldFirst open the script in Atom (or another text editor of choice) and set some variables
months = ['13']
Then run the script
This script takes two arguments:
argv[1] is the path/to/Subject_Files
argv[2] is the path/to/processed_and_old
Command to run:
python /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/collect/video_bl.py /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Subject_Files /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_13/video/processed_and_old
More stuff will populate into the processed_and_old folder. This is the "old" component of "processed_and_old"
(7biii) Run the batch-wordmerge script
Command to run:
python /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Scripts_and_Apps/Github/seedlings/wordmerge2/wordmerge2_annotid.py /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/processed_and_old /Volumes/Fas-Phyc-PEB-Lab/Duke/Seedlings/Working_Files/reliability_14/video/batch_wordmerge_output video
7c) Check basic levels
One by one open each subject's month 13 sparse code in {audio: Subject Files, video: batch_wordmerge_output} and double-check the basic levels, fixing anything that needs to be fixed
7d) Send 'em back: video basic levels
The .csv basic level stuff, created during batch wordmerge and populated into batch_wordmerge_output just now, will be sent back at this time using scatter; see here for how to use scatter to send back basic levels
Last updated