Datavyu Scripts
Datavyu Scripts on Git
The repository for our Datavyu scripts can be found here.
This repository contains a collection of ruby scripts that make sure everything looks right in a Datavyu file. One of them, run_all.rb, runs all the checks at the same time. You should usually run this rather than the individual scripts. Clone this repository to the local machines and select it as the favorites folder for scripts from within Datavyu (Script -> Set Favorites Folder). You can run them by double clicking their names in the bottom left corner in Datavyu.
Before annotating
You should run this script before you've started coding:
insert_columnCodes_labeledObject_preannotation.rb
This inserts new column as "labeled_object" and inserts our four fields within the column.
After annotating
You should run these scripts once you've finished coding. Usually this means running run_all_postannotation.rb rather than each individual script. They're separated into 4 distinct programs with multiple checks per script:
check_codes.rb
entered values are one of the predefined codes
speaker code is exactly 3 letters long
none of the codes are empty
check_comments.rb
all of the non-comment codes are "NA"
offset and onset are equal
check_intervals.rb
all onsets come prior to offsets
personalinfo.rb
pulls out the timestamps for the personal info comments and writes them out to a file to be used by videoscrub.py
run_all_postannotation.rb
runs all of the above scripts (does not include compare_columns.rb)
Personal Information
The personal information script outputs a .csv file in the folder containing the .opf file being worked on.
Formatting:
If there are no personal info regions annotated in the .opf file, then the name of the file will be added to a running list in /seedlings/Scripts_and_Apps/no_personal_info.txt (so we know it's been personalinfo processed).
CHIs in Videos
Pulling Out Child Productions
Getting child and comment rows
In order to get all rows in the first column that have a child (CHI) speaker, a line commented with a multi word utterance (%com: mwu), or a line commented with a first word (%com: first word), you should run the following script:
get_child.rb
This script will create a new column named 'child_labeled_object' which can be edited. Each CHI utterance cell will also have an accompanying %pho phonetic transcription cell under it. The pho cell will be a point cell with the onset/offset set to the offset value of the cell it's referring to.
If you need to add a new cell to the column that was pulled out, make sure to add "NEW" to the cell_number code. This is how the merge script will know how to handle the new cell when merging.
Phonetic annotations should be added as point cell comments right under each utterance, using this format:
Merging edits from get_child.rb
Once you have made all of the necessary edits to the rows in the 'child_labeled_object' column, run the following script to merge your edits:
merge_child_edits.rb
All of your changes will be reflected into the first column.
Last updated