Datavyu Script Repository

Datavyu Scripts

The repository for our Datavyu scripts can be found here. The usual directions for working with git repositories still apply.

This repository contains a collection of ruby scripts that make sure everything looks right in a Datavyu file. One of them, run_all.rb, runs all the checks at the same time. You should usually run this rather than the individual scripts. Clone this repository to the local machines and select it as the favorites folder for scripts from within Datavyu (Script -> Set Favorites Folder). You can run them by double clicking their names in the bottom left corner in Datavyu.

Before annotating

You should run this script before you've started coding:

  • insert_columnCodes_labeledObject_preannotation.rb

This inserts new column as "labeled_object" and inserts our four fields within the column.

After annotating

You should run these scripts once you've finished coding. Usually this means running run_all_postannotation.rb rather than each individual script. They're separated into 4 distinct programs with multiple checks per script:

  1. check_codes.rb

    • entered values are one of the predefined codes

    • speaker code is exactly 3 letters long

    • none of the codes are empty

  2. check_comments.rb

    • all of the non-comment codes are "NA"

    • offset and onset are equal

  3. check_intervals.rb

    • all onsets come prior to offsets

  4. personalinfo.rb

    • pulls out the timestamps for the personal info comments and writes them out to a file to be used by videoscrub.py

  5. run_all_postannotation.rb

    • runs all of the above scripts (does not include compare_columns.rb)

Personal Information

The personal information script outputs a .csv file in the folder containing the .opf file being worked on.

Formatting:

source,onset_ms,offset_ms   
audio,3000,5000     
audio,8000,14000        
audio,22000,34000   
video,1000,3000     
video,24000,34000

If there are no personal info regions annotated in the .opf file, then the name of the file will be added to a running list in /seedlings/Scripts_and_Apps/no_personal_info.txt (so we know it's been personalinfo processed).

Compare Columns

For instructions on using compare_columns.rb, see the Datavyu Consensus page.

Pulling Out Child Productions

Getting child and comment rows

In order to get all rows in the first column that have a child (CHI) speaker, a line commented with a multi word utterance (%com: mwu), or a line commented with a first word (%com: first word), you should run the following script:

  • get_child.rb

This script will create a new column named 'child_labeled_object' which can be edited. Each CHI utterance cell will also have an accompanying %pho phonetic transcription cell under it. The pho cell will be a point cell with the onset/offset set to the offset value of the cell it's referring to.

If you need to add a new cell to the column that was pulled out, make sure to add "NEW" to the cell_number code. This is how the merge script will know how to handle the new cell when merging.

Phonetic annotations should be added as point cell comments right under each utterance, using this format:

%pho: phonetic trans,NA,NA,NEW

Merging edits from get_child.rb

Once you have made all of the necessary edits to the rows in the 'child_labeled_object' column, run the following script to merge your edits:

  • merge_child_edits.rb

All of your changes will be reflected into the first column.

Last updated