Annotations in csv format
All complete annotations are collected into two large csv files with all tiers and intervals (annotations.csv and intervals.csv ). The annotations are exported using a blabpy function, saved to a repo as these csv files, whence they can be loaded with a blabr function. See the vihi_annotations repo for details.
blabpy
blabpy.pipeline.extract_aclew_dataextracts annotations from a file or recursively from a folder and returns two dataframes: annotations and intervals. This function can be used to extract any eaf files, and is the most raw version of the data.Intervals table has one row per coding interval. Tables can be merged using the eaf_filename and code_num columns. These are extracted from the code, code_num, sampling_type, onset/offset and context tiers.
Annotations table has one row per participant-level annotation, all extra annotations (vcm, xds, etc.) are in their own columns. A missing child-tier segment is represented as NA, an empty one - as an empty string. Any annotations outside an interval (code_num) is assigned a code_num of -1
A version of this function in
blabpy.vihi.pipelinehas been adapted to combine this data with additional information extracted from selected_regions.csv files (specifically the rank of each code_num)
vihi_annotations
This repo contains the current version of the large VIHI csv files.
It contains an
update.shscript which clone the most recent version of VIHI_LENA and calls the blabpy function above to collect all the intervals and update the current csv. The lab technician should do this everytime a new file has gone through the annotation/supercheck/merge pipeline and is ready to part of the final csv.As noted above, these csv is the most "raw" version of the annotations.
annotations.csvhas every tiers along with every annotations and codes in the eaf files (with the exception of one or two entire tiers that can be excluded in theupdate.shscript), whileintervals.csvwill have only the intervals that were explicitly coded in the eaf files. In addition, no checks are performed on these once they have been through the superchecking steps. Thus, it is not recommended to use these csv directly.The repo needs to be cloned to your local
~/BLAB_DATA(instructions) to be accessible byblabr::get_vihi_annotations()
An older version of this page notes that the current version of update.sh uses the "Dev versions (0.0.0.9xxx)". I have no idea what this means but will look into it.
blabr
There is
blabr::get_vihi_annotations()that loads the csv from the repo mentioned above.The
vihi_annotationsrepo needs to be cloned to your local~/BLAB_DATA(instructions) to be accessible byblabr::get_vihi_annotations()This function is the *preferred* way to work with the full VIHI corpus since:
It performs various checks to make sure that there are no errors among annotations according to ACLEW standards
It ensures the correct data type of every column (including empty/NA values)
It processes the data to add derivative column that are useful but not included in the original, such as the is_top_5_hivol, and the first 90 minute interval (interval 0) that is not explicitly coded in the eaf files.
It has options for loading
Older version of the csv
Only the annotations, only the intervals, or the annotations with interval data merged to it
Only the random samples, only VI and TD matches, or the entire corpus
Include annotations with/without PI
See details of all that here
Last updated