Annotations Validation Script

This page is a work in progress

This is a guide for using blabpy.validate to perform super checking on any project that uses ACLEW scheme of annotations.

Prerequisites

  • Python 3 installed. See instructions here.

  • blabpy installed (version ≥ 0.38.1). See instructions here on how to install it and here on how to check the version and upgrade.

  • BLAB_SHARE mounted.

Usage

This script will generate one Markdown file per eaf annotation file and perform several validation checks on general ACLEW conventions, tier hierarchy, blank code and interval coding. The script should perform all the checks that was include in the minCHAT checker and more. Currently, the most of the checks are based on the minimal standard ACLEW scheme and have not quite incorporated novel tiers and coding scheme unique to each project. Specifically:

  • Listing all unique speakers and reporting any speakers that do not conform to the ACLEW naming scheme.

  • Validating standard tier hierarchy (e.g. cds should be a child of xds), reporting any unconventional tiers and their dependency.

Currently, the script is NOT checking for tiers that should be coded (e.g. if the lex tier is W, there must be an mwu tier). I don't know how useful this is, since certain project might choose to not care about certain tiers.

  • Reporting the number of annotations per interval and whether intervals containing blank annotations, as well as any annotations not coded for interval.

  • Reporting any tier that is blank.

  • Validating that the code in each tier is in their controlled vocabulary.

  • Validating parent-tier dependency values (e.g. if there is a mwu tier, its parent lex tier must be coded as W)

  • Validates the transcription text according to ACLEW transcription conventions, in particular checking that transcriptions end with exactly one terminal punctuation, that square bracketed annotations are correctly formatted as <blabla> [: blabla], <blabla> [=! blabla] or [- abc], and that at-sign annotations are correctly formatted as bla@c, bla@l, or bla@s:eng.

The transcription validation check currently does not handle nested brackets well (e.g. <get [: moving]> [=! sings]) and might flag these as errors by mistake.

How to run the script

For brevity, in this document, I will refer to the file path to blab share as BLAB_SHARE_PATH , the path to the VIHI folder as VIHI_PATH, and the path to OvS as OVS_PATH .

  • Open Terminal.

  • Change directory into the folder with files to be super-checked. The script is designed to check every file in this folder, and go into every folder in this folder and check every file, and further go into every folder in these folders, etc. It will find every .eaf files and generate one report file for each eaf file.

If you only want to super-check one file, change directory into the individual annotator's folder (say, VIHI_PATH/annotations/VI_001_676_Lilli-Righter). If you want to batch super-check multiple file then 1) if you have a folder that store all annotations to be super-checked, cd into that folder, or 2) I would recommend creating a separate folder to store all files to be super-checked and make a copy of every eaf file to be super-checked in said folder.

# Example, say you want to go the OvS annotations-to-be-superchecked folder 
cd ~
cd /Volumes/Fas-Phyc-PEB-Lab # or other filepath to blab_share
cd OvSpeech/SubjectFiles/Seedlings/overheard_speech/annotations-to-be-superchecked
  • Run the following line in the Terminal. The script will create a new folder called {today's date}_validation_reports inside this your current folder and generate a .md report file for each .eaf file found. You can open these md files in RStudio and select Preview to view these report as HTML pages.

validate .

Optionally, you can use a custom name for the output folder and run validate . output-folder-name instead.

Developmental Notes

Since the script is still in development, please let me know of any error or issue in use. As such, even though this script covers all the functionality of the minCHAT checker, it might be prudent to run it through the minCHAT checker for cross-validation.

Last updated