Annotations Validation Script
This is a guide for using blabpy.validate to perform super checking on any project that uses ACLEW scheme of annotations.
Prerequisites
Python 3 installed. See instructions here.
BLAB_SHAREmounted.
Usage
This script will generate one Markdown file per eaf annotation file and perform several validation checks on general ACLEW conventions, tier hierarchy, blank code and interval coding. The script should perform all the checks that was include in the minCHAT checker and more. Currently, the most of the checks are based on the minimal standard ACLEW scheme and have not quite incorporated novel tiers and coding scheme unique to each project. Specifically:
Listing all unique speakers and reporting any speakers that do not conform to the ACLEW naming scheme.
Validating standard tier hierarchy (e.g. cds should be a child of xds), reporting any unconventional tiers and their dependency.
Reporting the number of annotations per interval and whether intervals containing blank annotations, as well as any annotations not coded for interval.
Reporting any tier that is blank.
Validating that the code in each tier is in their controlled vocabulary.
Validating parent-tier dependency values (e.g. if there is a mwu tier, its parent lex tier must be coded as W)
Validates the transcription text according to ACLEW transcription conventions, in particular checking that transcriptions end with exactly one terminal punctuation, that square bracketed annotations are correctly formatted as <blabla> [: blabla], <blabla> [=! blabla] or [- abc], and that at-sign annotations are correctly formatted as bla@c, bla@l, or bla@s:eng.
How to run the script
For brevity, in this document, I will refer to the file path to blab share as BLAB_SHARE_PATH , the path to the VIHI folder as VIHI_PATH, and the path to OvS as OVS_PATH .
Open Terminal.
Change directory into the folder with files to be super-checked. The script is designed to check every file in this folder, and go into every folder in this folder and check every file, and further go into every folder in these folders, etc. It will find every .eaf files and generate one report file for each eaf file.
# Example, say you want to go the OvS annotations-to-be-superchecked folder
cd ~
cd /Volumes/Fas-Phyc-PEB-Lab # or other filepath to blab_share
cd OvSpeech/SubjectFiles/Seedlings/overheard_speech/annotations-to-be-supercheckedRun the following line in the Terminal. The script will create a new folder called
{today's date}_validation_reportsinside this your current folder and generate a .md report file for each .eaf file found. You can open these md files in RStudio and select Preview to view these report as HTML pages.
validate .Developmental Notes
Since the script is still in development, please let me know of any error or issue in use. As such, even though this script covers all the functionality of the minCHAT checker, it might be prudent to run it through the minCHAT checker for cross-validation.
Last updated