blabr and blabpy
Documentations for BLab in-house library (in progress)
This is a reference for the functions available in our two in-house libraries, blabr and blabpy, which are designed to support work with various BLab datasets. These libraries provide reusable code for handling more complex or coding-intensive tasks, particularly those involving specialized file formats or data wrangling operations. If you encounter a challenging or repetitive task, consult this documentation first (or ask the lab technician). In many cases, a function has already been developed to do just that.
This is not an intensive guide, only the functions you'll most likely find useful.
Installation guides can be found here.
blabr
This library provides functions aim to access clean csv files of different corpora, with parameters for exactly which version of the corpora one may need. In addition, it should simplifies how to access different metadata of the same file (such as vtc output for VIHI or cdi for seedlings).
See full documentation here.
blabpy
blabpy has Python scripts and modules for directly handling and modifying the different corpora in our lab. It is most useful for loading and navigating files on blab_share.
The package itself is quite formulaic, with submodules for each projects, as well as general code for handling different file types. Each submodules typically have a path.py for easy navigation of blab_share, a pipeline.py with commonly used functions for handling each submodules, and a cli.py for command line interfaces.
Base level modules
blabpy.pipeline
For working with ACLEW style annotations
extract_aclew_data(path, recursive=True, show_tqdm_pbar=False)
find_eaf_paths(path, recursive=True)blabpy.path
Utilitiy functions for getting path to blab_data and blab_share
blabpy.vtc
Functions to work with the VTC output files. These files have an ".rttm" extension, they are space-separated text fiels without column names. VTC - Voice Type Classifier - a set of of voice classification models and the code to apply them. See GitHub for more details.
Subpackages
blabpy.seedlings
blabpy.seedlingsTBA
blabpy.eaf
blabpy.eafSubpackage with classes and functions for working with ELAN .eaf files. There are several ways of working with .eaf using this module:
EafPlus class which is just pympi.Eaf plus a few extra methods. Has a lot implemented but makes it hard to navigate the data and makes editing result in unnecessarily large diffs.
EafTree class which is a wrapper around an ElementTree object. Good for navigating data and editing but lacks any functionality to add new elements.
An assortment of functions in eaf_utils that work with .eaf files as XML trees that they are. They allow adding new elements to eaf files. These functions should eventually be moved to EafTree.
There is also an etree_utils module that contains functions for working with xml.etree.ElementTree.Element objects that aren't specific to .eaf files.
eaf.eaf_plus
vihi
vihi.pipeline
Last updated