Code
Where to find code that help working with the Seedlings data, so that you don't have to reinvent the wheel
There are two principal ways that code for working with Seedlings is distributed:
Default one. Through two libraries:
blabr
andblabpy
.Neither of the libraries has any narrative documentation at the moment, most of the functions are documented though. So, a good way to find if something already exists is a GitHub search.
Many of the functions are not too robust. If something isn't working as expected, feel free to open an issue on GitHub or slack the lab technician.
Zhenya thinks that all the code that is run more than once should be moved to these two libraries.
Old one. An assortment of
*.py
scripts that live in several GitHub repositories that are cloned toFas_Phyc-PEB-Lab/Seedlings/Scripts_and_Apps/Github/seedlings
. Many of the instructions in the Seedlings chapter will point you to specific scripts. Avoid using those scripts, useblabpy
as much as possible. Do not rely on the clones inScripts_and_Apps
either. They often contain uncommitted changes that either should have been committed or should have been undone - your guess is as good as mine. So, if you do want to use one of those scripts:Clone the repo to your computer.
Try running the script.
If it works - great!
If it doesn't, see if the clone in
Scripts_and_Apps
has changes that might be useful. If those changes are useful, commit them, push them, update the clone inScripts_and_Apps
.
blabr
GitHub repo here.
R package. It is not on CRAN, so it has to be installed from GitHub with
Here are some functions and modules one might find useful:
get_blab_share_path
finds the location of the BLab share.module
get_data
has functions to download specific version from our data repositories on GitHub (currently, we might move them later).get_all_basiclevel
downloads a specific version of theall_basiclevel dataset
.
big_aggregate
aggregates information inall_basiclevel
(seeget_all_basiclevel
) outputting a big dataframe.blabr:::make_new_global_basic_level
adds a column with the global (corpus-wide) basic level information.module
lena
contains functions to calculate annotations metrics for a set of intervals and then select intervals with the mostX
.module
seedlings
has functions to read sparse code csvs - csv versions of annotations with an extrabasic_level
column.
blabpy
GitHub repo here.
Python package. It is on PyPI and can be installed with
Here are some functions and modules one might find useful:
Annotations-to-all_basiclevel pipeline:
blabpy.seedlings.cha
functions to read/write/extract information from the.cha
files (CLAN's CHAT annotation files for audio recordings).blabpy.seedlings.opf
- same but for the datavyu's.opf
files with video annotations.blabpy.seedlings.merge
functions that combine new sparse code csvs with data from the existing ones.blabpy.seedlings.gather
functions to assemble theall_basiclevel
dataset.blabpy.seedlings.pipeline
putting all of the above together so that it can be run on the whole corpus at once.
blabpy.seedlings.listened_time
- functions that help figure out how much of the recordings have been listened to and annotated.blabpy.seedlings.paths
- functions that help locate certain types of files. Not always consistent signature - raising issues is highly appreciated.
Last updated