Code
Where to find code that help working with the Seedlings data, so that you don't have to reinvent the wheel
There are two principal ways that code for working with Seedlings is distributed:
Default one. Through two libraries:
blabrandblabpy.Neither of the libraries has any narrative documentation at the moment, most of the functions are documented though. So, a good way to find if something already exists is a GitHub search.
Many of the functions are not too robust. If something isn't working as expected, feel free to open an issue on GitHub or slack the lab technician.
Zhenya thinks that all the code that is run more than once should be moved to these two libraries.
Old one. An assortment of
*.pyscripts that live in several GitHub repositories that are cloned toFas_Phyc-PEB-Lab/Seedlings/Scripts_and_Apps/Github/seedlings. Many of the instructions in the Seedlings chapter will point you to specific scripts. Avoid using those scripts, useblabpyas much as possible. Do not rely on the clones inScripts_and_Appseither. They often contain uncommitted changes that either should have been committed or should have been undone - your guess is as good as mine. So, if you do want to use one of those scripts:Clone the repo to your computer.
Try running the script.
If it works - great!
If it doesn't, see if the clone in
Scripts_and_Appshas changes that might be useful. If those changes are useful, commit them, push them, update the clone inScripts_and_Apps.
blabr
GitHub repo here.
R package. It is not on CRAN, so it has to be installed from GitHub with
remotes::install_github('BergelsonLab/blabr')Here are some functions and modules one might find useful:
get_blab_share_pathfinds the location of the BLab share.module
get_datahas functions to download specific version from our data repositories on GitHub (currently, we might move them later).get_all_basicleveldownloads a specific version of theall_basiclevel dataset.
big_aggregateaggregates information inall_basiclevel(seeget_all_basiclevel) outputting a big dataframe.blabr:::make_new_global_basic_leveladds a column with the global (corpus-wide) basic level information.module
lenacontains functions to calculate annotations metrics for a set of intervals and then select intervals with the mostX.module
seedlingshas functions to read sparse code csvs - csv versions of annotations with an extrabasic_levelcolumn.
blabpy
GitHub repo here.
Python package. It is on PyPI and can be installed with
pip install blabpyHere are some functions and modules one might find useful:
Annotations-to-all_basiclevel pipeline:
blabpy.seedlings.chafunctions to read/write/extract information from the.chafiles (CLAN's CHAT annotation files for audio recordings).blabpy.seedlings.opf- same but for the datavyu's.opffiles with video annotations.blabpy.seedlings.mergefunctions that combine new sparse code csvs with data from the existing ones.blabpy.seedlings.gatherfunctions to assemble theall_basicleveldataset.blabpy.seedlings.pipelineputting all of the above together so that it can be run on the whole corpus at once.
blabpy.seedlings.listened_time- functions that help figure out how much of the recordings have been listened to and annotated.blabpy.seedlings.paths- functions that help locate certain types of files. Not always consistent signature - raising issues is highly appreciated.
Last updated