Code

Where to find code that help working with the Seedlings data, so that you don't have to reinvent the wheel

There are two principal ways that code for working with Seedlings is distributed:

  • Default one. Through two libraries: blabr and blabpy.

    • Neither of the libraries has any narrative documentation at the moment, most of the functions are documented though. So, a good way to find if something already exists is a GitHub search.

    • Many of the functions are not too robust. If something isn't working as expected, feel free to open an issue on GitHub or slack the lab technician.

    • Zhenya thinks that all the code that is run more than once should be moved to these two libraries.

  • Old one. An assortment of *.py scripts that live in several GitHub repositories that are cloned to Fas_Phyc-PEB-Lab/Seedlings/Scripts_and_Apps/Github/seedlings. Many of the instructions in the Seedlings chapter will point you to specific scripts. Avoid using those scripts, use blabpy as much as possible. Do not rely on the clones in Scripts_and_Apps either. They often contain uncommitted changes that either should have been committed or should have been undone - your guess is as good as mine. So, if you do want to use one of those scripts:

    • Clone the repo to your computer.

    • Try running the script.

    • If it works - great!

    • If it doesn't, see if the clone in Scripts_and_Apps has changes that might be useful. If those changes are useful, commit them, push them, update the clone in Scripts_and_Apps.

blabr

GitHub repo here.

R package. It is not on CRAN, so it has to be installed from GitHub with

remotes::install_github('BergelsonLab/blabr')

Here are some functions and modules one might find useful:

  • get_blab_share_path finds the location of the BLab share.

  • module get_data has functions to download specific version from our data repositories on GitHub (currently, we might move them later).

    • get_all_basiclevel downloads a specific version of the all_basiclevel dataset.

  • big_aggregate aggregates information in all_basiclevel (see get_all_basiclevel) outputting a big dataframe.

  • blabr:::make_new_global_basic_level adds a column with the global (corpus-wide) basic level information.

  • module lena contains functions to calculate annotations metrics for a set of intervals and then select intervals with the most X.

  • module seedlings has functions to read sparse code csvs - csv versions of annotations with an extra basic_level column.

blabpy

GitHub repo here.

Python package. It is on PyPI and can be installed with

pip install blabpy

Here are some functions and modules one might find useful:

  • Annotations-to-all_basiclevel pipeline:

    • blabpy.seedlings.cha functions to read/write/extract information from the .cha files (CLAN's CHAT annotation files for audio recordings).

    • blabpy.seedlings.opf - same but for the datavyu's .opf files with video annotations.

    • blabpy.seedlings.merge functions that combine new sparse code csvs with data from the existing ones.

    • blabpy.seedlings.gather functions to assemble the all_basiclevel dataset.

    • blabpy.seedlings.pipeline putting all of the above together so that it can be run on the whole corpus at once.

  • blabpy.seedlings.listened_time - functions that help figure out how much of the recordings have been listened to and annotated.

  • blabpy.seedlings.paths - functions that help locate certain types of files. Not always consistent signature - raising issues is highly appreciated.

Last updated