> For the complete documentation index, see [llms.txt](https://gitbook.bergelsonlab.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://gitbook.bergelsonlab.com/data-pipeline/code.md).

# Code

There are two principal ways that code for working with Seedlings is distributed:

* Default one. Through two libraries: `blabr` and `blabpy`.&#x20;
  * Neither of the libraries has any narrative documentation at the moment, most of the functions are documented though. So, a good way to find if something already exists is a GitHub search.&#x20;
  * Many of the functions are not too robust. If something isn't working as expected, feel free to open an issue on GitHub or slack the lab technician.
  * Zhenya thinks that all the code that is run more than once should be moved to these two libraries.
* Old one. An assortment of `*.py` scripts that live in several GitHub repositories that are cloned to `Fas_Phyc-PEB-Lab/Seedlings/Scripts_and_Apps/Github/seedlings`. Many of the instructions in the Seedlings chapter will point you to specific scripts. Avoid using those scripts, use `blabpy` as much as possible. Do not rely on the clones in `Scripts_and_Apps` either. They often contain uncommitted changes that either should have been committed or should have been undone - your guess is as good as mine. So, if you do want to use one of those scripts:
  * Clone the repo to your computer.
  * Try running the script.
  * If it works - great!
  * If it doesn't, see if the clone in `Scripts_and_Apps` has changes that might be useful. If those changes *are* useful, commit them, push them, update the clone in `Scripts_and_Apps`.

### blabr

GitHub repo [here](https://github.com/BergelsonLab/blabr).

R package. It is not on CRAN, so it has to be installed from GitHub with

```
remotes::install_github('BergelsonLab/blabr')
```

Here are some functions and modules one might find useful:

* `get_blab_share_path` finds the location of the BLab share.
* module `get_data` has functions to download specific version from our data repositories on GitHub (currently, we might move them later).&#x20;
  * `get_all_basiclevel` downloads a specific version of the `all_basiclevel dataset`.
* `big_aggregate` aggregates information in `all_basiclevel` (see `get_all_basiclevel`) outputting a big dataframe.
* `blabr:::make_new_global_basic_level` adds a column with the global (corpus-wide) basic level information.
* module `lena` contains functions to calculate annotations metrics for a set of intervals and then select intervals with the most `X`.
* module `seedlings` has functions to read sparse code csvs - csv versions of annotations with an extra `basic_level` column.

### blabpy

GitHub repo [here](https://github.com/BergelsonLab/blabpy).

Python package. It is on PyPI and can be installed with

```
pip install blabpy
```

Here are some functions and modules one might find useful:

* Annotations-to-all\_basiclevel pipeline:
  * `blabpy.seedlings.cha` functions to read/write/extract information from the `.cha` files (CLAN's CHAT annotation files for audio recordings).
  * `blabpy.seedlings.opf` - same but for the datavyu's `.opf` files with video annotations.
  * `blabpy.seedlings.merge` functions that combine new sparse code csvs with data from the existing ones.
  * `blabpy.seedlings.gather` functions to assemble the `all_basiclevel` dataset.
  * `blabpy.seedlings.pipeline` putting all of the above together so that it can be run on the whole corpus at once.
* `blabpy.seedlings.listened_time` - functions that help figure out how much of the recordings have been listened to and annotated.
* `blabpy.seedlings.paths` - functions that help locate certain types of files. Not always consistent signature - raising issues is highly appreciated.
