Parallel Annotation: Merging

This should be done by the lab technician after the lab member responsible for a given type of annotating tells them to.

Below, I use $BLAB_SHARE to refer to the path to BLab share on your computer. At the time of writing, on Macs, it is most likely /Volumes/Fas-Phyc-PEB-Lab, on Windows in git-bash - the letter of the mapped drive, e.g., /x.

Let $LENA_PATH refer to $BLAB_SHARE/VIHI/SubjectFiles/LENA.

Let's say we are merging XX_123_456 which was annotated by Jane Doe. The annotation should live under

$IN_PROGRESS_PATH=$LENA_PATH/annotations-in-progress/XX_123_456_Jane-Doe

Check the recording is ready to merge

First, make sure that all the changes have been committed and pushed to BLab share:

git -C $IN_PROGRESS_PATH status

If it doesn't say

Your branch is up to date with 'blab_share/annotating/XX_123_456_Jane-Doe'.

and

nothing to commit, working tree clean

ask the lab member who asked you to merge the recording to deal with that first.

Sometimes, they will have a problem pushing to BLab share - you might need to push for them. In other cases, you won't be able to run any commands in their folder due to permission issues. In that case, ask them to double-check that everything has been committed and pushed and then — after you've successfully merged their branch — to delete the folder.

If everything has been pushed and committed, move on to merging.

Merge

If you are merging just a couple of recordings then you can work directly in $LENA_PATH/annotations. Otherwise, use a local clone - it will be a lot faster.

Merging in $LENA_PATH/annotations

First, cd into $LENA_PATH/annotations

Basic git-based merge:

git merge -m "merge: XX_123_456_Jane-Doe" annotating/XX_123_456_Jane-Doe

In case of git conflicts

There is a CLI uitlity in blabpy that helps you merge the files:

eaf merge XX/XX_123/XX_123_456/XX_123_456.eaf

If there are conflicts/issues that eaf merge will fail at, there will be three new files saved: XX_123_456.<suffix> where suffix is one of: OURS, THEIRS, BASE. Edit these files as necessary.

Below are a few examples of potential issues and what was done to fix those:

Controlled vocabulary codes updated on the main branch but not on the annotating/XX_123_456_Jane-Doe.
from blabpy.eaf.eaf_tree import EafTree
eaf_path_original = 'XX/XX_123/XX_123_456/XX_123_456.eaf'
for side in ('THEIRS', 'BASE'):
    eaf_path = f'{eaf_path_original}.{side}'
    eaf_tree = EafTree.from_eaf(eaf_path, validate_cv_entries=False)
    for tier in eaf_tree.tiers.values():
        if tier.linguistic_type.id == 'CDS':
            for ann_id, annotation in tier.annotations.items():
                if annotation.value == 'U':
                    annotation.value = 'X'
                if annotation.value == 'B':
                    annotation.value = 'M'
    eaf_tree.to_eaf(eaf_path)
    # to test that the cv entries match now
    eaf_tree = EafTree.from_eaf(eaf_path)

Rogue tier conflicted with other tiers but didn't need to exist at all.
from blabpy.eaf.eaf_tree import EafTree
eaf_path_original = 'XX/XX_123/XX_123_456/XX_123_456.eaf'
for side in ('THEIRS', 'BASE', 'OURS'):
    eaf_path = f'{eaf_path_original}.{side}'
    eaf_tree = EafTree.from_eaf(eaf_path)
    if 'SYL' in eaf_tree.tiers:
        syl_tier = eaf_tree.tiers['SYL']
        syl_tier.drop_all_annotations()
        eaf_tree.drop_tier('SYL')
        eaf_tree.to_eaf(eaf_path)

Once the three sides are compatible, run

eaf merge XX/XX_123/XX_123_456/XX_123_456.eaf --use-temps

And, once the above runs successfully,

git add XX/XX_123/XX_123_456/XX_123_456.eaf
git commit

Merging using a local clone

Push all the branches to GitHub:

# from $LENA_PATH/annotations
git push origin 'refs/heads/annotating/*:refs/heads/annotating/*'

Switch to the local folder you want the clone to live in and clone and fetch all branches:

gh repo clone bergelsonlab/VIHI_LENA
cd VIHI_LENA
git fetch --all

Continue with the merging process as described in Merging in $LENA_PATH/annotations with the following change: whereas before you had the branch named like this:

annotating/XX_123_456_Jane-Doe

It will now be

origin/annotating/XX_123_456_Jane-Doe

The latter is a local branch in $LENA_PATH/annotations and the latter is a remote-tracking branch, hence the added origin/ at the beginning.

Once done, run git push from the local clone and git pull in $LENA_PATH/annotations.

Cleanup

Once the actual merging is done, go to $LENA_PATH/annotations and then

  1. git branch -d annotating/XX_123_456_Jane-Doe
  2. Delete the folder at $IN_PROGRESS_PATH.

If you were using a local clone, you'll additionally need to run

git push origin --delete annotating/XX_123_456_Jane-Doe

either from your local clone or from $LENA_PATH/annotations.

Status check

Should be done weekly.

There are principally two checks we need to run. We have scripts in the blabsh repo that can help with that.

  • There is no unsaved work in annotations-in-progress child folders. Use script par-ann_check-in-progress-folders.sh

  • There are no loose (merged but not deleted) branches in annotations (main blab share clone). Use script par-ann_check-branches

A few caveats:

  • It is OK to have unsaved work in a folder that's been modified today - this is probably someone actively working on it.

  • par-ann_check-branches distinguishes two cases when a branch appears merged into main:

    • After the branching, no new commits were added to the branch. This is OK if

Last updated