Proposal for Parallel Annotation of VIHI

Where annotation files can live.
GitHub repo/GitHub. VIHI_LENA repository on GitHub.
Individual folder. An individual annotator-recording folder in , e.g.,
Fas-Phyc-PEB-Lab/VIHI/annotations-in-progress/LENA/George-Romero_AB_123_456
. Inside that folder is a clone of VIHI_LENA with only files for this one recording checked out, i.e.,.../George-Romero_AB_123_456/AB/AB_123/AB_123_456/*.*
are the only files in that folder. This is also the only folder that ELAN touches.Individual branch. A branch checked out in the individual folder. Named something like
George-Romero_AB_123_456
LENA folder. A folder on the BLab share at
Fas-Phyc-PEB-Lab/VIHI/SubjectFiles/LENA
to which the GitHub repo is cloned with themain
branch checked out. There are three distinct ways that data are stored here:Working Tree. That’s all the tracked files inside the
LENA
folder. There should never be any files that are in the modified, deleted, etc. state, i.e.,git status
should always sayworking tree clean, nothing to see here
. The only thing that should touch that folder ispull -ff-only
. Other than that, this should be considered a read-only folder. I haven’t yet come up with a way to enforce this while allowing pulling at the same time. I’ll think of something.Ignored files. Files in the LENA folder but not tracked by git - they are either large files (wav/its) or files we won’t miss if something happens to them. Ideally, the important files wouldn’t be here at all and would be stored separately and linked here. In any case, I’ll change them to read-only to avoid losing them.
BLab share repo.
Fas-Phyc-PEB-Lab/VIHI/SubjectFiles/LENA/.git
folder. That’s where copies of the branches inannotations-in-progress
are pushed to after every commit. One staff member (by default, Zhenya) regularly pushes these branches to GitHub.
Ensuring no data is lost
Ideally, we should achieve a state where we can delete the LENA folder and the annotations-in-progress folder at any given moment.
The working tree is just a mirror of the GitHub repo - recoverable.
Objects on branches saved in the BLab share repo are pushed to GitHub - recoverable too.
The ignored but important files (.wav, .its, etc.) are not even in the folders.
We don't care about other ignored files.
🤦🏻 And I’ve just realized that I am sort of re-inventing the child-project system.
Operations that change states
Annotating. Changes the .eaf and the .pfsx files
Saving locally. In the individual folder,
git-add all changes (we are not affecting the main branch so it is OK in this situation),
git-commit them,
git-push them to the BLab share repo.
Pushing to GitHub. Push branches from the BLab share repo to GitHub. Not the main branch though, which we only ever
pull --ff-only
to.Rebasing. Replaying commits on a super-checked branch onto the main branch on GitHub.
Updating BLab share repo. Run
pull --ff-only
in the LENA folder.
Annotation process
(the level of details decreases as we move down the list.
Annotation
A recording is assigned to an annotator.
Annotator tells
blabpy
about that.An individual folder is created and opens in Finder/Explorer.
Annotator opens ELAN and annotates as annotators do.
(optional) They tell
blabpy
that they are finished for the day andblabpy
saves the annotations locally.They tell
blabpy
that they are finished with the recording andblabpy
saves the annotations locally and notifies Lilli.
Super-checking
Lilli tells
blabpy
that she wants to super-check.An individual folder is created, Lilli makes edits, optionally finishes for the day, finishes fully, tells
blabpy
about that,blabpy
saves changes locally.
Incorporating changes into the GitHub, and the local repos.
rebases the branch in the BLab share repo onto the
main
branch,merges the branch into the
main
branch without affecting the working tree,pushes the merged branch to GitHub.
Updates the BLab share repo.
(if
blabpy
fails) Zhenya gets a notification, tellsblabpy
that he needs to work on that one recording annotated by the annotator and then finishes the steps in the previous list item.(if Zhenya fails at resolving conflicts) Zhenya asks Lilli to resolve the conflicts that require thinking, not coding. Lilli tells blabpy, resolves conflicts, tells blabpy about that, it does the rest.
UX
An annotator starting a new recording:
Opens Terminal.
$ vihi annotation start XX_NNN_MMM
> What is your name? (Last First): <type-in-John-space-Doe>
> Hi, I am your VIHI annotation assistant. > My name is HAL 2023. What is your name? (First Last): <types-John-space-Doe>
If the name hasn’t been used yet:
> Hi, John Doe! > > It looks like you are not on my annotators' list. Have you worked with me before? Select from the options below. > 1. It is your first time working with me, and I need to add you to the list. > 2. You misspelled your name and want to type it again. > 3. You have worked with me before but possibly used a different version of your name, like Margaret instead of Peggie. You would like to see the list of annotator to see if you are on it. > 999. You want to continue some other time. Select a number: <x> # 1 > Nice to meet you, John Doe! I am looking forward to working with you. > Just one more thing: what is the email address that I can use to sign your work and write to you? email address: <[email protected]> # 2 (First Last): <types-John-space-Doe> # 3 Here is the list of annotators. If you find yourself on it, type the corresponding number. Otherwise, use one of the options below the list. > 1. Jane Doe > 2. Snow White > 3. Jack Doe > > 777. It is your first time working with me, and I need to add you to the list. > 888. You misspelled your name and want to type it again. > 999. You want to continue some other time. # 1-3 -> Hi, Snow White. GOTO next step. # 777. -> GOTO
Prompt changes to:
(John Doe working on XX_NNN_MMM) $
Finder/Explorer opens on the folder with the EAF file
Opens EAF in ELAN, annotates, saves, and closes ELAN.
If not yet finished with the recording:
Saves an in-progress version as a commit in (I forgot the alternative) we came up with yesterday
(John Doe working on XX_NNN_MMM) $ vihi annotation pause > Describe where you finished: <I finished annotating coding segment 7> > Saving and backing up. > Done! > When you get back to annotating this recording, run > "vihi annotation start XX_NNN_MMM" > again. > See you next time! $
Next time, GOTO 1.
If done with the recording:
Saves the finished version.
(John Doe working on XX_NNN_MMM) $ vihi annotation finish > Great job! Thank you, John Doe > Saving and backing up. > Done! > Lilli is gonna get a notification that she can super-check XX_NNN_MMM. > Slack her anyway, just in case. $
Zhenya
Receives a notification about a version conflict.
vihi annotation resolove-conflicts --manual XX_NNN_MMM John Doe
Resolves conflicts if it is a technical thing and finishes
Lilli
Super-checking.
vihi annotation super-check start XX_NNN_MMM
A Finder window opens with the folder that has the EAF.
Lilli does the super-checking.
If superchecking isn’t complete:
(Lilli working on XX_NNN_MMM) $ vihi annotation super-check pause > Describe where you finished: <I finished super-checking on annotation XYZ> > Saving, backing up, and pushing files. > Done! > When you get back to annotating this recording, run > "annotation super-check start" > again. > See you next time! $
GOTO 1
If super-checking is complete.
(Lilli working on XX_NNN_MMM) $ vihi annotation finish > Saving, backing up, and pushing files. > Done! > Replaying changes on top of the current main branch. > Done! > Pushing to GitHub. > Done! > Updating the BLab share repo. > Done!
If “Replaying changes…” or “Checking if…” reported any conflicts, go to “Zhenya” → “If there are conflicts…”
Conflict resolution
vihi annotation resolve-conflicts --elan XX_NNN_MMM John Doe
A Finder window opens with the folder that has the EAF.
Opens EAF in ELAN.
Resolves conflicts by editing conflicting annotations that are easy to find because the script did something helpful (no idea what that is yet :-).
vihi annotation conflicts-resolved
blabpy
does the rest.
Last updated