Proposal for Parallel Annotation of VIHI

Flowchart of the Parallel Annotation Process Proposal

Where annotation files can live.

  1. GitHub repo/GitHub. VIHI_LENA repository on GitHub.

  2. Individual folder. An individual annotator-recording folder in , e.g., Fas-Phyc-PEB-Lab/VIHI/annotations-in-progress/LENA/George-Romero_AB_123_456. Inside that folder is a clone of VIHI_LENA with only files for this one recording checked out, i.e., .../George-Romero_AB_123_456/AB/AB_123/AB_123_456/*.* are the only files in that folder. This is also the only folder that ELAN touches.

  3. Individual branch. A branch checked out in the individual folder. Named something like George-Romero_AB_123_456

  4. LENA folder. A folder on the BLab share at Fas-Phyc-PEB-Lab/VIHI/SubjectFiles/LENA to which the GitHub repo is cloned with the main branch checked out. There are three distinct ways that data are stored here:

    1. Working Tree. That’s all the tracked files inside the LENA folder. There should never be any files that are in the modified, deleted, etc. state, i.e., git status should always say working tree clean, nothing to see here. The only thing that should touch that folder is pull -ff-only. Other than that, this should be considered a read-only folder. I haven’t yet come up with a way to enforce this while allowing pulling at the same time. I’ll think of something.

    2. Ignored files. Files in the LENA folder but not tracked by git - they are either large files (wav/its) or files we won’t miss if something happens to them. Ideally, the important files wouldn’t be here at all and would be stored separately and linked here. In any case, I’ll change them to read-only to avoid losing them.

    3. BLab share repo. Fas-Phyc-PEB-Lab/VIHI/SubjectFiles/LENA/.git folder. That’s where copies of the branches in annotations-in-progress are pushed to after every commit. One staff member (by default, Zhenya) regularly pushes these branches to GitHub.

Ensuring no data is lost

Ideally, we should achieve a state where we can delete the LENA folder and the annotations-in-progress folder at any given moment.

  • The working tree is just a mirror of the GitHub repo - recoverable.

  • Objects on branches saved in the BLab share repo are pushed to GitHub - recoverable too.

  • The ignored but important files (.wav, .its, etc.) are not even in the folders.

  • We don't care about other ignored files.

🤦🏻 And I’ve just realized that I am sort of re-inventing the child-project system.

Operations that change states

  • Annotating. Changes the .eaf and the .pfsx files

  • Saving locally. In the individual folder,

    • git-add all changes (we are not affecting the main branch so it is OK in this situation),

    • git-commit them,

    • git-push them to the BLab share repo.

  • Pushing to GitHub. Push branches from the BLab share repo to GitHub. Not the main branch though, which we only ever pull --ff-only to.

  • Rebasing. Replaying commits on a super-checked branch onto the main branch on GitHub.

  • Updating BLab share repo. Run pull --ff-only in the LENA folder.

Annotation process

(the level of details decreases as we move down the list.

  1. Annotation

    1. A recording is assigned to an annotator.

    2. Annotator tells blabpy about that.

    3. An individual folder is created and opens in Finder/Explorer.

    4. Annotator opens ELAN and annotates as annotators do.

    5. (optional) They tell blabpy that they are finished for the day and blabpy saves the annotations locally.

    6. They tell blabpy that they are finished with the recording and blabpy saves the annotations locally and notifies Lilli.

  2. Super-checking

    1. Lilli tells blabpy that she wants to super-check.

    2. An individual folder is created, Lilli makes edits, optionally finishes for the day, finishes fully, tells blabpy about that, blabpy saves changes locally.

  3. Incorporating changes into the GitHub, and the local repos.

    1. rebases the branch in the BLab share repo onto the main branch,

    2. merges the branch into the main branch without affecting the working tree,

    3. pushes the merged branch to GitHub.

    4. Updates the BLab share repo.

    5. (if blabpy fails) Zhenya gets a notification, tells blabpy that he needs to work on that one recording annotated by the annotator and then finishes the steps in the previous list item.

    6. (if Zhenya fails at resolving conflicts) Zhenya asks Lilli to resolve the conflicts that require thinking, not coding. Lilli tells blabpy, resolves conflicts, tells blabpy about that, it does the rest.

UX

An annotator starting a new recording:

  1. Opens Terminal.

    1. $ vihi annotation start XX_NNN_MMM

    2. > What is your name? (Last First): <type-in-John-space-Doe>

      > Hi, I am your VIHI annotation assistant.
      > My name is HAL 2023. What is your name?
      (First Last): <types-John-space-Doe>
    3. If the name hasn’t been used yet:

      > Hi, John Doe!
      >
      > It looks like you are not on my annotators' list. Have you worked with me before? Select from the options below.
      > 1. It is your first time working with me, and I need to add you to the list.
      > 2. You misspelled your name and want to type it again.
      > 3. You have worked with me before but possibly used a different version of your name, like Margaret instead of Peggie. You would like to see the list of annotator to see if you are on it.
      > 999. You want to continue some other time.
      Select a number: <x>
      
      # 1
      > Nice to meet you, John Doe! I am looking forward to working with you.
      > Just one more thing: what is the email address that I can use to sign your work and write to you?
      email address: <[email protected]>
      
      # 2
      (First Last): <types-John-space-Doe>
      
      # 3
      Here is the list of annotators. If you find yourself on it, type the corresponding number. Otherwise, use one of the options below the list.
      > 1. Jane Doe
      > 2. Snow White
      > 3. Jack Doe
      >
      > 777. It is your first time working with me, and I need to add you to the list.
      > 888. You misspelled your name and want to type it again.
      > 999. You want to continue some other time. 
      
      # 1-3 -> Hi, Snow White. GOTO next step.
      # 777. -> GOTO
    4. Prompt changes to:

      (John Doe working on XX_NNN_MMM)
      $ 
    5. Finder/Explorer opens on the folder with the EAF file

  2. Opens EAF in ELAN, annotates, saves, and closes ELAN.

  3. If not yet finished with the recording:

    1. Saves an in-progress version as a commit in (I forgot the alternative) we came up with yesterday

      (John Doe working on XX_NNN_MMM)
      $ vihi annotation pause
      > Describe where you finished:
      <I finished annotating coding segment 7>
      > Saving and backing up.
      > Done!
      > When you get back to annotating this recording, run
      > "vihi annotation start XX_NNN_MMM"
      > again.
      > See you next time!
      $
    2. Next time, GOTO 1.

  4. If done with the recording:

    1. Saves the finished version.

      (John Doe working on XX_NNN_MMM)
      $ vihi annotation finish
      > Great job! Thank you, John Doe
      > Saving and backing up.
      > Done!
      > Lilli is gonna get a notification that she can super-check XX_NNN_MMM.
      > Slack her anyway, just in case.
      $

Zhenya

  • Receives a notification about a version conflict.

  • vihi annotation resolove-conflicts --manual XX_NNN_MMM John Doe

  • Resolves conflicts if it is a technical thing and finishes

Lilli

Super-checking.

  • vihi annotation super-check start XX_NNN_MMM

  • A Finder window opens with the folder that has the EAF.

  • Lilli does the super-checking.

  • If superchecking isn’t complete:

    (Lilli working on XX_NNN_MMM)
    $ vihi annotation super-check pause
    > Describe where you finished:
    <I finished super-checking on annotation XYZ>
    > Saving, backing up, and pushing files.
    > Done!
    > When you get back to annotating this recording, run
    > "annotation super-check start"
    > again.
    > See you next time!
    $

    GOTO 1

  • If super-checking is complete.

    (Lilli working on XX_NNN_MMM)
    $ vihi annotation finish
    > Saving, backing up, and pushing files.
    > Done!
    > Replaying changes on top of the current main branch.
    > Done!
    > Pushing to GitHub.
    > Done!
    > Updating the BLab share repo.
    > Done!
  • If “Replaying changes…” or “Checking if…” reported any conflicts, go to “Zhenya” → “If there are conflicts…”

Conflict resolution

  • vihi annotation resolve-conflicts --elan XX_NNN_MMM John Doe

  • A Finder window opens with the folder that has the EAF.

  • Opens EAF in ELAN.

  • Resolves conflicts by editing conflicting annotations that are easy to find because the script did something helpful (no idea what that is yet :-).

  • vihi annotation conflicts-resolved

  • blabpy does the rest.

Last updated