Marking PI in VIHI files

If you come across information in a file that is sensitive or potentially reveals someone's identity, we need to mark it so we can scrub it later.

Things you should mark:

Identifying information

  • Full names, excluding celebrities.

    • E.g. “I saw Joe Smith from high school!” gets marked, but the TV in the background saying “police found Joe Smith with 200 stolen flowerpots” does not.

    • DO mark first names and last initials, e.g. "Shannon D."

  • Last names, excluding celebrities (same examples as above hold for “Mrs. Smith”)

  • Addresses, physical and email.

    • E.g. “we’re the last red house on Mount Hope before the gas station” gets marked, as does “no, your aunt’s address is 123 Doe St”

  • Names of gyms, daycares, churches, etc (non-chains) or neighborhoods

    • Mark IF this could potentially be used to identify them (could you find their house just based off that information?)

  • Full birthdates of the infants, e.g. 5/2/15.

  • Social security numbers

  • Financial information (credit card info, bank statements, account numbers)

  • Contact information (email addresses, phone numbers, social media handles)

  • Passwords (including phone passcodes)

  • License plates

Sensitive information/conversations

  • Family matters or drama

    • Mark what you deem sensitive, seeking second opinions from other RAs and staff as needed, e.g. big arguments, embarrassing family discussions, etc.

  • Medical information (medications, severe illnesses, surgeries, etc)

    • Again, only mark what you deem sensitive. Don't mark if it's about non-serious illnesses (like colds), but if dad is holding baby wearing recorder while sibling is vomiting, mark it.

    • Serious medical conditions/medications (e.g prescription painkillers, cancer) should be marked. If they wouldn't readily share the information with an acquaintance, mark it.

  • Any explicit information (such as sexual activities or ''discussions'' of drug use). If drug use is suspected '''during''' the recording in the presence of the baby, let Elika know immediately.


  • Sections with unconsented adults and children: mark sections where people clearly don't know they're being recorded and are interacting more than in passing with the family.

    • Only mark if it's over 5 minutes OR they have a significant interaction.

      • E.g. no need to mark cashiers or waitresses, but DO mark babysitter who didn’t realize she was recorded.

      • Do NOT mark if you can just vaguely hear children running around in the background.

    • Check if permission was received on sign-off for any significant interactions.

    • Mark if someone says "I wish that hadn't been recorded" or "I'm not comfortable with being recorded" (''in seriousness, not sarcastically or jokingly'') or something similar.

  • Any sections that the family has requested us not to share

  • Or anything else that involves sensitive or identifiable information

Note: If any child or substance abuse is suspected to occur during the recording, in the presence of the baby, let Elika know immediately.

How to format PI in ELAN:

  1. In the ELAN toolbar, go to Tier> Add new Tier. Add a new subtier of type "PI," with the parent tier as the speaker that uttered the PI.

    If the file does not already have a PI tier type:

    a. Go to Edit>Edit Controlled Vocabularies . Type "PI" into the cv name field, type "has PI" as the description, and click add. The controlled vocabulary has been created. In the bottom half of the pop-up, type "PI" into the Entry value field, "has PI" into the Entry description field, and then click add. This is the only potential value you need. Close the pop-up.

    b. Go to Type>Add new tier type. Type "PI" into the type name, select "symbolic association" in the Stereotype dropdown, and select "PI" in the Use Controlled Vocabulary dropdown. Click Add. Close the pop-up.

    c. Follow the instructions in step 1 to add a new tier associated with the speaker.

  2. Copy the annotation that contains the PI to this new tier.

  3. "PI" type tiers have a controlled vocabulary. Double click or select the utterance and press ctrl/command-M and select "PI" as the annotation (the only option).

  4. Note yes/no in the cell corresponding to the file in this google spreadsheet (in the "PI?" column).

  5. Back in ELAN, obscure the portion of the transcription (e.g. on the parent tier/FA1/UC2 etc.) by replacing it with the type of PI it is in all caps. E.g. "Elika_Bergelson you stop hitting your sister right now!" becomes "Elika_LASTNAME you stop hitting your sister right now!" and "that's right, your aunt's going to visit her friends at 123 New Hope Street." becomes "that's right, your aunt's going to visit her friends at ADDRESS." and "my credit card pin is one two three four five." becomes "my credit card pin is FINANCIALINFO."

