Extracting individual tokens from Box

A little background

As a postdoc, Dr. Bergelson collected day-long audio recordings and hour-long video recordings from 44 infants over the course of a year from when they were 6 months to 18 months.
- Every month, each baby was audio recorded for a full day + an hour where we got a video recording as well
- We call this the SEEDLingS corpus, and it gives us a lot of data about what kids heard over the course of a year
- There are a lot of things we can do with these recordings (some of the things we have already done/looked at can be found on our website), but what I'm interested in is figuring out just how different individual words sound for kids
- For example, a child hears the word 'baby' from many people in their environment, and those people say it in different ways (in questions, or songs, or reading books, or exclamations, etc). On top of purposely saying words in different ways for different contexts, every time a word is said (even if it said by the same person in the exact same sentence) is going to sound slightly different. I want to understand just how different words sound for infants.
- This project helps with this! In order to understand just how different words sound, we need to find every instance of a word, make an audio file that includes only that word, and then measure a variety of acoustic properties of that word (such as how long it is, or how high or low, or how variable etc).

Therefore, in this project, you will be listening to some short audio clips that all contain a specific word (in this case: car and then head) and writing down everything you hear in them. I then use a forced-aligner, which is a software that finds word boundaries in a sound file based on what we tell it the words in the sound file are, to make individual sound clips containing only the word of interest (e.g. hat) and measure what each instance sounded like.

We've already done this for some of the words in the corpus, and the presentations found here explains the process and what we've found (if you're interested in more details): https://bergelsonlab.com/subpages/presentations/bulgarelli_bergelson_bucld2019.html

Download necessary software:

Download Praat: https://www.fon.hum.uva.nl/praat/
- Follow the instructions found on the website (should be super easy!)

If you have access to the BergelsonLab Drive:

Log into the VPN and the BergelsonLab Drive
To find out what you should work on, please go to the Google TVS tracking sheet (found here: https://docs.google.com/spreadsheets/d/1bO4JVkqVEFKuqAyDp9i_mWnzT23hBlRkLE6ceizWGr4/edit?usp=sharing). This spreadsheet contains a list of folders that contain sound files that need to be transcribed.
For example, Subj 01, source All, word Nose means you should navigate to BergelsonLab > Talker_Variability> Output_folders_nose > 01 > All.
Once there, you will see a list of wav files with names that look like this:

If you do not have access to the VPN/Bergelson Lab (you are a volunteer)

Set up Box Sync on your desktop: https://support.box.com/hc/en-us/articles/360043697194-Installing-Box-Sync *
- Log into Box using your Duke guest account
- Then, you will need to sync the folder to work on to your desktop (Federica will tell you what this is called)
  - In order to do this, go to box.duke.edu and sign in. Then go to the folder "Output_folders_hat" that has been shared with you, go to the "Details" tab on the right, and switch it to "sync to desktop" (see image below, if the slider is grey then it is not synched, it will change to blue when you click on it)

If this has all been done correctly, you should now see the "Output_folders_hat" folder in your Box Sync folder like below (note that I have other folders synced as well):

Transcribing sound files using Praat textgrids

You will be listening to a series of short sound files and writing down everything you hear in them (this is called transcribing)

In the appropriate word folder (e.g. output_folders_hat on box or on the drive), you will find a series of numbered folders (1, 2, 3, etc), these are the numbers for the different subjects
- Inside each of these, are two folders: All and Video
  - Inside All and Video you will find a list of wav files with names that look like is:
  - The naming convention is complicated, and doesn't directly matter for this task, but basically tells you:
    the 'target' word we were interested in that will appear in the sound file (in this case hat),
    the subject (01),
    the month recording the sound clip came from (06),
    who said it (MOT = mother),
    the timestamp of the whole recording where it happened (the long numbers in the middle are the start and end of the sound clip),
    whether the object was present or absent when it was named (was the hat there or were they talking about a hat that the baby could not see? y = yes, n = no, u = unclear)
    If it came from an audio recording or video recording,
    the type of sentence it occurred in (question, singing, during reading, etc)
    and a unique identifier that allows us to link that particular instance to the same instance in a different dataset

You will use Praat and a Praat script to open up each file, listen to it, and type out what you hear. This is how it works:

First, open Praat. You can close the Praat Picture window. Then go up to the left of the menu bar and click on "Praat". There will be a drop-down menu where you should choose "Open Praat script..."

On the BergelsonLab Drive: BergelsonLab>Talker_Variability>Other scripts>Text grid maker

Navigate Box Sync > Output_folders_WORD and open Text grid maker.

A window will pop up that looks like this:

In order to begin running the script, you can either click on the "Run" tab at the top of the window or click command + R. That will take you to this window:

Clear out whatever is in the "Directory" slot. In Finder, navigate to the folder with the subject number (e.g. 01) and source (All or Video) that you want to work on (e.g. Box_sync > Output_folders_hat > [subject number] > [all/video]), and drag the folder into the "Directory" slot. Type a forward slash in the Directory after the last number or word in the file path. Clear out the "Word" slot if there is anything in it. Keep the Filetype "wav". Before you click OK, it should look something like this:

Click OK. A file that looks like that below will pop up. On the top is the waveform of the sound file. On the bottom (yellow) is where you will type in everything you hear.

To play the sound file, you need to put the cursor at the beginning of the sound clip, and then press TAB. You can always move to cursor to specific spots if you want to listen to those again.

In the yellow box, you want to type everything you hear in the sound clip, without any punctuation. If a word is cut off at the beginning or the end, write down the part of the word that you do hear.

After all of these are transcribed, we will be using a speech alignment software to align the typed out words to the wave form, and therefore the words and sounds we type out should be English words (when possible). Individual syllables are fine, but do your best to make it possible for the software to know what the sounds are supposed to be.

Things that you might run into:
- If a child produced a word, write down that the word sounds like, not what it is intended to be. For example, if a child clearly means 'baby' but just says the final syllable, you could type that as 'bee' (since that is a real English word)
- If a word is cut off, write down the beginning or the end of it

After you have typed everything in that you heard in the sound clip, click Continue on the "Pause: stop or continue" window, this will save the work you did and open up the next file in the folder. Once you've listened to and transcribed all of the sound files in a folder, the script will alert you that it has finished.

Once you are done with a folder, mark it as done in the Google spreadsheet sound here: https://docs.google.com/spreadsheets/d/1bO4JVkqVEFKuqAyDp9i_mWnzT23hBlRkLE6ceizWGr4/edit#gid=206859204: add your initials to that row, as well as the date that it was completed, then start on the next one!

Things to note

If you start a folder, please try to finish it. However, if you can't and you stop in the middle of a folder, you can run the script again for that folder and it will start generating TextGrids where you left off.
If you accidentally hit Continue before transcribing a TextGrid, the TextGrid has been created, it's just blank. You can transcribe it by selecting both the .wav and .TextGrid file in Praat and following the instructions here: (Transcribing audio clips using Text Grids).

Page Status: needs updating

Status details: Duke-related keywords found on the page.

Last updated by ?? on ??/??/??.

PreviousExtract individual tokens NextReliability of token exclusions

Last updated 8 months ago