Extract individual tokens

Step by step instructions for transcribing wave files using textgrids, to then be used with a forced aligner

Background

In order to do some acoustic analyses of how words sound for infants in their daily lives, we need to extract certain words from the corpus.

We extract them by looking at the transcripts and finding the time stamp where we identified a certain word to be. For example, we look at the sparse code file to see when the word 'baby' occurred, and then take that chunk of audio and export it. This chunk of audio we extract is going to contain the word 'baby', but it is also going to contain some other things, so after we extract the larger chunk, we need to go through and and separate out just the word baby. This can be done in Praat.

Running python script to extract tokens

Open a terminal window and navigate to the home directory and type 'python' followed by a space note: it obviously won't say my name in the directory on the lab computers

Next, you will drag in the python script called "read_sparse_csv.py" which can be found under: BergelsonLab > Talker_variability > Python_script

Next, drag in the sparse_csv file for Subj 1, month 6. In this case, the file is called " 01_06_audio_sparce_code.csv" and can be found at: Fas-Phyc-PEB-Lab/Seedlings/Subject_Files/01/01_06/Home_Visit/Analysis/Audio_Analysis

Next, drag in the original sound file from that month's recording. In this case, the file is called "01_06.wav" and can be found at : Fas-Phyc-PEB-Lab/Seedlings/Subject_Files/01/01_06/Home_Visit/Processing/Audio_Files

Next, you will drag in the output folder so the script knows where to put all of the smaller file chunks. Unlike dragging individual files before, this time you will just drag the entire folder. Each subject has their own folder, and each month has its own folder inside the subject specific folder. These can be found inside the "Output_folder" and is found: Fas-Phyc-PEB-Lab/Talker_variability/Output_folder/01/06

Then, you will tell the script which word you want the script to look for. In this case, you will just type 'baby'.

Lastly, you will type either "audio" or "video", depending on where the file comes from. Below is an example of finding the target word 'ball' from audio files.

Press Enter and the script will run. In the terminal window you will see that it is running through all of the iterations. Once it is done, go to the output folder you selected and you will see there are now 18 files with names such as those seen in the picture below.

Transcribing audio clips using Text Grids

Locate Praat on the computer, and open it. Two windows will pop up that look like those below. You can close the Praat Picture window, and will only need to use the Praat Objects window.

From the Praat Objects window, click on Open > Read from File. A window will pop up, and you should navigate to the folder where the extracted sound files are (Fas-Phyc-PEB-Lab > Talker_Variability > Output_folder > 01 > 06). Double click on the first file, and you will see it listed in the Praat Objects window.

Select the sound file so it is blue (as above) and select Annotate > To TextGrid. A window will pop up that looks like this, and you will want to put the word you are looking for in the "All tier names" box. You can leave the other blank.

Press OK, and a new object will appear in your Praat window, which has the same name as the sound file but now says TextGrid in front of it

Hold down the control key to select both, and then click "View and Edit" on the right side. A new window will pop up with the sound wave at the top, and a yellow box underneath. In the yellow box, you want to type everything you hear in the sound clip, without any punctuation (see below). If a word is cut off at the beginning or the end, write down the part of the word that you do hear.

After you have typed in everything that is said, hit control + S to save the file. It should automatically save it with the same name as the audio file and .TextGrid as the extension. Make sure it is saved in the same spot as the audio file (Fas-Phyc-PEB-Lab> Talker_Variability > Output Folder > O1 > 06).

Repeat this process with every sound file.

We will be using a speech alignment software to align the typed out words to the wave form, and therefore the words and sounds we type out should be English words (when possible). Individual syllables are fine, but do your best to make it possible for the software to know what the sounds are supposed to be.

Things that you might run into:
- If a child produced a word, write down that the word sounds like, not what it is intended to be. For example, if a child clearly means 'baby' but just says the final syllable, you could type that as 'bee'
- If a word is cut off, write down the beginning or the end of it
- If a sound file says it is produced by Mom, but you only hear the subject producing it, skip that file and make a note in the Talker_Variability > Progress Log > Notes tab
  - For example, you might write in the file name, and comment that mom did not produce the word baby in that clip
- If the target word (whichever one you are coding for, but in this example 'baby') is cut off, do not transcribe the sound clip, but also make a note in the Talker_Variability > Progress Log > Notes tab

In order to not have people working on things at the same time (and redoing unnecessary work), when you start a new subject month, open the Google Sheet TVS Progress Log and put down your initials for that subject + month. That way the next person who starts working on it knows to start on a different month.

You can work through this systematically, working through each subject months 06-18 before moving on to the next subject.

New Praat script for creating TextGrids

To find out what you should work on, please go to the Google TVS tracking sheet (found here: https://docs.google.com/spreadsheets/d/1bO4JVkqVEFKuqAyDp9i_mWnzT23hBlRkLE6ceizWGr4/edit?usp=sharing). This spreadsheet contains a list of folders that contain sound files that need to be transcribed.

For example, Subj 01, source All, word Nose means you should navigate to Talker_Variability> Output_folders_nose > 01 > All.
Once there, you will see a list of wav files with names that look like this:

There is a script (Talker_Variability>Other scripts>Text grid maker) that will automatically generate TextGrids for all of the .wav files in a folder. This explains how to use it.

First, open Praat. You can close the Praat Picture window. Then go up to the left of the menu bar and click on "Praat". There will be a drop-down menu where you should choose "Open Praat script..."

Navigate to Talker_Variability>Other scripts and open Text grid maker.

A window will pop up that looks like this:

In order to begin running the script, you can either click on the "Run" tab at the top of the window or click command + R. That will take you to this window:

Clear out whatever is in the "Directory" slot. In Finder, navigate to the subject number and source that you want to segment (Talker_variability > Output_folders_[word] > [subject number] > [all/video]), and drag the folder into the "Directory" slot. Type a forward slash in the Directory after the last number or word in the file path. Clear out the "Word" slot. Keep the Filetype "wav". Before you click OK, it should look something like this:

Click OK. The TextGrid file corresponding to the first .wav file in the folder will pop up, along with another window:

To play the sound file, you need to put the cursor at the beginning of the sound clip, and then press TAB. You can always move to cursor to specific spots if you want to listen to those again.

In the yellow box, you want to type everything you hear in the sound clip, without any punctuation. If a word is cut off at the beginning or the end, write down the part of the word that you do hear.

After all of these are transcribed, we will be using a speech alignment software to align the typed out words to the wave form, and therefore the words and sounds we type out should be English words (when possible). Individual syllables are fine, but do your best to make it possible for the software to know what the sounds are supposed to be.

Things that you might run into:
- If a child produced a word, write down that the word sounds like, not what it is intended to be. For example, if a child clearly means 'baby' but just says the final syllable, you could type that as 'bee' (since that is a real English word)
- If a word is cut off, write down the beginning or the end of it
- If a sound file says it is produced by Mom, but you only hear the subject producing it, skip that file and make a note in the Talker_Variability > Progress Log > Notes tab
  - For example, you might write in the file name, and comment that mom did not produce the word baby in that clip
- If the target word (whichever one you are coding for, but in this example 'baby') is cut off, do not transcribe the sound clip, but also make a note in the Talker_Variability > Progress Log > Notes tab

After you have typed everything in that you heard in the sound clip, click Continue on the "Pause: stop or continue" window in order to generate the next TextGrid. Once you've generated TextGrids for all the .wav files in a folder, the script will alert you that it has finished.

Once you are done with a folder, return to the Google spreadsheet and add your initials to that row, as well as the date that it was completed, then start on the next one!

Things to note

If you start a folder, please try to finish it. However, if you can't and you stop in the middle of a folder, you can run the script again for that folder and it will start generating TextGrids where you left off.
If you accidentally hit Continue before transcribing a TextGrid, the TextGrid has been created, it's just blank. You can transcribe it by opening the .wav and .TextGrid file in Praat the old way and transcribing it from there (Transcribing audio clips using Text Grids).

PreviousCDS/ADS Coding- Talker Variability NextExtracting individual tokens from Box

Last updated 1 year ago