OLD Extracting tokens in Praat for TVS-Corpus
Last updated
Last updated
Prior to this next step, make a folder with your name on it in:/Volumes/psych/BergelsonLab/Talker_variability/Output_folders/Segmented_output_files
Locate Praat on the computer, and open it. Two windows will pop up that look like those below. You can close the Praat Picture window, and will only need to use the Praat Objects window.
From the Praat Objects window, click on Open > Read from File. A window will pop up, and you should navigate to the folder where the extracted sound files are (BergelsonLab > Talker_Variability > Output_folder > 01 > 06). Double click on the first file, and you will see it listed in the Praat Objects window.
Select the sound file in Praat so it is blue (as above) and click on "View & Edit", which will pop up a new window.
This is the waveform of the sound file we extracted. For this specific file, it says something like "baby kick, baby wiggle, baby". We need to pull out just the words 'baby'. In order to do that, you will want to listen and find the exact time when the word baby starts. Once you find where baby starts and stops, you can highlight it using your cursor. It will look like this:
To make sure you highlighted all of the word, and nothing more, you want to listen to it. Once it's highlighted, press 'tab' on your keyboard to hear just what is highlighted. You want to make sure it sounds like the whole word (the beginning or end aren't cut off) and that no other sounds are included. If you need to adjust your highlight, do it now.
A couple of tips:
Command + I will zoom in
Command + O will zoom out
If you're not sure exactly where the word starts, highlight just the beginning and listen to it. If you hear anything that sounds like the beginning of the word, then adjust where the word starts. If you hear silence, static, or the end of another word, then you can leave it out. You can also use this strategy for the end of the word.
As you do more of these, you will start to recognize the shape of the waveform for the word, which will make it easier.
When you have highlighted the whole word correctly, extract it by choosing File > Save selected sound as WAV file.
A window will pop up to ask you where to save it. You want to navigate to the folder you made with your name on it in "BergelsonLab> Talker_Variability > Output_files > Segmented_output_files" and save it there.
Repeat this process for all the other words in the folder.
If there are multiple tokens of 'baby' in any one sound file, you should segment each one separately. If there is more than one word, you will need to add a number at the end of the file name so that it does not overwrite the previous one. This will look like the screen shot below: You do not need to add a number to the first one (shown on the bottom of the screen shot), but you can add an underscore and the subsequent number for the rest of them.
If the word gets cut off at the end of the file, and you cannot hear the whole thing, do not segment it, just leave it. You should add that you were not able to segment the word to the "Segmented comments" excel file, where you will add the file name, the comment (in this case, "word was cut off" and your initials. Use this file for any other strange things you would like me to check, such as "couldn't segment this word because X".
I segmented 24 instances of 'baby' from the files for subject 1 at month 6. If this does not match, feel free to check and compare file names (based on time-stamps) with the ones in the FB folder in Segmented_output_files.
Since I will be conducting a series of more complicated analyses on the segments you have identified as 'baby', we want to check that everybody is equally accurate at finding the words. In this next step, we will check that the segments that you segmented as 'baby' match the ones I did in length. To do this, we will measure the duration of each file.
In the Praat Object, select all of the files (hold shift while selecting all of them) and click the Remove button at the bottom. This will clear everything from the Praat Object Window. Now, open up the files you segmented by going to your folder in "BergelsonLab> Talker_Variability > Output_files > Segmented_output_files".
Click on the file in the Praat Object window, and select "View & Edit". A new window will pop up as before. At the bottom of the window, it will say "Total duration" and there will be a value next to it.
In /Volumes/psych/BergelsonLab/Talker_variability/Output_folders/Segmented_output_files you will find an excel file called "Segmentation_comparison". Open that file, and make a new column with your initials (next to FB). In the column, you will write down the total duration found in Praat.
Repeat this process for each of the sound files you segmented.
Once you have done this, let me know (fb82@duke.edu, or pop down to room 002), and we can compare and go over anything that was confusing before moving on.