Reliability of token exclusions

After we extract individual token of each word, and force align them, we listen to each one to make sure that they are good, clear tokens of words so that when we measure acoustic properties of them, we are not potentially measuring something else in the signal.

Basically, we want to listen to and exclude tokens that would give us inaccurate measurements for one of two reasons:

They contain more than one speaker
- This could be another person talking in the background, or a child cooing/screaming/crying in the background
They contain 'environmental' noise.
- This is typically music, or some other sudden or persistent noise (ball bouncing, parent clapping, wind chimes, etc).

First, go to BergelsonLab > Talker_variability > Token_exclusion_reliability > Exclusion_examples

Here you will find a couple examples of tokens that should be excluded based on either overlap, or environmental noise. Listen to them and come ask me (Federica) if you have any questions.

Then, open up "baby_spectral_measures_for_reliability.csv"

This is a csv that has a list of files you will need to listen to, and decide if they should be excluded and for what reason.
Sort by the file name, this will sort them by subject number, which will make it easier for you to find them.
For each subject, you will navigate to that subject's folder, and you will fine the segmented file to listen to (one that should match exactly the name listed in the csv) in the Segmented folder.
- BergelsonLab > Talker_variability > Output_folders_baby > 01 > All > Segmented
- Note, that sometimes, you will have to go to the the Video folder instead of the All folder, and sometimes there will be another folder between All and Segmented called "Output".
  - e.g. BergelsonLab > Talker_variability > Output_folders_baby > 11 > All > Output > Segmented
- If you cannot find a file name, look for it in the video folder instead.
Listen to each file (you can press space bar on a mac from the terminal window and it will play the sound without you needing to open it anywhere) and decide if it should be excluded based on one fo the above reasons or not. If no, type no under the column "exclude_JR", if yes, type yes under the column "exclude_JR" and then under the "reason_JR" column type in the reason (overlap, environmental).
Try to keep track of how long this is taking you (if you work on it for one whole hour, how many tokens did you get through), as it will help me figure out how long this will take.
Let me know (Federica) if you have any questions!

PreviousExtracting individual tokens from Box NextEducation Outreach

Last updated 1 year ago