Making New Audio Stimuli
Last updated
Last updated
These are the steps to follow when recording and editing stimuli. This is a living document! Recording Best Practices
We use the Blue Snowflake to record stimuli. It lives on the desk outside of the headturn preference soundbooth. It requires a USB port, but you can also use the dongle to connect it to a newer laptop.
Test out your setup on the laptop you intend to use before you record. The age and quality of the laptop can have an effect on the sound quality of the recording.
The Blue Snowflake is pretty forgiving, so you should be able to set it up as you see fit and still get a decent recording. I recommend setting the mic up 4-12 inches away from the speaker's mouth. You can hook it over the top of the laptop screen, or you can set it up so that its stand is supporting it on the table.
We use Audacity to record. It's free to use and does a great job, plus it's easier to record in than Praat. Recording in Audacity is very easy. You simply click the Record button to begin the recording, and the Stop button to stop. Two important things need to be checked before you should start recording:
The microphone being used: The default here is the laptop's built-in microphone. Do not use this. Change it to say "Blue Snowflake" next to the picture of the microphone, under the big buttons (screenshot forthcoming).
The gain level. This setting dictates how much sound the microphone actually picks up. There's a slider under the microphone selection that you can use to adjust this, next to the volume slider. Set this at .5 as the default. Your recording will look like it's too quiet but it isn't. This setting in the sound booth leads to the least background hum will still capturing the full acoustics from the speaker's voice.
The sound booth has our discrimination paradigms set up in there, so it's important to be careful with the TVs and wires, and to not remove anything from the booth. There's a foldable table that can hold the laptop and the microphone. The table is located on top of the tall, gray filing cabinet in the room where head-turning preference experiments are conducted. You can bring that in and have the speaker sit in the chair that's already in there. Turn the fan all the way OFF during the recording to minimize ambient noise: turn the knob clockwise all the way until you hear a satisfying click and the fan shuts completely off. (Then remember to turn it back on when you're done!)
Have a written list of everything you want your speaker to say so that you don't forget any words or carrier phrases. You may think you don't need this, but you do.
Do a test recording before jumping into the whole thing. Stay with your speaker for the first few trials to make sure that they're correctly pronouncing things and close enough to the mic. Listen to your test recording and adjust things as needed before you start the full recording. I recommend staying in the room with your speaker.
For best results, have your speaker produce each utterance 3 times. The easiest thing to do is just record in one long file, but if you want to break it into 2 or 3, that's fine. Do not make individual files for each utterance, since it's faster to do that in the later steps.
Once you have your recordings, it's time to turn them into stimuli! There are a few things that need to happen in a specific order
The easiest way to differentiate what will eventually be your individual audio files is to use a TextGrid in Praat. Open your file in Praat and then select Annotate...> to TestGrid. With the TextGrid, you can delimit each word you want. Establish a naming scheme for your words. I usually do something like targetword_carrier#, e.g. mouse_can3. This way I can listen to all my tokens of "mouse," sorted by carrier phrase. If there's a token of an utterance that you know you won't use because there's background noise or the speaker made an error, just don't write anything in the TextGrid. Same with all silences. These will all be removed for you in the following steps.
More detailed steps can be found if you click the header above.
General note for all Praat scripts: depending on if you're on a Windows or a Mac, the slashes need to go different ways. These are all set up for Macs currently.
Depending on what it is that you're doing, you might want to ensure that your cuts land on zero-crossings. These are points in the waveform where the amplitude is 0. If you're splicing together different files or splicing words, this is crucial to avoid pops and clicks at the splice points. Run the Praat script here: https://github.com/BergelsonLab/praat_utilities/blob/master/zero_crossing_boundaries.praat
At this point you may be asking, "why don't we normalize volume earlier? Then we'd only need to normalize one file." When you normalize a file, it averages over the amplitude of the entire file, so the smaller the file, the less there is to change per file, so we do it in batch with https://github.com/BergelsonLab/praat_utilities/blob/master/normalize_intensity.praat
This script (https://github.com/BergelsonLab/praat_utilities/blob/master/longfile_to_labeled_wavs.praat) saves each interval in the selected Tier of a TextGrid to a separate WAV sound file. The source sound must be a LongSound object, and both the TextGrid and the LongSound must have identical names, and they have to be selected before running the script. Files are named with the corresponding interval labels (plus a running index number when necessary).
This script (https://github.com/BergelsonLab/praat_utilities/blob/master/textgrid%20to%20labeled%20wav%20files) is an edited version of the one above that words with regular sound files. This script can also be found on the BergelsonLab drive as well (BergelsonLab > ScriptsandApps > PraatScripts > textgrid_to_labeled_wav_files
The source Sound and the TextGrid must have identical names, and they have to be selected before running the script. Files are named with the corresponding interval labels (plus a running index number when necessary). See here for more detailed instructions.
Because we have a consistent audio setup and because some of the sounds on the eyetracker can't be manually altered, we need to make sure our audio is set at an appropriate relative volume. We've found that 72dB is the right ampliture to set speech at to make sure that it's around 65dB when you're sitting in the participant's chair at baby height. You will want to double-check this before you start running participants by downloading a decibel-meter app on your phone and listening through your experiment to see what the readings are. We used an app called Sound Meter.