Audio Processing

Processing directions from SEEDLingS Wiki--last updated 2016

Exports

You'll need to create a .csv, a .cha, and an .its file.

  • Open LENA Pro (elephant insignia) on Lily

  • Select the Client Manager icon in the upper left-hand corner

  • Double-click on the subject that you will be processing

  • In the Report List window, check if the visit you will be creating exports for, spans one day or two (i.e. if the recorder is on past midnight, it is two days

Shows only one date per month
Shows two back to back dates for month 7 (adds up to 16h)
  • Select the date(s) that correspond to the recording you want to export and click on the Excel icon in the bottom left-hand corner

  • Make the following selections in the Export Data window:

    • Make sure that 5 Minute Detail is selected under the Report Elements heading

    • Under Specify Dates, select one or two days

  • Under Export Now, select CSV

    • Rename the newly exported file as XX_XX_lena5min.csv (ex: 01_08_lena5min.csv)

    • The lena5min file is a data sheet that chunks the 16 hour audio file into 5 minute segments with speaker categories (ex: child/adult vocalizations, TV/radio/media, distant speech, etc.)

  • Select CHA

    • Rename the newly exported file as XX_XX.cha (ex: 01_08.cha)

    • This export will create two files (a .cha file and a .wav file); keep the .wav and delete the .cha, as we will create one in CLAN later

  • Select ITS

    • Rename the newly exported file as XX_XX.its (ex: 01_08.its)

CLAN: .its to .cha

  • Now that we have all of our exports in place, we need to convert the .its file to a .cha file

  • Open CLAN and Select Commands and change the working directory to where your .its file is saved.

  • Write the following command in the window:

    • lena2chat XX_XX.its

  • This makes a "lena.cha" file

Sound Finder

  • There are long periods of time in these recordings where nothing is going on because the child is asleep

  • We have a silence finder script that puts this information into the clan file to save coders time (so they're not listening to naps)

  • The first step involves finding these long silences in a program called Audacity

  • Then you edit this list of silent times with a python script looking for little interruptions to the silences so that you can ignore those too (pops or single cries, etc)

  • Make sure you have pulled the most recent version of audiowords.py from the github repository audiowords [this script requires python (2.x) to function properly]

  • Open Audacity and go to the same directory you have been using for the above steps, and drag the .wav file into your new audacity window

  • In the menu bar, select Analyze then Sound Finder

  • In the Sound Finder window, change your settings to match those below

Change Minimum duration of silence last, as changes in the others will affect it

Guidelines for Processing

  1. Here are some general guidelines to follow when scanning through the audio file within Audacity

  2. There likely be 2 to 3 regions of minimal activity (naps, sleeping, possibly a car ride, etc)

    a. Pay close attention to these regions as they may contain some verbal production that the script did not pick up based on its parameters

    b. Caveat: Only keep the bits of these regions where there is verbal production (concrete object words); if the speaker is soothing the baby back to sleep that does not have to be included as it would not be coded later on

  3. Scan through the stretches of sound, focusing on moments with lower peaks of activity or bits with constant amplitudes (these areas denote a possible car ride or noise/sound machine, for example, that will help you indicate the onsets/offsets of naps, television, radio, etc.)

  4. Take care when zooming into different regions (Ctrl+Scroll) where you think think there is little to no activity, as sometimes there may be!

Remember: This is a preliminary pass whose main purpose is to save time for the in-depth coding that will be performed in CLAN. Keep this in mind when processing, as it will help you when making decisions on what bits to keep

Exporting

  1. Export the sound segments

  2. In the menu bar, select File then Export Labels

  3. Rename the new file as Label_Track.txt and save it to the same directory

Audiowords: Running the Script

  1. We now have all the files that we need to complete processing

  2. In order to do so, we will use a script called audiowords.py from this repo.

Click Load All (cha) and select the main CLAN file (ex, 01_08.lena.cha)

Info: The program will load and generate the other files that are necessary, running through all the steps at once. It assumes that all the necessary files are within the same directory as the original CLAN file that was loaded. It will output the XX_XX_silences.txt regions, XX_XX_silences_added.cha, and XX_XX_subregions.cha exports to this same directory.

The format it's expecting files to be in:

  • 01_08.lena.cha

  • 01_08_lena5min.csv

  • Label_Track.txt

  • 01_08_silences.txt (output)

  • 01_08_silences_added.cha (output)

  • 01_08_subregions.cha (output)

After the script is finished running, the terminal window will give you the following information:

  • The number of silences which should match up with the XX_XX_silences.txt file that was just generated

  • Whether or not the file spans over one day or two

  • An issue within the CLAN (XX_XX_subregions.cha) file where the onset is greater than the offset; this is ultra important and will be surrounded by asterisks in the terminal window

    • Open the XX_XX_subregions.cha file and use the Esc+L function to search for the Line Number specified in the terminal window

    • To make the correction; open timestamps (Esc+a) and change the bad offset to match the onset of the next line

    • Insert the following as comment under the line you just changed: %com: OR %xcom: (tab) manually adjusted timestamp

    • Run the Esc+L command to check any remaining issues in the file that may have flown under the radar

Last updated