Audio Processing

Processing directions from SEEDLingS Wiki--last updated 2016

Exports

You'll need to create a .csv, a .cha, and an .its file.

Open LENA Pro (elephant insignia) on Lily
Select the Client Manager icon in the upper left-hand corner
Double-click on the subject that you will be processing
In the Report List window, check if the visit you will be creating exports for, spans one day or two (i.e. if the recorder is on past midnight, it is two days

Select the date(s) that correspond to the recording you want to export and click on the Excel icon in the bottom left-hand corner

Make the following selections in the Export Data window:
- Make sure that 5 Minute Detail is selected under the Report Elements heading
- Under Specify Dates, select one or two days

Under Export Now, select CSV
- Rename the newly exported file as XX_XX_lena5min.csv (ex: 01_08_lena5min.csv)
- The lena5min file is a data sheet that chunks the 16 hour audio file into 5 minute segments with speaker categories (ex: child/adult vocalizations, TV/radio/media, distant speech, etc.)
Select CHA
- Rename the newly exported file as XX_XX.cha (ex: 01_08.cha)
- This export will create two files (a .cha file and a .wav file); keep the .wav and delete the .cha, as we will create one in CLAN later
Select ITS
- Rename the newly exported file as XX_XX.its (ex: 01_08.its)

CLAN: .its to .cha

Now that we have all of our exports in place, we need to convert the .its file to a .cha file
Open CLAN and Select Commands and change the working directory to where your .its file is saved.
Write the following command in the window:
- lena2chat XX_XX.its

This makes a "lena.cha" file

Sound Finder

There are long periods of time in these recordings where nothing is going on because the child is asleep
We have a silence finder script that puts this information into the clan file to save coders time (so they're not listening to naps)
The first step involves finding these long silences in a program called Audacity
Then you edit this list of silent times with a python script looking for little interruptions to the silences so that you can ignore those too (pops or single cries, etc)

Make sure you have pulled the most recent version of audiowords.py from the github repository audiowords [this script requires python (2.x) to function properly]
Open Audacity and go to the same directory you have been using for the above steps, and drag the .wav file into your new audacity window
In the menu bar, select Analyze then Sound Finder
In the Sound Finder window, change your settings to match those below

Guidelines for Processing

Here are some general guidelines to follow when scanning through the audio file within Audacity
There likely be 2 to 3 regions of minimal activity (naps, sleeping, possibly a car ride, etc)
a. Pay close attention to these regions as they may contain some verbal production that the script did not pick up based on its parameters
b. Caveat: Only keep the bits of these regions where there is verbal production (concrete object words); if the speaker is soothing the baby back to sleep that does not have to be included as it would not be coded later on
Scan through the stretches of sound, focusing on moments with lower peaks of activity or bits with constant amplitudes (these areas denote a possible car ride or noise/sound machine, for example, that will help you indicate the onsets/offsets of naps, television, radio, etc.)
Take care when zooming into different regions (Ctrl+Scroll) where you think think there is little to no activity, as sometimes there may be!

Remember: This is a preliminary pass whose main purpose is to save time for the in-depth coding that will be performed in CLAN. Keep this in mind when processing, as it will help you when making decisions on what bits to keep

Exporting

Export the sound segments
In the menu bar, select File then Export Labels
Rename the new file as Label_Track.txt and save it to the same directory

Audiowords: Running the Script

We now have all the files that we need to complete processing
In order to do so, we will use a script called audiowords.py from this repo.

Click Load All (cha) and select the main CLAN file (ex, 01_08.lena.cha)

Info: The program will load and generate the other files that are necessary, running through all the steps at once. It assumes that all the necessary files are within the same directory as the original CLAN file that was loaded. It will output the XX_XX_silences.txt regions, XX_XX_silences_added.cha, and XX_XX_subregions.cha exports to this same directory.

The format it's expecting files to be in:

01_08.lena.cha
01_08_lena5min.csv
Label_Track.txt
01_08_silences.txt (output)
01_08_silences_added.cha (output)
01_08_subregions.cha (output)

After the script is finished running, the terminal window will give you the following information:

The number of silences which should match up with the XX_XX_silences.txt file that was just generated
Whether or not the file spans over one day or two
An issue within the CLAN (XX_XX_subregions.cha) file where the onset is greater than the offset; this is ultra important and will be surrounded by asterisks in the terminal window
- Open the XX_XX_subregions.cha file and use the Esc+L function to search for the Line Number specified in the terminal window
- To make the correction; open timestamps (Esc+a) and change the bad offset to match the onset of the next line
- Insert the following as comment under the line you just changed: %com: OR %xcom: (tab) manually adjusted timestamp
- Run the Esc+L command to check any remaining issues in the file that may have flown under the radar

PreviousAudio Importing NextFile Conversions

Last updated 3 years ago