Subregion Splicing

Subregsplice Timestamps

You will be creating a list of timestamps corresponding to the sections of an audio file that have been listened to by a human coder. This will be used for our script subregsplice.

Notes

This does not need to be done for any 06 or 07 month files, as they were all listened to in their entirety.
Theoretically, what has been coded is as follows:
- 08 through 13 month files: the top 4 subregions (plus make-up time)
  - For these, lowest ranked subregion (ranked 5 of 5) should not have been coded, except for make-up time.
- 14 through 17 month files: the top 3 subregions (plus make-up time)
  - For these, subregions ranked 4th and 5th should not have been coded, except for make-up time.
Silences: Do NOT count these as listened-to sections. If there is a silence in a subregion, only count the section of the subregion outside of the silence. (DO count skips; they have been listened to.)

What to do with non-subregions

Some non-subregion sections listed in the timestamps.csv should start and stop at a conversation block. Conversation blocks and pause blocks are notated "Bg" (beginning) and "Eg" (end). You will have to listen through sections that haven't been coded yet.
1. When you find a make-up region or extra time, first find the beginning comment.
2. Then, go up in the .cha file until you find the next "@Bg" tier, e.g. "@Bg: Pause 364" or "@Bg: Conversation 23". If and only if it is a "Conversation" tier AND it is within 10 lines of the make-up region, use the timestamp of the @Bg: Conversation tier in the subregsplice.csv.
  a. Again, do NOT change the timestamp if the make-up region begins within a pause block.
3. Now you have to listen through this section.
  a. Mark any PI
  b. Listen for any concrete, imageable nouns. These are described here If you hear any (or aren't sure but think there might be), put a comment "SD check" in the cha AND assign Shannon a general task on Asana "XX_XX audio check for codable words."
4. Similarly, do the same thing at the end of every make-up region or extra time. This time, go down in the .cha file until you find the next "@Eg" tier, e.g. "@Eg: Pause 364" or "@Eg: Conversation 23". If and only if it is a "Conversation" tier AND it is within 10 lines of the make-up region, use the timestamp of the @Eg: Conversation tier in the subregsplice.csv. a. Follow the same guidelines above for listening through the file.
5. '''Make-up regions''': These regions are to make up for time in top-ranked regions that were skips. They should be at the beginning of the next-ranked (see below) region, unless it's a silence or skip. There should be comments for these regions.
6. Months 08-13: 5th ranked region
7. Months 14-17: 4th ranked region
"Extra time"': Time coded outside subregions and make-up regions occurs in very few files, only in instances when the subregions aren't codable. Such sections should be clearly commented in both the .cha and the Audio_Coding_Issues.

Instructions

Open the coded audio file (e.g. XX_XX_coderXX_final.cha).
Open the file’s Audio Coding Issues. This will list what subregions were coded, what make-up time was needed, or any issues with the file. Read through the audio notes for any relevant information.
a. Earlier files may not all have Audio Coding Issues. In this case, you'll have to rely on the clancomments output.
Run the clancomments script. This will output all of the comments in a cha file, including the subregions comments with their timestamps.
a. NOTE: The timestamps of the clancomments.csv are always the line before the comment. This means you CANNOT use the clancomments.csv timestamps for the onsets!
Fill out the template (Template_subregsplice.csv in seedlings/Scripts_and_Apps) with the appropriate information.

For example:

sr_or_ex

sr_num

chron_num

onset

offset

comments

1234_2345

3456_4567

make-up region

5678_6789

7890_8901

10345_11456

extra region

14123_15234

18234_19234

22345_24234

26765_29876

a. sr_or_ex = subregion or extra region (values: "sr" or "ex")

b. sr_num = subregion number (can have repeated entries). Value should be NA for extra non-subregions.

c. chron_num = the chronological order of each entry, subregion or not. Each entry must be unique.

d. onset = the timestamp of the first line within the region (e.g. 12345_27890)

e. offset = the timestamp of the last line within the region (e.g. 82345_97890)

f. comments = either "make-up region" for skip time made up in a subregion OR "extra region" for other time coded (this is not common)

Save the file as XX_XX_subregsplice '''''as a .csv''''' in the Audio_Annotation directory (with the .cha), e.g. Subject_Files/01_01-01-2000/01_06/Home_Visit/Coding/Audio_Annotation.

Running the script

The subregsplice script can be found on GitHub. This script slices an audio track into component subregions, then combines these subregions into a new concatenated audio file. A corresponding .cha file will be produced with new timestamps to reflect the new positions of the subregions within the concatenated audio file. Each subregion will have an associated comment describing the displacement in time required to produce the original timestamps.

The script takes 4 arguments:
- Original .cha: For now, use the final annotated .cha. This will either be called newclan_merged.cha OR final.cha

Timestamps .csv: XX_XX_subregsplice.csv (created in previous steps, described above)
Audio file: XX_XX_scrubbed.wav, or if there's no scrubbed, XX_XX.wav. Be sure to use the scrubbed version if one exists.
If there is a scrubbed.aif but no scrubbed.wav, follow the instructions here for converting it.
Output directory: This is a folder where the new .cha and the new .wav will be output, Home_Visit/Processing/Audio_Files

$: python subrsplice.py [original_cha_file.cha] [subregsplice timestamps.csv] [audio file.wav] [output directory]

AGAIN, BE SURE TO USE THE SCRUBBED AUDIO FILE IF IT EXISTS.

Although it looks a little overwhelming, it is probably easiest just to use the full path names to each of these files (instead of copying them into one directory). For example:

$: python subrsplice.py Volumes/seedlings/Subject_Files/01_01-01-2016/01_06/Home_Visit/Coding/Audio_Annotation/01_06_coderSD_final.cha Volumes/seedlings/Subject_Files/01_01-01-2016/01_06/Home_Visit/Coding/Audio_Annotation/01_06_subregsplice.csv Volumes/seedlings/Subject_Files/01_01-01-2016/01_06/Home_Visit/Processing/Audio_Files/01_06_scrubbed.wav Volumes/seedlings/Subject_Files/01_01-01-2016/01_06/Home_Visit/Processing/Audio_Files/

After the script has run, check that it worked properly. There should be two new files: XX_XX_subregion_concat.wav AND XX_XX_subregion_concat.cha.
Rename the concat.cha file to include "_precheck", so it looks like XX_XX_subregion_concat_precheck.cha

Previousfix_speakercodes script NextSEEDLingS Data pipeline (October 2021)

Last updated 6 years ago