Generate an .eaf using the segment sampling script

Generating random and high-volubility samples

There are three main parts to this:

  • Running VTC on the audio file on DCC. Instructions here.

We need to run VTC because the high-volubility intervals are selected by choosing 2-minute-long intervals that maximize the total duration of segments that VTC marks as speech.

  • Running blabpy functions that create an eaf with random intervals and then add high-volubility intervals:

    • Calculate the age in months.

    • Calculate the audio duration in minutes. Use soxi -D xxx.wav to get the duration in seconds, then divide by 60 and round down. (if you get "command not found", install sox with Hombrew (brew install sox) or conda (conda install -c conda-forge sox).

    • In python/iPython (substitute id/age/length for your recording):

      • set the recording_id, age, and duration:

      full_recording_id = 'XX_NNN_MMM'
      age_in_months = KK
      length_in_minutes = LLL
      • Run the following code:

      from blabpy.vihi.intervals.intervals import create_files_with_random_regions
      from blabpy.vihi.pipeline import add_intervals_for_annotation
      
      # Random intervals
      create_files_with_random_regions(
          full_recording_id=full_recording_id,
          age=age_in_months,
          length_of_recording=length_in_minutes)
          
      # High-volubility ones
      add_intervals_for_annotation(full_recording_id=full_recording_id)
  • Check and log:

    • Check *.eaf and selected_regions.csv and confirm that there are 15 random, 15 high-volubility, and 5 high-volubility-extra regions.

    • If all looks good, go into processed_participants.csv and fill in the columns for your files, including the on- and offset times from selected_regions.csv.

Page Status: needs updating

Status details: Duke-related keywords found on the page.

Last updated by ?? on ??/??/??.

Last updated