VTC (Voice Type Classifier)

How to run the voice type classifier on a set of wav files from VIHI

Wav files must be already renamed to "AB_XXX_YYY.wav" before you start.

Prerequisites:

  • You have set up the connection to the Duke Computing Cluster (DCC). This will require the involvement of Elika so budget your time accordingly.

  • On your computer

    • You have a working Python installation with blabpy installed and updated (>=0.15.0). Check with pip show blabpy.

  • On cluster (connect with ssh <netid>@dcc-login.oit.duke.edu)

    • Check that /hpc/group/bergelsonlab/VTC/VTC_repo exists and isn't empty. If the folder doesn't exist or it's empty, clone the VTC repo into that location

  ```shell
  cd /hpc/group/bergelsonlab/VTC
  git clone --recurse-submodules https://github.com/MarvinLvn/voice_type_classifier.git VTC_repo
  ```
  
  • Check that you have a conda enironment called "pyannote":

    conda activate pyannote

    If it doesn't exist, create it and check again:

    cd /hpc/group/bergelsonlab/VTC/VTC_repo
    srun --mem=16G --pty bash -i
    # wait until a prompt appears
    conda env create -f vtc.yml
    exit
    conda activate pyannote
  • Make sure you have sox installed (check with which sox). If you don't, run

  ```shell
  conda install -c conda-forge sox
  ```
  

General steps

  1. Connect to the cluster in the terminal. Run VTC. Delete wavs, disconnect.

  2. Copy all.rttm file (VTC output) back to your computer and run a function that will distribute its contents into the individual subjets' folder.

Detailed version

  • Open two terminal windows. Connect to the cluster in one of them:

```shell
ssh <netid>@dcc-login.oit.duke.edu
```
  • On the terminal window that is connected to the cluster:

    • Create a new folder under /hpc/group/bergelsonlab/VTC/wavs

      vtc_dir=/hpc/group/bergelsonlab/VTC
      wav_dir=$vtc_dir/wavs/<my-new-folder>
      mkdir -p $wav_dir

      We'll use wav_dir later on, so do define this variable. It's better to create a unique folder each time you do this process.

  • On your terminal window that is still local to your computer:

  • On the terminal window that is connected to the cluster:

    1. Activate conda environment "pyannote": conda activate pyannote

    2. Check that all the files are wav files sampled at 16 KHz:

  ```shell
  soxi -t $(find "$wav_dir" -type f)  # file types
  soxi -r $(find "$wav_dir" -type f)  # sampling rate
  ```
  
  1. Change into the VTC_repo folder with

    cd $vtc_dir/VTC_repo
  2. Switch to a gpu-enabled "computer":

  ```shell
  srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
  ```
  
  Check that it worked:

  1. Wait for the following to appear (the number will be different):\
     `srun: job 19332600 has been allocated resources`
  2. Check that your prompt now ends with `<net-id>@dcc-core-gpu-<x>` (where `<x>` is some number)
  3. Check that your prompt still starts with `(pyannote)` If it doesn't, activate `pyannote` again with `conda activate pyannote`
  4.  You may also need to remind the cluster of your previously set variables. If you get an "access denied"/ no such directory error when you try to run VTC, rerun these two commands prior to setting the error log and output log variables in the next step:

      <pre><code><strong>vtc_dir=/hpc/group/bergelsonlab/VTC
      </strong><strong>wav_dir=$vtc_dir/wavs/&#x3C;my-new-folder>
      </strong></code></pre>

5. Start VTC and wait (~15 minutes per file but can vary a lot):

  ```shell
  error_log=$wav_dir/error.log
  output_log=$wav_dir/output.log
  ./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
  ```
  
  1. Check that everything went well. Either open error.log and output.log files in FileZilla (right click and click "View or edit"-- don't double click which may be your instinct) or use less on $error_log and $output_log to view the log files in the terminal window (run less $error_log to open, press [Q] to exit the viewer). Here is what your error.log should look like:

    Test set: <N>it [07:06, 426.16s/it]
    Test set: <N>it [01:14, 74.71s/it]
    Test set: <N>it [01:14, 74.95s/it]
    Test set: <N>it [01:15, 75.11s/it]
    Test set: <N>it [01:15, 75.93s/it]
    Test set: <N>it [01:15, 75.00s/it]

    Where <N> is the number of files you were processing. And here is output.log:

    Creating config for pyannote.
    Done creating config for pyannote. 
    Took 3430 sec on <wav_dir>.

    ⚠️ Continue only if all is good! ⚠️

  2. Once the job is finished and if there were no errors:

    1. Copy the output to $wav_dir.

     ```shell
     wav_dir_name=$(basename $wav_dir)
     vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
     cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
     ```
     
 2.  Delete wav files from DCC:

     
     ```shell
     rm $wav_dir/*.wav
     ```
     
  • Back on your computer:

    1. Make an empty folder and copy all.rttm file from wav_dir to it (use scp or FileZilla)

    2. cd into that folder in the terminal.

    3. Run vihi distribute-all-rttm from your command line (this is a function in blabpy) and check the output.

In short

  • Set a few variables and make a new folder on the cluster.

If not on a Duke computer: substitute your Net ID for $USER in the first line.

net_id=$USER
ssh $net_id@dcc-login.oit.duke.edu
vtc_dir=/hpc/group/bergelsonlab/VTC
wav_dir=$vtc_dir/wavs/$(date +%Y-%m-%d)_$net_id
mkdir $wav_dir
echo "Copy wav files to:"
echo $wav_dir
  • Copy the wav files (with VIHI-formatted names!) to the folder printed (use scp/FileZilla).

  • Check filetypes (must be "wav") and sampling rates (must be 16000)

conda activate pyannote
soxi -t $(find "$wav_dir" -type f)  # file types must be "wav"
soxi -r $(find "$wav_dir" -type f)  # sampling rates must be 16000
  • Start VTC

error_log=$wav_dir/error.log
output_log=$wav_dir/output.log
cd $vtc_dir/VTC_repo
srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
# wait for allocation
./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
  • After ~15 minutes per file, check error.log (should have six rows with "<N>it") and output.log (look for "Took X sec ...").

  • If all is good, copy the output and delete the wavs:

wav_dir_name=$(basename $wav_dir)
vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
rm $wav_dir/*.wav
  • Copy (scp/FileZilla) all.rttm from wav_dir to a new folder on your local computer (not on PN-OPUS) and cd into that folder.

  • Run vihi distribute-all-rttm.

Page Status: needs updating

Status details: Duke-related keywords found on the page.

Last updated by ?? on ??/??/??.

Last updated