VTC (Voice Type Classifier)

How to run the voice type classifier on a set of wav files from VIHI

Prerequisites:

  • You have set up the connection to the Duke Computing Cluster (DCC). This will require the involvement of Elika so budget your time accordingly.

  • On your computer

    • You have a working Python installation with blabpy installed and updated (>=0.15.0). Check with pip show blabpy.

  • On cluster (connect with ssh <netid>@dcc-login.oit.duke.edu)

    • Check that /hpc/group/bergelsonlab/VTC/VTC_repo exists and isn't empty. If the folder doesn't exist or it's empty, clone the VTC repo into that location

  ```shell
  cd /hpc/group/bergelsonlab/VTC
  git clone --recurse-submodules https://github.com/MarvinLvn/voice_type_classifier.git VTC_repo
  ```
  
  • Check that you have a conda enironment called "pyannote":

    conda activate pyannote

    If it doesn't exist, create it and check again:

    cd /hpc/group/bergelsonlab/VTC/VTC_repo
    srun --mem=16G --pty bash -i
    # wait until a prompt appears
    conda env create -f vtc.yml
    exit
    conda activate pyannote
  • Make sure you have sox installed (check with which sox). If you don't, run

  ```shell
  conda install -c conda-forge sox
  ```
  

General steps

  1. Copy wav file(s) to the cluster from your computer's shell. It is necessary because the cluster doesn't have access to PN-OPUS. Use FileZilla or scp, see Copying files

  2. Connect to the cluster in the terminal. Run VTC. Delete wavs, disconnect.

  3. Copy all.rttm file (VTC output) back to your computer and run a function that will distribute its contents into the individual subjets' folder.

Detailed version

  • Open two terminal windows. Connect to the cluster in one of them:

```shell
ssh <netid>@dcc-login.oit.duke.edu
```
  • On the terminal window that is connected to the cluster:

    • Create a new folder under /hpc/group/bergelsonlab/VTC/wavs

      vtc_dir=/hpc/group/bergelsonlab/VTC
      wav_dir=$vtc_dir/wavs/<my-new-folder>
      mkdir -p $wav_dir

      We'll use wav_dir later on, so do define this variable. It's better to create a unique folder each time you do this process.

  • On your terminal window that is still local to your computer:

    • Copy the wav files to wav_dir. Use FileZilla (by dragging from pn-opus and dropping into the new wav_dir you just made) or scp, see Copying files

  • On the terminal window that is connected to the cluster:

    1. Activate conda environment "pyannote": conda activate pyannote

    2. Check that all the files are wav files sampled at 16 KHz:

  ```shell
  soxi -t $(find "$wav_dir" -type f)  # file types
  soxi -r $(find "$wav_dir" -type f)  # sampling rate
  ```
  
  1. Change into the VTC_repo folder with

    cd $vtc_dir/VTC_repo
  2. Switch to a gpu-enabled "computer":

  ```shell
  srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
  ```
  
  Check that it worked:

  1. Wait for the following to appear (the number will be different):\
     `srun: job 19332600 has been allocated resources`
  2. Check that your prompt now ends with `<net-id>@dcc-core-gpu-<x>` (where `<x>` is some number)
  3. Check that your prompt still starts with `(pyannote)` If it doesn't, activate `pyannote` again with `conda activate pyannote`
  4.  You may also need to remind the cluster of your previously set variables. If you get an "access denied"/ no such directory error when you try to run VTC, rerun these two commands prior to setting the error log and output log variables in the next step:

      <pre><code><strong>vtc_dir=/hpc/group/bergelsonlab/VTC
      </strong><strong>wav_dir=$vtc_dir/wavs/&#x3C;my-new-folder>
      </strong></code></pre>

5. Start VTC and wait (~15 minutes per file but can vary a lot):

  ```shell
  error_log=$wav_dir/error.log
  output_log=$wav_dir/output.log
  ./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
  ```
  
  1. Check that everything went well. Either open error.log and output.log files in FileZilla (right click and click "View or edit"-- don't double click which may be your instinct) or use less on $error_log and $output_log to view the log files in the terminal window (run less $error_log to open, press [Q] to exit the viewer). Here is what your error.log should look like:

    Test set: <N>it [07:06, 426.16s/it]
    Test set: <N>it [01:14, 74.71s/it]
    Test set: <N>it [01:14, 74.95s/it]
    Test set: <N>it [01:15, 75.11s/it]
    Test set: <N>it [01:15, 75.93s/it]
    Test set: <N>it [01:15, 75.00s/it]

    Where <N> is the number of files you were processing. And here is output.log:

    Creating config for pyannote.
    Done creating config for pyannote. 
    Took 3430 sec on <wav_dir>.

    ⚠️ Continue only if all is good! ⚠️

  2. Once the job is finished and if there were no errors:

    1. Copy the output to $wav_dir.

     ```shell
     wav_dir_name=$(basename $wav_dir)
     vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
     cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
     ```
     
 2.  Delete wav files from DCC:

     
     ```shell
     rm $wav_dir/*.wav
     ```
     
  • Back on your computer:

    1. Make an empty folder and copy all.rttm file from wav_dir to it (use scp or FileZilla)

    2. cd into that folder in the terminal.

    3. Run vihi distribute-all-rttm from your command line (this is a function in blabpy) and check the output.

In short

  • Set a few variables and make a new folder on the cluster.

net_id=$USER
ssh [email protected]
vtc_dir=/hpc/group/bergelsonlab/VTC
wav_dir=$vtc_dir/wavs/$(date +%Y-%m-%d)_$net_id
mkdir $wav_dir
echo "Copy wav files to:"
echo $wav_dir
  • Copy the wav files (with VIHI-formatted names!) to the folder printed (use scp/FileZilla).

  • Check filetypes (must be "wav") and sampling rates (must be 16000)

conda activate pyannote
soxi -t $(find "$wav_dir" -type f)  # file types must be "wav"
soxi -r $(find "$wav_dir" -type f)  # sampling rates must be 16000
  • Start VTC

error_log=$wav_dir/error.log
output_log=$wav_dir/output.log
cd $vtc_dir/VTC_repo
srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
# wait for allocation
./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
  • After ~15 minutes per file, check error.log (should have six rows with "<N>it") and output.log (look for "Took X sec ...").

  • If all is good, copy the output and delete the wavs:

wav_dir_name=$(basename $wav_dir)
vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
rm $wav_dir/*.wav
  • Copy (scp/FileZilla) all.rttm from wav_dir to a new folder on your local computer (not on PN-OPUS) and cd into that folder.

  • Run vihi distribute-all-rttm.

Page Status: needs updating

Status details: Duke-related keywords found on the page.

Last updated by ?? on ??/??/??.

Last updated