VTC (Voice Type Classifier)

How to run the voice type classifier on a set of wav files from VIHI

Wav files must be already renamed to "AB_XXX_YYY.wav" before you start.

Prerequisites:

You have set up the connection to the Duke Computing Cluster (DCC). This will require the involvement of Elika so budget your time accordingly.
On your computer
- You have a working Python installation with blabpy installed and updated (>=0.15.0). Check with pip show blabpy.
On cluster (connect with ssh <netid>@dcc-login.oit.duke.edu)
- Check that /hpc/group/bergelsonlab/VTC/VTC_repo exists and isn't empty. If the folder doesn't exist or it's empty, clone the VTC repo into that location

  ```shell
  cd /hpc/group/bergelsonlab/VTC
  git clone --recurse-submodules https://github.com/MarvinLvn/voice_type_classifier.git VTC_repo
  ```

Check that you have a conda enironment called "pyannote":

conda activate pyannote

If it doesn't exist, create it and check again:

cd /hpc/group/bergelsonlab/VTC/VTC_repo
srun --mem=16G --pty bash -i
# wait until a prompt appears
conda env create -f vtc.yml
exit
conda activate pyannote

Make sure you have sox installed (check with which sox). If you don't, run

  ```shell
  conda install -c conda-forge sox
  ```

General steps

Copy wav file(s) to the cluster from your computer's shell. It is necessary because the cluster doesn't have access to PN-OPUS. Use FileZilla or scp, see Copying files
Connect to the cluster in the terminal. Run VTC. Delete wavs, disconnect.
Copy all.rttm file (VTC output) back to your computer and run a function that will distribute its contents into the individual subjets' folder.

Detailed version

Open two terminal windows. Connect to the cluster in one of them:

```shell
ssh <netid>@dcc-login.oit.duke.edu
```

On the terminal window that is connected to the cluster:
- Create a new folder under /hpc/group/bergelsonlab/VTC/wavs
  vtc_dir=/hpc/group/bergelsonlab/VTC wav_dir=$vtc_dir/wavs/<my-new-folder> mkdir -p $wav_dir
  We'll use wav_dir later on, so do define this variable. It's better to create a unique folder each time you do this process.
On your terminal window that is still local to your computer:
- Copy the wav files to wav_dir. Use FileZilla (by dragging from pn-opus and dropping into the new wav_dir you just made) or scp, see Copying files
On the terminal window that is connected to the cluster:
1. Activate conda environment "pyannote": conda activate pyannote
2. Check that all the files are wav files sampled at 16 KHz:

  ```shell
  soxi -t $(find "$wav_dir" -type f)  # file types
  soxi -r $(find "$wav_dir" -type f)  # sampling rate
  ```

Change into the VTC_repo folder with
```
cd $vtc_dir/VTC_repo
```
Switch to a gpu-enabled "computer":

  ```shell
  srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
  ```

  Check that it worked:

  1. Wait for the following to appear (the number will be different):\
     `srun: job 19332600 has been allocated resources`
  2. Check that your prompt now ends with `<net-id>@dcc-core-gpu-<x>` (where `<x>` is some number)
  3. Check that your prompt still starts with `(pyannote)` If it doesn't, activate `pyannote` again with `conda activate pyannote`
  4.  You may also need to remind the cluster of your previously set variables. If you get an "access denied"/ no such directory error when you try to run VTC, rerun these two commands prior to setting the error log and output log variables in the next step:

      <pre><code><strong>vtc_dir=/hpc/group/bergelsonlab/VTC
      </strong><strong>wav_dir=$vtc_dir/wavs/&#x3C;my-new-folder>
      </strong></code></pre>

5. Start VTC and wait (~15 minutes per file but can vary a lot):

  ```shell
  error_log=$wav_dir/error.log
  output_log=$wav_dir/output.log
  ./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
  ```

Check that everything went well. Either open error.log and output.log files in FileZilla (right click and click "View or edit"-- don't double click which may be your instinct) or use less on $error_log and $output_log to view the log files in the terminal window (run less $error_log to open, press [Q] to exit the viewer). Here is what your error.log should look like:
```
Test set: <N>it [07:06, 426.16s/it]
Test set: <N>it [01:14, 74.71s/it]
Test set: <N>it [01:14, 74.95s/it]
Test set: <N>it [01:15, 75.11s/it]
Test set: <N>it [01:15, 75.93s/it]
Test set: <N>it [01:15, 75.00s/it]
```
Where <N> is the number of files you were processing. And here is output.log:
```
Creating config for pyannote.
Done creating config for pyannote. 
Took 3430 sec on <wav_dir>.
```
⚠️ Continue only if all is good! ⚠️
Once the job is finished and if there were no errors:
1. Copy the output to $wav_dir.

     ```shell
     wav_dir_name=$(basename $wav_dir)
     vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
     cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
     ```

 2.  Delete wav files from DCC:

     ```shell
     rm $wav_dir/*.wav
     ```

Back on your computer:
1. Make an empty folder and copy all.rttm file from wav_dir to it (use scp or FileZilla)
2. cd into that folder in the terminal.
3. Run vihi distribute-all-rttm from your command line (this is a function in blabpy) and check the output.

In short

Set a few variables and make a new folder on the cluster.

If not on a Duke computer: substitute your Net ID for $USER in the first line.

net_id=$USER
ssh [email protected]
vtc_dir=/hpc/group/bergelsonlab/VTC
wav_dir=$vtc_dir/wavs/$(date +%Y-%m-%d)_$net_id
mkdir $wav_dir
echo "Copy wav files to:"
echo $wav_dir

Copy the wav files (with VIHI-formatted names!) to the folder printed (use scp/FileZilla).
Check filetypes (must be "wav") and sampling rates (must be 16000)

conda activate pyannote
soxi -t $(find "$wav_dir" -type f)  # file types must be "wav"
soxi -r $(find "$wav_dir" -type f)  # sampling rates must be 16000

Start VTC

error_log=$wav_dir/error.log
output_log=$wav_dir/output.log
cd $vtc_dir/VTC_repo
srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
# wait for allocation
./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &

After ~15 minutes per file, check error.log (should have six rows with "<N>it") and output.log (look for "Took X sec ...").
If all is good, copy the output and delete the wavs:

wav_dir_name=$(basename $wav_dir)
vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
rm $wav_dir/*.wav

Copy (scp/FileZilla) all.rttm from wav_dir to a new folder on your local computer (not on PN-OPUS) and cd into that folder.
Run vihi distribute-all-rttm.

Page Status: needs updating

Status details: Duke-related keywords found on the page.

Last updated by ?? on ??/??/??.

PreviousFileZilla NextBox

Last updated 9 months ago