VTC (Voice Type Classifier)
How to run the voice type classifier on a set of wav files from VIHI
Wav files must be already renamed to "AB_XXX_YYY.wav" before you start.
Prerequisites:
You have set up the connection to the Duke Computing Cluster (DCC). This will require the involvement of Elika so budget your time accordingly.
On your computer
You have a working Python installation with blabpy installed and updated (>=0.15.0). Check with
pip show blabpy
.
On cluster (connect with
ssh <netid>@dcc-login.oit.duke.edu
)Check that
/hpc/group/bergelsonlab/VTC/VTC_repo
exists and isn't empty. If the folder doesn't exist or it's empty, clone the VTC repo into that location
```shell
cd /hpc/group/bergelsonlab/VTC
git clone --recurse-submodules https://github.com/MarvinLvn/voice_type_classifier.git VTC_repo
```
Check that you have a conda enironment called "pyannote":
conda activate pyannote
If it doesn't exist, create it and check again:
cd /hpc/group/bergelsonlab/VTC/VTC_repo srun --mem=16G --pty bash -i # wait until a prompt appears conda env create -f vtc.yml exit conda activate pyannote
Make sure you have
sox
installed (check withwhich sox
). If you don't, run
```shell
conda install -c conda-forge sox
```
General steps
Copy wav file(s) to the cluster from your computer's shell. It is necessary because the cluster doesn't have access to PN-OPUS. Use FileZilla or
scp
, see Copying filesConnect to the cluster in the terminal. Run VTC. Delete wavs, disconnect.
Copy
all.rttm
file (VTC output) back to your computer and run a function that will distribute its contents into the individual subjets' folder.
Detailed version
Open two terminal windows. Connect to the cluster in one of them:
```shell
ssh <netid>@dcc-login.oit.duke.edu
```
On the terminal window that is connected to the cluster:
Create a new folder under
/hpc/group/bergelsonlab/VTC/wavs
vtc_dir=/hpc/group/bergelsonlab/VTC wav_dir=$vtc_dir/wavs/<my-new-folder> mkdir -p $wav_dir
We'll use
wav_dir
later on, so do define this variable. It's better to create a unique folder each time you do this process.
On your terminal window that is still local to your computer:
Copy the wav files to
wav_dir
. Use FileZilla (by dragging from pn-opus and dropping into the new wav_dir you just made) orscp
, see Copying files
On the terminal window that is connected to the cluster:
Activate conda environment "pyannote":
conda activate pyannote
Check that all the files are wav files sampled at 16 KHz:
```shell
soxi -t $(find "$wav_dir" -type f) # file types
soxi -r $(find "$wav_dir" -type f) # sampling rate
```
Change into the
VTC_repo
folder withcd $vtc_dir/VTC_repo
Switch to a gpu-enabled "computer":
```shell
srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
```
Check that it worked:
1. Wait for the following to appear (the number will be different):\
`srun: job 19332600 has been allocated resources`
2. Check that your prompt now ends with `<net-id>@dcc-core-gpu-<x>` (where `<x>` is some number)
3. Check that your prompt still starts with `(pyannote)` If it doesn't, activate `pyannote` again with `conda activate pyannote`
4. You may also need to remind the cluster of your previously set variables. If you get an "access denied"/ no such directory error when you try to run VTC, rerun these two commands prior to setting the error log and output log variables in the next step:
<pre><code><strong>vtc_dir=/hpc/group/bergelsonlab/VTC
</strong><strong>wav_dir=$vtc_dir/wavs/<my-new-folder>
</strong></code></pre>
5. Start VTC and wait (~15 minutes per file but can vary a lot):
```shell
error_log=$wav_dir/error.log
output_log=$wav_dir/output.log
./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
```
Check that everything went well. Either open
error.log
andoutput.log
files in FileZilla (right click and click "View or edit"-- don't double click which may be your instinct) or useless
on$error_log
and$output_log
to view the log files in the terminal window (runless $error_log
to open, press [Q] to exit the viewer). Here is what yourerror.log
should look like:Test set: <N>it [07:06, 426.16s/it] Test set: <N>it [01:14, 74.71s/it] Test set: <N>it [01:14, 74.95s/it] Test set: <N>it [01:15, 75.11s/it] Test set: <N>it [01:15, 75.93s/it] Test set: <N>it [01:15, 75.00s/it]
Where <N> is the number of files you were processing. And here is
output.log
:Creating config for pyannote. Done creating config for pyannote. Took 3430 sec on <wav_dir>.
⚠️ Continue only if all is good! ⚠️
Once the job is finished and if there were no errors:
Copy the output to
$wav_dir
.
```shell
wav_dir_name=$(basename $wav_dir)
vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
```
2. Delete wav files from DCC:
```shell
rm $wav_dir/*.wav
```
Back on your computer:
Make an empty folder and copy
all.rttm
file fromwav_dir
to it (usescp
or FileZilla)cd
into that folder in the terminal.Run
vihi distribute-all-rttm
from your command line (this is a function in blabpy) and check the output.
In short
Set a few variables and make a new folder on the cluster.
If not on a Duke computer: substitute your Net ID for $USER
in the first line.
net_id=$USER
ssh [email protected]
vtc_dir=/hpc/group/bergelsonlab/VTC
wav_dir=$vtc_dir/wavs/$(date +%Y-%m-%d)_$net_id
mkdir $wav_dir
echo "Copy wav files to:"
echo $wav_dir
Copy the wav files (with VIHI-formatted names!) to the folder printed (use
scp
/FileZilla).Check filetypes (must be "wav") and sampling rates (must be 16000)
conda activate pyannote
soxi -t $(find "$wav_dir" -type f) # file types must be "wav"
soxi -r $(find "$wav_dir" -type f) # sampling rates must be 16000
Start VTC
error_log=$wav_dir/error.log
output_log=$wav_dir/output.log
cd $vtc_dir/VTC_repo
srun -p gpu-common --gres=gpu:1 --mem=32G -c 8 --pty bash -i
# wait for allocation
./apply.sh $wav_dir --device=gpu 2> $error_log 1> $output_log &
After ~15 minutes per file, check
error.log
(should have six rows with "<N>it") andoutput.log
(look for "Took X sec ...").If all is good, copy the output and delete the wavs:
wav_dir_name=$(basename $wav_dir)
vtc_output_dir=$vtc_dir/VTC_repo/output_voice_type_classifier
cp -a $vtc_output_dir/$wav_dir_name/. $wav_dir
rm $wav_dir/*.wav
Copy (
scp
/FileZilla)all.rttm
fromwav_dir
to a new folder on your local computer (not on PN-OPUS) andcd
into that folder.Run
vihi distribute-all-rttm
.
Last updated