VTC (Voice Type Classifier)
How to run the voice type classifier on a set of wav files from VIHI
Last updated
How to run the voice type classifier on a set of wav files from VIHI
Last updated
Wav files must be already renamed to "AB_XXX_YYY.wav" before you start.
You have the connection to the Duke Computing Cluster (DCC). This will require the involvement of Elika so budget your time accordingly.
On your computer
You have a working Python installation with installed and updated (>=0.15.0). Check with pip show blabpy
.
On cluster (connect with ssh <netid>@dcc-login.oit.duke.edu
)
Check that /hpc/group/bergelsonlab/VTC/VTC_repo
exists and isn't empty. If the folder doesn't exist or it's empty, clone the VTC into that location
Check that you have a conda enironment called "pyannote":
If it doesn't exist, create it and check again:
Make sure you have sox
installed (check with which sox
). If you don't, run
Connect to the cluster in the terminal. Run VTC. Delete wavs, disconnect.
Copy all.rttm
file (VTC output) back to your computer and run a function that will distribute its contents into the individual subjets' folder.
Open two terminal windows. Connect to the cluster in one of them:
On the terminal window that is connected to the cluster:
Create a new folder under /hpc/group/bergelsonlab/VTC/wavs
We'll use wav_dir
later on, so do define this variable. It's better to create a unique folder each time you do this process.
On your terminal window that is still local to your computer:
On the terminal window that is connected to the cluster:
Activate conda environment "pyannote": conda activate pyannote
Check that all the files are wav files sampled at 16 KHz:
Change into the VTC_repo
folder with
Switch to a gpu-enabled "computer":
5. Start VTC and wait (~15 minutes per file but can vary a lot):
Check that everything went well. Either open error.log
and output.log
files in FileZilla (right click and click "View or edit"-- don't double click which may be your instinct) or use less
on $error_log
and $output_log
to view the log files in the terminal window (run less $error_log
to open, press [Q] to exit the viewer).
Here is what your error.log
should look like:
Where <N> is the number of files you were processing. And here is output.log
:
Once the job is finished and if there were no errors:
Copy the output to $wav_dir
.
Back on your computer:
cd
into that folder in the terminal.
Run vihi distribute-all-rttm
from your command line (this is a function in blabpy) and check the output.
Set a few variables and make a new folder on the cluster.
If not on a Duke computer: substitute your Net ID for $USER
in the first line.
Copy the wav files (with VIHI-formatted names!) to the folder printed (use scp
/FileZilla).
Check filetypes (must be "wav") and sampling rates (must be 16000)
Start VTC
After ~15 minutes per file, check error.log
(should have six rows with "<N>it") and output.log
(look for "Took X sec ...").
If all is good, copy the output and delete the wavs:
Copy (scp
/FileZilla) all.rttm
from wav_dir
to a new folder on your local computer (not on PN-OPUS) and cd
into that folder.
Run vihi distribute-all-rttm
.
Copy wav file(s) to the cluster from your computer's shell. It is necessary because the cluster doesn't have access to PN-OPUS. Use FileZilla or scp
, see
Copy the wav files to wav_dir
. Use FileZilla (by dragging from pn-opus and dropping into the new wav_dir you just made) or scp
, see
Continue only if all is good!
Make an empty folder and copy all.rttm
file from wav_dir
to it (use scp
or )