About RecSpe - Automatic Speaker Recognition Toolkit

This application provides recognition and segmentation of speakers by running experiments implemented as a shell scripts in separate processes. It is based on the Qt Plugin System so it can be easily widen. It means that it consists of a lot of libraries which are located in the lib directory of application. These libraries are often dependent on other libraries. The most significant libraries are LNKnet (MIT USA) and HTK. Both are available to be downloaded without pay but to get HTK (Cambridge University England) you have to register. This program use except these libraries: ALSA sound system library, that is a part of Linux kernel, the libasound2 library that enables to work with the ALSA system in program, libresample (CCRMA Stanford England + Asterisk modification), sndlib (CCRMA Stanford England), MPEG Audio Decoder library (Underbit Technologies).

Configuration of application

Configuration of application is done by the XML files. Default configuration file is conf.xml and is validated against the conf.xsd file. These files have the structure with following elements:

parametrization element specifies all parametrizers libraries, has one attribute algorithm that specifies selected algorithm name and it consists of elements parametrizer. parametrizer element has one attribute name that specifies the name of library stored in lib directory. This program has one library libparamdefault.so with dialog that changes the parameters of this library:

classifiers element holds all classifiers libraries, it consists of elements classifier. The classifier element has one attribute name that specifies the name of library stored in lib directory. The classifier element has also elements that holds the parameters of current classifier. These are param elements with attribute name that specifies the name of the parameter and the value of element specifies the value of parameter. This program has the following libraries:
Gaussian Mixture Model libgmm.so


Hidden Markov Model libhmm.so


Multi Layer Perceptron libmlp.so


Nearest Cluster libncclassifier.so

clusters element holds all clusters libraries, it consists of elements cluster with one attribute name that specifies the name of the cluster library in lib dir. Cluster library provides minimal one cluster algorithm that parameters are specified in params element with attribute name that is equal to the algorithm name. Than parameters of one algorithm is hold in the param element with attribute name as parameter name and value as parameter value.

train element holds all training audio files used in recognition. It consists of file elements that values are the absolute or relative paths to the files.
test element holds all testing audio files used in recognition. It consists of file elements that values are paths to the files. It has one attribute seconds . The attribute tells whether to border the duration of testing audio files or not. If its value is non-zero than it borders the duration else not.

sound element holds all sound libraries, it consists of soundLib elements with attribute name that equal the name of sound library in lib dir.

Recognition experiments

To perform recognition experiment you have to setup these:

Optional are:

Segmentation

To perform segmentation of one selected file you have to setup:

Optional are:

Training files setup

To setup the training files list click on the button Add voices [F10] near the label Trained voices. You will see the dialog for adding new files that can have the suffix list or pcm. Find the directory with the corpus files and select the pcm files or in data directory click on list file and press Open. After this you can remove some or all files listed in the Trained voices list. To remove all files click on icon with thrash {14}. To remove some files select them in the list and press Delete key. You can also move the files from the Trained voices to Tested voices by selecting them and clicking on the button >

Testing files setup

To setup the testing files list click on the button Add voices [F11] near the label Tested voices. You will see the dialog for adding new files that can have the suffix list or pcm. Find the directory with the corpus files and select the pcm files or in data directory click on list file and press Open. After this you can remove all or some files listed in Tested voices list. To remove all files click on icon with thrash {18}. To remove some files select them in the list and press Delete key. You can also move the files from the Tested voices to the Trained voices by selecting them and clicking on the button <

Parametrization algorithm setup

To setup parametrization algorithm select the algorithm from the box Extraction coefficients {19}. If you want to change some parameters of the algorithm click on the button Change parameters [Ctrl+Alt+P] below the box of algorithm. If the library provides it you will see a dialog with boxes to change the values of parameters else you will get a message with error that there is no dialog to change these parameters. To modify parameters in dialog setup them in boxes and click on Close button. If you changed parameters and dont want to get the change be made close the dialog by clicking on the cross on border of window.

Classification algorithm setup

To setup classification algorithm select the algorithm from the box {11}. If you want to change parameters of the algorithm click on the button Change parameters [Ctrl+Alt+C] below the box with name of algorithm. If the library provides it you will see a dialog with boxes to change the values of parameters else you will get a message with error that there is no dialog to change these parameters. To modify parameters in dialog setup them in boxes and click on Close button. If you changed the parameters and dont want to get the change be made close the dialog by clicking on the cross on border of window.

Result file setup

To setup a result file click on the button Select Result File and dialog to select the file will be viewed. In combo box labeled as Files of type select the file type to be stored. The provided file types are:

Open-set or Closed-set classification setup

To setup open-set classification check the radio button with name Open Set [Ctrl+Alt+O] and set the value of threshold to the percent of classified vectors from all vectors in file as the limit below it the speaker will be recognized as Unknown. To set the closed-set classification check the radio button with name Closed Set [Ctrl+Alt+S].

Testing files duration setup

To setup testing files duration check the box with label Seconds of Tested File and set the value in box to seconds you want to use for recognition.

Selection of segmentation file

To setup the file to segment click on the button Open Audio File [F12] and select the file from the dialog and click Open. You will see the duration of file. The combo box in audio input frame should be set to the value File. If you want to use the microphone as source select it in combo box to Microphone.

Setup output pcm file

To setup output file from the segmentation running check the box Save the output audio and click on the button Select path [Ctrl+H] to select the audio filename. You will see a dialog to select the file to be saved as a output and after selecting the file click on the button Save.

Playing audio

To play audio file selected by the Open Audio File [F12] button click on the button with icon play [Ctrl+P]. To stop playing click on icon stop [Ctrl+X].

Showing recognition results

To show recognition results you can use the Result console or click on the button [Ctrl+W] to show tables with complete results and also with a confusion matrix.

Save results from results console

To save results from the results console click on the button Save log [F5] and select the file to be saved.

Reset console

To reset the console click on the button Reset results [F2].

Loading and storing configuration

To load and to store XML configuration files click on the buttons Load Conf [F4] respective Save Conf [F3] and select the file you want to load or to save.

Visualization of segmentation

To visualize the segmentation results from stored results file with suffix res click on the button Visualize [F6] and you will see a dialog with components that shows the results. In dialog set the results file by clicking on the Load button and selecting the file from the Open File dialog. If you select proper file with right structure you will get the colored wave by the speakers. To play the file click on play button to stop on stop button.

Add new speaker

To add new speaker to corpus click on button Add [F7] and you will see a dialog that performs this. Select the directory to store the pcm files with sentences. Select the speaker name. Select the file with sentences to be recorded. Then after loading of the file with sentences you will see a sentence in the text box and by clicking on rec icon you can start to record the speaker. To stop the recording click on stop button and to play the record click on play button. After recording of sentence is finished continue on next sentence by clicking on next icon. To change the previous sentence record click on previous button and record it.