Software


Enhanced Local Binary Patterns (E-LBP) method

This is an implementation of a novel automatic face recognition approach based on local binary patterns (LBP). LBP descriptor considers a local neighbourhood of a pixel to compute the features. This method is not very robust to handle image noise, variances and different illumination conditions. This method address these issues and extend the original LBP operator by considering more pixels and different neighbourhoods to compute the feature vector and propose enhanced local binary patterns (E-LBP) method.

We have evaluated this method on two benchmark corpora, namely UFI and FERET face datasets. We experimentally show that our approach is very efficient, because it significantly outperforms several other state-of-the-art methods and is efficient particularly in the real conditions where the above mentioned issues are obvious.

For further information about E-LBP approach, see the paper below. Please, cite this paper when you use these source codes.



RecSpe - Automatic Speaker Recognition Toolkit

RecSpe is a toolkit for automatic speaker recognition. It proposes the following main functionality: speech recording; speech parametrisation (MFCC, PLPC, LPREFC, LPCEPSTRA and Discrete Wavelet Transform algorithms available); speaker classification (training and testing procedures; some different classifiers such as GMM, MLP, etc.); speaker segmentation (if more speakers are speaking). RecSpe is based on the Qt Plug-in system so its functionality can be easily extended.



jDALabeler - tool for Dialog Act Corpus Labeling

jDALabeler is a tool for manual Dialog Act (DA) corpus labeling. Dialog acts are saved in the predefined schemes (Meeting Recorder Dialogue Act, Verbmobil, etc). jDALabeler also allows to create additional DA schemes, if necessary.



AutoFaceRec - Automatic Face Recognition System

Automatic Face Recognition System (AutoFaceRec) is a tool-kit designed for face detection and automatic recognition from real-world photographs. It means recognizing people in ordinary photographs that are not acquired in controlled environment. The quality of such photographs is significantly lower than in the case of photographs usually used for testing of Automatic Face Recognition (AFR) methods. The face in these photographs is often rotated, tilted or occluded and the pose is not uniform. Therefore, the recognition from such photographs is very difficult. This tool-kit allows creating a fully automated face recognition system from the following modules depending on the needs of the users. Five main modules are implemented:



Corpora

Czech Text Document Corpus v 2.0

General Information

Czech Text Document Corpus v 2.0 is a collection of text documents for automatic document classification in Czech language. It is composed of 11,955 text documents provided by the Czech News Agency (CTK) and is freely available for research purposes. This corpus was created in order to facilitate a straightforward comparison of the document classification approaches on Czech data. It is particularly dedicated for evaluation of multi-label document classification approaches, because one document is usually labelled with more than one label. Besides the information about the document classes, the corpus is annotated at morphological layer.

This corpus has been created from Czech Text Document Corpus v 1.0 where morphological annotation and development set were missing.

Technical Details

The text documents are stored in the individual text files using UTF-8 encoding. Each filename is composed of the serial number and the list of the categories abbreviations separated by the underscore symbol and the .txt suffix. Serial numbers are composed of five digits and the numerical series starts from the value one. For instance the file 00046_kul_nab_mag.txt represents the document file number 46 annotated by the categories kul (culture), nab (religion) and mag (magazine selection).

The content of the document, i.e. the word tokens, are stored in one line separated by the space symbols. Every text document has its lemmatized form, file with suffix .lemma, and is further associated with its POS-tags, see .pos files.

Download

This dataset is licensed under the Attribution-NonCommercial-ShareAlike 3.0 Unported License. Commercial use in any form is excluded. For further information about this corpus, please, see the paper below:

  • P. Kral, L. Lenc, Czech Text Document Corpus v 2.0 arXiv preprint arXiv:1701.03849, FullText.
  • Please, cite this paper when you used this corpus in your experiments.


    Real Face Recognition Corpus (REFARECO) v 1.0

    The Real Face Recognition Corpus (REFARECO) is a set of real-world photographs randomly selected from the large Photobank of the Czech News Agency. It is intended to be used for evaluation of the face detection and automatic face recognition algorithms. It is composed of the images of individuals taken in uncontrolled environment. All images were obtained during a long time period (20 years or more). The corpus contains grayscale images of 561 individuals of the size 384 x 384 pixels. At least 10 images for each person are available.

    This corpus is available only for research purposes for free. Commercial use in any form is strictly excluded.

    It is possible to download directly only the sample of the corpus because of the large corpus size. The whole corpus will be sent at the DVD upon the request to the authors: llenc@kiv.zcu.cz or pkral@kiv.zcu.cz.