This document is an accompanying technical documentation to the OCTAVE deliverable D22 ‘Methods for environmental robustness’. The deliverable consists of software modules, including code and wrapper scripts, and this supporting document. The document details the modules, their formal evaluation and the code framework used to obtain the results from a large number of experiments based on which an optimum combination of methods is subsequently evaluated. The main purpose of D22 is to provide algorithmic modules to enhance the performance of the OCTAVE platform TBAS within real-world, often adverse, environments where noise and other distortions are likely to be encountered. The main focus has been on front-end processing in terms of robust voice activity detection (VAD), robust feature extraction, speech enhancement and noise characterization. Furthermore, model-domain acoustic normalization, score normalization, and data collection using a throat microphone are investigated. The back-end automatic speaker verification system is chosen to be the one from D11 (single mode voice biometric engines), which is used to evaluate the performance of each module. A large array of noise-robustness algorithms and an optimised end-to-end system have been evaluated using the standard RSR2015 database, with a variety of manually added noise in addition to a speech codec. Several speech enhancement algorithms are evaluated through subjective listening tests. The experiments and comparison of a wide range of methods for environmental robustness provide a solid basis and guidance for tailoring the TBAS platform for various application scenarios. The software modules are a pool of code that can facilitate the execution of trials, the implementation of robust voice biometric engines and the development of real applications for user validation.
Source: WP 3 Robustness in Speaker Verification
Dissemination level: Public
This document is available for download, just click here.