Technologies
The Project uses a number of technologies, in the areas of Signal Processing, Voice Processing, Secure Identity Management and Component Integration and Networking.
The inclusion of countermeasures guarantees high performance of a voice biometrics system when malicious attacks are presented. This is clearly demonstrated in terms of performance comparison. With the introduction of advanced countermeasures modules, OCTAVE aims to almost completely eliminate degradation due to malicious attacks, bringing the functioning of the voice biometrics to a level-of state-of-the-art performance. The following figure shows a comparison of the performance resulting when a voice biometrics system is attacked: with no countermeasures (red line); with standard countermeasures (blue line); with advanced countermeasures (green line).
In the OCTAVE Platform spoofing detection is performed in parallel with automatic speaker verification. Innovative speech features based on perceptual characteristics have been introduced, with very promising results. Specific algorithms have been considered for the different spoofing technologies.

In voice biometrics interfaces three main modes of operation can be found.
- In the fixed-passphrase mode the user is asked to say a fixed passphrase or password which has to know in advance and remember. The speaker acoustic models used in this mode are trained with enrolment recordings of the same fixed passphrase. This mode of operation presents high speaker verification accuracy however it is vulnerable to spoofing attacks such as audio replay.
- In the text-dependent mode the user is asked to read a prompted message which is usually randomly selected from a list of utterances. The speaker acoustic models used in this mode are trained with enrolment recordings of the same (i.e. text-dependent) utterances. This mode of operation presents good speaker verification accuracy however it is less vulnerable to spoofing attacks.
- In the text-independent mode the user is asked to read a prompted message which can be produced by a random word sequence generator. The speaker acoustic models used in this mode are trained with enrolment recordings of the different (i.e. text-independent) utterances. This mode of operation presents lower speaker verification accuracy however it is robust to spoofing attacks such as synthetic speech and voice conversion (audio replay attacks are practically not applicable in this case).
The main focus has been on front-end processing in terms of robust voice activity detection (VAD), robust feature extraction, speech enhancement and noise characterization. Further, model-domain acoustic normalization, score normalization, and data collection using a throat microphone are investigated. To evaluate the performance of all these different approches a code framework is used which consists of software modules, including code and wrapper scripts.
A GMM-UBM is chosen as the back-end automatic speaker verification system, which is used to evaluate the performance of each module.
A large array of noise-robustness algorithms and an optimised end-to-end system have been evaluated using the standard RSR2015 database, with a variety of manually added noise in addition to a speech codec. Several speech enhancement algorithms are evaluated through subjective listening tests.
The audio signal of a voice utterance recorded with the normal audio microphone (up) and the throat microphone (low).
Throat microphone can be used for voice activity detection, by which it is also possible to remove ambient noise from audio signal.
APLcomp provided research equipment for AAU and UEF, which they used for studying and developing methods for throat microphone based speaker verification.
This research displayed great improvement possibilities of dual microphone technology.
By using these experiences APLcomp developed a pre-commercial hardware product called DualMic, which integrates the voice capture tasks on a single board. It can make use of commodity quality audio and throat microphones in order to output high quality stereo voice.
DualMic can be used for speaker verification and possibly for other speech enhancement needs.
Research data acquisition equipment (left) and single-board DualMic prototype (right).