Work Package 3

This work package will identify a set of baseline automatic speaker verification (ASV) systems. Whereas all components form part of the background technology, they will be enhanced as part of the collaborative effort in WP3 with the central aim of improving ASV robustness to varying acoustic environments and threats from spoofing. Outcomes will take the form of pluggable software modules for integration into the OCTAVE Trusted Biometric Authentication Service (TBAS).


WP3 aims to establish a set of TBAS enhancement modules necessary to ensure (i) its proper functioning in relevant operational environments and (ii) protection from spoofing attacks. The objectives are to:

  • identify existing, baseline text-independent and text-dependent voice biometric systems to meet OCTAVE application requirements;
  • assess the vulnerability of such systems to diverse spoofing attacks;
  • innovate new countermeasures to improve their robustness to spoofing attacks;
  • deliver new solutions to ensure the proper functioning of speaker recognition in the face of variable acoustic conditions and
  • establish specific technology and modules for integration into the OCTAVE platform.


To embrace the widest possible array of application scenarios, OCTAVE targets unsupervised speaker verification scenarios. In such cases the ambient conditions are uncontrolled and there is potential for systems to be attacked through spoofing.

Unsupervised settings entail unknown acoustic conditions involving varying amounts and types of background and channel noise. Background noise results in a lower signal-to-noise ratio and also leads to increased vocal effort. Both result in speech signals whose acoustic properties can differ significantly from those at enrolment. Channel variation can also introduce acoustic mismatch which generally degrades speaker recognition performance. Thus, noise reduction, speech enhancement and acoustic normalization techniques will be integrated into the OCTAVE platform so as to ensure satisfactory performance for all operating conditions.

Almost all biometric systems are vulnerable to spoofing, whereby a malicious user masquerades as a legitimate user in order to gain access to systems or services to which they are not entitled. Automatic speaker verification systems are no exception and a growing body of work shows that, when presented with spoofed speech, then the performance of such systems can degrade to below that expected by chance. Spoofing is a genuine threat to security in biometric access control applications that needs to be properly analysed and effectively addressed. Emerging anti-spoofing technology, which aims to counter the threat of spoofing in automatic speaker verification, will be integrated into the OCTAVE platform. This will entail innovative, automatic detection approaches which have the aim of deciding whether a given speech signal is the result of genuine access attempt, or an illegitimate access attempt, stemming from replayed speech, converted or synthesized speech.


Baseline voice biometric systems and their vulnerability to circumvention

  • Major state-of-the-art ASV systems have been identified, in order to assess their vulnerability to diverse spoofing: attacks replay, speech synthesis, voice conversion.
  • ASV performance degradation, as a result of varied spoofing attacks (ASVspoof) under different conditions, has been evaluated

New countermeasures against spoofing

  • Robustness to spoofing, with existing countermeasures: explicit (detection) and implicit (prevention)
  • Characterization of performance for known and unknown attacks
  • Greater focus put on replay detection: the most probable attack, more difficult to assess, faced by means of new evaluation/testing/training data generated within the Project

Unprecedented solutions to noise-robust automatic speaker verification

  • multi-stage noise compensation, including SNR estimation, speech enhancement, spectral feature extaction,  voice activity detection
  • model-domain acoustic normalisation modules
  • multiple simulations to find the best perforing NR method at each stage in the system: the best from each stage is then selected as the best system
  • text-dependent speaker recognition results in terms of EER (in %) for matched and mismatched conditions using the GMM-UBM system


The vulnerability of automatic speaker verification (ASV) systems to spoofing attacks is now well acknowledged. If exposed, these vulnerabilities not ...
This document is an accompanying technical documentation to the OCTAVE deliverable D22 ‘Methods for environmental robustness’. The deliverable consists of ...
This deliverable outlines existing countermeasures for the protection of automatic speaker verification technology from spoofing. This can take the form ...
This report provides an overview of speaker verification and its potential vulnerabilities, particularly with reference to the objectives within OCTAVE ...


Nicholas Evans

WP3 Leader and PCB Member

Nicholas Evans is an Assistant Professor at EURECOM where he heads research in Speech and Audio Processing. In addition to other interests in speaker diarization, speech signal processing for mobile device applications and multimodal biometrics, he is studying the threat of spoofing to automatic speaker verification systems and working to
develop new spoofing countermeasures. Previously, his work in spoofing was funded by the EU FP7 ICT TABULA RASA project, continuing today
through OCTAVE. Together with OCTAVE partner UEF, he co-organised the Spoofing and Countermeasures for Automatic Speaker Verification special
session at Interspeech in 2013 and the ASVspoof evaluation at Interspeech in 2015. He was Lead Guest Editor for the IEEE Transactions
on Information Forensics and Security special issue in Biometrics Spoofing and Countermeasures and the IEEE SPM special
issue on Biometric Security and Privacy. He is a member of the IEEE and its Signal Processing Society and currently serves as an Associate
Editor of the EURASIP Journal on Audio, Speech and Music Processing and as an elected member of the IEEE Speech and Language Technical Committee.