Work Package 4

This work package will deliver hybrid approaches to automatic speaker verification (ASV) by combining and fusing different verification algorithms stemming from WP3. The use of single mode text-dependent, text-prompted and text-independent ASV and then fused-mode, hybrid ASV systems will be aligned to the different access control application scenarios and use case requirements. The work will also consider impacts on the balance between user convenience and security. Outcomes are a set of pluggable software modules for integration into the OCTAVE Trusted Biometric Authentication Service (TBAS)


  • To determine the single and fused (hybrid) modes of speaker verification (SV) that are suitable for different classes of access control applications in practice.
  • To develop the required SV systems in accordance with the identified (selected) modes of operation (single/hybrid) and to determine fusion approaches for hybrid voice biometrics with reference to the application requirements.
  • To produce procedures for various phases of operation in SV (from user enrolment to single- and multi-stage identity verification) to meet the requirements of the considered access control applications.


The aim in this WP is to realise various operational modes of voice biometrics that are identified appropriate for the considered access control applications. The work therefore involves a thorough evaluation of a number of conventional operational modes including passphrase (text-dependent), text-prompted and text-independent. However, a particular consideration in this WP is the potential advantages offered by the combination of complementary operational modes: an approach which is referred to as hybrid voice biometrics. The final selection of the modes of operation for the considered applications is based on a comparative analysis of various single and combined (hybrid) modes.

The following provide a brief description of the individual modes of operation considered in this WP.

  • Passphrase-based voice biometrics involves the assignment of a fixed word or sentence to each client during enrolment. This text-dependent approach can in general achieve a high degree of recognition accuracy, even with a small amount of training material available from the clients. Although a major disadvantage of this approach is that it is prone to replay attacks, it is nevertheless a very important operational mode in certain applications where a high level user convenience is considered a priority. Therefore, passphrase-based voice biometrics based on short utterances will be included in this WP in order to determine the recognition accuracy achievable in this case.
  • It should also be noted that, depending on the application, users can be asked and therefore expected to keep their spoken passphrases secret. As a result, both excellent convenience and high security are achievable, making this mode of operation a suitable choice in this specific scenario.
  • The aim of text-prompted voice biometrics is to significantly reduce the above-mentioned vulnerability of passphrase-based operation to replay attacks. This is achieved through a challenge-response approach. Here, a certain phrase or a sequence of phrases is selected by the system, and the challenge presented to the client is to speak it (when prompted) in order to be verified. Systems in this category can be further divided into text-dependent (TD) and text-independent (TI). In the case of the former, a number of phrases are registered in the enrolment stage, and the system constructs an arbitrary combination of these phrases (in each verification trial) as the challenge. In the text-independent approach, the system creates challenges that are unrestricted as far as the textual content is concerned.

The term hybrid voice biometrics is introduced to refer to approaches that involve a fusion of the operational modes described above. This can be highly beneficial in practice, as possible drawbacks associated with individual operational modes can be alleviated through a combined approach.


Determine and develop hybrid SV systems in accordance with the selected modes of operation

The optimal Convenience-Security trade-off is reached by means of the fusion of three operation modes: Fixed passphrase, Prompted text-dependent, Prompted text-independent.

Determine the single/fused (hybrid) modes of SV for different classes of access control applications

Different modes of operation (single/fused -hybrid modes of SV for different classes of access control applications) have been assessed in order to verify accuracy and vulnerability.


This deliverable combines much of the work within OCTAVE which has implemented hybrid solutions to automatic speaker verification and also innovative ...
This deliverable focuses on the use of fusion to attain better performances than single-mode speaker verification engines in Automatic Speaker ...
This deliverable represents optimization and evaluation of single mode voice biometric engines defined in the earlier OCTAVE project deliverable D11 ...
This deliverable, D23 details different hybrid architectures for fusing the scores from three different modes of speaker verification. The suitability ...
This document accompanies the program code of the OCTAVE deliverable D11 ‘Single mode biometric engines’. Here, an engine is understood ...
This deliverable details the key considerations and requirements for the effective realisation and deployment of voice biometrics in the application ...


Iosif Mporas

WP4 Leader and PCB member

Dr Iosif Mporas (male) is a Senior Lecturer in Information Engineering at the University of   Hertfordshire since Feb 2016. He received the Diploma degree in Electrical and Computer Engineering and the PhD degree in signal processing from the University of Patras, Greece, in 2004 and 2009, respectively. From 2010 up to Feb 2016 he was an Adjunct Assist. Prof. at the TEI of Western Greece. He has participated as Senior Researcher, PI in more than 10 FP6, FP7, Life+ and H2020 European R&D Projects in the areas of speech and audio processing, spoken and multi-modal human-machine interaction, ICT-based biodiversity monitoring, brain and biomedical signal analysis, healthcare monitoring. His research interests include statistical signal modeling, speech and audio processing, natural language processing, brain and biosignal data processing, data mining and machine learning. He is author and co-author in more than 80 journal and international conference articles.