This Deliverable opens the phase of the Project dedicated to the provision of the speech and audio data required for completion of laboratory tests aiming to assess the voice biometrics and enhancing modules of the TBAS, the Trusted Biometric Authentication Platform envisaged by OCTAVE. The subsequent Deliverables, namely “Corpora collection” and “Spoofing corpora”, will close that phase. The Deliverable is organised as described below.

The first chapter introduces the “spoken corpora” as a linguistic resource, and outlines the main characteristics we have to look for, when considering the use of a corpus in voice biometric assessment. An explanation of the importance and the function of the present report in the project, concludes the chapter.

In the second chapter, after framing the role of speech corpora in research and development of speaker verification technologies, provides a history and review of activities, organisations, and projects, devoted to the topic of voice corpora, or in general on “linguistic resources”. The chapter also discusses and justifies the main characteristics that a speech corpus must have.

The third chapter is dedicated to the description of linguistic resources available in OCTAVE and useful to its objectives. The strategies implemented to collect this info, and the resulting material are then illustrated in detail, first on the basis of the selected language for the speech material and then focusing on noise corpora. Thereafter, spoofing corpora are introduced and reviewed, as a forward reference to a subsequent Deliverable.

The fourth chapter is of crucial value: here we exploit the concept of “corpus of corpora” already announced in the OCTAVE Technical Plan, as one of relevant role for the Project. Moreover, the necessary standards and formats required for the “corpus of corpora” are defined, as well as possible software tools for its management. A forecast of the corpora that might be used to assess the various test conditions defined in the defined plan, is also provided. A preview of the activities planned for the next stages of work, and particularly for the forthcoming “corpora collection” report, concludes chapter 4.

A conclusion section summarizes key points, advantages, risks and mitigation measures of the proposed approach to vocal corpora for the objectives of OCTAVE.

Source: WP 7 Test and Verification

Dissemination level: Confidential. A public version of this report is available as Deliverable D55.