The present Deliverable addresses the objective of assembling, in the most appropriate way, the necessary datasets and their associated metadata for the purpose of testing the modules implemented for the OCTAVE platform (TBAS) and of validating the TBAS as a whole. A challenging and innovative solution was anticipated in the Description of Work (DoW), that is, Annex 1 of the Grant Agreement: the “corpus of corpora”. By this Deliverable, the solution materializes as a product that we have implemented by using open source software and the necessary hardware resources. We have based the solution on two important factors that characterise the ‘corpus’:

  • the normalisation of the different available datasets (the audio samples of the native corpora) in a unified format, which is also very simple;
  • the organisation of all the associated information (metadata) into an indexing database, which can be accessed by a database management system (mySQL in our case).

Having been made remotely accessible, as a service on the Internet, the “corpus” can be seen as the fuel that powers the test and validation process distributed among several Beneficiaries in the Consortium. Specifically, it will drive trial execution in the laboratory, and support the optimisation of the TBAS by collecting data from later user validation activities.The database structure and the format of the physical audio files as well as of the different metadata are described. Software tools to enrich the description of the speech sample by means of additional descriptors, as automatic transcription and quality measure, have also been implemented. Specific attention has been devoted in describing the data which will be collected in the trials in the field, and that will be used to validate and evaluate the TBAS with the end-users accessing the pilot services for SEA and FINDOMESTIC. A description of the actual status of the “corpus” and of its potential use in the second year of the project, closes the report.

Source: WP 7 Test and Verification

Dissemination level: Confidential

