This project investigates the current Speech-To-Text technologies (such as Amazon Transcribe,
GCP Cloud Speech, Azure Speech and others) with the aim of testing viability of use by the
Libraries and Learning Services (LLS) audio-visual archive content. It includes the development
of functionality such as:
(1) Production of searchable text indexes from spoken audio content;
(2) Creation of subtitle files for optional display in video playback;
(3) Enabling jump-to-word playback through text and media timeline linking.
The project will work closely with the LLS at the University of Auckland,
who will provide necessary resources for the research.
Research Components: (1) Research into assessing various speech-to-text technologies, and weighing up cost-vs-benefit evaluations; (2) Research into providing optimising technologies to work with: + New Zealand (and other non-US/UK) English accents + Māori language recognition (if possible) + Variable quality audio content (e.g., broadcast quality materials vs. researcher field recordings)
Implement a software tool for Speech-To-Text conversion, and evaluate the outcome in terms of accuracy, scalability, indexing, etc.
Lab allocations have not been finalised