Multi-Language Audio Annotation for Automatic Speech Recognition Software
Our client, an international internet technology company, was looking for a partner to assist in a large-volume project; the company needed 10,000 audio hours annotated in 20 languages for its automatic speech recognition (ASR) software. With a tight timeline to complete the full scope, the team used their time efficiently to reach a minimum of 2,000 annotated audio hours per week. Additionally, the work had to be done remotely and on the client’s platform, which meant the service provider needed a strong community-sourcing network and massive team-management capability.
Using our global community database of over 1.3 million contributors, DataForce quickly assembled and trained the teams. Once the project kicked off, the DataForce sourcing team screened between 1,000 and 2,000 applicants per week, onboarding over 100 contributors per day. All participants were then qualified, trained, and divided into language-specific teams, each totaling between 600 and 2,000 annotators. Our team of over 30 project managers ensured the scope of the project was completed in the given timeframe.
Working closely with the client and applying our deep industry knowledge and community resources, we annotated and transcribed 20 languages in one year: 10 languages with 1,000 audio hours, four languages with 10,000 audio hours, and six languages with over 100 audio hours. At the completion of this project, we provided our client with high-quality transcribed training data needed to train its ASR software.