Bridging Text & Speech
STEP 1 — THE VOICE
For a virtual advisor, the voice is the face of the brand. Although Crystal had no physical features, the sound of her voice alone needed to create a specific feel, atmosphere, and lasting first impression. To create a synthetic voice, we turned to MediaNEXT’s and DataForce’s vast database of linguists and phoneticians, and iGenius cast samples of different voice talents and styles. With multiple options, inflections, and mannerisms in mind, iGenius was able to identify who they wanted Crystal to be.
STEP 2 — THE SCRIPT
DataForce and iGenius worked together to identify the overall length of the script, the number of sentences, the speaking duration of each sentence, and, most importantly, a particular balance of phonemes in the corpus, which matched the overall distribution of English phonemes.
STEP 3 — THE RECORDING
Working with Jennifer, the voice of Crystal, we brought the script to life through StudioNEXT in a fully remote environment. Using her cloud recording kit, Jennifer could log in and out without having to remember where she left off —everything was uploaded to the cloud. Having never done anything like this before and dealing with noisy construction in her building, Jennifer was able to move her devices to a quiet location and complete the project with ease.
The main focus for the research department at iGenius was to create a synthetic and customized voice for our main product, Crystal. Through my LinkedIn connection Sofia Silva, account executive for DataForce by TransPerfect, we managed to bring this idea to reality. Working together with DataForce and MediaNEXT gave us the confidence to create a large data set of phonetically balanced sentences and corresponding high quality audio clips to be able to train a high performance AI model with text-to-speech capabilities.
Marco Bocchio, PHD Machine Learning & Data Science
Team Lead iGenius