Amazon Polly’s Brand Voice taps AI to generate custom spokespeople
Brand Voice allows organizations to differentiate their brand by incorporating unique vocal identities into their products and services.
The system that can learn to adopt a new speaking style from just a few hours of training — as opposed to the tens of hours it might take a voice actor to read in a target style.
How it Works:
Amazon’s AI model consists of two components.
The first is a generative neural network that converts a sequence of phonemes into a sequence of spectrograms or visual representations of the spectrum of frequencies of sound as they vary with time.
The second is a vocoder that converts those spectrograms into a continuous audio signal.
An AI model-training method that combines a large amount of neutral-style speech data with only a few hours of supplementary data in the desired style, and an AI system capable of distinguishing elements of speech both independent of a speaking style and unique to that style.
Amazon’s Brand Voice also competes with offerings from WaveNet (Google), Azure (Microsoft), Startups (Voicery, iSpeech, Modulate, Respeecher, Resemble AI, Descript and Bengaluru)