Unlock stock picks and a broker-level newsfeed that powers Wall Street.

Introducing Amazon Nova Sonic: A New Gen AI Model for Building Voice Applications and Agents

In This Article:

SEATTLE, April 08, 2025--(BUSINESS WIRE)--Today, Amazon.com Inc (NASDAQ: AMZN) introduced Amazon Nova Sonic, a new foundation model that unifies speech understanding and speech generation into a single model, to enable more human-like voice conversations in artificial intelligence (AI) applications. Available in Amazon Bedrock via a new bi-directional streaming API, the model simplifies the development of voice applications, such as customer service call automation and AI agents across a broad range of industries, including travel, education, healthcare, entertainment, and more.

"From the invention of the world’s best personal AI assistant with Alexa, to developing AWS services like Connect, Lex, and Polly that are used across a wide range of industries, Amazon has long believed that voice-powered applications can make all of our customers’ lives better and easier," said Rohit Prasad, SVP of Amazon Artificial General Intelligence. "With Amazon Nova Sonic, we are releasing a new foundation model in Amazon Bedrock that makes it simpler for developers to build voice-powered applications that can complete tasks for customers with higher accuracy, while being more natural, and engaging."

Traditional approaches to building voice-enabled applications involve complex orchestration of multiple models, such as speech recognition to convert speech to text, large language models (LLMs) to understand and generate responses, and text-to-speech to convert text back to audio. This fragmented approach not only increases development complexity but also fails to preserve crucial acoustic context and nuances like tone, prosody, and speaking style that are essential for natural conversations.

Nova Sonic solves these challenges through a unified model architecture that delivers speech understanding and generation, without requiring a separate model for each of these steps. This unification enables the model to adapt the generated voice response to the acoustic context (e.g. tone, style) and the spoken input, resulting in more natural dialog. Nova Sonic even understands the nuances of human conversation, including the speaker’s natural pauses and hesitations, waiting to speak until the appropriate time, and gracefully handling barge-ins. It also generates a text transcript for the user’s speech, enabling developers to use that text to call specific tools and APIs for building voice-enabled AI agents (e.g., an AI-powered travel agent that can book flights by retrieving up to date flight information). These capabilities, along with its lightning-fast inference, make voice applications powered by Nova Sonic more natural and useful.