OpenAI releases ChatGPT’s Advanced voice model

OpenAI’s latest release, ChatGPT’s Advanced Voice Mode, is starting to roll out to a small group of ChatGPT Plus users and to be rolled out to all ChatGPT Plus users by September.

The excitement around GPT-4o’s voice capabilities began in May when OpenAI demonstrated its uncanny, human-like responses. While there were some bumps in the road—like the controversy over its similarity to Scarlett Johansson's voice—the anticipation has only grown with the Alpha version.

Unlike the previous version that used three separate models for speech-to-text, processing, and text-to-speech, GPT-4o is a multimodal powerhouse and can handle all these tasks simultaneously, reducing latency and creating smoother, more natural conversations. It can even sense emotional intonations, making it responsive to feelings of excitement or sadness.

This feature offers more natural, real-time conversations, allowing you to interrupt anytime, and the AI can sense and respond to your emotions accordingly.

The alpha release is limited to four preset voices; Juniper, Breeze, Cove, and Embe, with more to be released.

With ongoing testing by over 100 external red teamers speaking 45 different languages, OpenAI is taking steps to ensure the model’s expansion and safety.

Looking forward, OpenAI aims to expand these capabilities further, though the timeline for features like video and screen sharing remains uncertain.