The GPT-4O Advanced Voice is Even Better Than We Can Expect

OpenAI is gradually providing access to its ChatGPT Advanced Voice assistant to a select number of ChatGPT Plus subscribers. While all paying users are expected to enjoy this feature by the end of the year, for now, only a fortunate few have access.

Last week, I was notified that my account was among those chosen to interact with this emotionally aware, highly responsive, and Yoda-voiced artificial intelligence. After spending a full weekend with the Advanced Voice feature, I found it to be more impressive and expressive than the initial demos indicated.

One of its standout capabilities is the ability to interrupt the AI mid-conversation, prompting it to immediately adjust to the new direction. For instance, I had it narrate a story about Paddington Train Station in London in a Yoda voice, then interrupted it to quickly count to 100.

What truly sets Advanced Voice apart is its ‘human-like’ quality, which is far superior to any other AI voice assistant I’ve encountered. The interaction feels natural, with the voice adjusting its tone and speed to match your speaking style.

It’s easy to see why OpenAI is concerned about users forming emotional attachments to the AI voice. Paired with the natural language processing and knowledge of GPT-4o, it offers a remarkable experience.

ChatGPT with GPT-4o is already adept at writing stories, but the addition of speech-to-speech in Advanced Voice elevates it to an exceptional storyteller, capable of dynamically adapting narratives and even incorporating multiple voices and varying energy levels.

I began by asking it to tell a story about an AI gaining sentience, which it narrated like an audiobook. I then requested elements like space travel and real scientific equations and even had it speak in a ‘vampire Yoda’ voice, which it did flawlessly. The voice was exactly as you’d imagine.

Next, I had it create a story about the first humans on Mars encountering an unexpected discovery, complete with sound effects—though it used these sparingly.

When I asked for a more dramatic reading, it delivered perfectly. It can also generate a ‘choose your own adventure’ story where you guide the narrative. I asked it to have the characters find a human skeleton.

Advanced Voice as a City Guide

When I travel to London for work, I often like to explore the area. My office is near Paddington Station, so I asked ChatGPT Advanced Voice for information on nearby sights and landmarks.

This feature will become even more valuable as OpenAI integrates searchGPT and other real-time data capabilities into Voice.

Even without live data, the AI’s training is recent enough to provide details about the Paddington Bear statue, the station’s history, and its unique architectural features.

Advanced Voice as a Personal Trainer

After years of avoiding exercise, I finally decided to get fit. I now have a personal trainer, regularly visit the gym, have cut down on cherryade, and adopted healthier eating habits.

After a particularly intense workout, I asked Voice for advice. It guided me through a stretching routine, counting down from 10 to help me hold a position or stretch correctly.

It also offered healthy recipe ideas and motivated me on the treadmill with encouraging phrases, adjusting its tone from gentle encouragement to a more intense drill sergeant style.

Final thoughts I believe I’ve only scratched the surface of what’s possible with Advanced Voice. Once I can access it by simply saying “Hey, ChatGPT” or tapping a button on my phone, I expect it to become even more useful—I hope Apple eventually offers alternatives to Siri.

During a walk, instead of typing into Google or ChatGPT, I found myself asking Advanced Voice about a building I was curious about.

Initially, I used the feature for playful tasks like trying different voices, speaking as Yoda, counting quickly, singing, speaking in different languages, and even performing a short stand-up routine about space. I won’t be getting a Netflix special anytime soon.

However, as I continued using it, I realized it was becoming my default way of searching for information or interacting with my phone. At the supermarket, I used it to keep track of my shopping and even get suggestions for alternative ingredients.

Its natural and responsive interaction, coupled with the ability to easily interrupt and steer the conversation, represents a significant leap forward in computer interaction—on par with the invention of the mouse and touchscreen.

Latest articles