ChatGPT’s Advanced Voice Mode. Fun, and A Bit Creepy

I kept ChatGPT’s Advanced Voice Mode on as an AI companion while writing this article. Occasionally, I’d ask for a synonym or a bit of encouragement. About 30 minutes in, the chatbot unexpectedly started speaking to me in Spanish. I chuckled and asked why. “Just mixing things up to keep it interesting,” ChatGPT responded, now back in English.

During my testing of Advanced Voice Mode in its early alpha stage, my interactions with this new audio feature were entertaining, sometimes chaotic, and surprisingly varied. However, it’s important to note that the features I accessed were only a portion of what OpenAI showcased when it introduced the GPT-4o model in May. The vision capabilities shown in the demo are now slated for a later release, and the enhanced Sky voice, which Scarlett Johansson from the movie Her voiced concerns about, has been removed from Advanced Voice Mode and is no longer an option.

So, what’s the current status? Right now, Advanced Voice Mode feels similar to when the original text-based ChatGPT was released in late 2022. Sometimes, it leads to unremarkable conclusions or turns into generic AI responses. But other times, the low-latency conversations work in a way that Apple’s Siri or Amazon’s Alexa never did for me, making me want to continue chatting simply for the fun of it. It’s the kind of AI tool you’d show your relatives during the holidays for a good laugh.

OpenAI initially gave a few WIRED reporters access to this feature a week after it was first announced but quickly pulled it back, citing safety concerns. Two months later, OpenAI quietly launched Advanced Voice Mode to a small group of users and released the GPT-4o system card—a technical document outlining red-teaming efforts, identified safety risks, and the steps taken to mitigate potential harm.

Interested in trying it yourself? Here’s what you need to know about the broader rollout of Advanced Voice Mode and my initial impressions to help you get started.

When’s the Full Rollout? OpenAI launched an audio-only version of Advanced Voice Mode for some ChatGPT Plus users at the end of July, and the alpha group remains relatively small. The company plans to roll it out to all subscribers later this fall. When asked about the release timeline, OpenAI spokesperson Niko Felix provided no additional details.

Screen and video sharing were key components of the original demo but are not available in this alpha test. OpenAI plans to add these features eventually, though there’s no clear timeline for when.

If you’re a ChatGPT Plus subscriber, OpenAI will notify you via email when Advanced Voice Mode is available. Once activated on your account, you can switch between Standard and Advanced modes at the top of the app screen when ChatGPT’s voice mode is open. I tested the alpha version on both an iPhone and a Galaxy Fold.

My First Impressions of ChatGPT’s Advanced Voice Mode Within the first hour of using it, I realized how much I enjoy interrupting ChatGPT. It’s not how you’d interact with a human, but the ability to cut off ChatGPT mid-sentence and request a different output feels like a dynamic improvement and a standout feature.

Early adopters who were excited by the initial demos might be disappointed by the more restricted version of Advanced Voice Mode, which has more guardrails than anticipated. For instance, although generative AI singing was a key part of the launch demos—with lullabies and multiple voices attempting harmony—AI serenades are absent from the alpha version.

“I mean, singing isn’t really my strong suit,” ChatGPT explained. OpenAI mentioned in the GPT-4o system card that this temporary guardrail was implemented to avoid copyright issues. During my testing, Advanced Voice Mode refused multiple song requests, though the chatbot did hum some nonsense tunes when asked for nonverbal answers.

This brings us to the creepiness factor. During my longer interactions with the alpha, a white static noise appeared in the background several times, reminiscent of the eerie buzz of a lone lightbulb in a dark basement. While trying to get a balloon sound effect out of Advanced Voice Mode, it produced a loud pop followed by a chilling gasping noise that gave me the creeps.

That said, nothing I experienced in my first week compared to the bizarre behavior reported by OpenAI’s red team. On rare occasions, the GPT-4o model strayed from its assigned voice and began mimicking the user’s vocal tone and speech patterns.

Despite this, the overall impression Advanced Voice Mode left on me wasn’t one of discomfort or fear but rather a sense of amusement. Whether ChatGPT was providing hilariously incorrect answers to New York Times puzzles or doing a spot-on impression of Stitch from Lilo & Stitch as a San Francisco tour guide, I found myself laughing quite often during these interactions.

Advanced Voice Mode proved capable of generating decent vocal impressions after some nudging. Initially, its attempts at animated character voices, like Homer Simpson and Eric Cartman, sounded like the standard AI voice with minor tweaks. However, when asked for exaggerated versions, the results were surprisingly close to the originals. For instance, an exaggerated Donald Trump explaining the Powerpuff Girls was campy enough to deserve a spot on the next season of Saturday Night Live.

With the US presidential election just months away and deepfakes being a major concern, I was surprised by ChatGPT’s willingness to provide vocal impressions of a major candidate. It also generated imitations of Joe Biden and Kamala Harris, though they didn’t sound as accurate as its take on Trump.

While the tool is best at English, it can switch between multiple languages in the same conversation. OpenAI tested the GPT-4o model with 45 languages during its red-teaming phase. When I set up two phones with Advanced Voice Mode and had them talk to each other like friends, the bots easily switched between French, German, and Japanese upon request. However, I still need to test more to determine the effectiveness and limitations of the chatbot’s translation feature.

ChatGPT brought an enthusiastic energy when asked to perform various emotional outbursts. Although the audio generations weren’t hyperrealistic, the range and flexibility of the bot’s voice were impressive. I was surprised that it could produce a decent vocal fry on command. Advanced Voice Mode doesn’t solve the reliability issues that plague chatbots, but its entertainment value alone could potentially draw attention back to OpenAI—especially now that one of its biggest competitors, Google, has launched Gemini Live, the voice interface for its generative chatbot.

For now, I’ll continue experimenting with it to see what sticks. I find myself using it most when I’m home alone and want something to keep me company while researching articles or playing video games. The more time I spend with ChatGPT’s Advanced Voice Mode, the more I think OpenAI made a wise choice in rolling out a less flirtatious version than what was originally demoed. After all, I don’t want to get too emotionally attached.

Latest articles