Earlier this week OpenAI launched GPT-4o (“o” for “omni”), a new version of the artificial intelligence (AI) system powering the popular ChatGPT chatbot. GPT-4o is promoted as a step towards more natural engagement with AI.
According to the demonstration video, it can have voice conversations with users in near real-time, exhibiting human-like personality and behaviour.
This emphasis on personality is likely to be a point of contention. In OpenAI’s demos, GPT-4o sounds friendly, empathetic and engaging. It tells “spontaneous” jokes, giggles, flirts and even sings.
The AI system also shows it can respond to users’ body language and emotional tone.
Launched with a streamlined interface, OpenAI’s new version of the ChatGPT chatbot appears designed to increase user engagement and facilitate the creation of new apps based on its text, image and audio capabilities.
GPT-4o is another leap forward for AI development.
However, the focus on engagement and personality raises important questions about whether it will truly serve the interests of users, and the ethical implications of creating AI that can simulate human emotions and behaviours.
The personality factor
OpenAI envisions GPT-4o as a more enjoyable and engaging conversational AI. In principle, this could make interactions more effective and increase user satisfaction.
Studies show users are more likely to trust and cooperate with chatbots exhibiting social intelligence and personality traits. This could prove relevant in fields such as education, where studies have indicated AI chatbots can boost learning outcomes and motivation.
However, some commentators worry users may become overly attached to AI systems with human-like personalities or emotionally harmed by the one-way nature of human-computer interaction.
The Her effect
GPT-4o immediately inspired comparisons – including from OpenAI boss Sam Altman – to the 2013 science-fiction movie Her, which paints a vivid picture of the potential pitfalls of human-AI interaction.
In the movie, the protagonist, Theodore, becomes deeply fascinated and attached to Samantha, an AI system with a sophisticated and witty personality.
Their bond blurs the lines between the real and the virtual, raising questions about the nature of love and intimacy, and the value of human-AI connection.
While we should not seriously compare GPT-4o to Samantha, it raises similar concerns. AI companions are already here. As AI becomes more adept at mimicking human emotions and behaviours, the risk of users forming deep emotional attachments increases. This could lead to over-reliance, manipulation and even harm.
While OpenAI demonstrates concern with ensuring its AI tools behave safely and are deployed in a responsible way, we have yet to learn the broader implications of unleashing charismatic AIs onto the world. Current AI systems are not explicitly designed to meet human psychological needs – a goal that is hard to define and measure.
GPT-4o’s impressive capabilities show how important it is that we have some system or framework for ensuring AI tools are developed and used in ways that are aligned with public values and priorities.
Expanding capabilities
GPT-4o can also work with video (of the user and their surrounds, via a device camera, or pre-recorded videos), and respond conversationally. In OpenAI’s demonstrations, GPT-4o comments on a user’s environment and clothes, recognises objects, animals and text, and reacts to facial expressions.
Google’s Project Astra AI assistant, unveiled just one day after GPT-4o, displays similar capabilities. It also appears to have visual memory: in one of Google’s promotional videos, it helps a user find her glasses in a busy office, even though they are not currently visible to the AI.
GPT-4o and Astra continue the trend towards more “multimodal” models that can work with text, images, audio and video. GPT-4o’s predecessor, GPT-4 Turbo, can process text and images together, but not audio and video. The original version of ChatGPT, released less than two years ago, was based only on text.
GPT-4o is also significantly faster than its predecessor.
The ability to work across audio, vision and text in real time is considered crucial to develop advanced AI systems that can understand the world and effectively achieve complex and meaningful goals.
But some critics argue that GPT-4o’s text capabilities are only incrementally better than GPT-4 Turbo and competitors such as Google’s Gemini Ultra and Anthropic’s Claude 3 Opus.
Will major AI labs be able to sustain the recent rapid pace of improvement by continuing to built bigger and more sophisticated models? This is a hot topic of debate among experts, and the outcome will determine the impact of the technology over the coming years.
Wider access
A less flashy but significant aspect of GPT-4o’s launch is that, unlike its GPT-4 family precursors, the new AI system is available to all users in the free version of ChatGPT, subject to usage limits.
This means millions of users worldwide just got an upgrade from GPT-3.5 to a more powerful AI system with more features. GPT-4o is significantly more useful than GPT-3.5 for various purposes, such as work and education.
The impact of this development will become more apparent over time.
What’s next?
OpenAI’s unveiling of GPT-4o disappointed enthusiasts for ever more powerful AI systems, who hoped GPT-5’s arrival was imminent after over a year since GPT-4’s launch.
Instead, this week’s unveiling of GPT-4o and Google’s latest AI announcements emphasise the features being incorporated into their products. These new developments point to possibilities such as more sophisticated virtual assistants capable of performing complex tasks on behalf of users, involving richer interaction and planning. (The Conversation)