Waymo has long highlighted its connection to Google’s DeepMind and its extensive history of AI research as a competitive edge in the autonomous driving sector. Recently, the Alphabet-owned company has taken its strategy a step further by creating a new training model for its robotaxis, leveraging Google’s multimodal large language model (MLLM), known as Gemini.
In a newly released research paper, Waymo introduces the “End-to-End Multimodal Model for Autonomous Driving,” abbreviated as EMMA. This innovative training model processes data from sensors to generate “future trajectories for autonomous vehicles,” which assists Waymo’s driverless cars in determining optimal routes and avoiding obstacles.
Significantly, this development marks one of the first signs that Waymo, a leader in autonomous driving, intends to integrate MLLMs into its operational framework. It suggests that these language models, typically utilized for tasks such as chatbots, email sorting, and image creation, are poised to be employed in a completely different setting—on the roads. Waymo’s research posits the idea of creating an autonomous driving system where the MLLM is treated as a primary component.
Historically, autonomous driving systems have been built with distinct “modules” dedicated to various functions, such as perception, mapping, prediction, and planning. Although this modular approach has served the industry well for years, it faces scalability issues due to the accumulation of errors across modules and the limitations in communication between them. Additionally, these predefined modules can struggle to adapt to new and unfamiliar environments.
Waymo suggests that MLLMs like Gemini can help address these challenges for two primary reasons. First, these models are trained on extensive datasets scraped from the internet, providing rich “world knowledge” that extends beyond the data typically found in standard driving logs. Second, they exhibit enhanced reasoning abilities through techniques such as “chain-of-thought reasoning,” which mimics human thought processes by breaking down complex tasks into manageable steps.
EMMA was specifically developed to assist robotaxis in navigating complicated scenarios. Waymo identified various situations where the model enabled its driverless cars to determine the correct path, such as when encountering animals or construction zones.
Other companies, including Tesla, have made significant claims about developing end-to-end models for their autonomous vehicles. Elon Musk asserts that Tesla’s latest version of its Full Self-Driving system (12.5.5) employs an “end-to-end neural nets” AI system that converts camera images into driving decisions.
This indicates that Waymo, which currently leads Tesla in deploying fully autonomous vehicles, is also pursuing an end-to-end system. The company states that its EMMA model excelled in key areas such as trajectory prediction, object detection, and road graph comprehension.
“This suggests a promising avenue for future research, where additional core autonomous driving tasks could be integrated into a similarly scaled setup,” Waymo noted in a blog post.
However, Waymo acknowledges the limitations of EMMA and recognizes that further research is necessary before the model can be practically implemented. For instance, EMMA does not incorporate 3D sensor inputs from lidar or radar, which Waymo describes as “computationally expensive.” Moreover, the model can process only a limited number of image frames at once.
There are also potential risks associated with employing MLLMs to train robotaxis, which the research paper does not explicitly address. Models like Gemini can occasionally hallucinate or struggle with basic tasks, such as telling time or counting objects accurately. Given that Waymo’s autonomous vehicles operate at speeds of up to 40 mph on busy roads, there is little room for error. Therefore, substantial further research is needed before these models can be deployed widely, and Waymo is transparent about this necessity.
“We hope that our findings will inspire additional research to address these issues,” the company’s research team stated, “and to advance the state of the art in autonomous driving model architectures.” As the industry evolves, Waymo’s efforts to integrate cutting-edge AI technologies could potentially reshape the landscape of autonomous driving.