Google is Reportedly Developing a ‘Computer-Using Agent’ AI System

Google is reportedly preparing to unveil its version of an AI-powered tool similar to Rabbit’s large action model, according to a report from The Information. This new project, codenamed “Project Jarvis,” could be previewed as early as December. Jarvis is designed to carry out tasks on behalf of users, such as gathering research, purchasing products, or booking flights. Three individuals familiar with the project have confirmed these details to The Information.

Jarvis is expected to be powered by a future iteration of Google’s Gemini AI model and will be optimized for use with web browsers, specifically tuned for Chrome. The main goal of the tool is to help users automate common web-based tasks, which it accomplishes by analyzing and interpreting screenshots, then clicking buttons or entering text as needed. According to the report, the system takes “a few seconds” between each action, which indicates that while it shows promise, it may still require optimization for smoother performance.

Many of the major AI companies are currently developing models with capabilities similar to what Project Jarvis aims to offer. For example, Microsoft is working on its own tool called Copilot Vision, which allows users to interact with webpages they are viewing by talking to the system. Apple’s upcoming intelligence features are expected to provide users with an AI that understands what’s on their screens and can perform tasks across various apps. Meanwhile, Anthropic has introduced a beta update for its AI model, Claude, that can use a computer on behalf of the user, although early reviews have described it as “cumbersome and error-prone.” OpenAI is also said to be developing a similar tool, though specific details are still under wraps.

The Information notes that Google’s plans for a December preview of Jarvis may be subject to change. The company is reportedly considering releasing the tool to a limited number of testers to identify and address potential bugs before making it more widely available.

Jarvis, as envisioned, represents Google’s entry into a rapidly evolving field where AI models are being designed not only to respond to user queries but to take proactive steps on their behalf. By automating everyday web-based tasks, this tool could help users save time and effort on mundane activities such as making purchases or handling travel arrangements. The ability to perform actions like these directly in a web browser could significantly streamline users’ online experiences.

What sets Jarvis apart from other AI tools is its reliance on browser-based automation. Rather than requiring a separate interface or app, Jarvis integrates directly into a web browser, leveraging Chrome’s capabilities to execute tasks. This browser-centric approach allows Jarvis to perform functions like clicking on buttons or entering information, mimicking the actions a user would take when interacting with a website. Such a setup suggests that Google is focused on creating an AI assistant that is both highly practical and easy to use within familiar online environments.

Other tech giants are also racing to introduce their own versions of AI tools that can handle similar tasks. Microsoft’s Copilot Vision aims to make interacting with webpages easier by enabling users to have conversations with the AI about what they’re seeing on their screens. This could allow for more fluid and dynamic interactions with the web, further blurring the lines between passive browsing and active engagement with online content.

Apple, on the other hand, is working on intelligence features that would allow its AI to understand and respond to what’s displayed on users’ screens across multiple apps. This cross-app functionality would likely give Apple’s tool a broader range of capabilities compared to those that are limited to web browsing.

While Google, Microsoft, and Apple are at the forefront of these developments, other players like Anthropic and OpenAI are not far behind. Anthropic’s Claude, though still in beta, is capable of controlling a computer for the user. However, early feedback suggests that Claude’s current implementation may still need refinement to become more user-friendly and reliable. OpenAI is also said to be developing its own tool, though details about how it might work remain speculative for now.

The rapid pace of innovation in this space reflects the growing demand for AI tools that go beyond answering questions or providing information. Instead, these next-generation AI assistants are being built to take action, completing tasks and making decisions for users. Whether it’s booking a flight, purchasing an item online, or managing daily tasks, these tools aim to automate many of the activities people currently do themselves, freeing up time and reducing the mental load of managing online interactions.

Google’s Jarvis, if it lives up to its potential, could be a game-changer in this space. However, the success of the project will depend on how well the company can refine the tool’s functionality and ensure that it can handle a wide range of tasks efficiently and accurately. The decision to release Jarvis to a small group of testers initially suggests that Google is taking a cautious approach, prioritizing quality and reliability before a wider rollout.

While there is no official confirmation yet on whether the December preview will happen as planned, the project underscores Google’s ambitions to lead the next wave of AI development, focusing on tools that don’t just assist but take action on behalf of users.

Latest articles