10.10.2025 19:13 ● Author: Viacheslav Vasipenok

Google Unveils Gemini 2.5 Computer Use: A Smarter AI Agent for Seamless UI Interaction

In a bold step toward more autonomous AI agents, Google DeepMind has launched Gemini 2.5 Computer Use, a specialized model powered by the advanced Gemini 2.5 Pro architecture. This innovative tool equips AI with the ability to mimic human-like interactions on digital interfaces - clicking buttons, typing text, scrolling through pages, and even filling out forms - all without relying on traditional APIs. Announced on October 8, 2025, the model is now available in public preview through the Gemini API on Google AI Studio and Vertex AI, marking a significant leap in agentic AI capabilities.

At its core, Gemini 2.5 Computer Use leverages the visual understanding and reasoning prowess of Gemini 2.5 Pro to analyze screenshots of user interfaces (UIs) and execute a series of actions in a continuous feedback loop.

Developers can prompt the model with natural language instructions, such as "Fill out this CRM form with customer details" or "Browse Hacker News for trending debates," and it will autonomously navigate the environment.

Supported actions include opening browsers, typing, hovering, scrolling, dragging and dropping, and using keyboard shortcuts. After each action, a new screenshot and URL are fed back to the model, allowing it to refine its approach iteratively. This loop ensures contextual awareness, making the AI more adaptive over time.

While the functionality echoes similar "computer use" features from competitors like Anthropic's Claude and OpenAI's ChatGPT agents - which also enable UI manipulation - Google positions Gemini 2.5 Computer Use as a superior alternative.

The company highlights its edge in benchmarks, where it outperforms both rivals on web and mobile control tasks.

For instance, on web navigation benchmarks, it achieves higher success rates with notably lower latency, enabling faster and more responsive interactions. Optimized primarily for web browsers, the model also shows promise in mobile UI control via Google's custom "AndroidWorld" benchmark, though it's not yet tuned for full desktop OS manipulation.

Internally, Google has already integrated versions of this model into its workflows, powering tools like Project Mariner and AI Mode for agentic tasks. One standout application is in UI testing for software development, where Google's payments platform team used it to rehabilitate over 60% of previously failing test executions - drastically speeding up quality assurance processes.

Early access programs have also extended these capabilities to third-party developers building virtual assistants and automation tools, demonstrating real-world potential beyond the lab.

Safety remains a top priority, as Google emphasizes built-in safeguards to mitigate risks. The model includes mechanisms to prevent high-risk behaviors, such as unauthorized system access or CAPTCHA circumvention, requiring explicit user consent for sensitive actions.

Developers gain access to granular controls via the API, ensuring responsible deployment in production environments. This proactive approach aligns with broader industry efforts to balance innovation with ethical AI use.

For developers eager to experiment, getting started is straightforward. Head to Google AI Studio or Vertex AI, enable the `computer_use` tool in the Gemini API, and provide initial prompts with UI screenshots.

Google has also shared public demos on Browserbase, showcasing the model in action - from playing 2048 to scouring forums for insights - all accelerated to 3x speed for brevity. Feedback is encouraged through the Developer Forum to refine future iterations.

Also read:

As AI agents evolve from passive responders to proactive operators, Gemini 2.5 Computer Use underscores Google's commitment to practical, high-performance tools. While it may not reinvent the wheel compared to Claude or ChatGPT, its benchmark-leading efficiency and internal validations suggest it's poised to streamline digital workflows like never before. Whether you're automating tedious form-filling or enhancing app testing, this preview release invites builders to push the boundaries of what's possible with AI-driven interfaces.

*This article is based on official announcements and reports as of October 10, 2025.*

0 comments

Google Unveils Gemini 2.5 Computer Use: A Smarter AI Agent for Seamless UI Interaction

Popular

The Anatomy of an Entrepreneur

What is a Startup?

Advertising on QUASA

8 Logo Design Tips for Small Businesses

Top 5 Tips to Make More Money as a Content Creator

Latest news