Google Unveils Gemini 2.5 Computer Use: A Smarter AI Agent for Seamless UI Interaction

In a bold step toward more autonomous AI agents, Google DeepMind has launched Gemini 2.5 Computer Use, a specialized model powered by the advanced Gemini 2.5 Pro architecture. This innovative tool equips AI with the ability to mimic human-like interactions on digital interfaces - clicking buttons, typing text, scrolling through pages, and even filling out forms - all without relying on traditional APIs. Announced on October 8, 2025, the model is now available in public preview through the Gemini API on Google AI Studio and Vertex AI, marking a significant leap in agentic AI capabilities.

Developers can prompt the model with natural language instructions, such as "Fill out this CRM form with customer details" or "Browse Hacker News for trending debates," and it will autonomously navigate the environment.
Supported actions include opening browsers, typing, hovering, scrolling, dragging and dropping, and using keyboard shortcuts. After each action, a new screenshot and URL are fed back to the model, allowing it to refine its approach iteratively. This loop ensures contextual awareness, making the AI more adaptive over time.
While the functionality echoes similar "computer use" features from competitors like Anthropic's Claude and OpenAI's ChatGPT agents - which also enable UI manipulation - Google positions Gemini 2.5 Computer Use as a superior alternative.
The company highlights its edge in benchmarks, where it outperforms both rivals on web and mobile control tasks.
For instance, on web navigation benchmarks, it achieves higher success rates with notably lower latency, enabling faster and more responsive interactions. Optimized primarily for web browsers, the model also shows promise in mobile UI control via Google's custom "AndroidWorld" benchmark, though it's not yet tuned for full desktop OS manipulation.

Early access programs have also extended these capabilities to third-party developers building virtual assistants and automation tools, demonstrating real-world potential beyond the lab.

Developers gain access to granular controls via the API, ensuring responsible deployment in production environments. This proactive approach aligns with broader industry efforts to balance innovation with ethical AI use.
For developers eager to experiment, getting started is straightforward. Head to Google AI Studio or Vertex AI, enable the `computer_use` tool in the Gemini API, and provide initial prompts with UI screenshots.
Google has also shared public demos on Browserbase, showcasing the model in action - from playing 2048 to scouring forums for insights - all accelerated to 3x speed for brevity. Feedback is encouraged through the Developer Forum to refine future iterations.
Also read:
- California Poised to Become First State to Regulate AI Chatbots
- Trump Sues New York Times for $15 Billion, Claiming Damage to His Meme Coin and Reputation
- Google Unveils AI Payment Protocol with Stablecoin Support
- Create Cool Games Without Programming Skills—This AI Does It All for Free
As AI agents evolve from passive responders to proactive operators, Gemini 2.5 Computer Use underscores Google's commitment to practical, high-performance tools. While it may not reinvent the wheel compared to Claude or ChatGPT, its benchmark-leading efficiency and internal validations suggest it's poised to streamline digital workflows like never before. Whether you're automating tedious form-filling or enhancing app testing, this preview release invites builders to push the boundaries of what's possible with AI-driven interfaces.
*This article is based on official announcements and reports as of October 10, 2025.*