Gemini 3.5 Flash can now see and control your screen, and Google wants enterprises to trust it



TL;DR

Computer use is now a built-in tool in Gemini 3.5 Flash, replacing the standalone Gemini 2.5 computer use model with enterprise safeguards.

Google has made computer use a built-in tool inside Gemini 3.5 Flash, the model it launched at I/O 2026 as its fastest agentic AI model. The capability, which lets AI agents see screens, click, type, and scroll across browsers, mobile devices, and desktops, previously required a separate standalone model and is now available as a native tool through the Gemini API and the Gemini Enterprise Agent Platform, the renamed version of Vertex AI.

The update means developers no longer need to call a dedicated computer use model to build agents that interact with graphical interfaces. Instead, they can activate computer use as one of several tools within Flash, alongside code execution, search, and function calling. Product manager Mateo Quiros described the integration as giving Flash the ability to see, reason about, and take action on screens.

Google first released a standalone Gemini computer use model in October 2025, designed specifically for browser-based agent workflows. That model achieved roughly 70 percent accuracy on the Online-Mind2Web benchmark and was built around a screenshot-action loop where developers fed it a screen capture, received a structured command, executed it, and sent back the updated view. Folding the capability into Flash consolidates what was a two-model workflow into one.

The enterprise pitch centres on automation that goes beyond chatbots. Google says the tool enables continuous software testing, where agents navigate applications and verify functionality without human testers stepping through each screen. Knowledge workers could use agents to complete multi-step browser tasks, fill forms, extract data from dashboards, or navigate internal tools.

The safety architecture is where Google is drawing the sharpest lines. The company says it applied targeted adversarial training specifically for prompt injection, the attack where malicious instructions embedded in a webpage or document trick an AI agent into performing unintended actions. The threat is not theoretical, as researchers have repeatedly demonstrated that AI agents can be manipulated through content they encounter while carrying out tasks.

Google is offering two optional enterprise safeguards on top of the base model. The first requires explicit user confirmation before the agent executes any action flagged as sensitive or irreversible, such as submitting a form, making a purchase, or deleting data. The second automatically halts the agent if it detects an indirect prompt injection attempt, stopping execution rather than risking a compromised action.

Both safeguards are opt-in, not defaults. Google recommends a “defense-in-depth” approach where developers layer multiple protections rather than relying on any single mechanism. The company’s documentation acknowledges that no individual safeguard is sufficient on its own, a candid framing that contrasts with the more confident marketing language around other AI capabilities.

The competitive landscape has shifted considerably since Anthropic pioneered the category. Anthropic’s Claude Computer Use works across operating systems and can interact with file systems, not just browsers, making it more versatile for desktop workflows. Google’s own Chrome Enterprise already added agentic browsing features earlier this year, including Auto Browse for autonomous multi-step tasks.

The new Flash integration extends that philosophy beyond Chrome to any screen an agent can see. OpenAI has also entered the space, and the three companies are now competing on different axes. The question for enterprise buyers is less about which model can click a button and more about which one can do it safely inside a regulated environment.

Google has not published updated benchmark scores for computer use as a built-in Flash tool versus the previous standalone model. The company has not disclosed how many enterprises are using the capability or provided case studies with named customers. The claims about targeted adversarial training for prompt injection are described in the blog post but not backed by published research or red-team results.

The Gemini Enterprise Agent Platform, where the tool is available, uses pay-as-you-go pricing. Flash is one of the cheaper models in Google’s lineup, which could make computer use more accessible for large-scale automation than running it through a heavier model. Whether the cost advantage holds depends on how many actions a typical agent workflow requires and how often the safety guardrails interrupt execution to request confirmation.

Computer use in AI is still early. The models can navigate familiar interfaces but struggle with unexpected pop-ups, CAPTCHAs, dynamically loaded content, and layouts they have not seen before. Google’s decision to make it a built-in tool rather than a standalone model signals confidence that the capability is mature enough for general availability, but the opt-in safety guardrails signal equal awareness that it is not yet mature enough to run unsupervised.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


Microsoft has spent the last several years pushing Copilot and new user interface designs, which has meant that several great features included with Windows don’t get the recognition that they deserve. These are some of my favorites that will run on any Windows 11-compatible PC.

Clipboard history remembers everything you copy

Win+V replaces one of the oldest frustrations in computing

Windows’s default clipboard has been a source of minor but constant annoyance: it holds exactly one thing. If you copy something new, the previous item is wiped out. It is enough of a problem that multiple third-party apps were created to address the shortcoming.

Now, Windows has Clipboard History built in, though it isn’t enabled by default. To turn it on, press Windows+i, then navigate to System > Clipboard, and click the toggle next to Clipboard history.

Once it is enabled, you can press Win+V to view up to 25 items in your clipboard history, including text, images, and links.

If you have specific pieces of information you use daily—like an email signature, a common code snippet, or a home address—you should pin up some of those items. Pinned items persist between system reboots and clipboard history clears, which means you never have to hunt to find something when you need it.

You can even enable sync in the Clipboard settings, allowing your copied text to follow you between different PCs signed in to the same Microsoft account. Once you get into the habit of using Win+V, the standard copy-paste function will feel useless by comparison.

Voice typing actually works now

Win+H lets you write with your voice

Notepad with Windows Voice Typing popup visible.

Windows dictation software has a reputation for being clunky and difficult to use, but that isn’t the case anymore. Thanks to the improvements in AI that we’ve seen since 2024, voice typing accuracy has improved significantly, especially for technical vocabulary. You don’t have to spend your time manually fixing formatting either. The tool supports punctuation commands like “period,” “new line,” and “question mark,” which prevents your text from turning into a rambling mess.

To use voice typing, press Windows+H anywhere there is a text field.

While it isn’t a full replacement for high-end professional software, it is free, built-in, and more than good enough for long-form writing, taking down a sudden idea, or writing quick messages when your hands are full.

Snap layouts make window management effortless

Hover over the maximize button and pick a layout

Notepad with the Windows Snap Layout window visible.

You can manually drag windows to the edges of your screen to split your display up, but you’re doing more work than is necessary in most cases. Windows’ Snap Layouts allow you to instantly arrange your Windows into predefined halves, thirds, or quarters. Just hover over the maximize button on any window or press Win+Z.

One of the most practical aspects of this system is the Snap Group. If you snap a browser and a document side-by-side, Windows remembers them as a pair. When you Alt+Tab, you can bring the entire group back together.

Live captions transcribe any audio on your device

Real-time subtitles for anything you’re watching

You can enable real-time subtitles for any audio playing through your speakers by going to Settings > Accessibility > Captions, or by pressing Win+Ctrl+L. The audio is processed locally on your device; nothing is sent to the cloud, which is critical if you’re privacy conscious or if whatever you’re captioning demands confidentiality.

I’ve mostly taken to using it when it is too hot to wear my headphones. I can just toggle it on and keep watching without disrupting anyone around me.

There are some hardware requirements you need to meet. Basic same-language captioning works on any Windows 11 PC running 22H2 and up, but if you want real-time translation, you will need Copilot+ hardware with an NPU and at least Windows 11 24H2.


The NZXT Capsule Elite USB microphone sitting on a desk.


Windows 11’s voice typing convinced me to skip Wispr Flow and other premium apps

Windows lets me turn my rambling thoughts into notes without typing anything.

Dynamic Lock locks your PC when you walk away

Pair your phone via Bluetooth and your computer can lock itself automatically

I can’t count how many times I’ve stepped away from my PC only to think, “Dang, I forgot to lock my PC.”

Fortunately, Windows has an easy way to handle that automatically by pairing your phone with your PC. When your phone gets out of range (about 20 feet in my house, though your wall materials and layout will affect that), your computer will automatically lock after about 30 seconds. There is no need to install a separate app on your phone, the setup just uses the Bluetooth connection itself. While the 30-second delay means it isn’t a guarantee no one can access my PC, it does mean it won’t remain unlocked if I step away for a long time.

I especially like this feature when I’m working on my laptop in public.

You can enable Dynamic Lock by navigating to Settings > Bluetooth & devices and pairing your phone, then enabling Dynamic Lock in Settings > Accounts > Sign-in options.


Microsoft includes tons of great tools if you dig for them

These tools aren’t alone either. There are tons of practical tools buried in Windows, unappreciated and underutilized.

Each of these tools takes less than a minute to enable, but they can make a significant difference in your day-to-day workflow. It is worth the small investment of time to find them and set them up.

If you’re looking for even more advanced customization options, I’d recommend checking out Microsoft PowerToys. It gives you a huge range of fantastic tools that make Windows much more pleasant to use.



Source link