AI agents need more than reasoning: they need to actually use the web



A company rolls out an AI customer service assistant. The model behind it is current and capable enough for the job. The assistant goes live. Within a week, support tickets are getting worse, not better.

The model isn’t the problem. The company’s own website is. The return policy the assistant needs to quote lives in a PDF. The shipping calculator it needs to reference is a multi-step form. The product specs it should be pulling sit behind a tabbed interface that only loads after a click. To a human visitor, the site works fine. To the AI trying to read it, half the site doesn’t exist.

This is the wall most agentic AI deployments are hitting right now, and it has almost nothing to do with the model.

McKinsey’s 2025 State of AI report found that 23% of organizations are now scaling agentic AI systems in at least one business function, with another 39% experimenting. Most of those deployments will run into the same wall: a web designed for humans, used by software that needs something humans never required. The next step for AI agents isn’t smarter reasoning. It’s the ability to actually navigate and use the live internet.

The three things an AI agent has to do on the web

The work breaks down into three jobs, and all three have to work for an agent to be useful in production.

Search. The agent needs to find the right information. Not URLs to a list of links, but actual content it can read and reason over. If a customer asks an insurance chatbot whether their policy covers a specific event, the agent needs to surface the relevant section of the policy, not a search results page.

Scrape. Once the agent finds the page, it needs to read it cleanly. Most modern websites don’t make this easy. Pages load through JavaScript that has to execute first. Content lives inside expandable accordions, tabs, and lazy-loaded sections. The HTML the agent receives often looks nothing like what a human sees in their browser.

Interact. This is where most agent demos fall apart in production. A lot of the information humans care about isn’t on a simple URL. It’s behind a “load more” button, a search box, a multi-step form, a navigation menu, or a login. A scraper that can only read static pages can’t reach any of it. An agent that can interact (click, navigate, fill, submit) can. The difference between the two determines whether the AI can actually do its job.

Of the three, interaction is the newest and the hardest. It’s also where the most useful agent applications live: shopping assistants that compare prices across sites, research tools that pull data from interactive dashboards, customer support bots that navigate documentation portals the way a real user would.

Firecrawl is building the layer underneath

Firecrawl is one of the companies building infrastructure designed to support all three functions. The platform sits between AI agents and the live web, handling search, scraping, and interaction as managed capabilities behind a single API. Its open-source project has more than 120,000 stars on GitHub. Customers including Lovable, Replit, and Zapier  use it in production. Nexus Venture Partners led the company’s $14.5 million Series A in 2025, with Shopify CEO Tobi Lütke joining as an investor after first using Firecrawl as a customer.

The pitch is straightforward: an AI agent built on top of Firecrawl doesn’t need its development team to write custom code for every site it touches. It calls an API, and the platform handles much of the underlying technical work: rendering JavaScript, navigating dynamic pages, interacting with elements, and returning structured output that AI systems can use.

“Every AI company needed clean web data and nobody was solving it well,” says Eric Ciarla, one of Firecrawl’s cofounders. “So we built Firecrawl.”

Ciarla and his cofounders ran into the problem directly while building their previous company, Mendable, an AI search platform that was used by a range of organizations. The search product worked. The infrastructure pulling data from each customer’s website to feed it didn’t. Every new integration meant rebuilding fragile extraction code that broke the next time the customer’s site changed. Mendable wasn’t unusual in hitting that wall. Many AI companies integrating web data faced similar challenges, repeatedly rebuilding internal extraction tools.

How AI is becoming the new way people find things

There’s a shift happening alongside the technical one, and it changes the stakes for businesses that haven’t thought about AI agents reading their websites yet.

For two decades, the path from “a customer is looking for something” to “a customer finds your business” often ran through traditional search engines. AI assistants are increasingly where people start when they want a recommendation, a comparison, or an answer. The AI assistant goes off, pulls information from the relevant websites on the person’s behalf, and comes back with a synthesized answer. If the AI couldn’t parse your site, your business doesn’t appear in the answer.

Ciarla argues this changes how businesses should think about AI crawlers entirely. “Behind every AI agent is a human trying to find something,” he says. The dominant industry framing has treated AI crawlers as unwelcome automation: bots to defend against, traffic that drains server resources without sending human visitors in return. That framing made sense when the only things reading websites at scale were search engines indexing for human visitors later. It makes less sense when AI agents are the path the human is using to find.

 In Ciarla’s view, blocking AI crawlers today may be comparable to limiting visibility on an emerging discovery channel. He argues that doing so could reduce opportunities for businesses to be found through evolving customer search behaviors.

What makes Firecrawl’s position in this shift unusual is that it doesn’t require businesses to do anything. Most approaches to AI visibility put the work on the site owner: add new markup, expose new endpoints, restructure pages, learn a new optimization discipline on top of the existing SEO one. Firecrawl works from the opposite direction. The platform handles the conversion between human-readable site and machine-readable data automatically, in real time. A business never needs to know AI agents are reading the page. The agents get what they need anyway.

The bigger question underneath

As agents pull more information from more sites, the relationship between AI systems and the sources they depend on becomes a real question. A model where AI extracts value from web content without anything flowing back to the people who created it isn’t durable. Publishers are pushing back through lawsuits and access blocks, and major sites are increasingly walling off their content from AI crawlers entirely. The underlying ecosystem isn’t healthy, and the long-term cost lands somewhere eventually.

In March 2026, Firecrawl partnered with Wikimedia Enterprise to route all of its Wikipedia traffic – 2 to 3 million requests per month – through Wikimedia’s commercial APIs rather than continuing to scrape Wikipedia pages directly. The arrangement replaces resource-intensive scraping with paid, structured access, and helps support the volunteer community that maintains one of the most-cited information sources on the open web.

“The community members who write and edit these articles hold immense power in the age of AI,” Ciarla said when the partnership was announced. “They are providing the essential service of defining what is true. We want to ensure our infrastructure supports their work rather than just consuming it.”

The Wikimedia deal is one model. Similar approaches may emerge elsewhere in the industry. As AI products move from demos into production at scale, the companies building the underlying infrastructure are helping shape how AI systems interact with the web.

What this means if you’re paying attention

If you’re building with AI, the practical takeaway is simple. The model is no longer the differentiator. Almost everyone has access to the same frontier models, and the gaps between them keep closing. What separates an AI product that works in production from one that doesn’t is increasingly the layer underneath, and whether the system can actually reach the information it needs to be useful. Investing in that layer may offer meaningful engineering benefits.

If you’re running a business and you’ve never thought about AI agents reading your website, that’s the moment to start. The discovery channel is shifting. A customer who previously may have found a business through a traditional search engine may now use an AI assistant as part of the discovery process. If that assistant can’t read your site, they may not find you at all. Many businesses continue to optimize primarily for human readers and search engines while evaluating how AI-driven discovery may affect their digital presence.

Digital Trends partners with external contributors. All contributor content is reviewed by the Digital Trends editorial staff.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


Ghost CMS flaw abused to push ClickFix attacks on hundreds of sites

Pierluigi Paganini
May 25, 2026

Threat actors are actively exploiting a security flaw, tracked as CVE-2026-26980, in Ghost CMS that was fixed months ago in real attacks against unpatched websites. According to Qianxin, the campaign has already affected more than 700 sites, including well-known organizations and universities.

The vulnerability is an SQL injection issue in Ghost’s Content API that can let an attacker read data from the database without logging in. In the worst case, this can expose the Admin API key, which can allow attackers to take over the site.

That key matters because it can be used to change published content. In this campaign, attackers used it to edit articles on compromised Ghost sites and insert malicious JavaScript at the end of pages. The goal was not just defacement, but to turn trusted websites into launch points for further malware delivery.

“After an in-depth investigation and analysis, we determined that this was not a targeted intrusion against the customer, but rather a large-scale poisoning campaign by an in-the-wild attack group targeting Ghost CMS. Although CVE-2026-26980 was publicly disclosed as early as February 19, a large number of users did not patch and upgrade in time, providing an opportunity for attackers.” reads the advisory published by Qianxin. “At least two groups are currently actively conducting such poisoning operations, and some sites have even become the target of competition between the two parties, with different malicious code being implanted one after another within a single day.”

The inserted code led visitors through a two-step chain. First, the page loaded a remote script that checked the browser and decided what the visitor should see. Then real victims were redirected to a fake verification page that looked like a normal “I’m human” check.

This is where the ClickFix part began. The page told users to press Windows+R, paste a command, and hit Enter. In practice, that command downloaded and started a malware payload on the victim’s machine. It was a classic social engineering trick: make the user do the dangerous part themselves.

Qianxin says the first signs of this activity appeared in early May. The malicious code found in the campaign had a compilation date of February 16, the same day Ghost announced the fix for CVE-2026-26980. That suggests the attackers moved quickly once they saw how many sites had not been updated.

The affected websites cover a wide range of sectors. Roughly half are personal blogs or independent sites, but the list also includes technology blogs, AI sites, media outlets, crypto projects, and educational institutions. Qianxin researchers say victims include sites linked to Harvard, Oxford, and DuckDuckGo.

The attack chain was also designed to be flexible. The loaders could fetch different payloads depending on the target, and the operators changed infrastructure several times.

“entire attack process has obvious five-stage characteristics of “CMS Takeover → Page Poisoning → Two-stage Loading → Social Engineering Lure (FakeCaptcha/ClickFix) → Malware Delivery”, and the entire process is highly automated: bulk vulnerability scanning → automatic key extraction → bulk injection → dynamic C2 distribution.” states the report.

In some cases, they switched domains after detection, keeping the campaign alive even when part of the chain was blocked.

“Through feature scanning of publicly accessible pages, we have cumulatively identified more than 700 poisoned victim domains, and have proactively contacted the sites for which contact information could be obtained, notifying them of the poisoning.” continues the report.

Qianxin also believes at least two different groups are involved. In some cases, the same site was hit more than once, with one attacker replacing the code left by another. That makes the campaign harder to clean up and shows how attractive compromised Ghost sites have become for abuse.

For site owners, the advice is straightforward. Ghost should be updated immediately, all credentials should be rotated, and site logs should be reviewed for suspicious admin API activity. Any injected scripts should be removed from the database itself, not just from the visual editor. Visitors who may have reached a poisoned site should also be warned.

The report includes Indicators of Compromise (IoCs) for the attacks observed by the researchers.

Follow me on Twitter: @securityaffairs and Facebook and Mastodon

Pierluigi Paganini

(SecurityAffairs – hacking, Ghost CMS)







Source link