
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- AI is getting better at small tasks, but still lags on long-form analysis.
- The consequences of prolonged interactions with AI can be disastrous.
- Use AI like a tool for well-defined tasks, and avoid falling down a rabbit hole.
Better to do a little well than a great deal badly. So said the great philosopher Socrates, and his advice can apply to your use of artificial intelligence, including chatbots such as OpenAI’s ChatGPT, or Perplexity, as well as the agentic AI programs increasingly being tested in enterprise.
AI research increasingly shows that the safest and most productive course with AI is to use it for small, limited tasks, where outcomes can be well defined, and results can be verified, rather than pursuing extensive interactions with the technology over hours, days, and weeks.
Also: Asking AI for medical advice? There’s a right and wrong way, one doctor explains
Extended interactions with chatbots such as ChatGPT and Perplexity can lead to misinformation at the very least, and in some cases, delusion and death. The technology is not yet ready to take on the most sophisticated kinds of demands of reasoning, logic, common sense, and deep analysis — areas where the human mind reigns supreme.
(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
We are not yet at AGI (Artificial General Intelligence), the supposed human-level capabilities of AI, so you’d do well to keep the technology’s limitations in mind when using it.
Put simply, use AI as a tool rather than letting yourself be sucked down a rabbit hole and get lost in endless rounds of AI conversation.
What AI does well – and not so well
AI tends to do well at simple tasks, but poorly at complex and deep types of analysis.
The latest examples of that are the main takeaways from this week’s release of the Annual AI Index 2026 from Stanford University’s Human-Centered AI group of scholars.
On the one hand, editor-in-chief Sha Sajadieh and her collaborators make clear that agentic AI is increasingly successful at tasks such as looking up information on the Web. In fact, agents are close to human-level on routine online processes.
Also: 10 ways AI can inflict unprecedented damage
Across three benchmark tests — GAIA, OSWorld, and WebArena — Sajadieh and team found that agents are approaching human-level performance on multi-step tasks such as opening a database, applying a policy rule, and then updating a customer record. On the GAIA test, agents have an accuracy rate of 74.5%, still below the 92% of human performance but way up from the 20% of a year ago.
On the OSWorld test, “Computer science students solve about 72% of these tasks with a median time of roughly two minutes,” while Anthropic’s Claude Opus 4.5, up until recently its most powerful model, reaches 66.3%. That means “the best model [is] within 6 percentage points of human performance.”
WebArena shows AI models “now within 4 percentage points of the human baseline of 78.2%” accuracy.
Agentic AI is getting better at online tasks such as Web browsing but still falls short of human-level accuracy.
Stanford
While Claude Opus and other LLMs are not perfect, they show rapid progress in at least reaching benchmark levels that come closer to human-level performance.
That makes sense, as manipulating a web browser or looking something up in a database should be among the easier scenarios in which the natural-language prompt can plug into APIs and external resources. In other words, AI should have most of the equipment it requires to interface with applications in limited ways and carry out tasks.
Also: 40 million people globally are using ChatGPT for healthcare – but is it safe?
Note that even with well-defined, limited tasks, it helps to check what you’re getting from a bot, as the average score on these benchmarks still falls short of human capacity — and that’s in benchmark tests, a kind of simulated performance. In real-world settings, your results may vary, and not to the upside.
AI can’t handle the hard stuff
When they dug into deeper kinds of work, the Stanford scholars found much less encouraging results.
Research has found, they noted, that “models handle simple lookups well but struggle when asked to find multiple pieces of matching information or to apply conditions across a very long document — tasks that would be straightforward for a human scanning the same text.”
That finding aligns with my own anecdotal experience using ChatGPT to draft a business plan. Answers were fine in the first few rounds of prompting, but then degraded as the model snuck in facts and figures I had not specified, or that might have been relevant earlier in the process but had no business being included in the present context.
The lesson, I concluded, was that the longer your ChatGPT sessions, the more errors sneak in. It makes the experience infuriating.
Also: I built a business plan with ChatGPT and it turned into a cautionary tale
The results of unchecked bot elaboration can get more serious. An article last week in Nature magazine describes how scientist Almira Osmanovic Thunström, a medical researcher at the University of Gothenburg, and her team invented a disease, “bixonimania,” which they described as an eye condition resulting from excessive exposure to blue light from computer screens.
They wrote formal research papers on the made-up condition, then published them online. The papers got picked up in bot-based searches. Most of the large language models, including Google’s Gemini, began to faithfully relate the condition bixonimania in chats, pointing to the faked research papers of Thunström and team.
The fact that bots will confidently assert the existence of the fake bixonimania speaks to a lack of oversight of the technology’s access to information. Without proper checking, you can’t know if a model will verify what it’s spitting out. As one scholar who wasn’t involved in the research noted, “We should evaluate [the AI model] and have a pipeline for continuous evaluation.”
Consequences can be serious
A more serious variant, where a user seems to have gone down a rabbit hole of confiding in a bot, is described in a recent New York Times article by Teddy Rosenbluth about the case of an older man grappling with white blood cell cancer.
Rather than following his oncologist’s advice, the patient, Joe Riley, relied on extensive interaction with chatbots, especially Perplexity, to refute the doctor’s diagnosis. He insisted his AI research revealed he had what’s called Richter’s Transformation, a complication of cancer that would be made more adverse by the recommended treatment.
Also: Use Google AI Overview for health advice? It’s ‘really dangerous,’ investigation finds
Despite emails from experts on Richter’s questioning the material in the Perplexity summaries of the condition, Riley stuck with his belief in his AI-generated reports and resisted his doctor’s and his family’s pleas. He missed the window for proper treatment, and by the time he relented and agreed to try treatment, it was too late.
Rosenbluth makes the connection between the story of Joe Riley and the case of Adam Raine last year, who committed suicide after extensive chats with ChatGPT about his inclination to end his life.
Riley’s son, Ben Riley, wrote his own account of his father’s journey with AI. While the younger Riley doesn’t blame the technology per se, he points out that getting immersed in chats and losing perspective can have consequences.
“The fact remains that AI does exist in our world,” writes Riley, “and just as it can serve as fuel to those suffering manic psychosis, so too may it affirm or amplify our mistaken understanding of what’s happening to us physically and medically.”
Staying sane with unreliable AI
The inclination to engage in long-form discussions about depression, suicide, and serious health conditions is understandable. People have been habituated to long-form engagements of hours at a time on social media. Some people are lonely, and a natural-language conversation with a bot is better than no conversation at all.
Also: Your chatbot is playing a character – why Anthropic says that’s dangerous
Bots have a tendency toward sycophancy, research has shown, which can make hours of engagement with a bot more fulfilling than the ordinary give and take with a person.
And the companies that make the technology, while warning users to verify bot output, have tended to place less emphasis on negative reports from individuals such as Riley and Raine.
4 rules for avoiding the rabbit hole
A few rules can help mitigate the worst effects of putting too much emphasis on the technology.
- Define what you are going to a chatbot for. Is there a well-defined task that has a limited scope and for which the predictions of the bot can be fact-checked with other sources?
- Have a healthy skepticism. It’s well known that chatbots are prone to confabulation, confidently asserting falsehoods. It doesn’t matter how many chatbots you use to try to balance the good and the bad; all of them should be treated with a healthy skepticism as having only part of the truth, if any.
- Regard chatbots not as friends or confidants. They are digital tools, like Word or Excel. You’re not trying to have a relationship with a bot but rather to complete a task.
- Use proven digital overload skills. Take stretch breaks. Step away from the computer for a non-digital human interaction, such as playing card games with a friend or going for a walk.
Also: Stop saying AI hallucinates – it doesn’t. And the mischaracterization is dangerous
Falling down the rabbit hole happens partly as a result of simply being parked in front of a screen with no downtime.



