I tested Claude and ChatGPT on a simple task—both failed spectacularly

AI chatbots are incredibly useful for my work. I use them for research, for proofreading and fact-checking, and for keeping track of articles and pitches. There’s one job I had really hoped AI could take off my hands. Sadly, both Claude and ChatGPT failed dismally.

I have no problem with AI handling the mundane tasks

AI should help make us more efficient

1X NEO humanoid robot still from promotional video.

There’s an excellent quote from author Joanna Maciejewska that completely sums up my feelings about AI. “I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.”

The same is true of my work. I don’t want AI to write articles for me, and I don’t want to read articles written by AI. They don’t have real-world experience with what they’re writing about.

There are plenty of parts of my work that AI does help with, and it’s the mundane parts, such as fact-checking, that I’m more than happy for AI to make easier. The less time I have to spend on routine tasks, the more time I can spend writing.

There’s one particularly mundane task that I still do manually, and it takes up a lot of time. I was hoping that AI might be able to take it off my hands.

Annotating screenshots is a thankless task

Adding arrows to multiple images eats up time

The Add label button for an area in Home Assistant.

One of the most tedious parts of writing how-to guides is creating step-by-step screenshots. These are really useful for helping readers to follow the guides quickly and accurately, but they’re a pain to create.

First, I have to work through all of the steps and take screenshots of all the relevant parts. This isn’t too painful, as it only takes a second or two to take a screenshot.

The time-consuming parts come when I have to annotate the screenshots. When a screenshot is trying to illustrate a specific menu or button, it’s not always immediately obvious which menu or button the screenshot is focused on. To solve this, arrows are added to the screenshots to indicate the appropriate objects.

Adding these annotations isn’t hard; it’s just time-consuming and tedious, especially when I have a lot of screenshots to annotate. My hope was that I’d be able to point an AI chatbot at a folder of screenshots, upload the text of the step-by-step instructions, provide some examples of previously annotated screenshots, and have the AI add the arrows to the images for me. It felt like this was something that should be possible with the current state of AI tools.

OpenAI is close to turning ChatGPT into a super app—here’s what it will do

You could create and code in one place.

ChatGPT wasn’t up to the job

The results were poor

ChatGPT currently has one of the best image generation models out there. The GPT Image 2 model can produce impressive results that can be very hard to distinguish from real images. I figured that if any chatbot could nail adding arrows to screenshots, it would be ChatGPT.

I uploaded the screenshots and step-by-step instructions and gave ChatGPT an example image so it knew what the annotations should look like. ChatGPT got to work and annotated all the screenshots for me.

The results were terrible. The arrows were pointing to completely the wrong things or to nothing at all. Some of the arrows were mangled, and not a single screenshot was even close to being usable. I tried again with refined prompts that were more explicit to try to solve these issues, but every time I tried, I got the same poor results.

Even with computer use, Claude failed

Full control of my computer wasn’t enough

Having failed with ChatGPT, I decided to give Claude a go. While Claude didn’t have native image generation tools, I’d used a Claude feature called computer use before, which lets Claude interact with a computer using screenshots, mouse control, and keyboard input. I figured that using this tool, Claude could take control of apps such as Preview, annotate the screenshots, see the results of its efforts, and adjust the positions of the arrows until they were exactly right.

I set everything up, pointed Claude at the folder of screenshots, uploaded the step-by-step instructions it needed to follow, provided some example screenshots, and gave it all the necessary permissions. I set Claude to work and sat back, waiting for my perfectly annotated screenshots to be generated.

Once again, I was hugely disappointed. The arrows often partially obscured the objects that they were meant to point to, pointed past them, or had ridiculously long tails. The icing on the cake was that Claude saved its annotated files over the top of the ones I’d generated with ChatGPT, completely destroying those files.

I tried refining my prompts and giving clearer instructions. Using Opus instead of Sonnet, I was able to get some of the results to be half-decent. The problem was that it wasn’t reliable; I could never manage to get a full set of images that didn’t need some kind of fixing, which completely defeats the purpose.

AI still can’t do everything perfectly

AI is really useful, but there are still things it can’t do without significant human review or correction. The problem is that when AI makes mistakes, fixing those mistakes can often take longer than just doing the job myself. For now, I’ll have to stick to annotating screenshots myself. Maybe Fable 5 can pull it off if we ever see it in action again.

Source link

Stephan Dorsey

Stephan is the sports journalist for the Maple Grove Report.

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Windows dark mode wasn’t dark enough, so I did this instead

Fortinet Warned as Three Critical FortiSandbox Bugs Come Under Attack

June 16, 2026

CISCO Catalyst SD-WAN Flaw Under Active Targeted Exploitation

June 16, 2026

U.S. CISA adds Cisco Catalyst and LiteSpeed cPanel plugin flaws to its Known Exploited Vulnerabilities catalog

June 16, 2026

Leaving no-one behind – the Winter Surge project

Fortinet Warned as Three Critical FortiSandbox Bugs Come Under Attack

June 16, 2026

CISCO Catalyst SD-WAN Flaw Under Active Targeted Exploitation

June 16, 2026

U.S. CISA adds Cisco Catalyst and LiteSpeed cPanel plugin flaws to its Known Exploited Vulnerabilities catalog

June 16, 2026

How to use Slicers in Excel (and why they’re better than filters)

Fortinet Warned as Three Critical FortiSandbox Bugs Come Under Attack

June 16, 2026

CISCO Catalyst SD-WAN Flaw Under Active Targeted Exploitation

June 16, 2026

U.S. CISA adds Cisco Catalyst and LiteSpeed cPanel plugin flaws to its Known Exploited Vulnerabilities catalog

June 16, 2026

Recent Reviews

Leaving no-one behind – the Winter Surge project

Reaching people who have been let down so many times they’ve stopped expecting anything different takes time, consistency, and trust. The Winter Surge project does all these things and more.

Running every November to March for the past four years, the Winter Surge project – part of our Higher Needs Floating Support service – provides high support temporary accommodation for 17 beds, daily welfare checks, and intensive, trauma-informed care for Bristol’s most entrenched rough sleepers.

Commissioned by Bristol City Council as part of its cold weather provision, it brings together a powerful network of partners including St Mungo’s Outreach, Social Care, Homeless Health, drug and alcohol services and housing providers.

Team Manager Sam Scott has been involved in shaping the project from the start – from planning how it works and selecting temporary accommodation providers, to troubleshooting, managing risk, and feeding back learning to improve the service year-on-year. She says it has been a privilege:

“Bristol City Council gave me the opportunity to run Winter Surge and the autonomy to shape it into what it’s become. From the planning stages right through to being on the ground – it’s an extraordinary project to be part of.”

A landmark year

This winter, 42 people came into the service and not one of them went back to the streets. This is the result of a small, skilled team of support workers focused on stabilisation, move-on planning, and wrap-around support covering mental health, safeguarding, benefits, addiction, and wellbeing. After the project ended on 31 March, the wider team makes sure clients move on from the service smoothly with no gap in care.

There are some truly amazing personal stories hidden behind the headline numbers. Four clients who had resisted support for years agreed to come in and stayed for the full duration. One man, who had been living with undiagnosed cancer for over three years, was supported by the team to access hospital treatment. He has now had two major operations and is receiving ongoing care. Sam said:

“It’s our patient, trauma-informed relationship building that makes all the difference. I’m so proud of the team and the work we’ve done, particularly this year when not one person went back onto the streets.”

Building trust where it’s been broken

At the heart of the Winter Surge is a commitment to breaking the cycle that sees the most vulnerable people going through many services and feeling constantly let down. The project successfully reduced evictions, improved access to housing, rebuilt confidence in receiving support, and promoted a My Team Around Me approach, ensuring every agency took genuine ownership of their role in a client’s journey.

This is what person-centred, trauma-informed care looks like in practice, and this year it worked for every single person who walked through the door.

Image L-R: Amy O’Loughlin, Sam Scott, Emma Ireland

Source link