I tested whether Gemini, ChatGPT, and Claude can analyze videos – this one wins


I tested whether Gemini, ChatGPT, and Claude can analyze videos - this one wins

David Gewirtz / Elyse Betters Picaro / ZDNET

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • Gemini can watch YouTube, MP4, and MOV files.
  • Claude still can’t process video directly.
  • ChatGPT needs Codex help for deeper video work.

AIs do a fine job understanding text from prompts and documents. Most do quite well interpreting images, but what about video? Can your favorite AI actually understand what’s in a video? If it does understand, what can you do with it?

Also: I tested ChatGPT Plus vs. Gemini Pro to see which is better – and if it’s worth switching

In this article, we test ChatGPT, Claude, and Gemini to see how well they grok the contents of videos, both from YouTube and local files. The results are surprising.

The tests

I fed each AI a set of three videos. One is a YouTube video I published last year about the scientific process of annealing (yes, I am as exciting on video as I am on ZDNET). I tested the AIs to see if they can understand what’s in the video. Then, I tried to see if they could create a better thumbnail than I used on my YouTube channel.

The second video is a motion test for the DJI Neo 2 drone. It’s just a video of me standing in front of the drone, using gestures to control how the drone flies. No audio. I wanted to see if the AIs understand what’s happening there. That’s in MP4 format.

Also: I tested ChatGPT vs. Claude to see which is better – and if it’s worth switching

Finally, I have the original MOV file that I uploaded to YouTube for a walk-and-talk about my YouTube posting strategy. I posted it last year. For my test, I used the local video file. It’s the same one that I actually uploaded to YouTube. I’m using the local version for my AI test, though, because I wanted to see how well the AIs could ascertain what I’m talking about without any metadata, transcripts, or hints provided by YouTube. It’s just the video itself. If you want to see the after-uploaded version, here’s a link.

I tested the latest and best models. I tested the $20-per-month ChatGPT Plus plan, the $20-per-month Gemini Pro plan, and the $100-per-month Claude Max plan, which I use for Claude Code.

My prompt was simple: “Can you watch this video?” I found that “watch this video” worked better than “understand” or “summarize,” because both of those sent the AIs looking for metadata. By contrast, when I asked the AIs to “watch” the videos, they didn’t all go looking for metadata. All three AIs seemed to know that “watch” meant I wanted them to actually watch the videos themselves and demonstrate an understanding of what was in them.

And with that, let’s dig in.

Claude: nope

Sometimes prolonging the pain is just cruel. Rip that Band-Aid off and get it over with. So, Claude is a big no. Neither the app nor the web interface can, well, let’s have it tell us, shall we?

Also: How I used Claude AI to plan an entire hiking trip to the Adirondacks in 30 minutes – for free

Claude says, “I can’t watch video content directly. I can’t watch videos. I don’t have the ability to process video or audio content from YouTube links. I don’t have the ability to process the visual or audio frames of an MP4 file. Same answer here — I can’t directly watch or listen to a .mov file. I don’t process video or audio streams.”

And so it goes.

Gemini: yep

Gemini does have a standalone app, but I didn’t need to run it. Gemini’s web interface did a great job understanding all the video formats.

It didn’t matter whether I fed Gemini a YouTube URL, a 625MB MP4 file, or a whopping 1.65GB MOV file. Gemini, right in a browser tab, could handle it.

Also: This powerful Gemini setting made my AI results way more personal and accurate

The most interesting one is my drone test. That’s because the video contains no audio (not even background noise) and no context other than me standing there and moving my arms.

yard

Drone shot by David Gewirtz/ZDNET

And yet, Gemini was able to ascertain exactly what I was doing:

In the video, you’re testing out some hand gestures — raising your palm to the camera as if signaling it to stop or move. The camera follows your lead, changing its angle and distance as you guide it through the yard and eventually back toward the house.

Looks like a successful test of those drone gestures! Is there something specific about the footage or the camera’s response you wanted to go over?

I mean, like, wow. Let’s be clear. The drone itself was not visible in the video. It was acting as the camera. I’m betting there are a lot of humans who wouldn’t understand what was happening there (I’m looking at you, my neighbors!), let alone an AI.

It did successfully understand my annealing video. It was able to identify sections, report on specific points I made verbally, and otherwise demonstrate its understanding.

It also understood the uploaded walk-and-talk video, not only identifying the location, but the various aspects of my commentary throughout the video.

Also: I tested ChatGPT Images 2.0 vs. Gemini Nano Banana to see which is better – this model wins

The one place Gemini fell down was in the transition from Gemini’s understanding mode to Nano Banana’s image mode. Despite Nano Banana’s ability to make awesome images, it doesn’t understand life and the world the way the new ChatGPT Images 2.0 does, and it shows.

I fed Gemini the original thumbnail for the video and told it, “Choose a single frame for the maximum impact as a YouTube thumbnail, then, based on context of the video and my existing YouTube thumbnail style, use Nano Banana to create a high-click-value thumbnail.”

The image on the left is my original thumbnail. The next two were Gemini’s attempts. They’re certainly vivid and might attract more clicks, but Gemini decided to make up a dude with a beard and place him in the image rather than use my suave and sophisticated visage. And it spelled “FIRE” as “FCIRE.” And so it goes.

gemini

Screenshot by David Gewirtz/ZDNET and Gemini

If you want to quickly get an understanding of a video or ask questions about it, Gemini is definitely the best way to go. If you want to create thoughtful imagery from the video, Gemini is not yet up to the task.

ChatGPT: sorta, with a bit more work

Then there’s ChatGPT, which is more like a bad news, good news story.

First, the bad news. ChatGPT itself failed my tests. It couldn’t read the YouTube link. And while it theoretically can understand videos, they need to be under 500MB in size. Mine were not.

Then, the good news. If you combine the OpenAI Codex app with ChatGPT, it can do some very cool stuff. Codex is the agentic workhorse of OpenAI’s offerings. I kind of liken Codex to that handy neighbor we all know who can build almost anything out of almost anything.

Also: I stopped using ChatGPT for everything: These AI models beat it at research, coding, and more

Codex was able to read both local files and understand their meaning. For the drone test, Codex reported:

It looks like a backyard drone test shot.

What’s happening:

  1. A person stands in a residential backyard and faces the camera/drone.
  2. They gesture a few times (including a hand raise/wave-like motion).
  3. The camera viewpoint moves around them over time, changing angle and distance while keeping them mostly centered.
  4. The sequence ends still in the same yard area, with no major scene change or additional people/activities.

Yep. That’s what it’s about.

Codex was also able to watch and understand the walk-and-talk MOV file I uploaded. Initially, it couldn’t process the content, so Codex asked permission to install some Python code and libraries to allow it to do audio transcription. Once it built all that out, it was able to not only view the video, but also understand its context.

But then Codex couldn’t watch the YouTube stream. OK, fine. This is Codex. So, I asked, “Can you download the full video and then work on it locally?”

That worked. It automagically wrote a Python script, installed some libraries, helped invent video-downloading technology on the fly, and then watched my YouTube video.

But then I wanted it to create a thumbnail. I first asked if it had access to ChatGPT Images 2.0 (remember, they’re both OpenAI tools). It responded, “I have access to image generation tools in this session, but I don’t have a tool explicitly labeled Images 2.0 exposed to me.”

Also: I tested ChatGPT and Perplexity AI as my CarPlay voice assistants – both made Siri look bad

Let’s just ignore the unfortunate “exposed to me” phrasing. I had to explain to Codex that Images 2.0 was a thing, and point it to OpenAI’s site for it to understand. At that point, the agentic tool was aware of the images tool, but still couldn’t do much with it.

So, that’s when I acted as the conduit between Codex and ChatGPT. I told Codex, “Choose a single frame for the maximum impact as a YouTube thumbnail, export that thumbnail somewhere so ChatGPT can get to it, or so I can upload it to ChatGPT, and then, based on context of the video and my existing YouTube thumbnail style, write a prompt for ChatGPT to create a high-click-value thumbnail.”

Then, in ChatGPT, I uploaded the original thumbnail image I showed you earlier, and the frame that Codex chose. I then pasted in the prompt Codex created. As you can see, Codex/ChatGPT got more right than Gemini did.

chatgpt-thumb.png

Screenshot via ChatGPT and Codex by David Gewirtz/ZDNET

It picked up on the white, yellow, and black color scheme for my lettering. It didn’t include my logo, and it didn’t include the yellow stripe I use for my titles, but I actually quite like the juxtapositioning of my picture over the torch flame. ChatGPT and Codex actually used my image, unlike Gemini. But I do take issue with the aluminum bar. I used flat material. For some reason, the OpenAI tools decided to make it into square tubing.

Also: I tried ChatGPT Images 2.0: A fun, huge leap – and surprisingly useful for real work

Here’s where ChatGPT’s better image knowledge comes into play. I corrected it on the tubing vs. bar situation and asked it to regenerate. I prompted, “That aluminum is flat bar material about 1/8-inch thick, not square tubing. Keep everything else, but please fix the aluminum.”

chatgpt-2.png

Screenshot via ChatGPT and Codex by David Gewirtz/ZDNET

We were close. I didn’t like how it placed the Sharpie marks (which are used to tell when the metal is hot enough to bend), and the actual bend was far too sharp. One more prompt: “Good, but the bend is too sharp. It’s not a perfect right angle. There’s a curve because the aluminum needs to flex as it bends. Please revise. Also, the sharpie marks are perpendicular to the edge of the bar, not on an angle. They indicate where to bend.”

chatgpt-3.png

Screenshot via ChatGPT and Codex by David Gewirtz/ZDNET

That’s good enough. I think it’s possible to feed Codex and ChatGPT a video with no additional context and get out a YouTube thumbnail. You could probably use it to analyze other types of videos and produce images from those as well.

It’s not super-convenient, but it does work pretty well.

AI can indeed watch video

There are few things to note. First, the AI was able to fully interpret the videos in a much shorter time than their actual play time. Both the science video and the walk-and-talk are about 15 minutes long, but both Gemini and ChatGPT were able to “watch” and parse them for understanding in what I would say was about two or three minutes each.

Second, both show fairly powerful interpretation skills. I found their ability to understand that the silent video I gave them was a drone test to be rather impressive. The drone mostly stayed at human height, yet they were both able to ascertain context from the frames in the video.

There are certainly some practical uses. I gave Gemini a YouTube video of a CBS report on the OpenAI trial and asked it to provide me details about what was discussed. I can definitely see using it to scan through security camera video to find a specific type of action quickly.

I can also definitely see giving the AI a longer video and having it pull out the major points. What was particularly useful is that Gemini time-stamped each of the key thoughts, so I could just click the time stamps and drop into the video at that point.

Then, of course, there’s the actual use of these tools to create YouTube thumbnails. I still prefer to do it by hand. But the fact that these AIs can extract usable frames and construct thumbnails means that creators have a new tool at their disposal.

Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic

Overall, I’m impressed with Gemini and the pairing of ChatGPT and Codex for video-watching ability. Isn’t it interesting that Gemini doesn’t need two tools (after all, it is called “Gemini”), but ChatGPT needs Codex? Things like this amuse me.

Even though Claude bombed at this test, it still has value. Claude is one of my favorites for vibe coding.

What productivity benefits can you see getting from the video-watching capabilities of these AIs? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.





Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


The start of a new year always brings a fresh burst of energy. Whether you’re looking to scale your operations or reclaim your weekends, January is the perfect time to audit the “manual” work you’re putting in and replace it with the “meaningful”.

We’ve heard from our customers that connecting apps to Xero is a critical piece of the puzzle when looking to save time, grow their business or simply make day-to-day work life easier. We know that as many as 78% of Xero customers have already gained efficiencies thanks to apps. 

That’s why we’ve launched our 2026 Xero App Store Power Lists. We’ve done the heavy lifting to bring you 10 collections of the most popular apps across a diverse range of categories to help you power up your new year. Thousands of your peers are already using these tools to drive to their success – it’s time to join them.

Meet the Xero App Store Power Lists

Select the statement below that most applies to you and your business to find the right list to start with. 

I am not sure where to start with apps

Check out some of the tools that are the most popular overall among Xero customers, and those that have been recognised for their stand-out features. 

  • The Standard Setters: Explore the most popular apps overall, used by thousands of Xero customers to deliver automation across payments, reporting, and payroll.
  • The Trophy Cabinet: See the winners of the Xero Global App Awards, recognised for excellence in design, functionality and customer satisfaction.

I have a specific problem to solve

If you’ve already identified a key time drain or challenge in your business, explore these lists of popular apps catered for some of the top business needs. 

  • The Spend Streamliners: Eliminate manual data entry for receipts and bills using AI-powered automation.
  • The Team Timekeepers: Swap spreadsheets for reliable software to ensure easy timesheets, accurate payroll and simple project costing.
  • The Job Flow Favourites: Automate quoting, billing and project tracking for stronger cash flow and smoother projects. 

I want to know what my peers are using

Is your efficiency on par with your peers? Check out the most popular apps that others in your industry are using to ensure you have the right tools for your unique trade:

  • The Retailer’s Top Shelf: Manage multi-channel sales and automate inventory with tools like A2X and Cin7 Core.
  • The Trades Toolkit: Simplify construction and trades job management and on-site invoicing with favourites like ServiceM8 and Tradify.
  • The Pure Professionals: Streamline invoicing, client management and more at your agency or firm with apps like GoCardless and Paidnice.
  • The Advisor’s Arsenal: Practice management software like Ignition and ContentSnare can help drive your accounting firm’s efficiency.

I want to know what’s new

  • The Rising Stars: Check out the top new and noteworthy apps to find modern software like generative AI business companions and real-time predictive analytics.

How to choose your 2026 toolkit

You don’t need to adopt 10 apps at once. To find the right fit, start with these three steps:

  1. Identify your biggest headache: Is it chasing late payments? Manually entering receipts? Pick one pain point to solve first.
  2. Look for the “Aha!” moment: Choose apps that provide insights, not just data. For example, use cash flow forecasting to plan for project gaps.
  3. Talk to your advisor: Your accountant or bookkeeper knows your business and can give you tailored advice.

Why your app stack is your competitive edge in 2026

In 2026, the most successful businesses aren’t just working harder; they’re working smarter by using automation to predict needs before they even arise.

Connecting the right apps to Xero doesn’t just save time on data entry – it transforms your business into a high-performing powerhouse, with benefits flowing on to your employees, customers and bottom line. 

Ready to power up?

Don’t let manual busywork slow down your momentum this year. Whether you’re automating your first quote or building an AI-powered workflow, the Xero App Store is here to help you make 2026 your most efficient year yet.

Explore all Power Lists in the Xero App Store

Was this article helpful?

YesNo



Source link