I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance


OpenAI gpt-5.5

ZDNET

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • GPT-5.5 delivers polished, useful answers across tasks.
  • Strong performance across writing, coding, and reasoning tasks.
  • Overeagerness hurts accuracy and instruction following.

OpenAI has released GPT-5.5, which can be reductively described as better and faster than GPT-5.4. The new large language model shows improvements in agentic coding, conceptual clarity, scientific research ability, and accuracy during knowledge work.

This release follows closely on the heels of the introduction of ChatGPT Images 2.0 earlier this week, which combines AI intelligence with image generation. And if it also feels like we just discussed the release of GPT-5.4, you’re not wrong.

Also: ChatGPT just made it easy to find and edit all the AI images you’ve ever generated

As the following chart shows, the release cadence for OpenAI releases has sped up dramatically, most likely because AI coding has significantly reduced OpenAI’s development time.

release-chart.png

David Gewirtz via ChatGPT Images/ZDNET

That chart was generated entirely by ChatGPT 5.5 Thinking using Images 2.0. All I did was tell the AI that I wanted to visualize the release cadence between GPT releases and wanted it presented in the ZDNET brand style. I also provided a PNG of the ZDNET logo.

The whole process, including some minor corrections, took less than 10 minutes. I have been researching data and creating professional-looking informational charts like this by hand since the invention of computer graphics. Something like this would take at least two hours to create, not 10 minutes.

Also: I got an early look at ChatGPT Images 2.0, and it’s impressive – with one exception

I have already done some testing of the Images 2.0 capabilities. I’ll be back with more next week. In this article, I’m focusing on GPT-5.5’s knowledge capabilities.

I ran GPT-5.5 through my 10-point testing process. I was both impressed and annoyed. The results were solid, but the model tended to be a little too exuberant, doing work I didn’t ask it to do.

Since GPT-5.5 is only available in paid tiers (Plus and above), I used ChatGPT Plus for my tests. Right now, my Plus account only shows GPT-5.5 available for the Thinking effort level in both Standard and Extended. I picked Standard Thinking. That’s the effort I used for these tests.

gpt-options

Screenshot by David Gewirtz/ZDNET

Let’s get started.

Test 1: Summarize a news story

  • Available points: 10
  • Awarded points: 5

This test looks at how well the AI can read a story on the web and explain it. I used Yahoo News because Yahoo doesn’t block AI access. I also looked for a story that’s as non-political as possible. Today, that meant I had to go a good way down the news page to find a story on the recent LaGuardia runway crash.

GPT-5.5 did correctly summarize the meat of the story, but it didn’t follow my instructions to use Yahoo News as the source. For GPT-5.2, I deducted one point because ChatGPT used information from Axios and Yahoo. This time, I took off five points, because it used information from AP, The Sun, Wall Street Journal, The Guardian, and even Wikipedia.

Also: I tested ChatGPT Plus vs. Gemini Pro to see which is better – and if it’s worth switching

If I had wanted a comprehensive news answer, that would have been fine. But the prompt specifically said to look at Yahoo News, and GPT-5.5 pretty much ignored that instruction.

There’s a big push from all the AI companies about running autonomous agents. But if even a simple summary prompt can’t be followed correctly, it does not give me confidence that it’s safe to let agents run wild on long-horizon projects. Just sayin’.

Test 2: Academic concept explanation

  • Available points: 10
  • Awarded points: 10

This challenge asked the AI to explain educational constructivism to a five-year-old. It tested how well the AI can research and report on a concept, and then adjust its explanation style to the desired target level.

GPT-5.5 provided a very clear answer that included an example that would be something a five-year-old could picture and understand. All 10 points were awarded.

Test 3: Math and analysis

  • Available points: 10
  • Awarded points: 10

This test was designed to test the AI’s math and pattern-recognition abilities. I passed the model a sequence of numbers. Those numbers were part of a math trope called the Fibonacci Sequence, but I didn’t tell the AI that.

When asked to fill in some numbers in the sequence, the AI had to understand the pattern and perform the calculations to provide the sequence. It did the math correctly.

Also: The best AI image generators of 2026: There’s only one clear winner now

The AI was also instructed to “explain your reasoning.” All I got back was, “The sequence is the Fibonacci sequence: each number is the sum of the two numbers before it.” This was a correct explanation and comparable to the results from earlier releases.

I awarded this test 10 points because, although brief, it was correct.

Test 4: Cultural discussion

  • Available points 10
  • Awarded points: 10

This test asked the AI to construct a case, form a coherent argument, and present an opinion on an issue that doesn’t have a definitive right or wrong answer. I asked, “Do you think social media has improved or worsened communication in society? Provide two reasons for your view.”

Interestingly, GPT-5.5 thought social media “has worsened communication overall.” I tended to agree. The model provided two solid reasons. The first was that it “often rewards speed and reaction over thoughtfulness.” The second was that social media “tends to create information bubbles.” For each reason, GPT-5.5 provided a supporting paragraph.

Also: How to switch from ChatGPT to Gemini

Both of those reasons were valid. It also shared a quick list of the positive benefits of social media, including helping people stay connected, organize for causes, and share information widely.

GPT-5.5 gave an answer that was concise, well-considered, and clear. It got 10 points for this test.

Test 5: Literary analysis

  • Available points: 10
  • Awarded points: 10

This approach tested the AI’s understanding of a piece of contemporary literature, the first Game of Thrones book, A Song of Ice and Fire. The test asked what the main themes are, and why they’re important.

GPT-5.5 gave me back a 632-word response that broke the book down into the following themes:

  • Power and its cost
  • The collapse of heroic fantasy ideals
  • Family, loyalty, and inherited conflict
  • Honor versus pragmatism
  • Identity and self-invention
  • The human cost of war
  • The danger of political distraction
  • Prophecy, religion, and uncertainty
  • Justice and revenge
  • The return of the ignored past

GPT-5.5 provided clear explanations for each theme, why it was included, how it related to the book, and what it meant to the overall series. It’s hard to be strictly objective with something like this, but I really got the feeling this was the most nuanced answer I’ve seen to this question from my various GPT version tests.

All 10 points were awarded.

Test 6: Travel itinerary

  • Available points: 10
  • Awarded points: 9

This test evaluated the AI’s knowledge of geographic regions and its ability to create a helpful travel itinerary based on specific interests. I asked it to plan a week-long vacation in Boston in March focused on technology and history.

Of all the times I’ve asked this question of AIs, GPT-5.5 produced the best version for points of interest and day schedules. The model didn’t just hit the major tourist landmarks; it also pointed out a nice mix of historical and tech points of interest. GPT-5.5 took into account that March is likely to be a bit unpleasant, so it mixed in both indoor and outdoor activities, including fallback plans.

While it did not recommend a wide range of eateries, GPT-5.5 did recommend Legal Seafoods, which is one of my personal favorite locations. The model lost a point because it made absolutely no reference to costs.

Also: I tried Personal Intelligence, and it was accurate (but unsettling)

I feel like GPT-5.5 really grokked (yes, I did that) what someone would want in an itinerary by providing a strong list of activities to get excited about. But the AI didn’t fulfill the travel advisor part of the process because it didn’t cover budgeting.

Test 7: Emotional support

  • Available points: 10
  • Awarded points: 10

The emotional support question asked for advice and words of encouragement for an upcoming job interview. I have to say I really liked this AI’s response.

The AI included some encouragement, like “The interview is not an interrogation. It’s a mutual fit conversation.” It also gave some practical advice. First, GPT-5.5 suggested preparing three stories the job seeker could use during the interview, one about solving a problem, one about working with others, and one about learning or recovering from something difficult.

The model gave a simple breathing exercise. It said that it’s okay to pause before answering a question. It was also encouraging, and the interview meant there was already something about the candidate that the hiring company found interesting.

Also: I tried Google Photos’ new AI Enhance tool: How it crops, relights, and fixes your shots

Good, solid, useful answers: 10 points.

Test 8: Translation and cultural relevance

  • Available points: 10
  • Awarded points: 9

My test prompt asked GPT-5.5 to translate a phrase from English to Latin and then explain the cultural relevance of Latin in today’s world.

The phrase I asked it to translate was, “The celebration will take place tomorrow in the town square.” GPT-5.5 gave me back two choices, “Celebratio cras in foro oppidi fiet,” and what it called a slightly more formal alternative, “Celebratio cras in foro publico oppidi habebitur.”

Also: This powerful Gemini setting made my AI results way more personal and accurate

The first version is a word-for-word translation of the requested phrase. But the second one translates back to English as, “The celebration will be held tomorrow in the town’s public forum,” which was not the phrase I asked for.

GPT-5.5 may have thought it was helpful to provide an additional variation, but for someone who doesn’t speak Latin, all the approach does is confuse the issue. Which is the Latin phrase that should be used? I’m deducting a point for overeagerness that doesn’t strictly follow the prompt.

As for the second half of the question, GPT-5.5 answered briefly, but accurately.

Test 9: Coding test

  • Available points: 10
  • Awarded points: 10

Chatbot coding test results are interesting. They’re different in nature from the types of results you get when testing coding agents like Codex or Claude Code.

Also: I used GPT-5.2-Codex to find a mystery bug and hosting nightmare – it was beyond fast

While the LLMs in the chatbots and coding agents are generally similar, I’ve found that the coding agents are considerably more accurate on requests than when running in the chatbots. I haven’t been able to get any of the AI companies to explain why, but I’m guessing it has something to do with how the two different tools allocate resources and training data.

The test case for this question was the second test in my coding metrics article, which asked the AI to clean up a buggy snippet of code for validating whether a dollar amount was properly entered into a field.

The AI passed this test. The only thing the AI did that could be an issue is denying correctness to a number that included a comma. But that’s actually still a safe response. If the user enters “1,000.00,” the AI returns false. It might take the user a second to try again with “1000.00,” but it won’t harm the system. 

GPT-5.5 got all 10 points for this test.

Test 10: Creative writing

  • Available points: 10
  • Awarded points: 10

This test is among the most fun in the entire question suite. It asked GPT-5.5 to write a story longer than 1,500 words, as described in the second prompt in this article. The aim was to explore the creativity and comprehensiveness of the chatbot’s answer.

Unlike the other tests, I ran this evaluation in Extended mode to see just how good the story could get. I’m not sure the AI took much advantage of this option, because it only ran for eight seconds. Still, it was frickin’ awesome.

GPT-5.5 gave me back 4,049 words, which I think is the longest story I have gotten back from an AI in all my tests of this particular challenge.

Also: How to shop with AI: 6 ways I find deals, price track, and let agents buy for me

I liked how GPT-5.5 opened the story by saying, “By the year 2339, most of Boston had become very good at pretending it was not old.” I was hooked.

I tried to get Voice Mode to read to me like a bedtime story. However, the AI first said the story was too long. It then offered to read the story to me section by section. When I agreed to that approach, nothing happened; it just hung. I’m not deducting points for that failure because it’s not part of the standard evaluation test, but it’s disappointing nonetheless.

Unfortunately, since I asked the AI to read the story via Voice Mode, I can’t share the output from within ChatGPT. What I didn’t know is that the three-dot icon after the response had a ‘Read aloud’ option, which probably would have worked.

read-aloud

Screenshot by David Gewirtz/ZDNET

That said, I copied the response to Google Docs, so you can still read it there, if you so wish.

Here are a few more quotes from the full response:

  • Jackson, who had clearly been waiting all his life to hear someone say “the one in the back” in a mysterious bookstore, looked radiant. Ophelia looked as though she was beginning to calculate exits.
  • “My dear,” Archibald said, “by 2339, evidence works however the wealthy can persuade it to.”
  • One stopped before Jackson: a slim manual bound in copper mesh titled The Gentleman’s Guide to Looking Ridiculous with Conviction. Jackson gasped. “I feel seen.”
  • This time, a small envelope slid out and landed in Archibald’s lap. It was addressed in his own hand. To myself, if I become insufferable.
  • The red door stood open behind them. Beyond it, the front of the shop looked warm, ordinary, and only mildly impossible.

I’ve given this writing assignment before, and in each incarnation it’s been impressive. But this output took the delightful cozy paranormality to an entirely new level. Enthusiastically 10 out of 10.

For kicks, I asked GPT-5.5 to “draw me a picture that perfectly illustrates this story in 16:9 aspect ratio.” Here’s what was returned:

bookstore.png

David Gewirtz via ChatGPT Images/ZDNET

The AI correctly illustrated all the characters to the point that I could identify each character. Jackson, mentioned above, is the guy with the hat. Archibald is the guy with the cane.

Overall test results

Overall, the tests can reward up to 100 points. The current version, GPT-5.5, scored 93. GPT 5.2 scored 92. GPT-5.1 scored 91. You might think this latest build would do better than a point or two improvement over the previous versions, but the model’s own overeagerness brought it down.

On the first test, the one asking about current news, I asked the AI to summarize one source. Instead, it looked for the same news from six separate sources. It overreached and lost points.

The same problem happened with the translation assignment. I asked GPT-5.5 to translate a sentence to another language, one I presumably don’t speak. It gave back two translations to choose from. Now, how is that helpful? If I don’t speak the language, how would I choose which translation I like better?

These two overzealous reactions lost the model six points. It would have scored a 99 (losing one point for skipping budget information on the travel question). But, instead, it scored a mere 93.

That said, I quite like this release. The answers were all good, notwithstanding the excessive enthusiasm. The ability to add relevant images, such as the infographic at the beginning and the bookstore illustration at the end, opens avenues for fun and work effectiveness.

I see no reason to recommend against GPT-5.5. I will be using the model as my default choice moving forward. Stay tuned, because I’ll be doing a lot more with the enhanced image features of Images 2.0 in ChatGPT with GPT-5.5.

Do you prefer a model that gives one exact answer or one that offers extra options? Let us know in the comments below.


You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.





Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


As I’m writing this, NVIDIA is the largest company in the world, with a market cap exceeding $4 trillion. Team Green is now the leader among the Magnificent Seven of the tech world, having surpassed them all in just a few short years.

The company has managed to reach these incredible heights with smart planning and by making the right moves for decades, the latest being the decision to sell shovels during the AI gold rush. Considering the current hardware landscape, there’s simply no reason for NVIDIA to rush a new gaming GPU generation for at least a few years. Here’s why.

Scarcity has become the new normal

Not even Nvidia is powerful enough to overcome market constraints

Global memory shortages have been a reality since late 2025, and they aren’t just affecting RAM and storage manufacturers. Rather, this impacts every company making any product that contains memory or storage—including graphics cards.

Since NVIDIA sells GPU and memory bundles to its partners, which they then solder onto PCBs and add cooling to create full-blown graphics cards, this means that NVIDIA doesn’t just have to battle other tech giants to secure a chunk of TSMC’s limited production capacity to produce its GPU chips. It also has to procure massive amounts of GPU memory, which has never been harder or more expensive to obtain.

While a company as large as NVIDIA certainly has long-term contracts that guarantee stable memory prices, those contracts aren’t going to last forever. The company has likely had to sign new ones, considering the GPU price surge that began at the beginning of 2026, with gaming graphics cards still being overpriced.

With GPU memory costing more than ever, NVIDIA has little reason to rush a new gaming GPU generation, because its gaming earnings are just a drop in the bucket compared to its total earnings.

NVIDIA is an AI company now

Gaming GPUs are taking a back seat

A graph showing NVIDIA revenue breakdown in the last few years. Credit: appeconomyinsights.com

NVIDIA’s gaming division had been its golden goose for decades, but come 2022, the company’s data center and AI division’s revenue started to balloon dramatically. By the beginning of fiscal year 2023, data center and AI revenue had surpassed that of the gaming division.

In fiscal year 2026 (which began on July 1, 2025, and ends on June 30, 2026), NVIDIA’s gaming revenue has contributed less than 8% of the company’s total earnings so far. On the other hand, the data center division has made almost 90% of NVIDIA’s total revenue in fiscal year 2026. What I’m trying to say is that NVIDIA is no longer a gaming company—it’s all about AI now.

Considering that we’re in the middle of the biggest memory shortage in history, and that its AI GPUs rake in almost ten times the revenue of gaming GPUs, there’s little reason for NVIDIA to funnel exorbitantly priced memory toward gaming GPUs. It’s much more profitable to put every memory chip they can get their hands on into AI GPU racks and continue receiving mountains of cash by selling them to AI behemoths.

The RTX 50 Super GPUs might never get released

A sign of times to come

NVIDIA’s RTX 50 Super series was supposed to increase memory capacity of its most popular gaming GPUs. The 16GB RTX 5080 was to be superseded by a 24GB RTX 5080 Super; the same fate would await the 16GB RTX 5070 Ti, while the 18GB RTX 5070 Super was to replace its 12GB non-Super sibling. But according to recent reports, NVIDIA has put it on ice.

The RTX 50 Super launch had been slated for this year’s CES in January, but after missing the show, it now looks like NVIDIA has delayed the lineup indefinitely. According to a recent report, NVIDIA doesn’t plan to launch a single new gaming GPU in 2026. Worse still, the RTX 60 series, which had been expected to debut sometime in 2027, has also been delayed.

A report by The Information (via Tom’s Hardware) states that NVIDIA had finalized the design and specs of its RTX 50 Super refresh, but the RAM-pocalypse threw a wrench into the works, forcing the company to “deprioritize RTX 50 Super production.” In other words, it’s exactly what I said a few paragraphs ago: selling enterprise GPU racks to AI companies is far more lucrative than selling comparatively cheaper GPUs to gamers, especially now that memory prices have been skyrocketing.

Before putting the RTX 50 series on ice, NVIDIA had already slashed its gaming GPU supply by about a fifth and started prioritizing models with less VRAM, like the 8GB versions of the RTX 5060 and RTX 5060 Ti, so this news isn’t that surprising.

So when can we expect RTX 60 GPUs?

Late 2028-ish?

A GPU with a pile of money around it. Credit: Lucas Gouveia / How-To Geek

The good news is that the RTX 60 series is definitely in the pipeline, and we will see it sooner or later. The bad news is that its release date is up in the air, and it’s best not to even think about pricing. The word on the street around CES 2026 was that NVIDIA would release the RTX 60 series in mid-2027, give or take a few months. But as of this writing, it’s increasingly likely we won’t see RTX 60 GPUs until 2028.

If you’ve been following the discussion around memory shortages, this won’t be surprising. In late 2025, the prognosis was that we wouldn’t see the end of the RAM-pocalypse until 2027, maybe 2028. But a recent statement by SK Hynix chairman (the company is one of the world’s three largest memory manufacturers) warns that the global memory shortage may last well into 2030.

If that turns out to be true, and if the global AI data center boom doesn’t slow down in the next few years, I wouldn’t be surprised if NVIDIA delays the RTX 60 GPUs as long as possible. There’s a good chance we won’t see them until the second half of 2028, and I wouldn’t be surprised if they miss that window as well if memory supply doesn’t recover by then. Data center GPUs are simply too profitable for NVIDIA to reserve a meaningful portion of memory for gaming graphics cards as long as shortages persist.


At least current-gen gaming GPUs are still a great option for any PC gamer

If there is a silver lining here, it is that current-gen gaming GPUs (NVIDIA RTX 50 and AMD Radeon RX 90) are still more than powerful enough for any current AAA title. Considering that Sony is reportedly delaying the PlayStation 6 and that global PC shipments are projected to see a sharp, double-digit decline in 2026, game developers have little incentive to push requirements beyond what current hardware can handle.

DLSS 5, on the other hand, may be the future of gaming, but no one likes it, and it will take a few years (and likely the arrival of the RTX 60 lineup) for it to mature and become usable on anything that’s not a heckin’ RTX 5090.

If you’re open to buying used GPUs, even last-gen gaming graphics cards offer tons of performance and are able to rein in any AAA game you throw at them. While we likely won’t get a new gaming GPU from NVIDIA for at least a few years, at least the ones we’ve got are great today and will continue to chew through any game for the foreseeable future.



Source link