There are more AI health tools than ever—but how well do they work?


Singhal, the OpenAI health lead, notes that the company’s current GPT-5 series of models, which had not yet been released when the original HealthBench study was conducted, do a much better job of soliciting additional information than their predecessors. However, OpenAI has reported that GPT-5.4, the current flagship, is actually worse at seeking context than GPT-5.2, an earlier version.

Ideally, Bean says, health chatbots would be subjected to controlled tests with human users, as they were in his study, before being released to the public. That might be a heavy lift, particularly given how fast the AI world moves and how long human studies can take. Bean’s own study used GPT-4o, which came out almost a year ago and is now outdated. 

Earlier this month, Google released a study that meets Bean’s standards. In the study, patients discussed medical concerns with the company’s Articulate Medical Intelligence Explorer (AMIE), a medical LLM chatbot that is not yet available to the public, before meeting with a human physician. Overall, AMIE’s diagnoses were just as accurate as physicians’, and none of the conversations raised major safety concerns for researchers. 

Despite the encouraging results, Google isn’t planning to release AMIE anytime soon. “While the research has advanced, there are significant limitations that must be addressed before real-world translation of systems for diagnosis and treatment, including further research into equity, fairness, and safety testing,” wrote Alan Karthikesalingam, a research scientist at Google DeepMind, in an email. Google did recently reveal that Health100, a health platform it is building in partnership with CVS, will include an AI assistant powered by its flagship Gemini models, though that tool will presumably not be intended for diagnosis or treatment.

Rodman, who led the AMIE study with Karthikesalingam, doesn’t think such extensive, multiyear studies are necessarily the right approach for chatbots like ChatGPT Health and Copilot Health. “There’s lots of reasons that the clinical trial paradigm doesn’t always work in generative AI,” he says. “And that’s where this benchmarking conversation comes in. Are there benchmarks [from] a trusted third party that we can agree are meaningful, that the labs can hold themselves to?”

They key there is “third party.” No matter how extensively companies evaluate their own products, it’s tough to trust their conclusions completely. Not only does a third-party evaluation bring impartiality, but if there are many third parties involved, it also helps protect against blind spots.

OpenAI’s Singhal says he’s strongly in favor of external evaluation. “We try our best to support the community,” he says. “Part of why we put out HealthBench was actually to give the community and other model developers an example of what a very good evaluation looks like.” 

Given how expensive it is to produce a high-quality evaluation, he says, he’s skeptical that any individual academic laboratory would be able to produce what he calls “the one evaluation to rule them all.” But he does speak highly of efforts that academic groups have made to bring preexisting and novel evaluations together into comprehensive evaluations suites—such as Stanford’s MedHELM framework, which tests models on a wide variety of medical tasks. Currently, OpenAI’s GPT-5 holds the highest MedHELM score.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


After being teased in the second beta, the new “Bubbles” feature is finally available in Android 17 Beta 3. This is the biggest change to Android multitasking since split-screen mode. I had to see how it worked—come along with me.

Now, it should be mentioned that this feature will probably look a bit familiar to Samsung Galaxy owners. One UI also allows for putting apps in floating windows, and they minimize into a floating widget. However, as you’ll see, Google’s approach is more restrained.

App Bubbles in Android 17

There’s a lot to like already

First and foremost, putting an app in a “Bubble” allows it to be used on top of whatever’s happening on the screen. The functionality is essentially identical to Android’s older feature of the exact same name, but now it can be used for apps in addition to messaging conversations.

To bubble an app, simply long-press the app icon anywhere you see it. That includes the home screen, app drawer, and the taskbar on foldables and tablets. Select “Bubble” or the small icon depicting a rectangle with an arrow pointing at a dot in the menu.

Bubbles on a phone screen

The app will immediately open in a floating window on top of your current activity. This is the full version of the app, and it works exactly how it would if you opened it normally. You can’t resize the app bubble, but on large-screen devices, you can choose which side it’s on. To minimize the bubble, simply tap outside of it or do the Home gesture—you won’t actually go to the Home Screen.

Multiple apps can be bubbled together—just repeat the process above—but only one can be shown at a time. This is a key difference compared to One UI’s pop-up windows, which can be resized and tiled anywhere on the screen. Here is also where things vary depending on the type of device you’re using.

If you’re using a phone, the current bubbled apps appear in a row of shortcuts above the window. Tap an app icon, and it will instantly come into view within the bubble. On foldables and tablets, the row of icons is much smaller and below the window.

Another difference is how the app bubbles are minimized. On phones, they live in a floating app icon (or stack of icons) on the edge of the screen. You are free to move this around the screen by dragging it. Tapping the minimized bubble will open the last active app in the bubble. On foldables and tablets, the bubble is minimized to the taskbar (if you have it enabled).

Bubbles on a foldable screen

Now, there are a few things to know about managing bubbles. First, tapping the “+” button in the shortcuts row shows previously dismissed bubbles—it’s not for adding a new app bubble. To dismiss an app bubble, you can drag the icon from the shortcuts row and drop it on the “X” that appears at the bottom of the screen.

To remove the entire bubble completely, simply drag it to the “X” at the bottom of the screen. On phones, there’s also an extra “Manage” button below the window with a “Dismiss bubble” option.

Better than split-screen?

Bubbles make sense on smaller screens

That’s pretty much all there is to it. As mentioned, there’s definitely not as much freedom with Bubbles as there is with pop-up windows in One UI. The latter allows you to treat apps like windows on a computer screen. Bubbles are a much more confined experience, but the benefit is that you don’t have to do any organizing.

Samsung One UI pop-up windows

Of course, Android has supported using multiple apps at once with split-screen mode for a while. So, what’s the benefit of Bubbles? On phones, especially, split-screen mode makes apps so small that they’re not very useful.

If you’re making a grocery list while checking the store website, you’re stuck in a very small browser window. Bubbles enables you to essentially use two apps in full size at the same time—it’s even quicker than swiping the gesture bar to switch between apps.

If you’d like to give App Bubbles a try, enroll your qualified Pixel phone in the Android Beta Program. The final release of Android 17 is only a few months away (Q2 2026), but this is an exciting feature to check out right now.

A desktop setup featuring an Android phone, monitor, and mascot, surrounded by red 'missing' labels


Android’s new desktop mode is cool, but it still needs these 5 things

For as long as Android phones have existed, people have dreamed of using them as the brains inside a desktop computing setup. Samsung accomplished this nearly a decade ago, but the rest of the Android world has been left out. Android 17 is finally changing that with a new desktop mode, and I tried it out.



Source link