Google’s TurboQuant compresses AI memory by 6x, rattles chip stocks


Google published a research blog post on Tuesday about a new compression algorithm for AI models. Within hours, memory stocks were falling. Micron dropped 3 per cent, Western Digital lost 4.7 per cent, and SanDisk fell 5.7 per cent, as investors recalculated how much physical memory the AI industry might actually need.

The algorithm is called TurboQuant, and it addresses one of the most expensive bottlenecks in running large language models: the key-value cache, a high-speed data store that holds context information so the model does not have to recompute it with every new token it generates. As models process longer inputs, the cache grows rapidly, consuming GPU memory that could otherwise be used to serve more users or run larger models. TurboQuant compresses the cache to just 3 bits per value, down from the standard 16, reducing its memory footprint by at least six times without, according to Google’s benchmarks, any measurable loss in accuracy.

The paper, which will be presented at ICLR 2026, was authored by Amir Zandieh, a research scientist at Google, and Vahab Mirrokni, a vice president and Google Fellow, along with collaborators at Google DeepMind, KAIST, and New York University. It builds on two earlier papers from the same group: QJL, published at AAAI 2025, and PolarQuant, which will appear at AISTATS 2026.

How it works

TurboQuant’s core innovation is eliminating the overhead that makes most compression techniques less effective than their headline numbers suggest. Traditional quantization methods reduce the size of data vectors but must store additional constants, normalization values that the system needs in order to decompress the data accurately. These constants typically add one or two extra bits per number, partially undoing the compression.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

TurboQuant avoids this through a two-stage process. The first stage, called PolarQuant, converts data vectors from standard Cartesian coordinates into polar coordinates, separating each vector into a magnitude and a set of angles. Because the angular distributions follow predictable, concentrated patterns, the system can skip the expensive per-block normalization step entirely. The second stage applies QJL, a technique based on the Johnson-Lindenstrauss transform, which reduces the small residual error from the first stage to a single sign bit per dimension. The combined result is a representation that uses most of its compression budget on capturing the original data’s meaning and a minimal residual budget on error correction, with no overhead wasted on normalization constants.

Google tested TurboQuant across five standard benchmarks for long-context language models, including LongBench, Needle in a Haystack, and ZeroSCROLLS, using open-source models from the Gemma, Mistral, and Llama families. At 3 bits, TurboQuant matched or outperformed KIVI, the current standard baseline for key-value cache quantization, which was published at ICML 2024. On needle-in-a-haystack retrieval tasks, which test whether a model can locate a single piece of information buried in a long passage, TurboQuant achieved perfect scores while compressing the cache by a factor of six. At 4-bit precision, the algorithm delivered up to an eight-times speedup in computing attention on Nvidia H100 GPUs compared to the uncompressed 32-bit baseline.

What the market heard

The stock reaction was swift and, in the view of several analysts, disproportionate. Wells Fargo analyst Andrew Rocha noted that TurboQuant directly attacks the cost curve for memory in AI systems. If adopted broadly, he said, it quickly raises the question of how much memory capacity the industry actually needs. But Rocha and others also cautioned that the demand picture for AI memory remains strong, and that compression algorithms have existed for years without fundamentally altering procurement volumes.

The concern is not unfounded, however. AI infrastructure spending is growing at extraordinary rates, with Meta alone committing up to $27 billion in a recent deal with Nebius for dedicated compute capacity, and Google, Microsoft, and Amazon collectively planning hundreds of billions in capital expenditure on data centres through 2026. A technology that reduces memory requirements by six times does not reduce spending by six times, because memory is only one component of a data centre’s cost. But it changes the ratio, and in an industry spending at this scale, even marginal efficiency gains compound quickly.

The efficiency question

TurboQuant arrives at a moment when the AI industry is being forced to confront the economics of inference. Training a model is a one-time cost, however enormous. Running it, serving millions of queries per day with acceptable latency and accuracy, is the recurring expense that determines whether AI products are financially viable at scale. The key-value cache is central to this calculation: it is the bottleneck that limits how many concurrent users a single GPU can serve and how long a context window a model can practically support.

Compression techniques like TurboQuant are part of a broader push toward making inference cheaper, alongside hardware improvements such as Nvidia’s Vera Rubin architecture and Google’s own Ironwood TPUs. The question is whether these efficiency gains will reduce the total amount of hardware the industry buys, or whether they will simply enable more ambitious deployments at roughly the same cost. The history of computing suggests the latter: when storage gets cheaper, people store more; when bandwidth increases, applications consume it.

For Google, TurboQuant also has a direct commercial application beyond language models. The blog post notes that the algorithm improves vector search, the technology that powers semantic similarity lookups across billions of items. Google tested it against existing methods on the GloVe benchmark dataset and found it achieved superior recall ratios without requiring the large codebooks or dataset-specific tuning that competing approaches demand. This matters because vector search underpins everything from Google Search to YouTube recommendations to advertising targeting, which is to say, it underpins Google’s revenue.

The paper’s contribution is real: a training-free compression method that achieves measurably better results than the existing state of the art, with strong theoretical foundations and practical implementation on production hardware. Whether it reshapes the economics of AI infrastructure or simply becomes one more optimisation absorbed into the industry’s insatiable appetite for compute is a question the market will answer over months, not hours.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


Google Maps has a long list of hidden (and sometimes, just underrated) features that help you navigate seamlessly. But I was not a big fan of using Google Maps for walking: that is, until I started using the right set of features that helped me navigate better.

Add layers to your map

See more information on the screen

Layers are an incredibly useful yet underrated feature that can be utilized for all modes of transport. These help add more details to your map beyond the default view, so you can plan your journey better.

To use layers, open your Google Maps app (Android, iPhone). Tap the layer icon on the upper right side (under your profile picture and nearby attractions options). You can switch your map type from default to satellite or terrain, and overlay your map with details, such as traffic, transit, biking, street view (perfect for walking), and 3D (Android)/raised buildings (iPhone) (for buildings). To turn off map details, go back to Layers and tap again on the details you want to disable.

In particular, adding a street view and 3D/raised buildings layer can help you gauge the terrain and get more information about the landscape, so you can avoid tricky paths and discover shortcuts.

Set up Live View

Just hold up your phone

A feature that can help you set out on walks with good navigation is Google Maps’ Live View. This lets you use augmented reality (AR) technology to see real-time navigation: beyond the directions you see on your map, you are able to see directions in your live view through your camera, overlaying instructions with your real view. This feature is very useful for travel and new areas, since it gives you navigational insights for walking that go beyond a 2D map.

To use Live View, search for a location on Google Maps, then tap “Directions.” Once the route appears, tap “Walk,” then tap “Live View” in the navigation options. You will be prompted to point your camera at things like buildings, stores, and signs around you, so Google Maps can analyze your surroundings and give you accurate directions.

Download maps offline

Google Maps without an internet connection

Whether you’re on a hiking trip in a low-connectivity area or want offline maps for your favorite walking destinations, having specific map routes downloaded can be a great help. Google Maps lets you download maps to your device while you’re connected to Wi-Fi or mobile data, and use them when your device is offline.

For Android, open Google Maps and search for a specific place or location. In the placesheet, swipe right, then tap More > Download offline map > Download. For iPhone, search for a location on Google Maps, then, at the bottom of your screen, tap the name or address of the place. Tap More > Download offline map > Download.

After you download an area, use Google Maps as you normally would. If you go offline, your offline maps will guide you to your destination as long as the entire route is within the offline map.

Enable Detailed Voice Guidance

Get better instructions

Voice guidance is a basic yet powerful navigation tool that can come in handy during walks in unfamiliar locations and can be used to ensure your journey is on the right path. To ensure guidance audio is enabled, go to your Google Maps profile (upper right corner), then tap Settings > Navigation > Sound and Voice. Here, tap “Unmute” on “Guidance Audio.”

Apart from this, you can also use Google Assistant to help you along your journey, asking questions about your destination, nearby sights, detours, additional stops, etc. To use this feature on iPhone, map a walking route to a destination, then tap the mic icon in the upper-right corner. For Android, you can also say “Hey Google” after mapping your destination to activate the assistant.

Voice guidance is handy for both new and old places, like when you’re running errands and need to navigate hands-free.

Add multiple stops

Keep your trip going

If you walk regularly to run errands, Google Maps has a simple yet effective feature that can help you plan your route in a better way. With Maps’ multiple stop feature, you can add several stops between your current and final destination to minimize any wasted time and unnecessary detours.

To add multiple stops on Google Maps, search for a destination, then tap “Directions.” Select the walking option, then click the three dots on top (next to “Your Location”), and tap “Edit Stops.” You can now add a stop by searching for it and tapping “Add Stop,” and swap the stops at your convenience. Repeat this process by tapping “Add Stops” until your route is complete, then tap “Start” to begin your journey.

You can add up to ten stops in a single route on both mobile and desktop, and use the journey for multiple modes (walking, driving, and cycling) except public transport and flights. I find this Google Maps feature to be an essential tool for travel to walkable cities, especially when I’m planning a route I am unfamiliar with.


More to discover

A new feature to keep an eye out for, especially if you use Google Maps for walking and cycling, is Google’s Gemini boost, which will allow you to navigate hands-free and get real-time information about your journey. This feature has been rolling out for both Android and iOS users.



Source link