The Raspberry Pi can now run local AI models that actually work


When you think of AI models, especially large language models, you probably imagine big data centers guzzling thousands of watts of power, or big expensive GPUs with enough VRAM to equal the GDP of a small nation.

What you don’t think about are cheap little single-board computers like the Raspberry Pi. Yet there are people (as reported in Tom’s Hardware) running LLMs on late ’90s PCs that are far less powerful. So clearly there’s something going on here. The truth is that even low-end devices like a Raspberry Pi can run some subset of the AI models out there, but how useful that ends up being is debatable.

From novelty to viability: local AI on a $100 computer

So cheap it’s amazing it works

There has been an ongoing effort to shrink LLMs down without losing too much capability. One of the big hurdles to running an LLM is not necessarily processing power, but being able to fit the model into memory in the first place. The Raspberry Pi 5 tops out at 16GB of RAM, and most people probably have the smaller 8GB version or less.

Using a technique known as quantization the precision of the weighted values in the LLM is reduced. Since there are billions of these values, slashing each one’s precision (its value becomes more approximate) has a big effect on the amount of space the model takes up.

Surprisingly, while this does make the model worse when it comes to output quality, the drop is not always proportional to the reduction in size. This means that the model may still be good enough for your needs, but now requires much less memory and processing power.

Real models you can actually run (and use)

It’s not just theoretical

Quantized versions of models like Llama 3, Mistral, and Qwen are commonly used on Pi hardware.

“Tiny” models that range between 1Bn and 3Bn parameters run comfortably on a Pi 5, and with careful tuning and managed expectations, small models with around 7Bn parameters seem usable too on an 8GB Pi 5.

For example, according to this LinkedIn Post the author used llama.cpp to run a Qwen coding assistant.

Performance is limited yet practical

Good enough for the right job

While some models will fit in the limited memory of a Raspberry Pi 5 8GB, the elephant in the room is still processing power. Although the Pi 5 is remarkably powerful for its size and power consumption, especially if you fit it with better cooling, it’s still really only an entry-level computer in the greater scheme of things.

Based on the different benchmarks I’ve seen, a stock Pi 5 is always going to give you a token rate ranging from a fraction of a token to the high single digits. That can be usable in many cases depending on the job.

If you’re going to leave a model running overnight working on a problem, then a lack of real-time response is less of an issue, and simple real-time AI use is still on the table.

raspberry pi 5-1

Brand

Raspberry Pi

Storage

8GB

CPU

Cortex A7

Memory

8GB

Operating System

Raspbian

Ports

4 USB-A

It’s only recommended for tech-savvy users, but the Raspberry Pi 5 is a tinkerer’s dream. Cheap, highly customizable, and with great onboard specs, it’s a solid base for your next mini PC.


Hardware and ecosystem upgrades are accelerating progress

You can build it better

So far, I’ve been referring to the stock Raspberry Pi you get out of the box, other than adding an air-cooler, but it doesn’t have to stop there. There’s a series of official AI “HATs” that add a neural processor to your Pi, significantly increasing its performance when running models.

Yes, it costs more than the Pi itself in many cases, but that’s still a very low total cost of ownership for a local, private AI.

If we’re spending money to upgrade our Pis for AI performance, then there’s also the option of using an eGPU as Jeff Geerling did in this video.

Now the model is running on the GPU and you’ll get commensurate performance. But, can we really still say we’re running local AI on a Raspberry Pi at this point? I’d say absolutely yes. This is much cheaper than having a whole traditional computer built around that GPU just for local AI, and the Raspberry Pi is doing all the coordination and support the GPU can’t because it’s not a complete general-purpose computer.


Building your bot army

As models become more efficient and single-board computers become more powerful, there’s no doubt we’ll be making much more use of localized AI that doesn’t need a distant data center or gallons of water to run. I can’t wait to make my own distributed Jarvis or perhaps a robot that only has one job: to pass butter.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


As I’m writing this, NVIDIA is the largest company in the world, with a market cap exceeding $4 trillion. Team Green is now the leader among the Magnificent Seven of the tech world, having surpassed them all in just a few short years.

The company has managed to reach these incredible heights with smart planning and by making the right moves for decades, the latest being the decision to sell shovels during the AI gold rush. Considering the current hardware landscape, there’s simply no reason for NVIDIA to rush a new gaming GPU generation for at least a few years. Here’s why.

Scarcity has become the new normal

Not even Nvidia is powerful enough to overcome market constraints

Global memory shortages have been a reality since late 2025, and they aren’t just affecting RAM and storage manufacturers. Rather, this impacts every company making any product that contains memory or storage—including graphics cards.

Since NVIDIA sells GPU and memory bundles to its partners, which they then solder onto PCBs and add cooling to create full-blown graphics cards, this means that NVIDIA doesn’t just have to battle other tech giants to secure a chunk of TSMC’s limited production capacity to produce its GPU chips. It also has to procure massive amounts of GPU memory, which has never been harder or more expensive to obtain.

While a company as large as NVIDIA certainly has long-term contracts that guarantee stable memory prices, those contracts aren’t going to last forever. The company has likely had to sign new ones, considering the GPU price surge that began at the beginning of 2026, with gaming graphics cards still being overpriced.

With GPU memory costing more than ever, NVIDIA has little reason to rush a new gaming GPU generation, because its gaming earnings are just a drop in the bucket compared to its total earnings.

NVIDIA is an AI company now

Gaming GPUs are taking a back seat

A graph showing NVIDIA revenue breakdown in the last few years. Credit: appeconomyinsights.com

NVIDIA’s gaming division had been its golden goose for decades, but come 2022, the company’s data center and AI division’s revenue started to balloon dramatically. By the beginning of fiscal year 2023, data center and AI revenue had surpassed that of the gaming division.

In fiscal year 2026 (which began on July 1, 2025, and ends on June 30, 2026), NVIDIA’s gaming revenue has contributed less than 8% of the company’s total earnings so far. On the other hand, the data center division has made almost 90% of NVIDIA’s total revenue in fiscal year 2026. What I’m trying to say is that NVIDIA is no longer a gaming company—it’s all about AI now.

Considering that we’re in the middle of the biggest memory shortage in history, and that its AI GPUs rake in almost ten times the revenue of gaming GPUs, there’s little reason for NVIDIA to funnel exorbitantly priced memory toward gaming GPUs. It’s much more profitable to put every memory chip they can get their hands on into AI GPU racks and continue receiving mountains of cash by selling them to AI behemoths.

The RTX 50 Super GPUs might never get released

A sign of times to come

NVIDIA’s RTX 50 Super series was supposed to increase memory capacity of its most popular gaming GPUs. The 16GB RTX 5080 was to be superseded by a 24GB RTX 5080 Super; the same fate would await the 16GB RTX 5070 Ti, while the 18GB RTX 5070 Super was to replace its 12GB non-Super sibling. But according to recent reports, NVIDIA has put it on ice.

The RTX 50 Super launch had been slated for this year’s CES in January, but after missing the show, it now looks like NVIDIA has delayed the lineup indefinitely. According to a recent report, NVIDIA doesn’t plan to launch a single new gaming GPU in 2026. Worse still, the RTX 60 series, which had been expected to debut sometime in 2027, has also been delayed.

A report by The Information (via Tom’s Hardware) states that NVIDIA had finalized the design and specs of its RTX 50 Super refresh, but the RAM-pocalypse threw a wrench into the works, forcing the company to “deprioritize RTX 50 Super production.” In other words, it’s exactly what I said a few paragraphs ago: selling enterprise GPU racks to AI companies is far more lucrative than selling comparatively cheaper GPUs to gamers, especially now that memory prices have been skyrocketing.

Before putting the RTX 50 series on ice, NVIDIA had already slashed its gaming GPU supply by about a fifth and started prioritizing models with less VRAM, like the 8GB versions of the RTX 5060 and RTX 5060 Ti, so this news isn’t that surprising.

So when can we expect RTX 60 GPUs?

Late 2028-ish?

A GPU with a pile of money around it. Credit: Lucas Gouveia / How-To Geek

The good news is that the RTX 60 series is definitely in the pipeline, and we will see it sooner or later. The bad news is that its release date is up in the air, and it’s best not to even think about pricing. The word on the street around CES 2026 was that NVIDIA would release the RTX 60 series in mid-2027, give or take a few months. But as of this writing, it’s increasingly likely we won’t see RTX 60 GPUs until 2028.

If you’ve been following the discussion around memory shortages, this won’t be surprising. In late 2025, the prognosis was that we wouldn’t see the end of the RAM-pocalypse until 2027, maybe 2028. But a recent statement by SK Hynix chairman (the company is one of the world’s three largest memory manufacturers) warns that the global memory shortage may last well into 2030.

If that turns out to be true, and if the global AI data center boom doesn’t slow down in the next few years, I wouldn’t be surprised if NVIDIA delays the RTX 60 GPUs as long as possible. There’s a good chance we won’t see them until the second half of 2028, and I wouldn’t be surprised if they miss that window as well if memory supply doesn’t recover by then. Data center GPUs are simply too profitable for NVIDIA to reserve a meaningful portion of memory for gaming graphics cards as long as shortages persist.


At least current-gen gaming GPUs are still a great option for any PC gamer

If there is a silver lining here, it is that current-gen gaming GPUs (NVIDIA RTX 50 and AMD Radeon RX 90) are still more than powerful enough for any current AAA title. Considering that Sony is reportedly delaying the PlayStation 6 and that global PC shipments are projected to see a sharp, double-digit decline in 2026, game developers have little incentive to push requirements beyond what current hardware can handle.

DLSS 5, on the other hand, may be the future of gaming, but no one likes it, and it will take a few years (and likely the arrival of the RTX 60 lineup) for it to mature and become usable on anything that’s not a heckin’ RTX 5090.

If you’re open to buying used GPUs, even last-gen gaming graphics cards offer tons of performance and are able to rein in any AAA game you throw at them. While we likely won’t get a new gaming GPU from NVIDIA for at least a few years, at least the ones we’ve got are great today and will continue to chew through any game for the foreseeable future.



Source link