This startup’s new mechanistic interpretability tool lets you debug LLMs


Mapping models

Silico lets you zoom in on specific parts of a trained model, such as individual neurons or groups of neurons, and run experiments to see what those neurons do. (Assuming you have access to the model’s inner workings. Most people won’t be able to use Silico to poke around inside ChatGPT or Gemini, but you can use it to look at the parameters inside many open-source models.) You can then check what inputs make different neurons fire, and trace pathways upstream and downstream of a neuron to see how other neurons affect it and how it affects other neurons in turn.

For example, Goodfire found one neuron inside the open-source model Qwen 3 that was associated with the so-called trolley problem. Activating this neuron changed the model’s responses, making it frame its outputs as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” says Ho.

Pinpointing the source of odd behavior like this is now pretty standard practice. But Goodfire wants to make it easier to adjust that behavior. Using Silico, developers can now adjust the parameters connected to individual neurons to boost or suppress certain behaviors.

In another example, Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing the negative business impact of such a disclosure.

By looking inside the model, the researchers found that boosting neurons that were found to be associated with transparency and disclosure flipped the answer from no to yes nine out of 10 times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” says Ho.

Tweaking the values of a model in this way is just one approach. Silico can also help steer the training process by filtering out certain training data to avoid setting unwanted values for certain parameters in the first place.   

For example, many models will tell you that 9.11 is greater than 9.9. Looking inside a model to see what’s going on might reveal that it is being influenced by neurons associated with the Bible, in which verse 9.9 comes before 9.11, or by code repositories where consecutive updates are numbered 9.9, 9.10, 9.11 and so on. Using this information, the model can be retrained to make it avoid its “Bible” neurons when doing math.

By releasing Silico, Goodfire wants to put techniques previously available to a few top labs into the hands of smaller firms and research teams that want to build their own model or adapt an open-source one. The tool will be available for a fee determined on a case-by-case basis according to customers’ requirements (Goodfire declined to give specific pricing details).



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


If you’ve bought a new Raspberry Pi, or just got your hands on an older model that someone else didn’t want, there are many ways to put that little computer to good use, and here are six of them.

Retro gaming galore

Recalbox running on a Raspberry Pi 500+. Credit: Tim Brookes / How-To Geek

One of the most popular uses for Raspberry Pi computers is as a retro gaming emulation system. Which systems can be emulated depends on which specific model of Pi you have, but even the oldest ones can do a great job with retro 8-bit and 16-bit titles, or MAME arcade titles. In fact, building your own arcade cabinet with a Pi at its heart is a common project, and you’ll find lots of instructional guides on the web to that effect.

8bitdo arcade stick for Nintendo Switch.

8/10

Number of Colors

1

Control Types

Arcade Stick


Build your own NAS

A Raspberry Pi configured as a NAS. Credit: Raspberry Pi Foundation

A NAS or Network-Attached Storage device is effectively a local file server that lets you store and access data on your local network using hard drives. You can go out and buy a NAS or you can follow the official Raspberry Pi NAS tutorial and turn your old USB hard drives into a NAS using stuff you already have, or can get for just a few dollars.

Everyone loves local streaming tools like Plex or Jellyfin, but not everyone wants to dedicate an expensive computer to act as the streaming server. Well, as long as your requirements aren’t too fancy, you can use a Raspberry Pi as a Plex server.

Just don’t expect it to handle heavy-duty transcoding. The good news is that most of your client devices can probably play back videos without the need for transcoding.

Turn your Pi into a home automation hub

The Home Assistant Green smart home hub surrounded by smart home devices. Credit: home-assistant.io

Home automation hub devices can cost hundreds of dollars, but if you have an old Raspberry Pi, you can run your smart home off it. The most common and effective solution is an open-source app called Home Assistant.

Raspberry Pi logo above a photo of Raspberry Pi boards.


I Run My Smart Home Off a Raspberry Pi, Here’s How It Works

Make your home smarter on a budget with a Raspberry Pi.

Build a weather station

If you’re interested in the weather, want to contribute to weather data, or are just sick of getting rained on when you least expect it, you have the option of getting a weather station kit for your Raspberry Pi or using something like the Raspberry Pi Sense HAT, which can detect pressure, humidity, and temperature, but not wind speed. However, there are also generic wind and rain sensors you can buy, and, of course, don’t forget an outdoor project enclosure.

There are a few guides on the web, but this weather station guide for Raspberry Pi is a good place to get some ideas.

Create a home web server

Another fun project to do is hosting your own little web server using a Raspberry Pi. You can make a website that only works on your home LAN, or even host something that people from outside your home network can access. Using open source software to host your own web resources is highly educational, and it can also be a way to do something genuinely useful without having to rely on a cloud service somewhere on the internet.

Imagine having your own little bulletin board at home, or hosting content like ebooks, music, or audiobooks?


Infinite possibilities

Despite lacking in the raw power department, all Raspberry Pi devices are little miracles—single board computers that can (in principle) do anything their bigger cousins can. Just more slowly. So if you have a few old Raspberry Pis hanging around, don’t be too quick to retire them yet.



Source link