UK Biobank data from 500,000 volunteers listed for sale on Alibaba after Chinese research institutions broke access agreements


Summary: Genetic, medical, and lifestyle data from all 500,000 UK Biobank volunteers was listed for sale on Alibaba after three Chinese research institutions with legitimate access violated their data-sharing agreements. The data was de-identified but includes genome sequences, hospital diagnoses, and biological measures that experts say can be re-identified. Alibaba removed the listings before any sales were made, UK Biobank has paused all external data access, and the ICO is investigating. A March investigation had already found the data leaked dozens of times via GitHub.

The genetic, medical, and lifestyle data of 500,000 British volunteers was listed for sale on Alibaba’s e-commerce platform in China this week, the UK government confirmed on Wednesday, in a breach that did not require a single line of malicious code. Three research institutions in China that had been granted legitimate access to UK Biobank’s database downloaded the data, then listed it for sale. It was not a hack. It was a contract violation by trusted researchers, and that distinction makes it worse, not better, because it exposes a vulnerability that no firewall can fix: the entire model of open research data sharing assumes that everyone who receives the data will follow the rules.

Ian Murray, the Minister of State, told the House of Commons that UK Biobank informed the government on Monday 20 April that three listings had been identified on Alibaba, with at least one appearing to contain data from all 500,000 participants. The data was de-identified, meaning it did not include names, addresses, contact details, or NHS numbers. It did include gender, age, month and year of birth, socio-economic status, lifestyle habits, and measures from biological samples. With support from both the UK and Chinese governments, Alibaba removed the listings before any sales were made. The three institutions had their access revoked. UK Biobank has paused all external data access while it develops a technical solution to prevent bulk downloads, and has referred itself to the Information Commissioner’s Office.

What UK Biobank holds

UK Biobank is one of the most valuable biomedical research resources in the world. Between 2006 and 2010, it recruited 500,000 volunteers aged 40 to 69 across Great Britain, who consented to share their health data and be followed for at least 30 years. The database now holds more than 10,000 variables per participant, including whole genome sequences for all 500,000 volunteers (released in full in 2023), blood and urine biomarkers, brain and body imaging scans, hospital diagnosis records, GP data, and detailed lifestyle questionnaires. Approximately 22,000 researchers worldwide have access to the data for approved studies into cancer, heart disease, diabetes, Alzheimer’s, and other conditions. The resource has generated thousands of peer-reviewed papers and is considered foundational to modern genomic medicine.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

The data is shared on the basis that it is de-identified. Researchers sign material transfer agreements prohibiting redistribution. The model depends on compliance with those agreements. What happened this week is that three institutions broke the agreement, and the only reason anyone knows is that they were brazen enough to list the data for sale on a public marketplace.

The re-identification problem

The government’s assurance that the data did not contain names or addresses is accurate but incomplete. A Guardian investigation published in March found that de-identified UK Biobank data had been exposed online dozens of times, with researchers inadvertently posting partial or complete datasets to GitHub, the code-sharing platform. Between July and December 2025, UK Biobank issued 80 legal notices to GitHub requesting removal. In one case, a dataset containing millions of hospital diagnoses and associated dates for more than 400,000 participants was published openly.

The Guardian demonstrated that the data is not as anonymous as it appears. A reporter was able to pinpoint a volunteer’s extensive hospital diagnosis records using only their month and year of birth and the details of a major surgery they had undergone, information that many people share in everyday conversation. Dr Luc Rocher, associate professor at the Oxford Internet Institute, told the paper that removing identifiers “often does not guarantee anonymity” and that knowing a person’s birthday and a specific medical event date might be sufficient to identify their record with high confidence. Once identified, that record could reveal psychiatric diagnoses, HIV test results, or histories of substance abuse.

Under UK GDPR, data is only truly anonymised if individuals cannot be identified “by any reasonably likely means.” With datasets of this size and richness, especially those containing full genome sequences, the question is not whether re-identification is theoretically possible but whether it is practically difficult enough to constitute meaningful protection. The governance gap in data security is widening as datasets grow larger and AI tools make cross-referencing easier. Privacy experts argue that UK Biobank’s approach, treating de-identification as a sufficient safeguard, is at odds with the reality that many people share fragments of their health information online, and in the age of large language models, those fragments can be reassembled.

A pattern, not an incident

The Alibaba listings are the most dramatic manifestation of a structural problem that UK Biobank has been managing, with limited success, for months. The March investigation revealed that data leaks had occurred dozens of times, driven by the tension between two competing imperatives: journals and funders increasingly require researchers to publish the code they use to analyse large datasets, and that code sometimes includes the data itself, or enough of it to be reconstructed. UK Biobank prohibits this, but enforcement has depended on discovering violations after the fact and issuing takedown notices.

The breach also fits a broader pattern of institutional data exposure across Europe, which IBM identified as the world’s most targeted region for cyberattacks, with the UK accounting for 27% of all attacks on the continent. The Synnovis ransomware attack in June 2024 disrupted pathology services across southeast London for weeks after the Qilin group published patient data from Guy’s and St Thomas’ and King’s College Hospital trusts on the dark web. The Advanced Software ransomware attack in August 2022 took down NHS 111 services. WannaCry in 2017 hit 80 NHS organisations. Each of those was a traditional cyberattack, an external adversary exploiting a technical vulnerability. The Biobank breach is different. The adversary was inside the system, credentialled and approved, and the vulnerability was the access model itself.

The geopolitical dimension

That the data appeared on a Chinese platform will inevitably sharpen the political response. The UK has spent the past five years progressively restricting Chinese technology involvement in critical infrastructure, from the Huawei 5G ban to the National Security and Investment Act’s powers over sensitive data acquisitions. In March 2024, the government accused China-linked actors of cyberattacks on the Electoral Commission and parliamentarians. Chinese state-sponsored hackers have targeted Western governments repeatedly, including a campaign the Dutch government publicly attributed to Beijing that compromised more than 20,000 systems.

Murray thanked the Chinese government “for the speed and seriousness with which they worked to help remove these listings,” a diplomatic formulation that acknowledged cooperation while sidestepping the question of how three Chinese research institutions came to violate their data-sharing agreements simultaneously. The minister did not name the institutions. The ICO said it is “making enquiries.” Whether this was opportunistic misconduct by individual researchers or something more coordinated is a question the investigation will need to answer.

What happens next

UK Biobank has temporarily suspended all access to its research platform and is developing an automated checking system to prevent de-identified participant data from being extracted in bulk, with a target of having the system operational by the end of 2026. The organisation is also implementing strict limits on the size of files that can be taken off the platform. Conor O’Neill, chief executive of cybersecurity firm OnSecurity, said the breach “is a reminder that data protection failures are rarely the result of malicious intent” and pointed to “a cultural gap between policy and practice” in how researchers handle sensitive data.

The vulnerability of public institutions to data theft is not new. But the Biobank case is distinctive because the data was not stolen in any conventional sense. It was given away, under contract, to researchers who broke the contract. The 500,000 volunteers who signed up between 2006 and 2010 consented to share their most intimate biological information for the advancement of medical science. They did not consent to have it listed for sale on a Chinese e-commerce site. The distinction between a hack and a breach of trust may be legally significant. For the people whose genomes are in that database, it is not.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


Serials have become the backbone of the streaming era, especially on Netflix. Serialized television is when a show’s plot unfolds in sequential order over the course of a season. It’s long-form storytelling that typically works best with dramas—Stranger Things, The Crown, etc. Watching the episodes in release order matters. Often, these shows are binged because the complex character arcs and cliffhangers encourage streaming multiple episodes at once.

Serial shows can feel like homework, especially when you fall behind on an episode and need to catch up. That always happens to me, and it leads to anxiety I didn’t want. Thankfully, Netflix offers shows where viewers can jump at any time and not feel lost. These episodic series are perfect for jumping around and picking the episodes you want to watch. One of the most famous comedies ever fits the criteria of an episodic sitcom. Anthology shows, including a Netflix sci-fi classic, are also ideal for watching episodes out of order.

Black Mirror

Welcome to your worst nightmare

Black Mirror wants to scare you. Charlie Brooker’s sci-fi anthology series has been warning humanity about the dangers of technology since 2011. It seems like ages ago that Rory Kinnear had sexual intercourse with a pig in the first episode. Apologies for the spoiler, but the media’s role in the spread of misinformation has never been more relevant.

Black Mirror features self-contained episodes with a beginning, middle, and an end. There has only been one direct sequel: USS Callister: Into Infinity, a season 7 episode that continues the events of season 4’s USS Callister. Otherwise, feel free to jump around and check out the best episodes of each season. Since most episodes feature bleak endings, I’ll leave you with one that ends on an upbeat note: San Junipero.

Seinfeld

Greatest comedy ever?

Comedies are the perfect vehicle for episodic storytelling. While having an overarching plot throughout a season helps attract viewers, many comedy fans are just looking for a few laughs. Write a self-contained story with numerous jokes over 20 to 30 minutes, and you’re ready to go. Seinfeld, aka the show about nothing, is the ideal escape from serialized dramas.

Seinfeld stars Jerry Seinfeld as a fictionalized version of himself as he navigates the comedic scene in New York City. The show revolves around Jerry’s interactions with his friends George (Jason Alexander), Elaine (Julia Louis-Dreyfus), and Kramer (Michael Richards). The gang faces a problem, hilarity ensues, and the episode ends. That’s really all you need to know. Enjoy the laughs.

Guillermo del Toro’s Cabinet of Curiosities

The genre maestro curates new horror stories

There’s a reason why Guillermo del Toro is considered the “King of the Monsters.” The genre expert is as elite as it comes when dealing with mythology and creating new worlds. The Oscar winner relied on his horror expertise in the anthology series Guillermo del Toro’s Cabinet of Curiosities.

I hate referring to episodes of television as “mini-movies.” However, that’s how I would describe the eight episodes of Cabinet of Curiosities. Each director puts their own signature style on a story and brings audiences into their terrifying creation. Del Toro wrote two of the episodes, including one about a demon being summoned. Some are scarier than others, but horror fans will feel right at home with this series. ​​​​​​​

Beat Bobby Flay

Bobby brings the heat

As I’ve gotten older, the Food Network has become one of my favorite channels. I mean, who doesn’t love food? I love eating my (average) home-cooked meal while watching contestants duke it out in the kitchen on my favorite show, Beat Bobby Flay. The competition breaks down into two rounds. In the first round, two chefs have 20 minutes to construct a meal using a secret ingredient. The winner advances to the main event, where they face off against Bobby Flay.

The challenger gets to pick the dish for the final round, so Bobby has a disadvantage. However, Bobby is an award-winning chef with a few tricks up his sleeves. He can handle making a version of your grandmother’s lasagna. With episodes available on Netflix, be prepared to learn why Bobby always throws chiles into his dishes.​​​​​​​

S.W.A.T.

Broadcast TV still knows how to make entertaining programs

The procedural is a genre best produced on broadcast television. Name a cop, doctor, or law drama—chances are it’s a procedural on broadcast TV. While the way we watch television has changed, people still love these types of shows on CBS, NBC, Fox, and ABC. Law & Order, NCIS, and Criminal Minds are procedurals that gained a bigger following thanks to streaming.

S.W.A.T. is cut from the same cloth as Chicago P.D. and CSI. Sergeant Daniel “Hondo” Harrelson (Shemar Moore) is tasked with leading a new S.W.A.T. unit in the LAPD. This action-packed show utilizes a “case of the week” formula in which the team must solve a dangerous situation, such as active shooters and hostage situations. You’re in and out in 44 minutes. What’s better than that?​​​​​​​


Netflix has more content coming your way

After you’re done watching these shows, stay on Netflix for more top-notch content. Netflix has an entire section dedicated to thrillers, and this week, The Guilty and El Camino are two of the section’s best. Keep an eye out for new movies, like Alan Ritchson’s War Machine, which is currently in the streamer’s top 10.

Subscription with ads

Yes, $8/month

Simultaneous streams

Two or four




Source link