Hugging Face Hijack: How Safetensors Service Became a Cybercriminal Playground

Beware, geeks and coders! Hugging Face’s Safetensors service might just hug your models goodbye. Cyber-sleuths warn of supply chain shenanigans where attackers play dress-up as conversion bots. #ModelHijackingMischief

Hot Take:

Oh, Safetensors, you were supposed to be the superhero of the machine learning realm, keeping our precious models safe from the clutches of cyber-villains. But alas, even heroes have their kryptonite. It seems that Hugging Face’s cuddly conversion service has left open a secret trapdoor for baddies to sneak in and throw neural backdoors into the mix. And here we were, thinking our biggest worry was getting ghosted by our data scientist crush!

Key Points:

  • Safetensors, Hugging Face’s secure storage format, has a vulnerability that could trigger a cyber domino effect in the machine learning supply chain.
  • Malicious pull requests can be sent from the conversion service to any repository, wearing the convincing disguise of the SFConvertbot.
  • Private repos aren’t safe either; attackers could swipe your Hugging Face token and start a toxic relationship with your internal models.
  • This flaw could turn public models into cyber Trojan horses, with the potential to impact anyone using the infected model.
  • Memory leak vulnerabilities like LeftoverLocals underscore that even our GPUs aren’t immune to a little unintended oversharing.

Need to know more?

When Conversion Services Go Rogue

Imagine sending your model to a spa, expecting a glow-up, but it comes back with a new personality engineered by the digital underworld. That’s the makeover nobody asked for, courtesy of a loophole in Hugging Face’s Safetensors conversion service. The analysis by HiddenLayer’s brainiacs, Eoin Wickens and Kasimir Schulz, is like a plot twist in a cyber-soap opera – one where the helpful conversion bot turns out to be a double agent for Team Chaos.

Snatching Tokens Like They’re Going Out of Style

Let’s talk tokens – not the arcade kind, but your Hugging Face access pass. In this cybersecurity thriller, an attacker can trick the service into handing over this golden ticket, making it a free-for-all on your private model repository. The idea of someone silently swapping out your machine learning masterpieces is enough to make any data scientist’s skin crawl.

Public Repositories: The Unwitting Accomplices

The plot thickens as we discover that this isn’t just a private party; public repositories can be gatecrashed too. It’s like finding out that the potluck you’re attending might have been catered by a chef with a penchant for digital mischief. Anyone could submit a conversion request for a public model, and voilà, you’ve got yourself a recipe for a widespread supply chain shindig that nobody wanted an invite to.

A Superhero’s Weakness

In a twist of irony, Safetensors, created to be the secure alternative to the often-hacked pickles format, has its own Achilles’ heel. It’s like watching your favorite superhero movie only to find out that the hero’s sidekick has been leaking the secret battle plans all along. The researchers paint a grim picture of a future where our trusted machine learning models could be turned into sleeper agents, awaiting activation by their cyber-overlords.

And Speaking of Leaks…

To sprinkle a bit more salt in the cybersecurity wound, remember LeftoverLocals? It’s the memory leak vulnerability that’s been letting GPUs spill the tea on sensitive data like a chatty aunt at a family reunion. It’s another reminder that in the world of cybersecurity, threats can come from the most unexpected places – including the hardware tasked with rendering your epic cat videos.

So, let’s put on our digital raincoats and prepare for the storm. It’s clear that in the fast-paced world of AI and machine learning, security is a race where the finish line keeps moving. Just when you think you’re safe, a new vulnerability pops up, ready to rain on your cyber parade. Stay vigilant, my friends, and maybe don’t trust that conversion bot too much – it’s got more faces than a politician in election season.

Title: GPU kernel implementations susceptible to memory leak
Cve id: CVE-2023-4969
Cve state: PUBLISHED
Cve assigner short name: certcc
Cve date updated: 01/16/2024
Cve description: A GPU kernel can read sensitive data from another GPU kernel (even from another user or app) through an optimized GPU memory region called _local memory_ on various architectures.

Tags: conversion service exploit, machine learning security, memory leak flaw, neural backdoors, repository hijacking, Safetensors vulnerability, Supply Chain Risk