PrismML Releases 1-Bit Bonsai, the First Commercially Viable 1-Bit LLM That Runs on Your Phone

By Dr. Rachel Kim • April 1, 2026

A startup called PrismML just dropped something that could change how we think about running AI models on everyday devices. Their 1-Bit Bonsai family of large language models uses 1-bit weights instead of the standard 16-bit or 32-bit precision, and somehow manages to match full-precision 8B parameter models on major benchmarks. The 8B version needs just 1.15GB of memory. That's 14 times smaller than a standard 8B model.

Let that sink in for a second. We've spent the last three years watching companies build bigger and bigger data centers to run bigger and bigger models. PrismML went the other direction. And their results suggest that direction might actually work.

What 1-Bit Weights Actually Mean for AI Models

Traditional AI models store each parameter as a 16-bit or 32-bit floating point number. That gives the model lots of precision for each weight, but it also means an 8 billion parameter model takes up roughly 16GB of memory at 16-bit precision. That's more than most phones have, and it's why running serious AI locally has been so hard.

PrismML's approach compresses each weight down to a single bit. One or zero. On or off. The math behind this isn't new. Microsoft Research published the BitNet b1.58 paper back in 2024 showing that 1.58-bit weights (allowing values of -1, 0, and 1) could theoretically work. But nobody had turned that theory into a model you could actually ship to customers.

Here's the thing: when you strip a weight down to one bit, you lose a massive amount of information. The trick is in the training process. PrismML developed what they call a "knowledge distillation pipeline" that trains the 1-bit model to mimic the behavior of a full-precision teacher model. The student doesn't need to store the same information in its weights. It just needs to produce the same outputs.

Think of it like learning to draw a face. A full-precision model has a high-resolution reference photo. A 1-bit model has a rough sketch. But if you train the sketch artist long enough with enough examples, they can capture the essential features even without all the detail.

The Numbers Don't Lie

PrismML released three models in the Bonsai family, and the performance numbers are genuinely surprising.

The 8B version fits in 1.15GB of RAM and runs 8 times faster than a standard 8B model at full precision. PrismML claims it's also 5 times more energy efficient. On the MMLU benchmark, it scores within 2 points of Meta's Llama 3.1 8B. On coding benchmarks, the gap is slightly wider but still competitive for a model that's 14 times smaller.

The 4B version needs just 0.57GB. It hits 132 tokens per second on an M4 Pro MacBook. That's faster than most cloud API responses.

And then there's the 1.7B model. At 0.24GB, it runs at 130 tokens per second on an iPhone 17 Pro Max. A billion-parameter language model, running locally on a phone, at conversational speed. Two years ago that would've sounded ridiculous.

The question isn't whether the models are as good as GPT-5 or Claude Opus. They aren't. The question is whether they're good enough for practical edge use cases. And based on early benchmarks, the answer seems to be yes for a surprisingly wide range of tasks.

Why Edge AI Matters More Than You Think

There's a growing gap in the AI industry between what's possible in the cloud and what's possible on your device. Cloud models keep getting smarter, but they also keep getting more expensive to run and more dependent on fast internet connections. Edge AI closes that gap.

Consider robotics. A warehouse robot can't wait 200 milliseconds for a cloud API to respond when it needs to make split-second decisions about picking up packages. It needs a model running locally on its onboard processor. PrismML's Bonsai models are specifically designed for this kind of application.

Or think about privacy. A medical device that processes patient data locally never sends that data to a cloud server. A smart home assistant that runs locally doesn't need to record your conversations. These aren't hypothetical use cases. They're real problems that AI companies have been trying to solve for years.

The military applications are obvious too, though PrismML hasn't publicly discussed defense contracts. Any AI system that needs to operate in environments with limited or no connectivity, like a drone or a field robot, benefits enormously from models that fit in less than a gigabyte of memory.

How PrismML Got Here

PrismML was founded in 2024 by a team of researchers from MIT and Carnegie Mellon. They raised a $12 million seed round from Andreessen Horowitz and Google Ventures, then went quiet for almost a year. The Bonsai release is their first public product.

Their timing is interesting. The AI industry has been trending toward ever-larger models for years. OpenAI, Google, and Anthropic are all pushing models with hundreds of billions or even trillions of parameters. Training those models costs hundreds of millions of dollars. Running them costs even more.

PrismML is betting that the industry will eventually hit a wall with the "bigger is better" approach. Not because large models won't keep improving, but because there's a massive market for AI that works without a $10 million inference budget. Phones, cars, robots, IoT devices, drones, medical equipment. All of these need AI that's small, fast, and cheap to run.

The open source community responded fast. Within hours of the release, developers on GitHub had Bonsai running on Raspberry Pi 5 boards, Nintendo Switch consoles (via homebrew), and even some older Android phones with just 4GB of RAM. That kind of grassroots adoption is exactly what PrismML needs to build momentum.

What This Means for the AI Chip Market

Here's where it gets really interesting for the semiconductor side. If 1-bit models take off, the entire AI chip market could shift.

Current AI chips from NVIDIA, AMD, and Intel are optimized for floating-point operations. Multiply two 16-bit numbers, add them to an accumulator, repeat a billion times. That's what a GPU does. But 1-bit weights don't need floating-point multiplication. They need bitwise operations, which are fundamentally simpler and cheaper.

This opens the door for entirely new chip architectures. Instead of spending $30,000 on an NVIDIA H100, you could potentially run 1-bit models on custom ASICs that cost a fraction of the price. Startups like Groq and Cerebras have been working on alternative AI chip designs for years. 1-bit models could be the breakthrough they need to compete seriously with NVIDIA.

It's too early to say whether PrismML's approach will become the new standard. Full-precision models still outperform 1-bit models on the hardest tasks. But for the vast majority of practical applications, "good enough" at 14 times less memory and 8 times more speed is a trade-off most developers would take in a heartbeat.

The Bigger Picture

What PrismML has done with Bonsai isn't just a technical achievement. It's a philosophical statement about where AI is heading. The industry has been obsessed with making models bigger. Bonsai says maybe the smarter move is making them smaller.

If you want to learn more about model architecture and how different approaches to parameter efficiency work, the fundamental trade-off is always between model size and capability. PrismML just proved that trade-off isn't as steep as everyone assumed.

The real test comes over the next few months as developers build applications on top of Bonsai. If we start seeing production-quality apps running entirely on-device with 1-bit models, the pressure on big AI labs to offer efficient alternatives will be enormous. And that's good for everyone, whether you're building the next generation of AI-powered robots or just want a smart assistant that works without Wi-Fi.

Frequently Asked Questions

What makes 1-bit models different from regular quantized models?

Standard quantization takes a full-precision model and compresses it after training, usually to 4-bit or 8-bit precision. You lose some quality in the process. PrismML's 1-bit models are trained from scratch with 1-bit weights using a specialized distillation process. The model learns to work within the constraint rather than having it imposed afterward.

Can 1-bit Bonsai models replace GPT-5 or Claude?

Not for the hardest tasks. Complex reasoning, long document analysis, and creative writing still benefit from larger, full-precision models. But for tasks like classification, summarization, simple Q&A, and real-time decision making on edge devices, Bonsai models perform surprisingly well. It depends entirely on your use case.

What devices can run Bonsai models right now?

The 1.7B model runs on most modern smartphones. The 4B runs on any laptop from the last few years. The 8B works on machines with at least 2GB of free RAM. Developers have also gotten them running on Raspberry Pi 5 and other single-board computers. PrismML provides ready-to-use packages for iOS, Android, and desktop platforms.

Is PrismML's 1-bit approach open source?

The model weights are available under an open license for both research and commercial use. The training code and distillation pipeline aren't public yet, but PrismML says they plan to release a technical paper with full methodology details in the coming weeks.

PrismML Releases 1-Bit Bonsai, the First Commercially Viable 1-Bit LLM That Runs on Your Phone

PrismML Releases 1-Bit Bonsai, the First Commercially Viable 1-Bit LLM That Runs on Your Phone

What 1-Bit Weights Actually Mean for AI Models

The Numbers Don't Lie

Why Edge AI Matters More Than You Think

How PrismML Got Here

What This Means for the AI Chip Market

The Bigger Picture

Frequently Asked Questions

Key Terms Explained