MADE: Shaking Up Medical Text Classification

By Callum BryceApril 17, 2026

MADE is redefining benchmarks in medical text classification, tackling label imbalances and contamination. Big gains for healthcare AI.

JUST IN: There's a new sheriff in town for medical text classification. MADE, a revolutionary benchmark, is shaking things up. It's built off medical device adverse event reports, and it’s constantly updating to prevent the pitfalls of data contamination.

Tackling the Beast of Multi-Label Classification

Multi-label text classification in healthcare is no walk in the park. The task is tough thanks to label imbalances, dependencies, and the sheer complexity involved. Until now, it's been a game of catch-up with existing benchmarks reaching their limits.

Enter MADE. This benchmark features a long-tailed distribution of hierarchical labels, which is a fancy way of saying it deals with lots of data types and scales. And it allows for reproducible evaluations, meaning results can be trusted over time. The labs are scrambling to see if their models can keep up.

The Battle of Models

MADE puts over 20 encoder- and decoder-only models to the test. It's no cakewalk. Fine-tuning and few-shot settings are the name of the game, with instruction-tuned and reasoning variants being part of the mix.

Results are in, and they’re wild. Smaller, discriminatively fine-tuned decoders are killing it with head-to-tail accuracy, showing they can handle everything from common to rare labels. But reliable uncertainty quantification (UQ), generative models take the crown. Big reasoning models? They're surprisingly off their game in UQ, despite their prowess with rare labels.

Uncertainty: The Uncertain Frontier

Here's the kicker: self-verbalized confidence, the idea that models can express their own confidence, isn't cutting it. It's not a reliable proxy for uncertainty. This raises a big question: how can we trust AI in high-stakes domains like healthcare if it can't gauge its own certainty?

And just like that, the leaderboard shifts. MADE is setting a new standard, challenging current models to either evolve or get out of the way. The implications for healthcare are massive, promising more accurate and reliable AI systems. But it’s clear, there’s still a long road ahead.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

MADE: Shaking Up Medical Text Classification

Tackling the Beast of Multi-Label Classification

The Battle of Models

Uncertainty: The Uncertain Frontier

Key Terms Explained