EHRs Go Local with New AI: A Step Toward Smarter Healthcare
A new framework uses local open-source LLMs to answer clinical questions directly from EHRs, achieving up to 95.3% accuracy in tests. However, human oversight remains key.
In the quest to speed up healthcare, a new framework promises to change how clinicians retrieve patient data. Enter the locally deployable Clinical Contextual Question Answering (CCQA) system. This innovation allows clinical questions to be answered directly from electronic health records (EHRs) without any external data transfer, safeguarding patient privacy.
Performance Under the Microscope
Testing the system's capabilities involved benchmarking large language models (LLMs) ranging from 4 billion to 70 billion parameters. These tests, conducted offline, used 1,664 expert-annotated question-answer pairs collected from 183 patients' records. Significantly, the dataset was predominantly Finnish clinical text.
The results were compelling. The Llama-3.1-70B model achieved 95.3% accuracy and 97.3% consistency across semantically equivalent question variants. Surprisingly, a smaller model, the Qwen3-30B-A3B-2507, delivered comparable performance, challenging the notion that bigger always means better in AI.
Practical Deployment: Challenges and Solutions
Deploying these models in a clinical setting isn't a simple task. While accuracy was high, the models showed variability in calibration during multiple-choice tests. Crucially, the use of low-precision quantization, specifically 4-bit and 8-bit, helped maintain predictive performance. This approach reduces GPU memory needs, making deployment more feasible.
Even with these advances, there are pitfalls. Clinically significant errors appeared in 2.9% of the outputs. Moreover, semantically equivalent questions sometimes produced discordant answers, with 0.96% of cases showing errors. This highlights an ongoing need for human oversight.
The Way Forward
Local deployment of open-source LLMs within EHR systems could revolutionize clinical data retrieval. But is AI ready to handle life-and-death decisions? Not yet. While the models are promising, the presence of critical errors underscores the necessity of human verification.
healthcare, where precision is everything, integrating these systems will require careful validation. As the healthcare industry moves forward, the collaboration between AI and clinicians could enhance decision-making. But until AI can guarantee flawless performance, human judgment will remain indispensable.
Get AI news in your inbox
Daily digest of what matters in AI.