Enhancing Legal Classifiers: The Role of LLMs in Question Generation
The FETCH classifier makes strides in refining legal problem classification, but struggles with generating impactful follow-up questions. The introduction of a high-cost model like GPT-5 reveals discrepancies in eliciting critical facts, questioning the efficacy of existing protocols.
The FETCH classifier, an ensemble of low-cost large language models (LLMs), is designed to refine the classification of legal issues. It's an intriguing approach, yet one critical component remains elusive: generating high-quality follow-up questions.
Low-Cost Models Fall Short
The core functionality of FETCH lies in its ability to categorize legal problems. For this, low-cost LLMs perform admirably. However, the same can't be said for crafting plain-language questions. The attempt to generate meaningful inquiries reveals a gap in sophistication, signaling the need for more advanced, and costly, models.
Why does this matter? As legal intake workers interact with clients, the ability to ask the right questions is essential. It ensures accurate classification and effective assistance. Can a low-cost approach truly serve the diverse complexities of legal problems?
Introducing GPT-5
Enter GPT-5, a high-cost model, into the equation. Its inclusion marks a significant improvement in the classifier's performance. With enhanced questioning capabilities, it draws out pertinent details from applicants, boosting classification accuracy.
However, this isn't a perfect solution. The process of prompt engineering alone doesn't suffice in elevating question quality. There's a divergence between LLM and human ratings in evaluating these questions. This disparity underscores the nuances that machine models struggle to capture, despite advancements.
Inconsistent Elicitation and Protocol Gaps
FETCH's performance isn't uniform across legal categories. Particularly, issues like domestic violence don't align with existing family law screening protocols. This inconsistency indicates a broader problem: the need for specialized screening panels in certain legal domains.
That raises a question: Are current legal intake systems prepared to integrate such advanced models effectively? The FETCH classifier's journey suggests there’s room for improvement, not just in technology but in procedural adaptation.
The paper's key contribution lies in highlighting these gaps and proposing solutions. Yet, it's a call to action for legal systems to embrace technological advancements without sidelining human expertise.
Get AI news in your inbox
Daily digest of what matters in AI.