Decoding AI Scoring Models: Why Shapley-Value Matters
AI models score classroom transcripts but offer scant insights. A new framework aims to change that, showing Shapley-value attributions outperform LLMs in interpretability.
AI has stepped into classrooms, scoring complex language performances with increasing frequency. But here's the catch: these models often leave educators in the dark about why a particular score is produced. Enter the proposed framework combining Shapley-value attributions with large language models (LLMs) to illuminate this opaque process.
The Framework in Action
This new framework was tested on the Quality of Feedback dimension from the CLASS framework using the NCTE corpus, analyzing 6,000 annotated transcript segments. The results? Fine-tuned pretrained language models (PLMs) took the lead in prediction accuracy but tended to compress scores toward the middle of the scale. Meanwhile, Shapley-value (SHAP) showed its muscle by identifying sentences that drive model predictions more reliably than LLM-generated justifications.
Why does this matter? If AI's scoring models are to be trusted in high-stakes educational settings, their interpretability isn't just a nice-to-have, it's essential. Yet if these models can't explain their scores in a way that's useful and reliable, what's the point?
Cross-Model Insights
Further analysis revealed SHAP's versatility across different architectures. In contrast, LLM rationales were inconsistent and limited in their influence. This is important because it suggests SHAP could be the key to more reliable and universal explanations in rubric-based scoring.
So, if you're designing AI for educational assessments, consider this: Slapping a model on a GPU rental isn't a convergence thesis. You need something more substantive, something that can genuinely stand scrutiny. SHAP offers a framework that not only explains but also transfers well across models. The intersection is real. Ninety percent of the projects aren't.
Why Educators Should Care
For educators, this isn't just technical mumbo-jumbo. It's about trust and transparency. Can these AI models provide explanations that teachers can actually use to improve educational outcomes? The proposed framework suggests they can, but only if we choose the right tools.
In the end, it's all about inference costs and how we interpret them. Show me the inference costs. Then we'll talk. Education isn't just a market, it's a trust-based system. If AI wants a seat at this table, it better bring some clarity.
Get AI news in your inbox
Daily digest of what matters in AI.