Decoding AI Scoring Models: Why Shapley-Value Matters

AI has stepped into classrooms, scoring complex language performances with increasing frequency. But here's the catch: these models often leave educators in the dark about why a particular score is produced. Enter the proposed framework combining Shapley-value attributions with large language models (LLMs) to illuminate this opaque process.

The Framework in Action

This new framework was tested on the Quality of Feedback dimension from the CLASS framework using the NCTE corpus, analyzing 6,000 annotated transcript segments. The results? Fine-tuned pretrained language models (PLMs) took the lead in prediction accuracy but tended to compress scores toward the middle of the scale. Meanwhile, Shapley-value (SHAP) showed its muscle by identifying sentences that drive model predictions more reliably than LLM-generated justifications.

Why does this matter? If AI's scoring models are to be trusted in high-stakes educational settings, their interpretability isn't just a nice-to-have, it's essential. Yet if these models can't explain their scores in a way that's useful and reliable, what's the point?

Cross-Model Insights

Further analysis revealed SHAP's versatility across different architectures. In contrast, LLM rationales were inconsistent and limited in their influence. This is important because it suggests SHAP could be the key to more reliable and universal explanations in rubric-based scoring.

So, if you're designing AI for educational assessments, consider this: Slapping a model on a GPU rental isn't a convergence thesis. You need something more substantive, something that can genuinely stand scrutiny. SHAP offers a framework that not only explains but also transfers well across models. The intersection is real. Ninety percent of the projects aren't.

Why Educators Should Care

For educators, this isn't just technical mumbo-jumbo. It's about trust and transparency. Can these AI models provide explanations that teachers can actually use to improve educational outcomes? The proposed framework suggests they can, but only if we choose the right tools.

In the end, it's all about inference costs and how we interpret them. Show me the inference costs. Then we'll talk. Education isn't just a market, it's a trust-based system. If AI wants a seat at this table, it better bring some clarity.

Decoding AI Scoring Models: Why Shapley-Value Matters

The Framework in Action

Cross-Model Insights

Why Educators Should Care

Key Terms Explained