The Hidden Costs of Per-Token Billing in AI Models

By Rina ShimizuMay 29, 2026

Per-token billing for large language models is difficult to audit, leading to potential overcharges. Companies must find ways to ensure accurate billing.

Per-token billing has become the standard for pricing commercial large language models (LLMs), but it's fraught with transparency issues. The data shows that auditing the honesty of token counts is inherently complex, as providers conceal the model, tokenizer, and execution details to safeguard their intellectual property. This creates a trust paradox where audits rely on the very data providers are motivated to manipulate.

The Trust Paradox

Auditing frameworks aimed at verifying token counts often fall short. What's the English-language press missed: the audit boils down to checking the consistency of the provider's reports, which can easily be skewed. To illustrate, recent studies indicate that providers with standard commercial capabilities can inflate billed token counts significantly. In some scenarios, hidden reasoning usage can be exaggerated by a staggering 1,469%, turning a legitimate $100 bill into approximately $1,569 for the same query.

The Ambiguity of Tokenization

Even when providers disclose the full reasoning string, tokenization ambiguity remains a concern. Reports indicate an average 50.85% over-reporting that slips past detection thresholds. This isn't merely an issue with a specific auditor but a systemic problem with audits that depend on evidence controlled by the provider. So, how do we ensure honest billing?

Potential Solutions

Restoring transparent billing will necessitate more rigorous verification methods. Solutions could include trusted execution attestation, cryptographic proofs of inference, or independent third-party re-execution. These would tether reported token counts to evidence outside the provider's control.

The benchmark results speak for themselves. With potentially inflated token counts, consumers may be paying exorbitant fees, essentially for phantom tokens. It's time for the industry to address these auditing flaws and build trust with users. For companies relying on these models, the question isn't just about costs, but about fairness and transparency in AI transactions.

Share this article:

Get AI news in your inbox

Daily digest of what matters in AI.

The Hidden Costs of Per-Token Billing in AI Models

The Trust Paradox

The Ambiguity of Tokenization

Potential Solutions

Key Terms Explained