Rethinking Majority Voting in LLMs: Why Propagational Proxy Voting Matters
Propagational Proxy Voting (PPV) challenges traditional majority voting in large language models, offering a consensus rule that outperforms. By leveraging unused signals, PPV provides a novel approach to unsupervised inference.
If you've ever trained a model, you know that majority voting has been the go-to method for aggregating answers in large language model (LLM) inference. But what if I told you there's a smarter way to get consensus without the need for supervision? Enter Propagational Proxy Voting (PPV), a new approach shaking up the scene.
The Nuts and Bolts of PPV
Think of it this way: traditional majority voting leaves valuable information on the table. PPV, on the other hand, taps into the signals every sample carries, specifically within-group letter entropy and between-group reasoning geometry. By doing so, PPV outshines majority voting by 1.5 percentage points overall on the MMLU-Pro dataset.
How does it work? PPV uses two levers per voter: 'WHEN' and 'WHOM.' 'WHEN' dictates how much weight a voter keeps on its own pick, driven by letter entropy. 'WHOM' decides how the remaining weight gets split among peers, influenced by per-question-centered embedding cosine. It's a dance of delegation that requires no gold labels or auxiliary training.
Why Should You Care?
Here's why this matters for everyone, not just researchers. In a world drowning in data, finding better ways to aggregate information is essential. PPV overturns wrong majority decisions by recognizing geometrically coherent minority clusters. In one example, a 10-6 majority for the wrong letter got overturned because the minority was tighter, with a mean within-cluster cosine of +0.26 versus the majority's -0.02.
The analogy I keep coming back to is trying to understand a symphony by listening to just one instrument. With PPV, you're not just getting the loudest voice but the whole orchestra's consensus.
Exploring the Limits
Of course, PPV isn't without its constraints. The researchers also explored delegation strategies that didn't pan out, narrowing the design space for unsupervised LLM aggregation. No ensemble of confidence modes within questions managed to close the oracle gap, a reminder that there are still boundaries to this innovation.
So, what does this mean for the future of AI? Honestly, it's a call to rethink how we approach consensus in unsupervised settings. Are we ready to let go of the majority rule and embrace a more nuanced understanding?
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
A dense numerical representation of data (words, images, etc.
Running a trained model to make predictions on new data.
An AI model that understands and generates human language.
An AI model with billions of parameters trained on massive text datasets.