Decoding the Misleading Signals of Predictive Coding Networks
Predictive coding networks (PCNs) may not live up to their promising potential in energy measurement. Recent findings indicate that they lag behind traditional softmax in tracking energy margins.
Predictive coding networks (PCNs) have long promised a richer understanding of energy dynamics compared to traditional softmax functions. Yet, recent analysis suggests this perceived advantage may be nothing more than an illusion.
The Reality Behind PCNs
PCNs employ a K-way energy probe where each class is fixed as a target, running inference until it settles. The settled energies are then compared across hypotheses. The allure of PCNs is that they seem to draw from a deeper well of information, as their energy readings factor in the entire generative chain.
However, that's where the catch lies. Under Pinchetti-style discriminative formulations, the supposed depth of PCNs' energy readings might be misleading. An approximate reduction model reveals that the K-way energy margin essentially boils down to a monotonous function of the log-softmax margin, with an additional residual that isn't trained for accuracy.
The Empirical Findings
Testing across six scenarios on CIFAR-10, with a 2.1M-parameter network and 1280 test images, consistently showed that PCN energy readings sat below softmax. Whether through extended deterministic training, direct latent movement measurement, or trajectory-integrated training, the gap remained stable. The AUROC_2 values varied by less than 10^-3 in deterministic evaluations.
This isn't about discarding PCNs altogether, but it does raise a critical question: Are we overestimating their role in energy measurement? PCNs, as they currently stand, might not be the silver bullet some hoped for.
Where Do We Go from Here?
The findings invite replication and further exploration. The decomposition doesn't apply in cases like bidirectional PC or non-cross-entropy energy formulations. There's room for productive structural probing that goes beyond what's been tested so far.
AI research, it's easy to chase after the next big thing. But it's essential to question whether these new approaches genuinely offer improvement. After all, you can modelize the deed, but you can't modelize the plumbing leak. The real value might lie in refining existing models, rather than wholly reinventing the wheel.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
Running a trained model to make predictions on new data.
A value the model learns during training — specifically, the weights and biases in neural network layers.
A function that converts a vector of numbers into a probability distribution — all values between 0 and 1 that sum to 1.
The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.