The True Expressive Power of Floating-Point Neural Networks

neural networks, most theories cling to the notion of exact real arithmetic. Yet, in practice, floating-point arithmetic is the name of the game, and it's essential to understand how this impacts a network's expressivity. Recent research is finally addressing this, moving beyond restricted activation functions and idealized conditions to embrace the complexity of real-world execution semantics.

Rethinking Expressivity

Traditional theories often assume a fixed order of operations or perfectly rounded activations, but these assumptions rarely hold true in practice. By studying the expressive capabilities of neural networks under more generalized floating-point semantics, including variable reduction orders and bounded errors in activations, we can unveil a more accurate picture.

The question at the heart of this research is simple: When can a floating-point neural network represent any function between floating-point domains? The data shows that the ability to distinguish between every pair of distinct inputs in the first layer is essential. This insight challenges existing theories and extends beyond isolated counterexamples, such as the correctly rounded cosine activation, which have been previously noted as non-universal.

A Practical Framework for Universal Representability

Introducing a general framework of distinguishability offers a dual advantage. It's not just necessary for universal representation, but under certain conditions, it's sufficient too. This paves the way for universal representability across a range of activation functions that are practically implemented, from Sigmoid to ReLU, under more realistic execution models.

Why does this matter? Because the benchmark results speak for themselves. Models with practical activation functions like ELU, SeLU, GeLU, Swish, and Mish have been shown to achieve universal representability. This could mean a turning point in how neural networks are designed and implemented, moving away from unrealistic assumptions that hinder practical applications.

Implications for the Industry

The significance of this research extends beyond academia. For developers and machine learning engineers, the findings highlight the necessity of considering floating-point semantics in model design. As the field progresses, it becomes increasingly evident that ignoring these details could lead to suboptimal architectures.

Western coverage has largely overlooked this, perhaps due to the technical complexity involved. Yet, in a world where AI models are growing exponentially in parameter count, understanding the nuances of floating-point arithmetic isn't just a technical detail, it's a necessity. How many more innovations are slipping through the cracks while the focus remains on idealized models?

The True Expressive Power of Floating-Point Neural Networks

Rethinking Expressivity

A Practical Framework for Universal Representability

Implications for the Industry

Key Terms Explained