Cracking the Code: Why Sparse Computing Strategies Are Failing on GPUs
Sparse computation strategies for spiking neural networks fall short on GPUs. Temporal Aggregated Convolution (TAC) offers a solution, but it's complex. Does it hold the key to efficient AI?
Sparse computing strategies have been heralded as the secret weapon for spiking neural network (SNN) efficiency on GPUs, yet reality proves otherwise. Despite five different sparse computation strategies, Apple M3 Max GPUs couldn't outperform traditional dense convolution. Why? The architectures of these GPUs, particularly their SIMD nature, struggle with the fine-grained, unstructured sparsity inherent in i.i.d. binary spikes.
The TAC Innovation
Enter Temporal Aggregated Convolution (TAC), a new strategy that leverages the linearity of convolution. By aggregating $K$ spike frames before a single convolution call, TAC reduces $T$ computation calls to a mere $T/K$. On rate-coded datasets like MNIST and Fashion-MNIST, TAC isn't just faster, it's more accurate, offering a 13.8x speedup with a 1.6% accuracy boost in MNIST and a 5.4% increase in Fashion-MNIST. These gains in speed and accuracy are noteworthy.
The Event-Based Data Challenge
However, when we shift our gaze to event-based datasets where temporal information is essential, TAC's approach falters. The simplification of temporal data that works for rate-coded inputs becomes a liability. In response, TAC-TP (Temporal Preservation) was developed. It retains full temporal resolution by sharing each group's convolution output across K independent LIF steps. On the DVS128-Gesture task, TAC-TP reached 95.1% accuracy compared to the baseline 96.3%, with half the convolution calls. This is a significant improvement over standard TAC's drop to 91.3%.
Data Dependency: A Double-Edged Sword
The lesson? The optimal strategy depends heavily on the data type. Rate-coded data thrives with reduced temporal dimensions, which acts like noise suppression. Meanwhile, event data demands full temporal preservation to maintain vital motion cues. If the strategy isn't tailored to the data, performance suffers. It's time to abandon the one-size-fits-all mindset.
TAC's speedup isn't chained to a specific GPU architecture. Tests on NVIDIA V100 demonstrated an 11x speedup, proving TAC's versatility across platforms. The AI-AI Venn diagram is getting thicker, and this isn't just a convergence of technologies, it's a important shift in how we harness compute resources.
Given these findings, a critical question emerges: how many more AI advancements remain trapped by unsuitable hardware strategies? The need for adaptable, data-sensitive solutions is evident. TAC's journey isn't just about faster computations. it's about aligning our technological tools with the demands of diverse, complex data types. We're building the financial plumbing for machines, and TAC is laying the groundwork.
Get AI news in your inbox
Daily digest of what matters in AI.