The Complex Art of Forgetting: AI Models and Their...

AI, understanding what models forget as they learn isn't just an academic exercise. It has real implications for how we design training regimens and optimize performance. Recent findings shed light on the complex forgetting patterns of AI models, and the results may surprise you.

Architectures That Forget Differently

forgetting, not all AI models are created equal. Take ResNet-18 and DeiT-Small, for instance. These architectures process and, importantly, forget information in distinct ways. Research on their performance with a retinal OCT dataset and a bird species dataset (CUB-200-2011) has shown that the overlap of forgotten samples between these two architectures is quite low. Jaccard overlap scores were a mere 0.34 and 0.15 on the respective datasets. Clearly, different architectures forget fundamentally different samples.

Structured Versus Stochastic Forgetting

It's intriguing to note that Vision Transformers (ViTs) like DeiT-Small forget in a more structured manner compared to CNNs like ResNet-18. With a mean R-squared value of 0.74 for ViTs versus 0.52 for CNNs, the predictability of forgetting in ViTs is notably higher. But here's the kicker: per-sample forgetting is stochastic when randomness is introduced. The correlation between samples forgotten across different training runs is almost nonexistent, with Spearman's rho hovering around 0.01.

The Nature of Sample Difficulty

We often assume that if a model repeatedly forgets specific samples, those samples must be inherently 'difficult'. Yet, the stochastic nature of forgetting challenges this assumption. If sample difficulty isn't intrinsic, what factors are truly at play? Could it be the dataset balance, the architecture, or something more elusive?

Implications for Curriculum Design

Forgetfulness patterns extend beyond individual samples to class-level data. Visually similar species are consistently forgotten more than distinctive ones, offering a semantic layer to forgetting. This insight could guide curriculum design or data pruning, though there’s a catch. Even when samples are arranged based on difficulty using their loss after initial training, the decay constants of their retention didn't provide much predictive power. Static scheduling methods, like spaced repetition based solely on these constants, fail to outperform random sampling.

Why It Matters

The AI-AI Venn diagram is getting thicker. If we understand the quirks of forgetting, we can better exploit architectural diversity in ensemble models. This might just lead to more reliable AI systems. But here's the big question, are we at a point where training strategies need a significant rethink? The data suggests that sticking with static methods may be limiting our potential.

As AI continues its relentless march forward, the way these models forget might just be the key to unlocking their full potential. The compute layer needs a payment rail, and understanding these memory quirks is part of that foundational infrastructure.

The Complex Art of Forgetting: AI Models and Their Memory Quirks