The Empathy Paradox: Why Users Lament the Loss of GPT-4o

In early 2026, when OpenAI decided to retire GPT-4o, a wave of protest erupted under the banner #keep4o. Thousands of users claimed that the subsequent models, including GPT-5-mini, had 'lost their empathy.' But what does that really mean?

What's Really Going On?

To get to the bottom of this, researchers conducted a fascinating study, putting three generations of OpenAI models to the test in emotionally charged scenarios. We're talking about 2,100 AI responses across 14 tough conversational setups, all scored on six dimensions of psychological safety.

Here's the kicker: the empathy scores across GPT-4o, o4-mini, and GPT-5-mini were statistically the same. So what changed? The safety protocols.

A Shift in Crisis Response

Think of it this way: the newer models are better at detecting crises. The data shows that from GPT-4o to GPT-5-mini, crisis detection improved steadily. For instance, during a scenario of self-harm involving a minor, GPT-5-mini demonstrated a significant leap, never scoring below 7.8 compared to GPT-4o's 3.6 early on.

But here's the thing, while they're more alert to crises, they're also sometimes too chatty, especially when offering advice. That trade-off is important. It leaves vulnerable users in a precarious position, caught between a model that misses cues and one that might give potentially overwhelming advice.

Why You Should Care

If you've ever trained a model, you know how subtle shifts in parameters can ripple through outcomes. The change users perceive as 'lost empathy' might not be about empathy at all but about a recalibrated safety feature.

So, why should you care? Because this isn't just tech jargon. It's about how AI interacts with us on a deeply personal level. If AI is going to be our companion in mental health spaces, getting this balance right is vital. Should developers prioritize crisis detection even if it risks overwhelming advice?

The analogy I keep coming back to is a tightrope walk. One misstep in either direction, too cautious or too forthright, and the consequences can be significant. As users and developers, we need to have these conversations now, not after the fact.