Why Human Data is the Key to Personalizing AI
AI personalization has been evaluated using synthetic data, but human data tells a different story. Discover the real-world limitations and potential solutions.
In the race to personalize AI, the reliance on synthetic data is leading many researchers astray. A recent study involving 550 real human conversations has exposed the stark differences between AI performance with synthetic versus human data. Western coverage has largely overlooked this, but the details are essential for understanding AI's future capabilities.
The Problem with Synthetic Data
Many evaluations of large language models (LLMs) focus on personalization through synthetic datasets. However, this approach may not give an accurate picture of how well current systems serve actual users. The study analyzed three stages of personalization: extracting user attributes, pairing them with new prompts, and generating a personalized response. The numbers speak volumes: 5,949 judgments on attribute extraction, 11,919 on pairing, and 1,101 on response quality.
The paper, published in Japanese, reveals a significant gap in performance when models switch from synthetic to human data. Notably, models struggle to extract relevant user attributes from human conversations and frequently disagree with human judgments. This mismatch suggests a fundamental flaw in current AI training methodologies.
Learning from Real Conversations
Human conversations provide much richer context and complexity than synthetic data. When tasked with generating personalized responses, LLMs often produce results that humans judge as no better than generic outputs. Surprisingly, LLMs themselves rate these personalized responses as superior. Why such a disconnect? The benchmark results speak for themselves: AI's current understanding of personalization isn't aligned with human perceptions.
Two lightweight interventions were introduced to bridge this gap, bringing automated evaluation closer to human data for the first two stages. Yet, the third stage of generating responses remains problematic. Learned reward models show only a modest correlation with human ratings, indicating an ongoing challenge in aligning AI with human judgment.
The Path Forward
This study's data lays the groundwork for better AI personalization models. It's a call to action for researchers to focus more on human data for training and evaluation. Personalized AI should reflect actual user experiences, not just synthetic benchmarks.
So, what does this mean for the future of AI? If we continue to rely on synthetic data, we're setting AI systems up to fail real users. The industry needs to pivot towards using human data, despite the challenges it brings. This shift could usher in an era where AI truly understands and responds to human nuances. What the English-language press missed is the urgency of this transition. If AI is to meet its potential, personalization must be based on how people actually interact, not how we expect them to.
Get AI news in your inbox
Daily digest of what matters in AI.
Key Terms Explained
A standardized test used to measure and compare AI model performance.
The process of measuring how well an AI model performs on its intended task.
Artificially generated data used for training AI models.
The process of teaching an AI model by exposing it to data and adjusting its parameters to minimize errors.