LLMs Stumble in Office Automation: Are We Expecting Too Much?
Large Language Models aren't ready for the big leagues in office software automation. Despite some progress, they can't match human precision.
Large Language Models (LLMs) are speeding up automation tasks everywhere, but navigating complex office software, these AI agents are still learning to crawl. If you thought your AI assistant was ready to handle your Excel and PowerPoint tasks, think again.
Office Automation: The Ultimate Test?
Office automation is more than just typing faster. It's about planning over the long haul, making precise tweaks, and getting different apps to talk to each other. It’s a playground where LLMs could shine, or trip up. To put these models through their paces, researchers rolled out an evaluation based on China's National Computer Rank Examination (NCRE). That's 200 practical tasks across Word, Excel, and PowerPoint scored on a 100-point scale. For context, they used 7,118 machine-gradable criteria. That’s a lot of hoops to jump through.
LLMs: Struggling with the Basics
Seven leading-edge LLMs were tested. The best of the single-turn models scored just 36.6%. That’s not even close to making the honor roll. Give these models some smarts like execution feedback and iterative repair, and they edge up to 68.8%. Unfortunately, this is still way off the 95.5% that’d match community standards. Why should you care? Because these numbers show how far we're from letting AI handle professional office work.
Are We Demanding the Impossible?
Let’s face it. We've made huge strides in code generation. But fine-grained tasks like office automation, we’ve still got miles to go. Why is this hard? Office work requires context and nuance, things that AI's precision and speed can't match yet. If you're holding out for an AI assistant to manage your office tasks without hiccups, you might want to hold your breath a little longer.
This isn't to bash LLMs. they've their strengths. But if you’re thinking they can replace a trained professional in automating intricate office tasks, you’ve got some waiting to do. The dream is real, but the reality? Not quite there. Solana doesn't wait for permission, and neither should tech companies working on LLMs.
Get AI news in your inbox
Daily digest of what matters in AI.