BeyondSWE: A New Benchmark for Code Agents
BeyondSWE is shaking up the benchmark scene for code agents, challenging them with tasks beyond single-repository fixes. With current scores revealing significant gaps, it's clear the journey is just beginning.