A new study reveals a subtle but consequential gap in how AI coding assistants work: they're good at navigating to the right file in a codebase, but they frequently miss the specific lines of code that actually need to be changed.

According to The Decoder, the research introduces SWE-Explore, described as the first benchmark designed to evaluate code search as a step separate from code repair. That distinction matters. Previous evaluations typically judged AI agents on whether they produced a working fix — but not on how well they located the problem in the first place.

Tools like Claude Code and Codex, according to The Decoder, perform reasonably well at file-level navigation. Given a bug report or task description, they can usually identify which file is relevant. The trouble starts at a finer grain: pinpointing the exact lines within that file where the problem lives. The study found agents miss most of those critical lines.

This has a practical consequence that the research makes explicit: without sufficient context about where the problem is, even a technically sound fix will fail. An agent might write perfectly reasonable repair code, but apply it to the wrong location — producing a patch that doesn't actually solve anything.

The finding reframes where the real bottleneck in AI-assisted software development may lie. The conversation in the field has largely focused on whether AI can write good code. SWE-Explore suggests the harder, less-examined problem is whether AI can read code well enough to know where to intervene.

If coding agents are going to handle real-world bugs reliably, getting the search step right may matter just as much as getting the fix right.