One theme that became very clear looking at some of the research on AI is that a decent chunk of it is trying to figure out how and why LLM's behave the way they do. This one, while not exactly that, is at least in the same neighbourhood.
Use of repository level config files is recommended by providers, to improve code-gen capabilities. But this paper suggests the approach proposed by some of the makers of these products, may not in fact, be optimal.
As it turns out, context files not only increase the cost (by consuming tokens), but the paper demonstrates that LLM-generated context files actually degrade task success. What DOES improve success rate is human provided context.
There are a couple of implications here. The first is the continued importance of quality documentation - DORA research has found for years that documentation consistently leads to 25% higher performance. It benefits humans; so it's not entirely surprising that it seems to benefit systems built on the outputs of humans.
The second is that the metadata that benefits LLM's is often biologically encoded - i.e. it's in people's heads. Documenting that metadata at the code level makes Gen AI more effective. Machine generated metadata isn't as effective - speaking to the importance of having a human in the lead.
My friend and colleague Geoffrey Underwood has observed similar patterns at the organizational level, and has done some excellent thinking on how to leverage and surface enterprise metadata for AI. At the engineering level, the answer seems to be same as it ever was - quality documentation.
Originally published on LinkedIn.