Code generation status report

The AI code generation race is now in full swing. Fable and Mythos by Anthropic is suspended due to a US government export control directive.

I have been using claude for 6-8 months now in my fulltime job. At the beginning the models did not feel very usable. They were wrong multiple times and consistently needed manual interventions and heavy reviewing. When throwing complex debugging tasks at them they seemed to come up with plausible explanations but didn’t connect the dots right. These were mostly distributed infra related platform issues which is the difficult type to get right even for humans.

Now I find myself burning a large amount of tokens with an unclear seat based pricing scheme. I just let the model explore, it reads CLAUDE.md, usually some artifact generated by itself during the session, and some clarification inputs. It works really well when the task description is very detailed, but that is most of the work anyway. I’m trying to use no guardrails, skills and harness style techniques to make the task easier, I’d like to see what the raw model can do. Frontier LLM companies are already pushing features which target the same type of optimization, like thinking mode which is essentially verbosely prompting the model with additional info.

This is at a point of full usability for Opus 4.7 on xhigh thinking settings. Sometimes trivial feeling tasks need a lot of prompting to get right, like editing yaml CI/CD pipelines. Even with small context Opus is struggling. If and when these inconsistencies get solved you could get away with not typing code anymore for certain jobs. The net resource utilization while running training and inference is looking high, datacenters are struggling to provide power. Meta is already building tents and using off-grid power systems to power new GPU clusters.

The development speed gains are not absorbed into visible productivity yet. PR reviews are still the bottleneck, we are not shipping without traditional reviews. Some companies are already building fully automated loops that avoid human input to a varying degree. I’m thinking this will bring out the types of problems scaling companies face even more. If there is no market for your product, you can rush out features but sales will struggle. This is already the case even without fully autonomous feature delivery.

For large corporations it’s mostly about bringing code in-house without respecting any kind of licensing. You can just spend tokens and generate your own code based on the training data which includes a lot of open source licensed software without giving credit to anyone. Plus you get to get rid of some of your workforce, who was producing worse results than even some hallucinating model, whose output is at least looking plausible.

I’m predicting handwritten code will still exist, as writing assembly exists today. It just gets cornered into a niche.