Skip to main content

When Machines Have Favourite Languages

A few days ago, I was talking with a customer whose development team was struggling to get useful help from generative AI tools - things like Cursor and ChatGPT - when working in Scala. They’d found these tools great in TypeScript or Python, but clumsy and inconsistent once they switched to a less mainstream language.

What struck me was if these models have such uneven competence across languages, what does that mean for the languages themselves?

After the call, I went looking to see if anyone had studied this properly, and sure enough there’s already emerging research showing that LLMs have measurable biases toward popular languages. A few recent papers (Joel, Wu & Fard 2024, Twist et al. 2025) point out that models perform far better in Python and JavaScript than in “low-resource” or domain-specific languages. It’s a data problem, not a conspiracy: there’s just more of some code than others.

The LLM advantage #

Historically, the success of a programming language has depended on very human factors - community, libraries, tooling, and the availability of people who can use it. Increasingly though, another influence is emerging: the behaviour of the models we now build and rely on.

Large Language Models have become a fixture of modern development. They autocomplete, explain, refactor, and even design. They’ve become the quiet pair-programmer in every IDE.

Which raises a new and slightly unsettling question: What happens to new programming languages in a world where machines already have favourites?

Modern LLMs are astonishingly capable in the popular languages - Python, TypeScript, JavaScript, Java. They’ve been fed years of open-source projects, Stack Overflow answers, and documentation. They’ve learned not just syntax but idiom and convention. When you ask them to write React components or a Flask service, it feels almost magical (well, it used to).

Ask them to write Scala, Zig, or Crystal, though, and that magic fizzles out. The model hesitates, confuses syntax, or falls back on generic patterns. It’s not that those languages are bad - they’re simply under-represented in the model’s training data. The model has never really seen them.

The result is a new kind of bias - not a human one, but a statistical one - and it’s quietly shaping the way people write software.

Increasingly, for many developers, especially those starting their careers, an AI assistant is part of the expected workflow. It explains APIs, completes boilerplate, and catches mistakes. When that support disappears, learning a language suddenly feels harder.

This changes the equation for language adoption. The question “is this language productive?” now implicitly includes “does my LLM understand it?” If it doesn’t, the language feels alien and unhelpful - even if it’s elegant or fast or beautifully designed. The absence of AI support becomes friction.

That’s a meaningful shift. Language ergonomics now includes machine ergonomics.

Feedback loops: human and machine #

Programming language ecosystems have always had network effects. The more people who use a language, the more tutorials, libraries, Stack Overflow posts, and job ads appear - which attracts more people.

LLMs add a second-order network effect. The more code that exists for a language, the better the models get at generating it. And the better the models get, the more developers use that language. A virtuous (or vicious) circle, depending on where you stand.

The net effect? Entrenchment. Python, TypeScript, and Java - already dominant - become even more so. Languages that were struggling to gain traction may find it almost impossible to break through, because the machines that help us code don’t know how to help with them.

It reminded me of something I’ve sometimes seen in startups and, to a lesser degree, in individual teams within larger organisations - a kind of human version of the same pattern.

A team picks an obscure or experimental language because an early engineer loves it. At first it’s fine; the code works, the product ships. But later, as the company grows, the choice starts to bite. Hiring becomes harder. The talent pool is smaller. Fewer people understand the tooling or the quirks of the ecosystem. The decision that once felt interesting becomes a quiet drag on velocity.

Those are really the same phenomenon seen from two sides: one statistical, one organisational.

In both cases, popularity shapes capability. A well-used language benefits from more shared knowledge - whether that’s on Stack Overflow or in the training data of a model. Both humans and machines learn faster when there’s a rich supply of examples to learn from.

The difference now is that this bias is baked not only into our organisations, but into the tools themselves.

When innovation meets inertia #

This might sound abstract, but the implications are practical. Imagine a new language designed for safer concurrency or better data-parallelism. Even if it’s technically superior, early adopters will have to work without the LLM crutch they’ve come to rely on. Fewer examples exist. Documentation is thinner. The model can’t autocomplete or refactor sensibly.

That increases the learning curve. And since most people are pragmatic, they’ll stay with what their AI assistants already understand.

In other words, innovation now competes not only with human inertia, but also with AI inertia.

Reasons to be cheerful #

There is a hopeful flip-side. The same models that currently ignore new languages can, in theory, accelerate their adoption. Feed them high-quality examples. Seed open repositories. Publish clear documentation. Encourage developers to build small projects in public. Once enough examples exist, the models can learn.

An LLM can even act as a bridge - translating idioms from Python or Java into the new syntax, generating tutorials, or bootstrapping libraries. A forward-thinking language team could deliberately train open models on its codebase to jump-start ecosystem support.

But that requires intention. It means treating “being learnable by an LLM” as a design goal from the start.

In this emerging landscape, a language’s success will hinge not only on human factors - performance, expressiveness, safety - but on its machine legibility. Languages that are well represented in the data economy of code will thrive. Those that aren’t may never reach critical mass.

That’s not a moral judgement; it’s just how probabilistic systems work. LLMs are pattern learners. They get better where the patterns are dense. Sparse data leads to fuzzy knowledge. And fuzzy knowledge feels unreliable to the developer depending on it at crunch time.

It may turn out that the golden age of new programming languages is slowing - not because people have run out of ideas, but because the ecosystem has acquired new gatekeepers: the models themselves. Or, more optimistically, perhaps we’ll see a wave of LLM-aware language design - languages conceived with AI tooling as a first-class citizen.

Either way, the dynamics that once allowed new languages to take off through sheer enthusiasm may have started to shift. The next great language will likely need both human advocates and machine tutors.

The future of programming languages may depend less on what they can express, and more on whether the machines that teach us to use them know how to speak them.

Further reading #

If you’re curious, a few of the papers and articles that explore this theme:

Thumbnail image by Ali Shah Lakhani on Unsplash