A Language For Agents: Why New Programming Languages Are Inevitable in the Age of AI

What makes a programming language “good” for AI agents, and why does that challenge everything we thought we knew about language design?

The rise of AI agents as primary code producers is forcing us to reconsider fundamental assumptions about programming languages. After a year of working with agents across different languages and observing their failure modes, I’ve come to believe that we are on the cusp of a new wave of language innovation—one driven not by human ergonomics alone, but by how machines comprehend, generate, and manipulate code.

This article explores why new languages will emerge despite the massive inertia of existing codebases, what characteristics make a language agent-friendly, and what principles should guide the next generation of language design.


Why Would Anyone Create a New Programming Language Now?

The core question: With millions of lines of existing code and mature ecosystems, why would developers adopt new languages instead of sticking with proven tools?

The Weight of Existing Code Is Overestimated

My initial assumption was that the sheer volume of pre-existing code would cement current languages in place indefinitely. That intuition was wrong. The cost of producing code is collapsing so dramatically that ecosystem breadth matters less than it once did.

Consider my recent experience building an Ethernet driver for our sandbox environment. Implementations existed in Rust, C, and Go, but I needed something pluggable in JavaScript. Rather than wrestling with native bindings and build systems, I had an agent reimplement the driver from scratch in JavaScript. The port was faster to produce than the integration work would have been.

This pattern repeats regularly. I now reach for TypeScript in places where I previously would have defaulted to Python—not because I prefer JavaScript’s ecosystem, but because agents perform demonstrably better with TypeScript’s explicit type system and tooling.

Missing Functionality Is Now a Solvable Problem

When a language lacks a library you need, the traditional response was to either switch languages or invest significant effort in bindings. Today, the calculation changes: pointing an agent at a library from another language and having it build a port is often the pragmatic path.

This doesn’t mean ecosystem size is irrelevant. It means the threshold for “good enough” has shifted. A new language can succeed with a narrower initial scope if its value proposition is strong and it evolves with awareness of how language models process code.

The Agent Performance Factor

An agent performs better on languages well-represented in its training weights—that much is obvious. But two less obvious factors matter enormously:

Tooling stability: Zig, for example, is underrepresented in model weights and changes rapidly. This combination creates friction—you can work around it with good documentation, but it’s suboptimal.

Build system complexity: Swift illustrates the opposite problem. It’s well-represented in weights, but the tooling around Mac and iOS development is so convoluted that agents struggle to navigate it effectively.

The lesson: representation in training data is necessary but not sufficient. The surrounding tooling ecosystem must also be navigable by agents operating with limited context.


The Case for Purpose-Built Agent Languages

The core question: If agents can work with existing languages, why invest in creating new ones?

Reconsidering Brevity vs. Clarity Trade-offs

Most modern languages were designed with the assumption that typing code is laborious. We traded verbosity for brevity through features like type inference. The downside: without an LSP (Language Server Protocol), understanding what types flow through complex expressions becomes difficult.

This creates two problems:

  1. Agent comprehension: Agents struggle with heavily inferred types just as humans do. When an agent reads code without full LSP support, it must guess at types that aren’t explicitly written.

  2. Code review friction: In pull request review, complex operations with heavy inference make it hard to determine actual types. The code is shorter but more ambiguous.

As code production becomes cheaper, we might actually prefer more explicit code if it reduces ambiguity during review and agent manipulation.

The Rise of Unseen Code

We are heading toward a world where some code is never seen by humans—only consumed by machines. Even in this scenario, we need to communicate intent to non-programmer users who must understand what the code will do without diving into implementation details.

This suggests a need for languages that optimize for:

  • Explicit intent signaling
  • Self-documenting structure
  • Clear behavioral contracts

Fundamental Assumptions Are Shifting

The core argument for new languages rests on changed fundamentals:

Old Assumption New Reality Design Implication
Human typing speed is a constraint Agent generation speed is effectively unlimited Verbosity is less costly; clarity is more valuable
Humans read all code they work with Agents read code in fragments with limited context Local reasoning becomes more important than global elegance
Ecosystem breadth determines viability Targeted use cases with strong agent ergonomics can succeed Narrow but deep languages become viable
Code review happens by humans reading diffs Agents may assist or lead review Diff stability and explicitness matter more

What Do Agents Actually Need From Languages?

The core question: What specific language characteristics improve agent performance, and how can we measure them?

Context Without LSP Dependency

The Language Server Protocol is excellent for IDEs but creates a specific cost for agents: it must be running. Agents often skip LSP setup when they can—out of efficiency, not technical limitation.

Consider these scenarios:

  • An agent reads documentation snippets that aren’t complete compilable units
  • An agent pulls individual files from a GitHub repository without cloning the full project
  • An agent examines code in a transient environment where setting up an LSP is overhead

In all cases, the agent reads code without semantic tooling support. A language that doesn’t bifurcate into “with-LSP” and “without-LSP” experiences provides a unified working model across more situations.

Practical implication: Syntax and scoping rules should make type and binding information evident from local inspection, without cross-file analysis.

Structural Explicitness Over Whitespace

As a long-time Python developer, this pains me to write: whitespace-based indentation is problematic for agents.

The token efficiency of getting whitespace correct is tricky. When making surgical changes without formatting tools, agents often:

  • Disregard indentation initially
  • Add markers to enable/disable code blocks
  • Rely on a formatter to clean up later

This works but introduces friction. Explicit delimiters (braces, brackets) provide clearer structural boundaries.

However, delimiter choice matters too. Dense runs of closing parentheses—common in Lisp-family languages—create tokenization challenges. Depending on the tokenizer, )))) may split unpredictably, and agents lose track of nesting depth just as humans historically have.

The sweet spot: Explicit delimiters with visual distinction between opening and closing, and syntax that makes nesting depth apparent without counting.

Flow Context: Explicit but Automatic

I’m a strong advocate for async locals and flow execution context—the ability to carry data through call chains for concerns like observability, authentication, or timing. Working in observability has reinforced how critical this is.

The challenge: implicit flow can be unconfigured. If you implicitly pass a timer to all functions, what happens when no timer is configured and a new dependency appears?

Explicit passing of all context is tedious and leads to shortcuts. One experimental approach I’ve explored: effect markers added through formatting. Functions declare needs (time, database, random number generator), and if unmarked, a linter warning triggers that auto-formatting resolves by propagating the annotation.

Example of this pattern:

fn issue(sub: UserId, scopes: []Scope) -> Token
    needs { time, rng }
{
    return Token{
        sub,
        exp: time.now().add(24h),
        scopes,
    }
}

test "issue creates exp in the future" {
    using time = time.fixed("2026-02-06T23:00:00Z");
    using rng  = rng.deterministic(seed: 1);

    let t = issue(user("u1"), ["read"]);
    assert(t.exp > time.now());
}

The benefits for agent development are concrete:

  • When building tests, the agent can precisely mock declared dependencies
  • Error messages guide the agent to supply required contexts
  • The boundary between pure and effectful code is explicit

Results Over Exceptions

Agents struggle with exception-based error handling. They tend to catch excessively, log poorly, and recover awkwardly. This may improve with reinforcement learning training, but current behavior suggests a preference for explicit error paths.

Checked exceptions (as in Java) attempt to solve this but propagate too aggressively up call chains. Typed results (as in Rust’s Result<T, E>) are promising but require language-level support for ergonomic composition.

Operational example: When an agent encounters a file-not-found error in a result-based language, it must explicitly handle or propagate the error. The type system guides this decision. In exception-based languages, the agent may fail to anticipate the exception, leading to unhandled error paths that only surface in production.

Diff Stability and Line-Based Operations

Agents typically read files line-by-line into context. This creates specific failure modes:

Multi-line string confusion: In a 2000-line file containing embedded code strings (common in code generators), agents sometimes edit within the string thinking it’s active code. Only Zig’s prefix-based multi-line string syntax (\\ continuation) provides a clean solution, though it’s unfamiliar to most developers.

Trailing comma sensitivity: Many languages don’t support trailing commas (JSON) or don’t use them idiomatically. When adding or removing items, this causes line shifts that complicate diffs. A syntax requiring trailing commas or being insensitive to their presence improves stability.

Reformatting churn: If formatting changes move constructs across lines, agents lose orientation. Syntax that requires less reformatting and avoids multi-line constructs where possible aids stability.

Greppability as a Core Virtue

Go’s approach to imports—requiring package-qualified references (context.Context rather than bare Context)—dramatically aids agent comprehension. The agent immediately understands where a symbol originates without cross-file analysis.

This principle extends: code should be discoverable with basic text search. It should work with external files that aren’t indexed, and it should minimize false positives for automated transformations using standard Unix tools (sed, perl, grep).

Application scenario: When an agent needs to rename a function across a codebase, qualified references allow simple text replacement. Unqualified imports with aliasing require full semantic analysis to avoid collateral damage.

Local Reasoning Over Global Knowledge

Much of the above distills to: agents prefer local reasoning. They work with partial file sets, lack spatial awareness of full codebases, and rely on external tools to find relationships. Anything requiring global knowledge or hiding information elsewhere creates friction.

Build System Predictability

Agent success correlates strongly with build tool quality. Languages where determining what needs rebuilding or retesting is difficult—due to circular dependencies, unclear package boundaries, or implicit cross-references—create failure modes.

Go succeeds here: it forbids circular package dependencies, enforces clear package layouts, and caches test results reliably. Agents can confidently make changes knowing the build system will correctly identify consequences.


What Frustrates Agents?

The core question: What language features or ecosystem patterns actively hinder agent performance?

Macros and Code Generation

Humans struggle with macros; agents struggle more. The historical argument for macros was code generation to reduce typing. With generation costs approaching zero, this justification weakens.

Generics and compile-time computation (Zig’s comptime) fare better because they generate predictable structural variations. The agent can understand the pattern being instantiated.

Re-exports and Barrel Files

Barrel files (index files that re-export from submodules) decouple implementation location from import location. Agents struggle to trace where a function actually originates, leading to:

  • Imports from wrong locations
  • Missing dependencies
  • Wasted context reading unnecessary files

The ideal: one-to-one mapping between declaration location and import path. Go approaches this without being overly rigid—any file in a directory can define package members, but packages remain small enough to search effectively.

Worst case: Free re-exports across the codebase with aliasing, making reconstruction of the physical layout from imports impossible.

Aliasing and Name Obfuscation

Agents complain about aliasing when refactoring code with heavy import aliasing. Ideally, languages encourage consistent naming and discourage import-time aliasing.

Flaky Tests and Environmental Divergence

Agents are particularly good at creating flaky tests—ironic given their aversion to them. The root cause: most languages make writing flaky tests easier than writing stable ones.

Agents favor mocking, but most languages don’t support mocking well. Tests end up:

  • Concurrency-unsafe due to shared mutable state
  • Dependent on development environment state that diverges in CI
  • Sensitive to execution order or timing

The languages and frameworks that succeed will make determinism the default, not the exception.

Multiple Failure Modes

Ideally, an agent has one command that lints, compiles, and reports success. Reality is messier:

TypeScript example: Code can often run despite type check failures. This “gaslights” the agent—execution success doesn’t indicate type correctness.

Bundler divergence: Different bundler configurations cause local success but CI failure. The more uniform the tooling environment, the better.

Ideal state: Binary success/failure with mechanical fixes available for lint failures. The agent should not manually resolve style issues.


The Future: Will We Actually See New Languages?

The core question: Given all these principles, is the ecosystem actually ready for new programming languages?

The Volume Argument

We are writing more software than ever—more websites, more open source, more internal tools. Even if the proportion of new languages remains constant, the absolute number will increase. But I believe the proportion itself will grow.

The Infrastructure Barrier Has Fallen

For years, launching a language required massive infrastructure investment: package managers, build systems, editor support, documentation generators. Today, you can target a narrower use case—make agents productive first, then extend to human ergonomics.

Two Hopes for the Coming Wave

Outsider innovation: I hope to see people who haven’t built languages before attempt it. Fresh perspectives often reveal assumptions that experts take for granted. The “agent-native” constraint is new enough that established language designers have no inherent advantage.

Evidence-based design: We need deliberate documentation of what works and why. We’ve learned enormous amounts about scaling software engineering, but finding consolidated, consumable guidance on language design is surprisingly difficult. Too much has been shaped by opinion on trivial matters rather than measured outcomes.

The Measurement Opportunity

Agents offer something unprecedented: the ability to measure language effectiveness without human survey bias. No human wants to be subjected to controlled experiments in their daily work, but agents don’t care. We can A/B test syntax variations, measure iteration counts to correct solutions, and quantify the impact of specific features on success rates.

This shifts language design from aesthetic debate to empirical optimization.


Author’s Reflection: The Humbling Reality of Tool Design

Writing this analysis forced me to confront my own biases. As someone who invested years in Python’s ecosystem, acknowledging that whitespace significance is problematic for agents felt like betrayal. But the evidence is clear: agents working without formatters produce structurally broken code in whitespace-sensitive languages at higher rates.

More broadly, this exercise reinforced how deeply human-centered our current tools are. We designed languages assuming:

  • A human would read the full file before editing
  • An IDE would always be available with full project context
  • The cost of typing justified brevity at comprehension expense

These assumptions no longer hold. The agents we are building—and the agents we are becoming through augmentation—need different affordances.

The most exciting possibility isn’t incremental improvement to existing languages. It’s reconceiving what a programming language could be if designed from first principles for a world where:

  • Code is written by machines as often as humans
  • Comprehension happens in fragments, not holistically
  • Explicitness is cheaper than inference
  • Local reasoning is the only reliable reasoning

I don’t know what such languages will look like. But I’m increasingly convinced they will emerge, and that the teams building them will have advantages that incumbent language communities struggle to match.


Practical Implementation: Principles for Agent-Native Language Design

Action Checklist for Language Designers

  • [ ] Evaluate every syntax choice for “LSP-free comprehensibility”—can an agent understand this reading only the local file?
  • [ ] Prefer explicit delimiters over significant whitespace
  • [ ] Design effect systems that make dependencies explicit but propagation automatic
  • [ ] Choose error handling mechanisms that guide agents toward correct handling paths
  • [ ] Require trailing commas or be insensitive to their presence for diff stability
  • [ ] Enforce qualified references over implicit imports for greppability
  • [ ] Forbid circular dependencies and enforce clear package boundaries
  • [ ] Provide a single command that lints, compiles, and tests with binary success/failure
  • [ ] Minimize macro usage; prefer generics and compile-time computation with predictable expansion
  • [ ] Eliminate barrel files and re-exports; maintain 1:1 declaration-to-import mapping
  • [ ] Discourage aliasing; encourage consistent naming across codebase
  • [ ] Make deterministic testing the default; provide first-class mocking support

One-Page Overview

Core thesis: AI agents as primary code producers necessitate new programming languages optimized for machine comprehension rather than human typing ergonomics.

Key shifts:

  • Code production cost collapse makes ecosystem breadth less critical
  • Agents work without full project context, requiring local reasoning
  • Explicitness beats brevity when comprehension happens in fragments
  • Measurement via agent performance enables empirical language design

Agent-friendly characteristics:

  • LSP-free comprehensibility (types visible locally)
  • Explicit delimiters over significant whitespace
  • Effect systems for implicit context
  • Results over exceptions for error handling
  • Trailing comma tolerance for diff stability
  • Qualified references for greppability
  • Predictable builds with single success/failure signal

Agent-hostile patterns:

  • Heavy macro usage
  • Barrel files and re-exports
  • Import aliasing
  • Flaky-test-prone mocking
  • Multiple divergent failure modes

Prediction: We will see new languages emerge targeting agent productivity first, with human ergonomics secondary. The measurement infrastructure for evaluating these languages will be agent-based, enabling empirical optimization previously impossible.


Frequently Asked Questions

Q: Why can’t we just improve existing languages for agents rather than creating new ones?

A: We can and should improve existing languages, but fundamental design decisions (whitespace significance, type inference philosophy, error handling models) are baked into languages at deep levels. New languages allow clean-slate optimization for agent workflows without breaking changes to massive existing codebases.

Q: Doesn’t the vast amount of existing code prevent new language adoption?

A: The calculation changes when agents can port code across languages efficiently. A new language needs only sufficient agent productivity advantage to justify porting costs, not massive ecosystem parity from day one.

Q: Which existing languages are most agent-friendly today?

A: Based on the analysis, Go scores well on greppability, build predictability, and local reasoning. TypeScript performs well due to explicit typing and tooling maturity. Python and Swift face challenges from significant whitespace and complex tooling respectively, though both are usable with proper agent configuration.

Q: How do we measure if a language is “good for agents”?

A: Key metrics include: iterations required to correct solution, frequency of type-related errors without LSP, diff stability under modification, and test flakiness rates. These can be measured through controlled agent experiments without human survey bias.

Q: What’s the most important single feature for an agent-native language?

A: Local comprehensibility—the ability for an agent to correctly understand and modify code given only the immediate file context, without full project analysis or running language services.

Q: Will these languages be worse for human programmers?

A: Not necessarily. Many agent-friendly characteristics (explicitness, greppability, predictable builds) improve human experience too. The difference is prioritization: agent-native languages might accept verbosity that humans would traditionally resist, given agents don’t tire of typing.

Q: When will we see the first major agent-native language?

A: The infrastructure exists now. The first attempts are likely already underway in research and industrial labs. Widespread adoption depends on demonstrating sufficient productivity gains over incumbent languages in specific domains.

Q: How should teams evaluate whether to adopt an agent-native language?

A: Consider: (1) Is your domain greenfield enough to tolerate new language adoption? (2) Do you have agent-heavy workflows where productivity gains would compound? (3) Can you tolerate ecosystem immaturity in exchange for specific advantages? (4) Is your team willing to contribute to early ecosystem development?