LOGO

AI Coding Tools Move to the Terminal - A New Trend

July 15, 2025
AI Coding Tools Move to the Terminal - A New Trend

The Evolving Landscape of AI-Powered Software Development

For a considerable period, AI-assisted software development has largely relied on code-editing tools such as Cursor, Windsurf, and GitHub Copilot. However, with the increasing sophistication of agentic AI and the emergence of 'vibe coding', a notable transformation is occurring in how AI systems interact with software environments.

Rather than focusing solely on code manipulation, these systems are now more frequently engaging directly with the underlying shell of the systems they inhabit. This represents a fundamental shift in the process of AI-driven software creation, and despite its understated nature, it carries potentially significant implications for the future direction of the field.

The Enduring Power of the Terminal

The terminal, often visualized as the black-and-white screen from classic ’90s films, remains a potent method for executing programs and managing data. While it may lack the visual appeal of modern code editors, it provides an exceptionally powerful interface for those proficient in its use.

Even as code-based agents excel at writing and debugging, terminal tools frequently prove essential for transitioning software from written code into a functional, deployable state.

A Shift Confirmed by Leading AI Labs

The move towards terminal interaction is becoming increasingly evident through the actions of prominent AI research labs. Since February, Anthropic, DeepMind, and OpenAI have each launched command-line coding tools – Claude Code, Gemini CLI, and CLI Codex, respectively – which have quickly become among their most utilized products.

This transition has been somewhat subtle, often presented under similar branding to previous coding tools. Nevertheless, significant changes have occurred in how agents interact with both online and offline computer systems, and some experts believe this is only the beginning.

The Future of LLM Interaction

Mike Merrill, co-creator of the Terminal-Bench benchmark, posits a future where “95% of LLM-computer interaction is through a terminal-like interface.” This suggests a fundamental rethinking of how developers will engage with AI in the coming years.

Challenges to Traditional Code Editors

The rise of terminal-based tools coincides with growing uncertainty surrounding established code-based editors. Windsurf, a prominent example, has experienced internal disruption due to acquisitions and executive departures, casting doubt on its long-term viability.

Furthermore, recent research challenges the perceived productivity gains offered by conventional AI coding assistants. A METR study of Cursor Pro, Windsurf’s primary competitor, revealed that developers actually completed tasks nearly 20% slower despite estimating a 20% to 30% improvement – indicating a potential cost in time and efficiency.

Warp: A New Contender

This situation has created an opportunity for companies like Warp, currently leading on Terminal-Bench. Warp positions itself as an “agentic development environment,” bridging the gap between traditional IDEs and command-line tools like Claude Code.

Zach Lloyd, Warp’s founder, remains optimistic about the terminal’s potential, viewing it as a solution for problems beyond the scope of conventional code editors. He states, “The terminal occupies a very low level in the developer stack, so it’s the most versatile place to be running agents.”

Benchmarking the New Approach

Understanding the differences between the new and old approaches requires examining the benchmarks used for evaluation. Code-based tools traditionally focus on resolving issues found on platforms like GitHub, as exemplified by the SWE-Bench test.

These tools iteratively refine code until a functional solution is achieved. While integrated products like Cursor have introduced more sophisticated methods, the core principle remains: transforming broken code into working code.

A Holistic View of Software Environments

Terminal-based tools adopt a broader perspective, considering the entire environment in which a program operates. This encompasses coding, but also extends to DevOps tasks such as configuring Git servers and diagnosing script failures.

TerminalBench challenges agents with tasks like reverse-engineering compression algorithms from decompression programs and building the Linux kernel from source code – requiring independent source code acquisition and robust problem-solving skills.

The Importance of Environmental Complexity

According to Alex Shaw, co-creator of Terminal-Bench, the difficulty of the benchmark stems not only from the questions themselves, but also from “the environments that we’re placing them in.”

This approach emphasizes a step-by-step problem-solving process – a key strength of agentic AI. However, even advanced models struggle with the complexity of these environments. Warp’s success on Terminal-Bench, solving just over half of the problems, highlights the ongoing challenges and the potential for further development.

Reliable Handling of Non-Coding Tasks

Despite these challenges, Lloyd believes terminal-based tools are already capable of reliably handling a significant portion of a developer’s non-coding workload. He asserts, “If you think of the daily work of setting up a new project, figuring out the dependencies and getting it runnable, Warp can pretty much do that autonomously. And if it can’t do it, it will tell you why.”

#AI coding#terminal#command line#developer tools#AI tools#coding efficiency