What Is OpenAI Codex CLI? The Complete Guide for Vibe Coders

TL;DR: OpenAI Codex CLI is OpenAI's terminal-based agentic coding tool, similar to Claude Code but powered by GPT models. You install it via npm, point it at your project, and give it plain-English instructions. It can read files, write code, run terminal commands, and iterate on bugs autonomously. It is open source, uses your OpenAI API key, and works best inside a git repo where you can review and revert changes.

Why AI Coders Need to Know This

The agentic coding landscape is moving fast, and the tools you choose shape how you build. For the past year, Claude Code has been the dominant terminal-based coding agent. It earned that position — it is powerful, context-aware, and deeply integrated with Anthropic's Claude models.

But OpenAI was never going to sit that out. Codex CLI is their entry into the same space: a command-line tool that lets a GPT-powered agent read your project, write code across multiple files, execute shell commands, and self-correct when things go wrong.

Why does this matter to you? Because competition makes everything better. Having two serious terminal-based coding agents means better models, lower prices, and more options. Some tasks work better with Claude. Some work better with GPT. The smartest vibe coders in 2026 are not picking sides — they are using the right tool for the right job.

Codex CLI is also fully open source. The entire codebase is on GitHub under an Apache 2.0 license. You can read the code, fork it, extend it, or study how an agentic coding tool actually works under the hood. For a tool category that is defining the future of software development, that transparency matters.

Whether you end up using Codex CLI as your primary tool, your secondary tool, or just want to understand the competitive landscape, you need to know what it does and how it works.

What It Does

At its core, Codex CLI is an agentic coding tool that runs in your terminal. That means it does not just generate code snippets for you to copy-paste. It operates like a junior developer sitting at your machine — reading files, making edits, running commands, checking results, and fixing problems.

Here is what that looks like in practice:

Reads Your Codebase

When you give Codex CLI a task, it starts by understanding your project. It reads your file structure, examines relevant source files, checks configuration files like package.json or pyproject.toml, and builds context about what you are working with. This is not a blind code generator — it sees your actual project.

Writes and Edits Code

Based on your instructions and its understanding of the codebase, it creates new files and modifies existing ones. Need a new API endpoint? It will create the route file, update the router, add any necessary imports, and wire things together across multiple files.

Runs Terminal Commands

This is what separates agentic tools from chatbots. Codex CLI can execute shell commands — install dependencies, run tests, start dev servers, check build output. It reads the results and uses them to decide what to do next. If npm test fails, it reads the error output and fixes the code.

Self-Corrects

The agent loop is the key capability. Codex CLI does not just make one attempt and hand you the result. It runs the code, checks if it works, and if something breaks, it tries to fix it. This plan-execute-verify cycle is what makes it an agent rather than a generator.

Works in a Sandbox

Codex CLI runs commands in a sandboxed environment by default. On macOS it uses Apple's Seatbelt sandbox, and on Linux it uses kernel-level isolation. This means the agent cannot accidentally reach out to the internet, delete files outside your project, or run destructive commands unless you explicitly allow it. The sandbox is not foolproof, but it adds a meaningful layer of protection.

How It Compares: Codex CLI vs Claude Code vs Cursor vs Windsurf

The agentic coding space has several serious players. Here is an honest comparison of where Codex CLI fits.

Codex CLI vs Claude Code

These are the two terminal-based agentic coding tools. They occupy the same niche and share the same fundamental approach: you type a prompt in your terminal, and an AI agent reads your codebase, writes code, runs commands, and iterates.

Where Claude Code is stronger:

Context handling. Claude's 200K token context window gives it an edge on large codebases. It can hold more of your project in memory at once.
CLAUDE.md files. The CLAUDE.md convention for project configuration is mature and well-documented. You can give Claude Code detailed instructions about your project that persist across sessions.
Nuanced instruction following. Claude models tend to follow complex, multi-part instructions more reliably. When you need the agent to respect specific conventions or constraints, Claude Code has an edge.
Ecosystem maturity. Claude Code has been in the market longer with more community knowledge, tutorials, and established workflows.

Where Codex CLI is stronger:

Open source. The entire tool is open source. Claude Code is not. You can read, fork, and modify Codex CLI.
Sandboxing. Codex CLI's default sandboxing is more locked-down. Commands run in an isolated environment that restricts network access and filesystem reach.
OpenAI ecosystem. If you are already using GPT models, have an OpenAI API key, and are familiar with the platform, Codex CLI slots in naturally.
Cost flexibility. You can choose different GPT models to balance cost vs. capability. The codex-mini model is optimized for fast, cheap coding tasks.

Codex CLI vs Cursor

Cursor is a visual code editor (a fork of VS Code) with AI built in. Its Composer mode is agentic — it can make multi-file changes from a prompt. But the experience is fundamentally different from Codex CLI.

Choose Cursor if: You want a visual editor with AI built in. You prefer seeing file tabs, inline diffs, and a graphical interface. You are coming from VS Code and want familiar territory.

Choose Codex CLI if: You prefer working in the terminal. You want an agent that runs commands and tests autonomously. You want the agent to do more of the work without you manually clicking through diffs.

Codex CLI vs Windsurf

Windsurf (formerly Codeium) is another AI-powered editor with its Cascade agentic feature. Like Cursor, it is editor-based rather than terminal-based.

The same principle applies: if you want a visual editor with AI, Windsurf or Cursor are your options. If you want a terminal agent, your choices are Codex CLI and Claude Code.

The honest take: Most experienced AI coders in 2026 use multiple tools. Claude Code for deep refactors and architecture work. Cursor or Windsurf for day-to-day editing. Codex CLI for quick one-shot tasks or when they want to stay in the OpenAI ecosystem. You do not have to pick one.

Getting Started

Setting up Codex CLI takes about five minutes. Here is what you need and how to install it.

Prerequisites

Node.js 22 or newer. Codex CLI is a Node.js tool. Check your version with node --version. If you need to install or update Node, use nodejs.org or a version manager like nvm.
An OpenAI API key. You need an account at platform.openai.com with API credits loaded. Codex CLI uses your API key to send requests to OpenAI's models.
Git (strongly recommended). Codex CLI works best inside a git repository. It uses git to track changes and generate diffs. You can use it without git, but you lose the ability to easily review and revert agent changes.

Installation

npm install -g @openai/codex

That is it. One command, globally installed. Verify it worked:

codex --version

API Key Setup

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="sk-your-key-here"

To make it permanent, add that line to your shell profile (~/.zshrc or ~/.bashrc).

Your First Command

Navigate to a project directory and try:

codex "Explain what this project does"

Codex CLI will read your project files and give you a summary. No files changed, no commands run — just the agent reading and responding. This is a safe way to test that everything is working.

Ready for something more substantial? Try:

codex "Add a health check endpoint to this Express app at GET /health that returns { status: 'ok' }"

In the default mode, Codex CLI will show you the proposed changes and ask for approval before applying them.

Modes: How Much Autonomy Do You Give It?

Codex CLI has several operating modes that control how much freedom the agent has. This is one of its most important design decisions — and the one that trips people up most often.

Suggest Mode (Default)

The agent can read files and propose changes, but it cannot write files or run commands without your approval. Every change is presented as a diff for you to review. This is the safest mode and where you should start.

codex "Refactor the auth middleware to use async/await"

You will see exactly what it wants to change. Approve or reject each change.

Auto-Edit Mode

The agent can read and write files automatically, but still needs your approval to run shell commands. This is a good middle ground — the agent can make code changes freely, but you stay in control of anything that executes on your system.

codex --auto-edit "Add input validation to all API endpoints"

Full-Auto Mode

The agent can read files, write files, and run shell commands — all without asking. This is the most powerful mode and the most dangerous. The agent operates fully autonomously within the sandbox.

codex --full-auto "Set up a complete test suite with Jest, write tests for all endpoints, and make sure they pass"

In full-auto mode, the agent will create test files, install dependencies, run the tests, read the failures, fix the code, and re-run the tests — all without stopping to ask you anything. It is impressive when it works. It can also burn through API credits fast and make unexpected changes.

About --yolo: You might see references to a --yolo flag online. This was an early community alias for full-auto mode with reduced safety checks. The naming was controversial, and the functionality has been folded into the standard --full-auto flag. The point stands: giving an AI agent unrestricted access to your terminal requires trust and git discipline.

Exec Mode (One-Shot)

Not a mode toggle, but a usage pattern. You can use codex as a one-shot command that runs a task and exits, rather than entering an interactive session:

codex exec "Generate a .gitignore file for a Node.js project with TypeScript"

This runs the prompt, outputs the result, and returns you to your shell. Useful for quick tasks and scripting.

What AI Gets Wrong About Codex CLI

Every AI coding tool has failure modes. Knowing them upfront saves you hours of frustration. Here is where Codex CLI stumbles — and these are not edge cases. These are things you will encounter in normal use.

Scope Creep in Full-Auto Mode

This is the biggest practical problem. You ask Codex CLI to "fix the login bug," and in full-auto mode it decides to also refactor the auth middleware, update the database schema, upgrade three dependencies, and rewrite the test suite. Every change might be technically reasonable in isolation, but together they turn a simple bug fix into a massive diff that is hard to review.

The fix: Be specific in your prompts. Instead of "fix the login," say "fix the login bug where users get a 401 error after token refresh — only modify src/auth/token.js." Constraints in your prompt are constraints on the agent.

It Needs a Git Repo (Seriously)

Codex CLI technically works outside a git repo, but you are flying without a safety net. Without git, you have no way to see a clean diff of what changed, no way to revert easily, and no baseline to compare against. The tool itself warns you when you are not in a repo.

The fix: Always git init and git commit your current state before letting Codex CLI work. Make it a habit: commit, then prompt.

Token Costs Add Up

Every time Codex CLI reads a file, generates code, or processes command output, it uses API tokens. In full-auto mode on a large codebase, a single session can consume tens of thousands of tokens. The codex-mini model is cheaper, but if you switch to GPT-4o or o3-mini for complex tasks, costs climb quickly.

The fix: Monitor your OpenAI usage dashboard. Set spending limits. Use codex-mini for routine tasks and save the more expensive models for when you actually need them. Keep your prompts focused — a vague prompt causes the agent to read more files and make more attempts.

Hallucinated APIs and Dependencies

Like all LLM-based tools, Codex CLI can generate code that references packages, functions, or API endpoints that do not exist. It will confidently npm install a package with the wrong name, or import a function from a library that was renamed two versions ago.

The fix: Review dependency changes carefully. If the agent installs something you have never heard of, check npm to verify it exists and is what the agent thinks it is. This applies to all AI coding tools, not just Codex CLI.

Sandbox Limitations

The sandbox restricts network access by default. This is a security feature, but it means tasks that require downloading packages, hitting APIs, or cloning repos will fail in sandboxed mode. You need to explicitly allow network access for those operations, which reduces the safety benefit.

The fix: Plan your workflow. Install dependencies first (npm install before starting the agent), or use a mode that allows the specific network access you need.

Best Practices for Using Codex CLI

After spending time with this tool, here are the practices that separate productive sessions from frustrating ones.

1. Always Work in a Git Repo with a Clean State

Before every Codex CLI session: git add -A && git commit -m "pre-codex checkpoint". This gives you a clean baseline. After the agent finishes, git diff shows you exactly what changed. Do not like the result? git checkout . and start over.

2. Start with Suggest Mode, Graduate to Auto

Begin every new type of task in suggest mode. Watch what the agent proposes. Once you understand its behavior patterns for that kind of work, move to auto-edit. Only use full-auto for tasks you understand well and can verify quickly.

3. Write Specific Prompts

Bad: "Fix the app"

Good: "Fix the 500 error in GET /api/users/:id when the user ID doesn't exist. Return a 404 with { error: 'User not found' }. Only modify src/routes/users.js."

The more specific your prompt, the more focused the agent stays. Include file paths, expected behavior, and constraints.

4. Use a codex.md or Instructions File

Codex CLI supports project-level instructions via markdown files (similar to CLAUDE.md files for Claude Code). Create a file that describes your tech stack, coding conventions, and rules. The agent reads it at the start of each session, which makes its output more consistent with your project's patterns.

5. Combine with Claude Code for Different Strengths

This is not an either/or decision. Many builders use both tools:

Codex CLI for quick, well-defined tasks: "Generate a migration," "Add this endpoint," "Write a script to process these CSV files."
Claude Code for complex, multi-step work: large refactors, architecture decisions, tasks that require deep understanding of a big codebase.

Different models have different strengths. GPT models can be faster for straightforward generation. Claude models tend to be more careful with nuanced, multi-constraint tasks. Using both gives you access to both strengths.

6. Review Everything Before Pushing

No matter which mode you use, review the changes before you push to a shared branch. Run git diff. Read the tests. Check that the agent did not add unused imports, unnecessary dependencies, or commented-out debug code. AI agents are productive, not infallible.

A Real Workflow: Using Codex CLI on an Existing Project

Here is a practical example of how you might use Codex CLI on a real project — not a demo, not a greenfield app, but an existing codebase with existing code and existing bugs.

The situation: You have a Next.js app with an API route that sometimes returns stale data. Users are complaining. You need to find and fix the caching issue.

# Step 1: Make sure you have a clean git state
git status  # Should be clean
git log --oneline -3  # Know where you are

# Step 2: Ask Codex CLI to investigate (suggest mode)
codex "Look at the API routes in src/app/api/ and find any caching issues.
Check the Cache-Control headers, any in-memory caches, and the
revalidation settings. Tell me what you find before making changes."

# Step 3: Review what it found, then ask for the fix
codex --auto-edit "Fix the caching issue in src/app/api/products/route.ts.
The response should have Cache-Control: no-store for authenticated requests
and Cache-Control: public, max-age=60 for anonymous requests.
Do not change any other files."

# Step 4: Review the diff
git diff

# Step 5: Test it
npm run dev  # Manual smoke test
npm test     # Run the test suite

# Step 6: Commit if it looks good
git add -A && git commit -m "Fix caching headers for product API route"

Notice the pattern: investigate first, fix second, review third. You are not letting the agent run wild. You are directing it, checking its work, and committing only what you approve.

What to Learn Next

Codex CLI is one tool in a larger ecosystem. To get the most out of agentic coding, explore these related topics:

Claude Code: The Beginner's Guide — The other major terminal-based coding agent. Understanding both tools lets you pick the right one for each task.
What Is Agentic Coding? — The broader concept behind tools like Codex CLI. Understand the plan-execute-verify loop that all these tools share.
What Is a CLAUDE.md File? — Project configuration files for AI agents. The same concept applies to Codex CLI's instruction files — giving the agent context about your project.
How to Debug AI-Generated Code — Because every AI coding tool will produce bugs. Knowing how to debug AI output is a core skill.
What Are Context Windows? — Understanding why the agent sometimes "forgets" things or misses files. Context windows are the fundamental constraint on all AI coding tools.
Codex vs Claude Code: Which Wins for Complex Projects? — A head-to-head comparison of the two biggest terminal-based coding agents. When to use each, and what kind of projects favor which tool.

Frequently Asked Questions

Is OpenAI Codex CLI the same as the old Codex API?

No. The original Codex API (code-davinci-002) was a code completion model that OpenAI deprecated in 2023. Codex CLI is a completely different product — it is a terminal-based agentic coding tool released in 2025 that uses modern GPT models (like codex-mini) to read your codebase, write files, run commands, and iterate on code autonomously. Same name, entirely different tool.

Is Codex CLI free to use?

The Codex CLI tool itself is free and open source — you install it via npm at no cost. However, it requires an OpenAI API key and uses API credits for every request. Costs depend on which model you use and how many tokens your sessions consume. Simple tasks might cost a few cents; complex multi-file refactors with full-auto mode can add up quickly. Monitor your OpenAI usage dashboard.

Can I use Codex CLI without knowing how to code?

You can use it to generate code from plain English descriptions, but you will get better results if you understand the basics of what it produces. You need to be comfortable with the terminal (command line), and you should be able to review the changes the agent makes. Think of it like hiring a contractor — you do not need to swing the hammer, but you need to know if the wall is straight.

Should I use Codex CLI or Claude Code?

It depends on your workflow and preferences. Claude Code tends to be stronger at understanding large codebases and following nuanced instructions via CLAUDE.md files. Codex CLI integrates naturally into the OpenAI ecosystem and works well if you are already using GPT models. Many experienced AI coders use both — Claude Code for deep refactors and architecture work, Codex CLI for quick one-shot tasks and scripting. Try both and see which fits your style.

What happens if Codex CLI breaks something in my project?

This is why Codex CLI strongly recommends working inside a git repository. If the agent makes changes you do not like or introduces bugs, you can run git diff to see exactly what changed and git checkout . to revert. In suggest mode (the default), it shows you proposed changes without applying them. In full-auto mode, it applies changes directly — which is why having a clean git state before starting is critical.