What Are Context Windows? Why Your AI Coding Tool Forgets What You Just Said

Q: How many tokens is a context window?

It varies by model. As of March 2026, Claude 3.5 and Claude 4 Opus support 200,000 tokens. GPT-4o supports 128,000 tokens. Google Gemini 2.5 Pro supports over 1 million tokens. One token is roughly 3-4 characters of English text, so 200K tokens is roughly 150,000 words — but remember, both your messages AND the AI's responses count toward that limit.

Q: How do I know when I've hit the context window limit?

Common signs include: the AI contradicts something it said earlier, it forgets a decision you agreed on, it re-introduces code you already told it to remove, or it starts giving generic answers instead of project-specific ones. Some tools like Cursor show a visual indicator of context usage. Claude Code will explicitly warn you when context is getting long.

TL;DR: A context window is the AI's short-term memory — measured in tokens, not words. Everything you type, everything the AI replies, and every file it reads counts toward a hard limit. When the conversation exceeds that limit, the oldest parts get silently dropped. That is why your AI "forgets" what you said, contradicts earlier decisions, and starts hallucinating. The fix: start fresh conversations often, use instruction files, and break big tasks into small ones.

Why AI Coders Need to Know This

If you have spent more than a few hours building with AI coding tools, you have hit this wall. The AI was doing great work, then it started:

Contradicting something it said ten messages ago
Re-introducing code you explicitly told it to remove
"Forgetting" your tech stack and suggesting the wrong framework
Giving generic, vague answers instead of project-specific ones
Making the same mistake you already corrected twice

It feels like the AI is gaslighting you. You know you told it to use PostgreSQL, not MongoDB. You know you agreed on a specific folder structure. But suddenly it is acting like that conversation never happened.

This is not a bug. It is not the AI being "dumb." It is a fundamental constraint called the context window, and once you understand it, half the frustration of AI-assisted coding evaporates. You stop fighting the tool and start working with its limitations.

This is arguably the single most important concept for anyone using AI to write code. More important than prompt engineering. More important than knowing which model is best. Because if you do not understand context windows, no amount of clever prompting will save you from the AI "forgetting" at the worst possible moment.

Real Scenario: The Conversation That Falls Apart

Let us walk through what actually happens. You open Cursor and start a new chat to build a dashboard feature for your SaaS app:

Messages 1–5: You describe the project. Next.js 15, TypeScript, Prisma, PostgreSQL. The AI acknowledges everything perfectly.
Messages 6–12: You build out the dashboard layout together. The AI creates a clean component structure, follows your naming conventions, uses the right patterns.
Messages 13–20: You move to the API routes. The AI correctly references the Prisma schema you discussed earlier. It connects everything up. Still great.
Messages 21–25: You ask the AI to add charts and data visualization. More code, more files, more back-and-forth. Each response is getting longer.
Messages 26–30: Something shifts. You ask it to update the dashboard layout and it suggests a completely different component structure than what you agreed on in message 8. You correct it. It apologizes. Two messages later, it uses MongoDB syntax in a query even though you specified PostgreSQL at the very beginning.

What happened? Your conversation got too long. The earliest messages — the ones where you established your tech stack, your project structure, your conventions — got pushed out of the context window. The AI literally cannot see them anymore. It is not ignoring you. Those messages do not exist as far as the model is concerned.

What AI Generated: A Context Window Breakdown in Action

Here is a simplified version of what this looks like in practice. Imagine this conversation in any AI coding tool:

// Message 3 (early in conversation):
You: "Use Prisma with PostgreSQL for all database queries.
      Never use raw SQL. Always use the Prisma client."
AI:  "Got it! I'll use Prisma for all database operations."

// ... 25 messages of back-and-forth coding ...

// Message 28 (later in conversation):
You: "Now add a function to get all users who signed up
      this month."
AI:  "Here's a function to query your users:

      const users = await db.query(
        'SELECT * FROM users WHERE created_at >= $1',
        [startOfMonth]
      );

      This uses a parameterized SQL query for safety."

// You: "What happened to Prisma?!"
// AI:  "You're right, I apologize! Let me rewrite that
//       with Prisma..."

That early message about using Prisma? It got pushed out of the context window. By message 28, the model has no idea you ever said it. It defaults to the most common pattern it knows — raw SQL — because it has lost the context that told it otherwise.

This is not a rare edge case. If you build anything beyond a trivial feature in a single conversation, you will hit this. Every single AI coding tool has this constraint. Some handle it better than others (more on that below), but none can escape it entirely.

Understanding Context Windows

Now that you have seen what happens, let us understand why it happens.

It's Tokens, Not Words

Context windows are measured in tokens, not words or characters. A token is a chunk of text — roughly 3 to 4 characters in English. The word "authentication" is 4 tokens. A short variable name like id is 1 token. A line of code like const user = await prisma.user.findUnique() might be 12–15 tokens.

Here is a rough conversion to keep in your head:

1,000 tokens ≈ 750 words of English text
1,000 tokens ≈ 30–40 lines of code
A typical AI response with a code block might be 500–2,000 tokens

This matters because code is token-expensive. A single file with 200 lines of code can eat 3,000–5,000 tokens. When your AI tool reads files into context, those tokens add up fast.

Input vs. Output: It All Counts

Your messages count toward the context window. The AI's replies count too. File contents the tool reads in? Those count. System prompts the tool adds behind the scenes? Those count. Everything in the conversation is competing for space inside that window.

A typical long coding session might break down like this:

System prompt (hidden, added by the tool): 2,000–5,000 tokens
Your messages: 5,000–15,000 tokens
AI responses: 15,000–40,000 tokens (AI writes a lot of code)
Files read into context: 10,000–50,000 tokens

Add that up and you can blow through even a 200K token context window in a serious coding session.

Different Models, Different Limits

As of March 2026, here are the context window sizes for the most common models used in AI coding tools:

Model	Context Window	Roughly
Claude 4 Opus / Sonnet	200,000 tokens	~150K words
GPT-4o	128,000 tokens	~96K words
GPT-4.5	128,000 tokens	~96K words
Google Gemini 2.5 Pro	1,000,000+ tokens	~750K words
DeepSeek V3	128,000 tokens	~96K words
Llama 3 (405B)	128,000 tokens	~96K words

Bigger sounds better, right? Not always. Even Gemini's 1M token window has practical limits. Models tend to pay less attention to information in the middle of very long contexts — a well-documented phenomenon researchers call the "lost in the middle" problem. A 200K window where the model pays close attention to everything can outperform a 1M window where important details get buried.

Also, remember: a bigger context window does not mean the AI "remembers better." It just means more text fits before things get dropped. The model still processes the entire context from scratch every single time.

What AI Gets Wrong: The Illusion of Memory

Here is the most important thing to understand, and the thing most people get wrong:

AI does not remember anything.

When you are chatting with Claude in Cursor and it references something you said five messages ago, it is not "remembering" that message. The tool is sending your entire conversation history to the model every single time you hit enter. The model reads the whole thing, generates a response, and then immediately forgets everything. Next message? Same thing. Full conversation, fed in from scratch.

This has several implications that catch people off guard:

1. There is no persistent state

The AI does not have a "working memory" between messages. Every response is generated by reading the full conversation from the beginning. If message 3 is no longer in the context window, the AI has no other way to know what you said in message 3. It is gone.

2. Long conversations degrade in quality

Even before you hit the hard token limit, quality drops in long conversations. The model has to process more and more text, and attention gets spread thinner. Instructions from early messages carry less weight when buried under 50,000 tokens of subsequent conversation. This is why the AI starts giving "meh" responses deep into a session — not because it is tired, but because your critical context is drowning in noise.

3. The tool decides what gets dropped

When you exceed the context window, your AI coding tool has to decide what to cut. Different tools use different strategies — some drop the oldest messages, some try to summarize them, some selectively keep "important" parts. But you rarely get to control this process, and the tool's idea of what is important might not match yours.

4. "Let me re-read your requirements" is a red flag

When the AI says something like "Let me review what we've discussed so far" deep into a conversation, it might sound helpful. But what it actually means is: the model is struggling to find your original requirements in the context. It is a sign you are running into context limits and should probably start a new conversation.

Practical Tips: Working With Context Limits

You cannot make the context window bigger (that is up to the AI companies). But you can work smarter within it. Here is what actually helps:

1. Start Fresh Conversations — Often

This is the single most impactful habit you can build. Do not try to do your entire project in one conversation. When you finish a distinct piece of work — a feature, a bug fix, a refactor — start a new chat.

Think of it like clearing your desk between tasks. Each conversation should have a clear, focused scope:

✅ "Build the user authentication flow" (one conversation)
✅ "Add the dashboard charts" (new conversation)
✅ "Fix the API error handling" (new conversation)
❌ "Build my entire app from scratch" (one massive conversation that will fall apart)

2. Use CLAUDE.md / .cursorrules / .windsurfrules

These are files that persist your project context outside the conversation. Instead of spending your first 10 messages telling the AI about your project, you put that information in a file that loads automatically every session.

A good CLAUDE.md file includes your tech stack, coding conventions, project structure, and common commands. The AI reads it at the start of every conversation, so you never waste tokens re-explaining your setup. It is like giving the AI a cheat sheet before class starts.

This is one of the highest-leverage things you can do as a vibe coder. If you are not using an instruction file, start today.

3. Break Big Tasks Into Smaller Ones

Instead of asking "build me a full e-commerce site," break it into pieces:

Set up the project structure and database schema
Build the product listing page
Build the shopping cart
Build the checkout flow
Add authentication
Add order history

Each piece gets its own conversation. Each conversation starts clean with full context. This is not just a context window trick — it produces better code too, because the AI can focus deeply on one thing instead of juggling an entire application in its head.

4. Front-Load Your Critical Context

The most important instructions should go at the beginning of your conversation (or in your instruction file). Models pay the most attention to the start and end of the context window. If you bury "always use TypeScript strict mode" in message 15 of a 40-message conversation, it is going to get lost.

When starting a new conversation for the next piece of work, begin with a brief summary of the relevant decisions from the previous session:

"I'm continuing work on the dashboard. In the previous
session we decided on:
- Chart.js for data visualization
- /api/metrics endpoint returns JSON
- Dashboard component lives at src/components/Dashboard.tsx
- Using server components for data fetching

Now I need to add the date range filter."

5. Use Agentic Tools That Manage Context for You

Traditional chat-based AI tools stuff everything into the context window. Agentic coding tools are smarter — they read files on demand, run commands, and only pull in what is relevant for the current task.

Claude Code, for example, does not try to keep your entire codebase in context. It reads specific files when it needs them, runs searches to find relevant code, and manages its own context much more efficiently than a plain chat window. This means you can work on larger projects without hitting context limits as quickly.

Tool-Specific Tips: How Each Tool Handles Context

Not all AI coding tools manage context the same way. Here is how the major ones handle it differently — and what that means for your workflow.

Cursor

Cursor gives you the most visible control over context. You can see which files are included in the conversation, manually add or remove files with @file references, and the tool shows you a visual indicator of how much context you are using.

Tips for Cursor:

Watch the context usage indicator — when it is getting full, start a new chat
Use @file to include only the files relevant to your current task
Use .cursorrules for persistent project context
Composer mode uses context differently than inline chat — Composer sessions can get long fast
If you are on a smaller model (GPT-4o-mini), context fills up much faster

Windsurf

Windsurf uses a feature called Cascade that is specifically designed to manage context intelligently. It maintains a memory layer that persists across conversations and automatically decides what context to include for each message.

Tips for Windsurf:

Cascade's memory helps, but it is not perfect — still start new conversations for distinct tasks
Use .windsurfrules for project-level instructions
Windsurf's "Memories" feature stores key decisions between sessions — check it occasionally to make sure it captured the right things
Cascade tends to include more context automatically, which means it can fill up faster on complex projects

Claude Code

Claude Code is an agentic tool that operates fundamentally differently. Instead of a chat window where you paste code, Claude Code reads and writes files directly, runs terminal commands, and manages its own context by pulling in only what it needs.

Tips for Claude Code:

Use CLAUDE.md — it loads automatically and gives Claude persistent project context
Claude Code tells you when context is getting long and suggests starting a new session — listen to it
The /compact command compresses the conversation to free up context space
Because Claude Code reads files on demand, it uses context more efficiently than chat-based tools
For very large projects, Claude Code's ability to search and selectively read files is a major advantage over tools that try to fit everything in context at once

GitHub Copilot

Copilot's inline completions use a very small context window — just the current file and nearby files. Copilot Chat has a larger window but follows the same general rules as other chat tools.

Tips for Copilot:

Keep the file you are editing focused and well-commented — Copilot uses the current file as its primary context
Open related files in tabs — Copilot considers open files when generating suggestions
For complex work, use Copilot Chat rather than relying on inline completions

The Bigger Picture: Context Is Your Most Precious Resource

Once you understand context windows, you start seeing your AI coding workflow differently. Every token you send to the AI is a resource. Every file you include, every long explanation, every back-and-forth correction — it all costs context.

The best vibe coders are not the ones who write the most clever prompts. They are the ones who manage context like a professional manages a budget:

They invest context upfront in clear, specific instructions
They avoid wasting context on vague requests that lead to back-and-forth corrections
They know when to "cash out" and start a new conversation with fresh context
They use external files (CLAUDE.md, .cursorrules) to store persistent context outside the window

Context window sizes will keep growing. Models will get better at handling long contexts. But the fundamental concept will not change anytime soon — the AI is always working with a limited window of text, and managing that window well is a core skill for building with AI.

What to Learn Next

Now that you understand context windows, here are the natural next steps to level up your AI coding workflow:

What Is a CLAUDE.md File? — The single best way to give your AI persistent context across sessions. Learn how to write one that makes every conversation start strong.
AI Prompting Guide for Coders — Now that you know about context limits, learn how to write prompts that use your context budget wisely. Clear, specific prompts waste fewer tokens on back-and-forth.
What Is Agentic Coding? — Agentic tools like Claude Code manage context for you instead of dumping everything into a chat window. Learn how this new paradigm works and why it matters.
Cursor Beginner's Guide — If you are using Cursor, learn all the features that help you manage context effectively.
The Complete Guide to Vibe Coding — Context management is just one piece. Get the full picture of how to build real software with AI tools.

Frequently Asked Questions

What is a context window in AI?

A context window is the maximum amount of text (measured in tokens) that an AI model can process in a single conversation. Think of it as the AI's short-term memory. Everything you say, everything the AI replies, and any code or files included all count toward this limit. Once the conversation exceeds the window, the oldest parts get dropped and the AI literally cannot see them anymore.

Why does my AI coding tool forget what I said earlier?

Your AI is not actually remembering anything between messages. Each time you send a message, the entire conversation history is fed back into the model from scratch. When the conversation gets long enough to exceed the context window, the tool has to drop older messages to fit. That is why the AI seems to "forget" — those earlier messages are no longer being sent to the model at all.

How many tokens is a context window?

It varies by model. As of March 2026, Claude 4 Opus supports 200,000 tokens. GPT-4o supports 128,000 tokens. Google Gemini 2.5 Pro supports over 1 million tokens. One token is roughly 3–4 characters of English text, so 200K tokens is roughly 150,000 words — but remember, both your messages AND the AI's responses count toward that limit.

How do I know when I've hit the context window limit?

Common signs: the AI contradicts something it said earlier, it forgets a decision you agreed on, it re-introduces code you told it to remove, or it starts giving generic answers instead of project-specific ones. Some tools like Cursor show a visual indicator of context usage. Claude Code will explicitly warn you when context is getting long.

What is the best way to manage context window limits?

Start fresh conversations frequently — do not try to do your entire project in one chat. Use project instruction files like CLAUDE.md or .cursorrules to give the AI persistent context without wasting tokens. Break large tasks into smaller, focused conversations. And use agentic tools that can read files on demand instead of stuffing everything into the conversation upfront.