What Are AI Tokens and Context Limits? Why Your AI Forgets

Q: How many tokens do Claude, GPT-4, and Cursor support?

As of early 2026: Claude has a 200,000-token context window (roughly 150,000 words). GPT-4o supports up to 128,000 tokens. Cursor's Agent mode uses the underlying model's context window but manages it automatically. GitHub Copilot's chat context is more limited, focusing on the current file and nearby code. Token limits change with model updates, so check the official documentation for each tool.

Q: How do I use fewer tokens when coding with AI?

The most effective strategies: (1) Be specific and concise in your prompts — avoid rambling preamble. (2) Don't paste entire files when you only need to share a function. (3) Start fresh conversations for separate tasks instead of one endless thread. (4) Use headings and structured prompts instead of long paragraphs. (5) When pasting code, only include the relevant section plus a brief summary of the surrounding context.

Q: What happens when you hit the context limit?

Different tools handle this differently. Some tools (like Claude.ai) warn you when you're approaching the limit. Others silently drop the oldest messages from the conversation. Some tools may refuse to respond or return an error. Practically: the AI starts giving answers that ignore information you shared earlier, seems to 'forget' your requirements, or makes mistakes that contradict earlier decisions it got right.

TL;DR: A token is a small chunk of text (roughly one word). AI models can only process a limited number of tokens at once — that limit is called the context window. Every message you send, every file you paste, and every response the AI gives all count against this limit. When a conversation fills up the context window, the AI literally can no longer see older messages — which is why it appears to "forget" what you told it. The fix: write tighter prompts, share only the code you need reviewed, and start fresh conversations for separate tasks.

What Is a Token? (The Non-Technical Explanation)

Before AI models can process text, they break it down into small pieces called tokens. Think of it like a butcher breaking down a whole animal into individual cuts before working with them. The AI doesn't see "Hello, how are you?" — it sees something more like [Hello][,][ how][ are][ you][?].

Tokens are not exactly words. They're roughly 3–4 characters each in English, which works out to about 0.75 words per token. Here are some real examples:

Text	Approx. Tokens	Notes
`"Hello, world!"`	4 tokens	Punctuation counts separately
500-word blog post	~650–700 tokens	Typical ratio: 1 word ≈ 1.3 tokens
100-line JavaScript file	~800–1,200 tokens	Code is usually token-dense
A full novel (80,000 words)	~100,000+ tokens	Exceeds most tool limits
This entire article	~4,000–4,500 tokens	Within one exchange's budget easily

Why does this matter? Because tokens are the currency of AI. Every word you type, every line of code you paste, and every response the AI generates all cost tokens. And every AI tool has a limit on how many it can handle at once.

Quick Rule of Thumb

1,000 tokens ≈ 750 words ≈ roughly 3 pages of plain text. For code, assume tokens run higher — code has more special characters, indentation, and symbols per word compared to prose.

What Is a Context Window?

Every AI model can only "see" a limited amount of text at any given moment. That limit is called the context window. Think of it like the AI's working memory — it can hold a certain amount of information in mind at once, and anything outside that window simply doesn't exist from its perspective.

The context window includes everything in the current conversation:

Your messages (all of them, going back to the start)
The AI's responses (all of them)
Any files, code snippets, or documents you've pasted in
Any system instructions the tool has set up behind the scenes

When you send a message, all of that goes in together. The AI generates its response with full awareness of everything in the window. But as the conversation grows, it creeps toward the limit — and that's when things get weird.

The Goldfish Memory Problem

AI doesn't have persistent memory between conversations (unless specifically designed to). Each new chat starts from zero. And within a long conversation, once old messages fall outside the context window, the AI can no longer access them. It's not being dumb — it literally cannot see them anymore.

For a deeper look at how context windows are architected, see our dedicated guide: What Are Context Windows? A Plain-English Guide.

Why Your AI "Forgets" Mid-Conversation

Here's the scenario that frustrates everyone eventually:

You start a coding session. You explain your project, your tech stack, your naming conventions.
The AI follows your rules perfectly for the first dozen exchanges.
An hour later, you notice it's using camelCase when you explicitly asked for snake_case. Or it's suggesting a library you said you weren't using. Or it keeps making a mistake you already corrected three times.

What happened? The conversation got long. Your early setup messages — the ones where you established your requirements — have been pushed out of the context window to make room for more recent exchanges. The AI isn't ignoring you. Those messages are gone from its view.

Symptom	What's Actually Happening
AI ignores rules you set at the start	Those early messages are no longer in the context window
AI repeats a mistake you already fixed	The correction was pushed out of context
AI doesn't know what "the button" refers to	The code defining that button scrolled out of view
Responses get slower and less accurate	Long context is computationally expensive; quality degrades near limits
AI suddenly acts like it has no project context	Context window hit; many of your early messages are gone

Different tools handle context limits differently. Claude.ai shows a visual warning when you're approaching the limit. ChatGPT silently drops old messages. Cursor manages context automatically but may silently exclude files it deems less relevant. The behavior varies, but the underlying constraint is the same.

Token Limits Across Major AI Coding Tools

Context windows have grown dramatically in the past two years. Here's where the major tools stand as of early 2026:

Tool / Model	Context Window	Approx. Word Count	Notes
Claude (claude.ai / API)	200,000 tokens	~150,000 words	Largest in mainstream use; can hold an entire codebase
GPT-4o (ChatGPT)	128,000 tokens	~96,000 words	Strong for most projects; older GPT-4 had 8k–32k limits
Cursor (Agent/Composer)	Varies by model	Managed automatically	Cursor selects and trims context intelligently; uses Claude or GPT under the hood
GitHub Copilot Chat	~10,000–16,000 tokens (effective)	~7,500–12,000 words	Focuses on current file + immediate context; limited deep project awareness
Claude Code (CLI)	200,000 tokens	~150,000 words	Compresses context automatically; shows usage warning when approaching limit
Gemini 1.5 Pro	1,000,000 tokens	~750,000 words	Experimental; very long context; used in some coding tools via API

These Numbers Change

Context window sizes are updated with new model releases. The figures above reflect early 2026 — check official documentation for current limits before making architecture decisions. Also note: a larger context window doesn't always mean better results. Quality can degrade at very long contexts even when technically supported.

To understand how context windows work at a deeper level — including how models handle very long inputs — see What Are Context Windows?

Bigger Context Window ≠ Better Results

More room sounds like it should always be better. In practice, there are real trade-offs:

Speed

Processing a long context takes more compute. A conversation with 100,000 tokens of history is noticeably slower than one with 5,000. If you're pasting huge files at the start of every session "just in case," you're paying a speed tax for context you may never use.

Attention Drift

AI models tend to pay more attention to the beginning and the end of their context window — a phenomenon sometimes called "lost in the middle." Critical instructions buried in the middle of a very long conversation may get relatively less weight than the same instruction at the top or bottom. This means a 200k context window doesn't guarantee the AI will remember message #47 as reliably as message #1.

Cost (for API users)

If you're building with the API rather than using a chat interface, token count directly drives cost. Input tokens and output tokens both have a price per thousand. A prompt that uses 50,000 tokens when 5,000 would have worked is ten times more expensive per call.

Practical Strategies to Stay Under Token Limits

You don't need to obsess over token counts to work effectively with AI. These habits will serve you well across every tool:

1. One Task Per Conversation

The single most effective thing you can do. Don't build your entire app in one endless chat session. Use a new conversation for each meaningful task:

New conversation to design the data model
New conversation to build the authentication flow
New conversation to debug the payment integration

This keeps each conversation focused, fast, and within safe context limits. It also makes it easier to re-run a task from scratch if the AI goes sideways.

2. Share Only What's Relevant

Resist the urge to paste your entire codebase. If you're asking about a bug in submitForm(), paste only that function — not the whole file. If the AI needs context about how a variable was defined, you can add that specific section. Most of the time, a focused snippet works better than a wall of code.

Wasteful (High Token Cost)

Here's my entire app.js file (400 lines). There's a bug
somewhere in the form submission. Can you find it?
[pastes 400 lines]

Efficient (Low Token Cost, Usually Better Results)

This function handles form submission in a React app.
When the user clicks submit, it throws:
"Cannot read properties of undefined (reading 'email')"

Here's the function:
[pastes 20-line function]

The form data comes from a controlled component using
useState. What's wrong?

3. Front-Load Critical Context

If you have requirements the AI must follow throughout a session — your tech stack, naming conventions, constraints — put them at the very top of your first message. Don't save them for message five. The start of the conversation is the safest place for important context.

Good Session Opener (Front-Loaded Context)

Project context for this session:
- Stack: Next.js 14, TypeScript, Prisma, PostgreSQL
- Styling: Tailwind CSS only (no inline styles)
- Naming: snake_case for DB columns, camelCase for JS
- No external libraries without asking me first
- All components should be server components unless
  client interactivity is required

Task: Build a user profile page that shows their
recent orders. Here's the Prisma schema for orders:
[paste only the orders model, ~15 lines]

4. Summarize Instead of Continuing

When a conversation is getting long and you need to keep working on the same project, don't just keep going. Instead, ask the AI to summarize the decisions and code you've written so far, then start a new conversation and paste that summary at the top. You get a clean context window with all the important information preserved.

Prompt to Use Before Starting a New Session

Before we wrap this session: please write a concise
summary (bullet points) of:
1. What we built
2. The key decisions we made and why
3. Any constraints or conventions I established
4. What was left to do

I'll use this as context at the start of my next session.

5. Use Structured Prompts, Not Essays

A well-structured prompt uses fewer tokens and gets better results than the same information written as flowing paragraphs. Use bullet points for requirements, code blocks for code, and clear section headers when you have multiple pieces of context to share.

For more on how to write prompts that get better results and use fewer tokens, see the AI Prompting Guide for Coders.

When to Start a New Conversation

Knowing when to cut your losses and start fresh is a skill. Here are the clear signals:

The AI forgets something you told it more than once. If you've re-stated the same requirement three times and it keeps violating it, the constraint has probably been pushed out of context. Start fresh and lead with that requirement.
You're switching to a different part of the project. Finished the auth flow? Don't continue in that same conversation to build the dashboard. The old auth context is mostly noise now.
Response quality noticeably drops. Long contexts sometimes produce more confused, hedging, or lower-quality responses. If the AI suddenly seems less capable, context bloat is a likely culprit.
You pasted a lot of files early on. Large file dumps fill context fast. If you front-loaded 10 files and the conversation is now 30 exchanges deep, you're probably near the limit.
The tool warns you. Claude shows explicit warnings. Cursor shows context usage indicators. If a tool is telling you you're running out of context, believe it and start fresh.

The 20-Message Rule of Thumb

For most vibe coding workflows, if a conversation has gone past 20–25 exchanges, consider whether it's time for a new session. This isn't a hard rule — but it's a useful signal to pause and ask: is the AI still performing at its best, or has context bloat crept in?

How Different Tools Handle Context

Each tool manages context a little differently. Understanding this saves frustration:

Claude (claude.ai)

Claude has the largest context window of the mainstream tools at 200,000 tokens — roughly the length of two novels. It shows you a visual indicator when you're approaching the limit. It handles very long conversations better than most, but quality still degrades near the top. Claude is a strong choice for tasks that genuinely require long context — like reviewing an entire codebase for consistency. For shorter tasks, a new conversation is still better. Learn more in the Claude Code Beginner's Guide.

ChatGPT / GPT-4o

GPT-4o supports 128,000 tokens. ChatGPT silently drops old messages when you hit the limit — it doesn't warn you. You'll notice when the AI starts acting like it forgot things. For coding workflows, this silent behavior is the most frustrating aspect of using ChatGPT for long sessions.

Cursor

Cursor is smart about context management. In Agent mode and Composer, it automatically selects the most relevant files from your codebase rather than dumping everything in. You can also explicitly add files to context using @filename syntax. This means you're less likely to hit raw token limits than in a raw chat interface — but you should still understand what Cursor is including. See the Cursor Beginner's Guide for how to manage context in Cursor specifically.

GitHub Copilot

Copilot's inline suggestions work on a narrow context — the current file, the current function, and sometimes files you have open. Copilot Chat has a larger context but still works best when you're focused on a specific file or feature. The @workspace command tells Copilot to search across your project, but it's more selective than tools with full codebase indexing. Learn more in our guide to what GitHub Copilot is and how it works.

How to Structure Prompts to Use Fewer Tokens

Every word in your prompt is a token being spent. Here are specific techniques that reduce token usage without losing information quality:

Cut the Preamble

Don't warm up your prompts. The AI doesn't need "I hope you can help me with this. I've been working on a project and I've run into a bit of an issue..." Just state the task.

Bloated Prompt (~80 tokens)

Hi! I was wondering if you could help me. I'm building
a web app and I'm not super experienced with JavaScript
but I've been learning and I have this function that
isn't working the way I expected. Could you take a look
at it and maybe explain what's going wrong?

Efficient Prompt (~20 tokens)

This JavaScript function isn't working as expected.
Explain the bug and fix it:

Use Code Comments as Your Spec

Instead of a long paragraph describing what you want, use a short comment block directly above your code. This is exactly how inline autocomplete tools like Copilot and Codeium are designed to work — and it uses far fewer tokens than a separate explanation.

Avoid Repeating Yourself

If you've already established context in this session, don't re-state it in every message. A brief reference is enough: "continuing from above" or "using the same User model as before." Full re-statements eat tokens and dilute the AI's attention.

Prefer Bullet Points Over Paragraphs

Bullet points express requirements in fewer words than prose — and they're clearer for the AI to parse. "The form should validate email format, require all fields, and show inline error messages" works better as three bullets than one run-on sentence.

Ask for Shorter Responses When Appropriate

AI output also counts against your context window. If you're in a long session and want to preserve context space, you can ask: "Keep your response brief — just the code, no explanation." That 500-token explanation the AI would have given is 500 tokens not filling up your window.

For a comprehensive approach to writing better AI prompts, read the AI Prompting Guide for Coders — it covers token efficiency in the context of a full prompting strategy.

When Context Problems Look Like AI Incompetence

One of the most common frustrations with AI coding tools: "It keeps making the same mistake even after I correct it." Half the time, this isn't an AI quality issue — it's a context issue. The correction fell out of the window.

Before you conclude an AI tool is broken or bad at something, run through this checklist:

Is this a long conversation? If you're 30+ messages in, start a new session and see if the issue persists.
Did you paste a lot of code early? Large early pastes compress the effective "age" at which later context falls out.
Is your requirement in the current context? In a new session, if you re-state the requirement clearly at the top, does the AI follow it? If yes, you had a context window problem, not an AI problem.
Is the AI hallucinating or just missing context? These are different failure modes. Hallucination is making up false information confidently. Missing context is giving a correct answer that contradicts something it can no longer see. See our guide on how to debug AI-generated code for how to tell the difference.

What to Learn Next

Now that you understand tokens and context limits, here's where to go depending on what you're working on:

Context Windows

What Are Context Windows?

Go deeper on how context windows work, why they vary between models, and how tools like Cursor and Claude manage them automatically.

Prompting

AI Prompting Guide for Coders

Write prompts that get better results, use fewer tokens, and don't waste your context window on preamble and filler.

Cursor

Cursor Beginner's Guide

Cursor's smart context management is one of its best features. Here's how to use @-mentions and Agent mode to control exactly what goes in your context.

Claude Code

Claude Code Beginner's Guide

Claude has the largest context window of any mainstream AI tool. Learn how to use it effectively for long coding sessions and whole-codebase tasks.

GitHub Copilot

What Is GitHub Copilot?

Understand Copilot's context model — how it uses the current file, open tabs, and @workspace to build its answers.

Debugging

How to Debug AI-Generated Code

When AI keeps making the same mistake, learn to tell whether it's a context problem, a hallucination, or a real limitation of the model.

Fundamentals

What Is JavaScript?

The language your AI writes most often — understand what it is so your prompts make more sense.

Tools

What Is Git?

When a conversation fills your context window, your code history lives in Git — not in the AI's memory.

Frequently Asked Questions

What is a token in AI?

A token is a small chunk of text — roughly 3–4 characters or about 0.75 words in English. AI models don't read word by word; they process these small chunks. "Hello, world!" is about 4 tokens. A 500-word article is roughly 650–700 tokens. Tokens cover both your input (what you send) and the AI's output (what it writes back). All of this counts toward your context window limit.

What is a context window?

A context window is the maximum amount of text an AI model can "see" at once — including your entire conversation history, any files you've attached, and the AI's previous responses. Think of it as the AI's working memory. Once a conversation fills up the context window, older messages start getting dropped, which is why the AI appears to "forget" earlier parts of your conversation. Context windows are measured in tokens.

Why does my AI forget what I told it earlier?

AI models have a fixed context window — a limit on how much text they can hold in "memory" at once. When a conversation gets long enough to fill that window, older messages are pushed out to make room for new ones. The AI isn't being forgetful — it literally can no longer see those earlier messages. The fix: start a new conversation and re-provide the critical context at the top, or keep your sessions focused so they don't balloon to the limit.

How many tokens do Claude, GPT-4, and Cursor support?

As of early 2026: Claude supports a 200,000-token context window (roughly 150,000 words). GPT-4o supports up to 128,000 tokens. Cursor manages context automatically using the underlying model's window — it selects relevant files rather than dumping everything in. GitHub Copilot Chat's effective context is narrower, focused on the current file and immediate surroundings. Token limits change with model updates, so check official documentation for each tool.

How do I use fewer tokens when coding with AI?

The most effective strategies: (1) Use one conversation per task instead of one endless thread. (2) Paste only the relevant function or section, not the entire file. (3) Front-load critical requirements at the very start of your conversation. (4) Use bullet points instead of paragraphs for requirements. (5) Cut the preamble — get straight to the ask. (6) Ask the AI to summarize decisions at the end of a session so you can carry them into a fresh context.

What happens when you hit the context limit?

Different tools handle this differently. Claude shows a warning when you're approaching the limit. ChatGPT silently drops the oldest messages. Cursor automatically manages context by prioritizing the most relevant files. Practically speaking: the AI starts ignoring requirements you set earlier, repeats mistakes you already corrected, gives answers that seem to contradict earlier decisions, or produces noticeably lower-quality responses. When you see these signs in a long session, start a new conversation.

Read: Context Windows Guide Read: AI Prompting Guide Read: Cursor Guide

What Are AI Tokens and Context Limits? Why Your AI Forgets What You Told It

What Is a Token? (The Non-Technical Explanation)

What Is a Context Window?

Why Your AI "Forgets" Mid-Conversation

Token Limits Across Major AI Coding Tools

Bigger Context Window ≠ Better Results

Speed

Attention Drift

Cost (for API users)

Practical Strategies to Stay Under Token Limits

1. One Task Per Conversation

2. Share Only What's Relevant

3. Front-Load Critical Context

4. Summarize Instead of Continuing

5. Use Structured Prompts, Not Essays

When to Start a New Conversation

How Different Tools Handle Context

Claude (claude.ai)

ChatGPT / GPT-4o

Cursor

GitHub Copilot

How to Structure Prompts to Use Fewer Tokens

Cut the Preamble

Use Code Comments as Your Spec

Avoid Repeating Yourself

Prefer Bullet Points Over Paragraphs

Ask for Shorter Responses When Appropriate

When Context Problems Look Like AI Incompetence

What to Learn Next

Frequently Asked Questions