TL;DR: Rate limiting restricts how many API requests a client can make in a time window — for example, 100 requests per minute per IP address. Requests over the limit get a 429 Too Many Requests response. It protects your API from abuse, bots, DDoS attacks, runaway code bugs, and users who try to extract your entire dataset. AI-generated APIs almost never include it — but they almost always need it.

Why AI Coders Need to Know This

Rate limiting is one of the most commonly missing features in AI-generated APIs. When you ask Cursor or Claude Code to build an API, you typically get clean route handlers, proper error handling, maybe authentication — but rarely rate limiting. The AI optimizes for "working code," not "production-hardened code."

The consequences of missing rate limiting are real:

  • Cost explosions: A misconfigured frontend making infinite retry loops, or a scraper hitting your API-backed database, can rack up thousands of dollars in compute and database costs overnight
  • Availability attacks: A competitor or bot can send enough requests to overwhelm your server, taking your app offline for legitimate users
  • Credential stuffing: Without rate limiting on login endpoints, attackers can try millions of password combinations automatically
  • Data harvesting: Without limits, anyone can call your user or product API in a loop and download your entire dataset

Rate limiting is not optional for any API exposed to the internet. Adding it takes about 10 minutes and prevents these scenarios entirely.

Real Scenario

You built a contact form that sends emails. A bot finds it and submits 10,000 forms in a minute. Your email provider charges per send — you get a $500 bill and your domain gets flagged as a spam source. This is a real scenario that happens to vibe coders constantly.

Prompt I Would Type

Add rate limiting to my Express API:
- Global limit: 200 requests per 15 minutes per IP
- Stricter limit for auth endpoints (login, register): 10 requests per 15 minutes
- AI endpoint (expensive): 20 requests per hour per authenticated user
- Return proper 429 responses with Retry-After headers
- Use Redis for distributed rate limiting (multiple servers)

What AI Generated

// Express with express-rate-limit + Redis store
import rateLimit from 'express-rate-limit';
import { RedisStore } from 'rate-limit-redis';
import { redis } from './redis';  // Your Redis client

// Global rate limiter — applies to all routes
export const globalLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 200,                    // requests per window
  standardHeaders: 'draft-7', // Return RateLimit headers
  legacyHeaders: false,
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args),
  }),
  handler: (req, res) => {
    res.status(429).json({
      error: 'Too many requests',
      retryAfter: Math.ceil(res.getHeader('Retry-After'))
    });
  }
});

// Strict limiter for auth endpoints
export const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10,
  store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }),
  handler: (req, res) => {
    res.status(429).json({
      error: 'Too many login attempts. Please try again in 15 minutes.'
    });
  }
});

// Per-user limiter for expensive endpoints
export const aiLimiter = rateLimit({
  windowMs: 60 * 60 * 1000,  // 1 hour
  max: 20,
  keyGenerator: (req) => req.user?.id || req.ip, // Per user, not per IP
  store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }),
});

// Apply in your app
app.use(globalLimiter);               // All routes
app.post('/auth/login', authLimiter, loginHandler);
app.post('/api/ai/generate', authenticate, aiLimiter, aiHandler);

For Next.js with Upstash (serverless-compatible):

// middleware.ts — rate limit at the edge
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
import { NextRequest, NextResponse } from 'next/server';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, '1 m'), // 100 per minute
  analytics: true,
});

export async function middleware(request: NextRequest) {
  const ip = request.ip ?? '127.0.0.1';
  const { success, limit, remaining, reset } = await ratelimit.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
          'X-RateLimit-Reset': new Date(reset).toISOString(),
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString()
        }
      }
    );
  }

  return NextResponse.next();
}

export const config = {
  matcher: '/api/:path*', // Only rate limit API routes
};

Understanding Each Part

The rate limit window

A rate limit has two parameters: a count and a window. "100 requests per minute" means at most 100 requests in any 60-second rolling window. Two algorithmic approaches:

  • Fixed window: Counts reset at specific times (like the top of every minute). Simple but allows burst attacks at window boundaries.
  • Sliding window: Counts requests in the last N seconds relative to now. Smoother and more fair. Upstash's default.

The 429 status code and Retry-After

When a request exceeds the limit, return HTTP 429 with a Retry-After header. This tells the client when they can try again. Well-behaved clients (and most HTTP libraries) will back off and retry at that time. Without Retry-After, clients often retry immediately in a loop, compounding the problem.

Key generation — who to limit

  • By IP: Default and simplest. Blocks bots and attacks. Can accidentally limit users behind shared IPs (corporate NAT, dorms).
  • By user ID: Fairer for authenticated users. Does not protect unauthenticated endpoints.
  • By API key: Standard for developer APIs. Allows different limits per customer tier.
  • Combined: IP for unauthenticated, user ID for authenticated — the most robust approach.

In-memory vs. Redis store

Default in-memory rate limiting only works if you have one server. If you have multiple instances (horizontal scaling, multiple Lambda invocations), each has its own counter — a user can get 5x the limit by round-robining across servers. Redis stores share the counter across all instances.

What AI Gets Wrong About Rate Limiting

Not adding it at all

The most common mistake: simply not including rate limiting. Always ask explicitly: "Add rate limiting to all API endpoints."

In-memory store for multi-instance deployments

AI uses the default in-memory store (which resets on restart and does not share state). For any deployed app, use a Redis-backed store.

Same limits for all endpoints

One global limit does not fit all endpoints. Login and registration need much stricter limits (10/15min) than a read-only product list endpoint (1000/hour). AI applies the same limit everywhere.

Not returning helpful error messages

AI returns just 429 with no body, or an unhelpful "rate limit exceeded." Include the window, the limit, when they can retry, and (for public APIs) a link to documentation. This is the difference between a frustrating DX and a professional one.

What to Learn Next

Next Step

Add express-rate-limit or Upstash Ratelimit to your next API project before you ship it. Start with a simple global limit of 100 requests per 15 minutes. Then add a stricter 10/15min limit to any authentication endpoints. Two lines of middleware code, enormous protection.

FAQ

Rate limiting controls how many requests a client can make to your API in a time window. Requests over the limit receive a 429 Too Many Requests response. It prevents abuse, protects against automated attacks, controls costs, and ensures fair access for all users.

429 Too Many Requests is the standard status code for rate-limited responses. Include a Retry-After header indicating when the client can try again, and a X-RateLimit-Remaining header showing how many requests remain in the current window.

Both, for different purposes. IP-based limits catch unauthenticated abuse. User-based limits are fairer for authenticated users behind shared IPs. Best practice: strict IP limits for unauthenticated endpoints, more generous per-user limits for authenticated requests.

Yes. Cloudflare's free tier includes basic rate limiting rules at the CDN layer, before requests reach your server. This provides a good first line of defense. Application-level rate limiting (in your Express/Next.js code) adds finer control and works even without Cloudflare.

Rate limiting hard-rejects requests over the limit with a 429 response. Throttling slows requests down by adding delays rather than rejecting them. Rate limiting is the standard approach for web APIs; throttling is more common for background processing and bandwidth-intensive operations.