TL;DR: Rate limiting controls how many requests a user can make to your API in a given time window. Without it, one bad actor (or one broken script) can crash your app, drain your API credits, or brute-force passwords. Implementation options: express-rate-limit for Express apps (5 lines of code), Upstash Ratelimit for serverless/Next.js (works with Vercel), or your hosting platform's built-in tools. Always rate limit auth endpoints, AI/LLM endpoints, and any endpoint that costs you money.

Why AI Coders Need This

You asked AI to build an API. It created beautiful endpoints — user registration, data fetching, maybe even an AI-powered feature that calls OpenAI. It all works. Ship it.

But your AI didn't add rate limiting. Here's what happens next:

  • A bot finds your login endpoint and tries 50,000 password combinations in an hour
  • Someone discovers your /api/generate endpoint and burns through $200 of OpenAI credits overnight
  • A user's broken frontend script sends the same request 100 times per second, crashing your database
  • A competitor's scraper downloads your entire content library in 10 minutes

All of these are real scenarios that happen to unprotected APIs every day. Rate limiting is the seatbelt of API development — you don't think about it until you need it, and by then it's too late.

🎯 Real Scenario

You tell your AI: "Add rate limiting to my Express API — 100 requests per minute per user, and 10 per minute for the login endpoint."

What AI Generated: Express Rate Limiting

Here's the typical output when you ask AI to add rate limiting to an Express app:

// middleware/rateLimiter.js
import rateLimit from 'express-rate-limit';

// General API rate limit: 100 requests per minute
export const apiLimiter = rateLimit({
  windowMs: 60 * 1000,       // 1 minute
  max: 100,                   // 100 requests per window
  standardHeaders: true,      // Return rate limit info in headers
  legacyHeaders: false,       // Disable X-RateLimit-* headers
  message: {
    error: 'Too many requests',
    message: 'Please try again in a minute',
    retryAfter: 60
  }
});

// Strict limit for auth endpoints: 10 per minute
export const authLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 10,
  message: {
    error: 'Too many login attempts',
    message: 'Please try again in a minute',
    retryAfter: 60
  }
});

// Usage in your Express app:
// app.use('/api/', apiLimiter);
// app.use('/api/auth/', authLimiter);

This is actually pretty good. But let's understand each piece so you know what to change.

Understanding Each Part

windowMs — The Time Window

This is how long the counter runs before resetting. 60 * 1000 means 60,000 milliseconds = 1 minute. After each minute, everyone's counter resets to zero. Common windows:

  • 1 minute (60 * 1000) — most common for API endpoints
  • 15 minutes (15 * 60 * 1000) — common for auth endpoints
  • 1 hour (60 * 60 * 1000) — for expensive operations like AI generation

max — The Request Limit

How many requests one client can make during the window. When they hit this number, all subsequent requests get a 429 Too Many Requests response until the window resets.

standardHeaders — Tell Users Their Limits

When true, every response includes headers like:

RateLimit-Limit: 100
RateLimit-Remaining: 73
RateLimit-Reset: 1711382460

This tells the client: "You have 100 requests per window, 73 remaining, and the window resets at this Unix timestamp." Good API citizenship.

IP-Based vs User-Based Limiting

By default, express-rate-limit uses the IP address. This works for most cases, but has a catch: users behind the same corporate VPN or coffee shop WiFi share an IP. For authenticated endpoints, you want to limit by user ID instead:

const userLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  keyGenerator: (req) => {
    // Use user ID if authenticated, fall back to IP
    return req.user?.id || req.ip;
  }
});

Rate Limiting in Next.js and Serverless

express-rate-limit stores counters in memory — which breaks on serverless platforms like Vercel because each function invocation is a separate instance with no shared memory.

For serverless, use Upstash Ratelimit — it stores counters in Redis (Upstash provides a serverless Redis):

// lib/ratelimit.js
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, '1 m'),
  analytics: true,
  prefix: 'api',
});

// Usage in a Next.js API route:
// app/api/generate/route.js
import { ratelimit } from '@/lib/ratelimit';
import { headers } from 'next/headers';

export async function POST(request) {
  const headersList = headers();
  const ip = headersList.get('x-forwarded-for') ?? '127.0.0.1';

  const { success, limit, remaining, reset } = await ratelimit.limit(ip);

  if (!success) {
    return Response.json(
      { error: 'Rate limit exceeded' },
      {
        status: 429,
        headers: {
          'RateLimit-Limit': limit.toString(),
          'RateLimit-Remaining': remaining.toString(),
          'RateLimit-Reset': reset.toString(),
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
        }
      }
    );
  }

  // ... your actual API logic
}
💡 Sliding Window vs Fixed Window

Fixed window: Counter resets at exact intervals (e.g., every minute on the minute). Problem: someone can send 100 requests at 11:59:59 and 100 more at 12:00:01 — 200 requests in 2 seconds.

Sliding window: The window slides with each request. Much smoother — recommended for most use cases. Upstash uses sliding window by default.

What Endpoints to Protect (Priority Order)

Endpoint TypeWhySuggested Limit
Login / SignupBrute force prevention5-10 / minute
Password ResetEmail bombing prevention3-5 / hour
AI/LLM CallsCost protection ($$$)10-20 / hour
Payment EndpointsFraud prevention5-10 / minute
File UploadsStorage/bandwidth abuse10-20 / hour
Search / QueryDatabase protection30-60 / minute
General APIOverall protection100-200 / minute

Start with auth and AI/LLM endpoints. Those are the highest-risk targets. A brute-forced login or an abused OpenAI endpoint can cost you money or compromise your users immediately.

What AI Gets Wrong About Rate Limiting

🚫 Mistake 1: In-Memory Storage on Serverless

AI defaults to express-rate-limit everywhere, including Vercel/Netlify deployments. In-memory stores don't work on serverless — each function invocation is a fresh instance. Use Upstash, Redis, or a database-backed store for serverless environments.

🚫 Mistake 2: Same Limits for Everything

AI often applies one rate limit globally: "100 requests per minute for all endpoints." This means your login endpoint (which should be 5-10/min) has the same limit as your read-heavy listing page (which could safely handle 200/min). Different endpoints need different limits.

🚫 Mistake 3: No Retry-After Header

AI returns a 429 status but doesn't include the Retry-After header telling the client when they can try again. Without it, clients just keep retrying immediately — making the problem worse. Always include Retry-After in your 429 responses.

🚫 Mistake 4: Trusting X-Forwarded-For

AI uses req.headers['x-forwarded-for'] to get the client IP, but this header can be spoofed. Behind a reverse proxy (nginx, Cloudflare), you need to configure which proxy headers to trust. Express has app.set('trust proxy', 1) for this — AI often forgets it.

🚫 Mistake 5: No Rate Limiting on AI/LLM Endpoints

AI builds an endpoint that calls OpenAI or Claude's API but doesn't add rate limiting to it. Each call costs real money. Without limits, one user (or one bot) can run up a $500 bill overnight. Always rate limit endpoints that cost you money per request.

How to Test Your Rate Limiting

Don't ship rate limiting without testing it. Here's a quick way to verify:

# Send 15 rapid requests to your auth endpoint
for i in $(seq 1 15); do
  echo "Request $i: $(curl -s -o /dev/null -w '%{http_code}' \
    -X POST http://localhost:3000/api/auth/login \
    -H 'Content-Type: application/json' \
    -d '{"email":"test@test.com","password":"test"}')"
done

# Expected output (with 10/min limit):
# Request 1-10: 200 (or 401)
# Request 11+: 429

If you see 429 responses after hitting your limit, it's working. If every request returns 200, your rate limiting isn't applied — check your middleware order.

Rate Limiting Tools Compared

ToolBest ForStorageFree Tier
express-rate-limitExpress apps on traditional serversIn-memory (or Redis with adapter)Open source
Upstash RatelimitNext.js, Vercel, serverlessUpstash Redis10K requests/day
Cloudflare Rate LimitingEdge-level protectionCloudflare edge1 rule free
Vercel FirewallVercel-hosted appsVercel edgePro plan
Redis + customFull control, any platformRedisDepends on Redis host

Recommendation for most vibe coders: Use express-rate-limit for Express apps and Upstash for Next.js/serverless. Add Cloudflare on top for DDoS protection if your app gets real traffic.

Frequently Asked Questions

Common defaults: 60-100/min for general API, 5-10/min for auth endpoints, 10-20/hour for expensive operations (AI calls, file uploads). Start conservative — you can always increase later.

Yes. Platform DDoS protection doesn't protect your application logic. Someone can still drain your OpenAI API credits, brute-force logins, or overload your database. You still need application-level rate limiting.

Rate limiting blocks requests that exceed the limit (429 error). Throttling slows them down by adding delays. Most APIs use rate limiting because it's simpler and more predictable.

Both. IP-based for unauthenticated endpoints (login, signup). User/key-based for authenticated endpoints. Users behind the same VPN share an IP, so user-based is more fair for authenticated routes.

HTTP 429 (Too Many Requests) with a Retry-After header and a clear JSON error message. Never silently drop requests — that makes debugging impossible for legitimate users.