Do I need rate limiting if I'm using Vercel or Netlify?

Yes, but differently. Vercel and Netlify have their own DDoS protection and bandwidth limits, which helps with extreme abuse. But they don't protect your application logic — someone can still hammer your database, exhaust your API quotas (OpenAI, Stripe, etc.), or abuse expensive operations. You still need application-level rate limiting.

Should I rate limit by IP address or by API key?

Both, for different reasons. IP-based limiting protects against anonymous abuse and brute force attacks. API key or user-based limiting protects against individual authenticated users consuming too many resources. Best practice: IP-based for unauthenticated endpoints (login, signup), user/key-based for authenticated endpoints (API calls, data access).

What do I do when someone hits the rate limit?

Return a 429 Too Many Requests status code with a Retry-After header telling them when they can try again. Include a clear error message in the response body. Don't just silently drop requests — that makes debugging impossible for your users. For legitimate users hitting limits, consider offering higher tiers or adjusting limits.

How to Implement API Rate Limiting: A Practical Guide for AI-Built Apps

Q: How many requests per minute should I allow?

It depends on your app, but common defaults: 60 requests/minute for general API access, 5-10/minute for authentication endpoints (login, signup, password reset), 100-200/minute for read-heavy endpoints. Start conservative — you can always increase limits later. It's much harder to add rate limiting after someone has already built a script that relies on unlimited access.

Q: What's the difference between rate limiting and throttling?

Rate limiting blocks requests that exceed the limit — the user gets a 429 error. Throttling slows down requests instead of blocking them — the server adds delays to space out requests. Most APIs use rate limiting (hard block) because it's simpler and more predictable. Throttling is used for things like API gateways that queue requests.

TL;DR: Rate limiting controls how many requests a user can make to your API in a given time window. Without it, one bad actor (or one broken script) can crash your app, drain your API credits, or brute-force passwords. Implementation options: express-rate-limit for Express apps (5 lines of code), Upstash Ratelimit for serverless/Next.js (works with Vercel), or your hosting platform's built-in tools. Always rate limit auth endpoints, AI/LLM endpoints, and any endpoint that costs you money.

Why AI Coders Need This

You asked AI to build an API. It created beautiful endpoints — user registration, data fetching, maybe even an AI-powered feature that calls OpenAI. It all works. Ship it.

But your AI didn't add rate limiting. Here's what happens next:

A bot finds your login endpoint and tries 50,000 password combinations in an hour
Someone discovers your /api/generate endpoint and burns through $200 of OpenAI credits overnight
A user's broken frontend script sends the same request 100 times per second, crashing your database
A competitor's scraper downloads your entire content library in 10 minutes

All of these are real scenarios that happen to unprotected APIs every day. Rate limiting is the seatbelt of API development — you don't think about it until you need it, and by then it's too late.

🎯 Real Scenario

You tell your AI: "Add rate limiting to my Express API — 100 requests per minute per user, and 10 per minute for the login endpoint."

What AI Generated: Express Rate Limiting

Here's the typical output when you ask AI to add rate limiting to an Express app:

// middleware/rateLimiter.js
import rateLimit from 'express-rate-limit';

// General API rate limit: 100 requests per minute
export const apiLimiter = rateLimit({
  windowMs: 60 * 1000,       // 1 minute
  max: 100,                   // 100 requests per window
  standardHeaders: true,      // Return rate limit info in headers
  legacyHeaders: false,       // Disable X-RateLimit-* headers
  message: {
    error: 'Too many requests',
    message: 'Please try again in a minute',
    retryAfter: 60
  }
});

// Strict limit for auth endpoints: 10 per minute
export const authLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 10,
  message: {
    error: 'Too many login attempts',
    message: 'Please try again in a minute',
    retryAfter: 60
  }
});

// Usage in your Express app:
// app.use('/api/', apiLimiter);
// app.use('/api/auth/', authLimiter);

This is actually pretty good. But let's understand each piece so you know what to change.

Understanding Each Part

`windowMs` — The Time Window

This is how long the counter runs before resetting. 60 * 1000 means 60,000 milliseconds = 1 minute. After each minute, everyone's counter resets to zero. Common windows:

1 minute (60 * 1000) — most common for API endpoints
15 minutes (15 * 60 * 1000) — common for auth endpoints
1 hour (60 * 60 * 1000) — for expensive operations like AI generation

`max` — The Request Limit

How many requests one client can make during the window. When they hit this number, all subsequent requests get a 429 Too Many Requests response until the window resets.

`standardHeaders` — Tell Users Their Limits

When true, every response includes headers like:

RateLimit-Limit: 100
RateLimit-Remaining: 73
RateLimit-Reset: 1711382460

This tells the client: "You have 100 requests per window, 73 remaining, and the window resets at this Unix timestamp." Good API citizenship.

IP-Based vs User-Based Limiting

By default, express-rate-limit uses the IP address. This works for most cases, but has a catch: users behind the same corporate VPN or coffee shop WiFi share an IP. For authenticated endpoints, you want to limit by user ID instead:

const userLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  keyGenerator: (req) => {
    // Use user ID if authenticated, fall back to IP
    return req.user?.id || req.ip;
  }
});

Rate Limiting in Next.js and Serverless

express-rate-limit stores counters in memory — which breaks on serverless platforms like Vercel because each function invocation is a separate instance with no shared memory.

For serverless, use Upstash Ratelimit — it stores counters in Redis (Upstash provides a serverless Redis):

// lib/ratelimit.js
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, '1 m'),
  analytics: true,
  prefix: 'api',
});

// Usage in a Next.js API route:
// app/api/generate/route.js
import { ratelimit } from '@/lib/ratelimit';
import { headers } from 'next/headers';

export async function POST(request) {
  const headersList = headers();
  const ip = headersList.get('x-forwarded-for') ?? '127.0.0.1';

  const { success, limit, remaining, reset } = await ratelimit.limit(ip);

  if (!success) {
    return Response.json(
      { error: 'Rate limit exceeded' },
      {
        status: 429,
        headers: {
          'RateLimit-Limit': limit.toString(),
          'RateLimit-Remaining': remaining.toString(),
          'RateLimit-Reset': reset.toString(),
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
        }
      }
    );
  }

  // ... your actual API logic
}

💡 Sliding Window vs Fixed Window

Fixed window: Counter resets at exact intervals (e.g., every minute on the minute). Problem: someone can send 100 requests at 11:59:59 and 100 more at 12:00:01 — 200 requests in 2 seconds.

Sliding window: The window slides with each request. Much smoother — recommended for most use cases. Upstash uses sliding window by default.

What Endpoints to Protect (Priority Order)

Endpoint Type	Why	Suggested Limit
Login / Signup	Brute force prevention	5-10 / minute
Password Reset	Email bombing prevention	3-5 / hour
AI/LLM Calls	Cost protection ($$$)	10-20 / hour
Payment Endpoints	Fraud prevention	5-10 / minute
File Uploads	Storage/bandwidth abuse	10-20 / hour
Search / Query	Database protection	30-60 / minute
General API	Overall protection	100-200 / minute

Start with auth and AI/LLM endpoints. Those are the highest-risk targets. A brute-forced login or an abused OpenAI endpoint can cost you money or compromise your users immediately.

What AI Gets Wrong About Rate Limiting

🚫 Mistake 1: In-Memory Storage on Serverless

AI defaults to express-rate-limit everywhere, including Vercel/Netlify deployments. In-memory stores don't work on serverless — each function invocation is a fresh instance. Use Upstash, Redis, or a database-backed store for serverless environments.

🚫 Mistake 2: Same Limits for Everything

AI often applies one rate limit globally: "100 requests per minute for all endpoints." This means your login endpoint (which should be 5-10/min) has the same limit as your read-heavy listing page (which could safely handle 200/min). Different endpoints need different limits.

🚫 Mistake 3: No Retry-After Header

AI returns a 429 status but doesn't include the Retry-After header telling the client when they can try again. Without it, clients just keep retrying immediately — making the problem worse. Always include Retry-After in your 429 responses.

🚫 Mistake 4: Trusting X-Forwarded-For

AI uses req.headers['x-forwarded-for'] to get the client IP, but this header can be spoofed. Behind a reverse proxy (nginx, Cloudflare), you need to configure which proxy headers to trust. Express has app.set('trust proxy', 1) for this — AI often forgets it.

🚫 Mistake 5: No Rate Limiting on AI/LLM Endpoints

AI builds an endpoint that calls OpenAI or Claude's API but doesn't add rate limiting to it. Each call costs real money. Without limits, one user (or one bot) can run up a $500 bill overnight. Always rate limit endpoints that cost you money per request.

How to Test Your Rate Limiting

Don't ship rate limiting without testing it. Here's a quick way to verify:

# Send 15 rapid requests to your auth endpoint
for i in $(seq 1 15); do
  echo "Request $i: $(curl -s -o /dev/null -w '%{http_code}' \
    -X POST http://localhost:3000/api/auth/login \
    -H 'Content-Type: application/json' \
    -d '{"email":"test@test.com","password":"test"}')"
done

# Expected output (with 10/min limit):
# Request 1-10: 200 (or 401)
# Request 11+: 429

If you see 429 responses after hitting your limit, it's working. If every request returns 200, your rate limiting isn't applied — check your middleware order.

Rate Limiting Tools Compared

Tool	Best For	Storage	Free Tier
express-rate-limit	Express apps on traditional servers	In-memory (or Redis with adapter)	Open source
Upstash Ratelimit	Next.js, Vercel, serverless	Upstash Redis	10K requests/day
Cloudflare Rate Limiting	Edge-level protection	Cloudflare edge	1 rule free
Vercel Firewall	Vercel-hosted apps	Vercel edge	Pro plan
Redis + custom	Full control, any platform	Redis	Depends on Redis host

Recommendation for most vibe coders: Use express-rate-limit for Express apps and Upstash for Next.js/serverless. Add Cloudflare on top for DDoS protection if your app gets real traffic.

Frequently Asked Questions

How many requests per minute should I allow?

Common defaults: 60-100/min for general API, 5-10/min for auth endpoints, 10-20/hour for expensive operations (AI calls, file uploads). Start conservative — you can always increase later.

Do I need rate limiting if I'm on Vercel or Netlify?

Yes. Platform DDoS protection doesn't protect your application logic. Someone can still drain your OpenAI API credits, brute-force logins, or overload your database. You still need application-level rate limiting.

What's the difference between rate limiting and throttling?

Rate limiting blocks requests that exceed the limit (429 error). Throttling slows them down by adding delays. Most APIs use rate limiting because it's simpler and more predictable.

Should I rate limit by IP or by API key?

Both. IP-based for unauthenticated endpoints (login, signup). User/key-based for authenticated endpoints. Users behind the same VPN share an IP, so user-based is more fair for authenticated routes.

What should I return when someone hits the limit?

HTTP 429 (Too Many Requests) with a Retry-After header and a clear JSON error message. Never silently drop requests — that makes debugging impossible for legitimate users.

What to Learn Next

🛡️ What Is Rate Limiting? (The Concept) ⚙️ What Is Middleware? 🔐 What Is Authentication? 🔒 API Security Guide ⚡ What Is Caching?

How to Implement API Rate Limiting: Protect Your AI-Built App Before Someone Hammers It

Why AI Coders Need This

What AI Generated: Express Rate Limiting

Understanding Each Part

windowMs — The Time Window

max — The Request Limit

standardHeaders — Tell Users Their Limits

IP-Based vs User-Based Limiting

Rate Limiting in Next.js and Serverless

What Endpoints to Protect (Priority Order)

What AI Gets Wrong About Rate Limiting

How to Test Your Rate Limiting

Rate Limiting Tools Compared

Frequently Asked Questions

What to Learn Next

`windowMs` — The Time Window

`max` — The Request Limit

`standardHeaders` — Tell Users Their Limits