TL;DR: Some tasks are too slow for a web request — sending emails, processing images, calling AI APIs, generating PDFs. If you do them inside your API request handler, the user waits and your server is blocked. Queues let you hand off slow work to a background process: accept the request instantly, do the heavy lifting separately, and notify the user when it's done. Think of it as a to-do list your server works through at its own pace.
Why AI Coders Need This
You asked AI to build a feature: "When users upload a profile photo, resize it to three sizes and store them in S3." AI writes the code. It works. But it does everything inside the upload endpoint. The user clicks upload, their browser spins for 12 seconds while the server resizes images, and if the connection drops halfway through, the whole thing fails silently.
Or worse: you built an AI-powered feature. The user clicks "Summarize this document." Your app sends the document to OpenAI, waits 30 seconds for a response, processes the result, saves it to the database, and finally responds. During those 30 seconds, that server thread is blocked. Ten users do this at once? Your server is handling nothing else.
This is the single biggest performance problem in AI-built apps: doing slow work inside HTTP request handlers. Queues solve it by separating "accepting the request" from "doing the work."
The user clicks a button. Your server says "Got it, we're working on it" in 200 milliseconds. A separate background worker picks up the job and processes it. When it's done, the user gets notified. The server stays fast, the user stays happy, and nothing blocks.
Real Scenario: "Process User Uploads in the Background"
You tell your AI: "When a user uploads an image, process it in the background — resize to thumbnail, medium, and large, upload all three to S3, and update the database with the URLs."
Without queues, AI writes something like this:
// ❌ Bad: Everything happens during the request
app.post('/upload', async (req, res) => {
const file = req.file;
// These three operations take 8-15 seconds total
const thumbnail = await sharp(file.buffer).resize(150, 150).toBuffer();
const medium = await sharp(file.buffer).resize(600, 600).toBuffer();
const large = await sharp(file.buffer).resize(1200, 1200).toBuffer();
await s3.upload('thumbnails/' + file.name, thumbnail);
await s3.upload('medium/' + file.name, medium);
await s3.upload('large/' + file.name, large);
await db.query('UPDATE users SET avatar_urls = $1 WHERE id = $2',
[{ thumbnail: '...', medium: '...', large: '...' }, req.userId]);
res.json({ message: 'Upload complete' }); // User waited 15 seconds for this
});
The user stares at a loading spinner for 15 seconds. If their connection drops at second 12, they get an error even though the processing was almost done. If 50 users upload at the same time, your server grinds to a halt.
What AI Should Generate: BullMQ with Redis
Here's the same feature with a queue. The upload endpoint responds immediately, and a background worker handles the slow processing:
// Step 1: Install dependencies
// npm install bullmq ioredis
// Step 2: Create the queue (shared config)
// queue.js
import { Queue } from 'bullmq';
import IORedis from 'ioredis';
const connection = new IORedis({ host: '127.0.0.1', port: 6379 });
export const imageQueue = new Queue('image-processing', { connection });
// Step 3: The API endpoint — responds instantly
// routes/upload.js
import { imageQueue } from '../queue.js';
app.post('/upload', async (req, res) => {
const file = req.file;
// Save the raw file temporarily
const tempPath = await saveToTempStorage(file);
// Add a job to the queue — this takes milliseconds
await imageQueue.add('process-image', {
userId: req.userId,
fileName: file.originalname,
tempPath: tempPath,
}, {
attempts: 3, // Retry up to 3 times
backoff: { type: 'exponential', delay: 2000 }, // Wait longer between retries
removeOnComplete: 100, // Keep last 100 completed jobs
removeOnFail: 500, // Keep last 500 failed jobs for debugging
});
// Respond immediately — user isn't waiting
res.json({ message: 'Upload received! Processing in the background.' });
});
// Step 4: The worker — runs separately, processes jobs
// worker.js
import { Worker } from 'bullmq';
import IORedis from 'ioredis';
import sharp from 'sharp';
const connection = new IORedis({ host: '127.0.0.1', port: 6379 });
const worker = new Worker('image-processing', async (job) => {
const { userId, fileName, tempPath } = job.data;
console.log(`Processing image for user ${userId}: ${fileName}`);
// Do all the slow work here — nobody is waiting
const fileBuffer = await readFromTempStorage(tempPath);
const thumbnail = await sharp(fileBuffer).resize(150, 150).toBuffer();
const medium = await sharp(fileBuffer).resize(600, 600).toBuffer();
const large = await sharp(fileBuffer).resize(1200, 1200).toBuffer();
const urls = {
thumbnail: await s3.upload('thumbnails/' + fileName, thumbnail),
medium: await s3.upload('medium/' + fileName, medium),
large: await s3.upload('large/' + fileName, large),
};
await db.query('UPDATE users SET avatar_urls = $1 WHERE id = $2',
[urls, userId]);
// Clean up temp file
await deleteTempFile(tempPath);
return urls; // Stored as the job result
}, { connection, concurrency: 5 }); // Process 5 jobs at once
worker.on('completed', (job) => {
console.log(`Job ${job.id} completed for user ${job.data.userId}`);
});
worker.on('failed', (job, err) => {
console.error(`Job ${job.id} failed: ${err.message}`);
});
The upload endpoint now responds in under 200 milliseconds. The worker runs as a separate process — it can even run on a different server. If the worker crashes, Redis still holds the job, and it gets processed when the worker restarts.
Understanding Each Part
Producer
The code that creates jobs. In the example above, the upload endpoint is the producer — it adds jobs to the queue. A producer doesn't do the heavy work. It says "here's a task that needs doing" and moves on. Your API endpoints are typically producers.
Queue
The queue itself is just an ordered list of jobs waiting to be processed. Think of a restaurant's order ticket rail — orders come in, get lined up, and the kitchen works through them in order. In BullMQ, the queue is backed by Redis, which means jobs survive server restarts.
Job
A single unit of work. Each job has a name (like 'process-image'), a data payload (the information the worker needs), and configuration (retry count, priority, delay). Jobs are serialized as JSON and stored in Redis until a worker picks them up.
Worker (Consumer)
The code that processes jobs. Workers run separately from your API server — often as a completely different process or even on a different machine. They pull jobs from the queue, do the work, and report the result. You can run multiple workers to process jobs in parallel.
Retries and Backoff
Real-world operations fail. APIs time out. S3 has a hiccup. A retry strategy tells the worker: "If this fails, try again in 2 seconds. Then 4 seconds. Then 8 seconds." This is called exponential backoff — each retry waits longer, giving the external service time to recover instead of hammering it.
// Retry configuration in BullMQ
await imageQueue.add('process-image', jobData, {
attempts: 5, // Try 5 times total
backoff: { type: 'exponential', delay: 1000 }, // 1s, 2s, 4s, 8s, 16s
});
Dead Letter Queue (DLQ)
When a job fails all its retries, it has to go somewhere. A dead letter queue is where failed jobs land so you can inspect them, figure out what went wrong, and either fix the issue and replay them or discard them. Without a DLQ, failed jobs vanish into the void and you never know something broke.
In BullMQ, failed jobs are automatically kept (configured by removeOnFail) and you can access them through the dashboard or API. Some teams set up alerts that fire when the dead letter queue grows — that's a signal something is systematically wrong.
Queue Tools Compared
Not every project needs the same queue setup. Here's an honest comparison of the most common tools AI might suggest:
| Tool | Serverless-Friendly | Free Tier | Best For |
|---|---|---|---|
| BullMQ | ❌ Needs a running worker | ✅ Open source (needs Redis) | Node.js apps with a server — the most popular choice for self-hosted queues |
| Inngest | ✅ Built for serverless | ✅ Generous free tier | Serverless apps on Vercel/Netlify — no worker management needed |
| Trigger.dev | ✅ Serverless-first | ✅ Free tier available | Complex background job workflows with observability built in |
| AWS SQS | ✅ Fully managed | ✅ 1M requests/month free | AWS-heavy stacks — pairs with Lambda for processing |
| Quirrel | ✅ Designed for Next.js | ✅ Open source / self-hostable | Next.js apps that need simple delayed/scheduled jobs |
Which one should you pick? If you have a Node.js server running 24/7 and already use Redis, go BullMQ. If you're on Vercel or Netlify and don't want to manage infrastructure, Inngest or Trigger.dev. If you're deep in AWS, SQS + Lambda. Don't overthink it — any queue is better than no queue.
What AI Gets Wrong About Queue Processing
AI generates the queue and worker but forgets to configure retries. The default in most libraries is zero retries — if the job fails once, it's gone. External APIs fail, databases have hiccups, networks blip. Fix: "Add retry configuration with exponential backoff — at least 3 attempts." Every job that calls an external service needs retries.
AI doesn't set up a dead letter queue or any monitoring for failed jobs. So when a job exhausts its retries, it silently disappears. You don't know users' uploads failed. You don't know emails weren't sent. Fix: "Configure failed jobs to be kept in a dead letter queue, and add logging or alerts when jobs fail permanently."
This is the most common mistake. AI puts the slow work directly in the HTTP endpoint and calls it "done." No queue at all. The user waits, the server blocks, and everything falls apart under load. Fix: When you see await slowOperation() inside an app.post() or app.get(), ask: "Move this to a background worker with a queue. The endpoint should just add a job and respond immediately."
User double-clicks the button. Two identical jobs get added to the queue. Now the user gets two emails, two charges, or two processed images. AI almost never handles this. Fix: "Add a unique job ID based on the user ID and action, so duplicate submissions are ignored." BullMQ supports jobId for exactly this:
await imageQueue.add('process-image', jobData, {
jobId: `upload-${userId}-${fileName}`, // Same ID = same job, no duplicates
});
AI sets up the worker but doesn't add event listeners for failures. Jobs fail silently. No logging, no alerts, no dashboard. Fix: "Add worker event handlers for 'failed' and 'error' events with proper logging." At minimum, log failed jobs. Better: send alerts to Slack or email when the failure rate spikes.
When You Need a Queue vs. When You Don't
Not everything belongs in a queue. Here's an honest guide:
✅ You Need a Queue When:
- The task takes more than 2-3 seconds — AI API calls, image/video processing, PDF generation, large file uploads
- The user doesn't need the result immediately — "We'll email you when your report is ready" is perfectly acceptable
- The task can fail and needs retrying — sending emails, calling third-party APIs, webhook delivery
- You need to process things in order — financial transactions, sequential pipeline steps
- You need to limit how many things run at once — rate-limited APIs, expensive compute tasks
- Multiple users trigger the same expensive operation — batch processing, scheduled reports
❌ You Don't Need a Queue When:
- The operation takes under a second — basic CRUD, simple database queries, reading from cache
- The user needs the result to continue — login authentication, checking permissions, loading page data
- You're the only user — personal projects or tools where blocking doesn't matter
- You don't have Redis or a queue service set up — for an MVP, it's OK to process inline and add queues when you hit scale problems
The pragmatic approach: Build it without queues first. When you notice requests taking too long or the server getting slow under load, identify the slow operations and move them to a queue. You don't need to architect for Netflix scale on day one.
How the Pieces Connect
Here's the mental model: your app has two sides now.
The fast side (API server): Receives requests, validates input, adds jobs to the queue, responds to the user. Every response takes under a second.
The slow side (workers): Pulls jobs from the queue, does the heavy lifting (AI calls, image processing, emails), records results. Takes as long as it needs — nobody is waiting.
They communicate through the queue, which is backed by Redis. The API server adds jobs. Workers consume jobs. Redis holds the jobs safely in between. If the worker crashes, the jobs wait in Redis until the worker comes back.
This separation is powerful. You can scale each side independently. Need to handle more users? Add more API servers. Need to process jobs faster? Add more workers. Need to survive a worker crash? Redis holds the jobs until the worker restarts. This is the foundation of every scalable production system.
Frequently Asked Questions
Not always. BullMQ requires Redis as its backing store, but serverless tools like Inngest and Trigger.dev handle the infrastructure for you. AWS SQS is fully managed by Amazon. If you already use Redis for caching, BullMQ is a natural fit. If you want zero infrastructure, go serverless.
A cron job runs on a schedule — every hour, every midnight. A queue processes jobs on demand — when something happens (user uploads a file, places an order). Use cron for scheduled tasks like daily reports. Use queues for event-driven tasks like processing uploads or sending transactional emails. Many background job libraries support both.
Yes, but not BullMQ directly — serverless functions are short-lived and can't run long-polling workers. Use serverless-friendly tools like Inngest, Trigger.dev, or AWS SQS with Lambda. These are designed for the serverless model where you don't have a persistent server running.
A well-configured queue retries the job automatically — usually with exponential backoff (wait 1 second, then 2, then 4, then 8). After a maximum number of retries, the job moves to a dead letter queue where you can inspect it, fix the issue, and replay it. Without retry configuration, a failed job just disappears silently — which is why always configure retries.
Common patterns: WebSockets or Server-Sent Events push a real-time notification to the browser. Polling — the frontend checks a status endpoint every few seconds. Email or in-app notification when complete. Webhooks if another service needs to know. The simplest approach for most apps is polling a job status endpoint.