TL;DR: A message queue is a to-do list for your server. Instead of making users wait while your app sends emails, processes images, or syncs data, you drop those tasks into a queue and a separate worker handles them in the background. Redis stores the queue. BullMQ manages it. Your users never notice.

Why AI Coders Need to Know This

Here's a pattern you'll see constantly: you ask your AI coding tool to build something that involves slow work — sending emails, generating reports, processing file uploads, hitting external APIs — and it doesn't just do the work inline. It creates a whole separate system with queues, workers, and job processors.

The first time this happens, it feels like overkill. You wanted one email sent. AI gave you an entire background job infrastructure.

But here's the thing: AI is right to do this.

Without a queue, your user clicks "Sign Up" and stares at a loading spinner for 3-8 seconds while your server talks to an email provider, waits for a response, maybe retries if it fails, and only then responds to the user. With a queue, the user clicks "Sign Up," your server drops "send welcome email" into the to-do list, and instantly responds with "You're in!" The email goes out a second later in the background. The user never noticed.

Message queues show up in almost every production app. If you're building with AI tools and deploying real projects, you're going to encounter them. Understanding what they do (not how they work under the hood) means you can:

  • Read AI-generated code without confusion when you see Queue, Worker, and Job classes
  • Debug intelligently when emails aren't sending or background tasks silently fail
  • Make smart decisions about when a queue is necessary and when it's overkill
  • Write better prompts that tell AI exactly what kind of background processing you need

The Mental Model

Imagine a busy restaurant kitchen.

The waiter (your API) takes orders from customers (users). Now, the waiter could walk each order to the kitchen, stand there watching the chef cook, wait for the plate to come out, and then deliver it — but that means no other customers get served while one meal is being prepared.

Instead, the waiter writes the order on a ticket and clips it to the order rail. That rail is the message queue. The waiter immediately goes back to take more orders. Meanwhile, the chef (the worker) pulls tickets off the rail one at a time and cooks each meal. If there's a rush, tickets stack up on the rail but nothing gets lost. If the chef needs a break, the tickets just wait — they're not going anywhere.

That's exactly what a message queue does for your app:

  • The order rail = the queue (stored in Redis)
  • The ticket = a job (the data describing what needs to happen)
  • The waiter = your API (adds jobs to the queue)
  • The chef = the worker (processes jobs from the queue)

The key insight: the waiter and the chef work independently. The waiter doesn't care how long the food takes. The chef doesn't care how many customers are waiting. The rail handles the coordination between them. That's what engineers mean when they say message queues "decouple" systems.

You don't need to memorize that word. Just remember: the thing that takes orders and the thing that does the work don't need to be connected to each other. The queue sits in the middle and handles the handoff.

Real Scenario: You Need Background Email Processing

Let's say you're building a SaaS app. Users sign up, and you need to:

  1. Send a welcome email
  2. Create their account in Stripe for billing
  3. Notify your team in Slack
  4. Generate a personalized onboarding PDF

If you do all of that before responding to the user, signup takes 6-10 seconds. Some of those external services might be slow. Some might fail. Your user is sitting there wondering if your app is broken.

Here's what you'd tell your AI tool:

💬 What you'd type in Cursor / Claude Code:

"I need a background job system for my Express app. When a user signs up, add jobs to a queue for: sending a welcome email via Resend, creating a Stripe customer, posting to a Slack webhook, and generating a PDF. Use BullMQ with Redis. The signup endpoint should return immediately — don't make the user wait for any of this. Include retry logic for failed jobs."

That prompt gives AI everything it needs. You specified the library (BullMQ), the trigger (user signup), the tasks (email, Stripe, Slack, PDF), and the behavior (return immediately, retry on failure). Let's look at what AI would generate.

What AI Generated

Here's realistic code you'd get back from Claude, Cursor, or Windsurf when you ask for a BullMQ setup. This is a simplified version focusing on the core pattern:

File 1: queue.js — Setting up the queue

// queue.js — Create the queue and add jobs to it
import { Queue } from 'bullmq';

// Connect to Redis (where the queue lives)
const connection = {
  host: process.env.REDIS_HOST || 'localhost',
  port: parseInt(process.env.REDIS_PORT) || 6379,
};

// Create a named queue for signup-related tasks
export const signupQueue = new Queue('signup-tasks', {
  connection,
  defaultJobOptions: {
    attempts: 3,              // Retry failed jobs up to 3 times
    backoff: {
      type: 'exponential',    // Wait longer between each retry
      delay: 2000,            // Start with 2 seconds
    },
    removeOnComplete: {
      count: 1000,            // Keep last 1000 completed jobs for debugging
    },
    removeOnFail: {
      count: 5000,            // Keep last 5000 failed jobs so you can investigate
    },
  },
});

// Helper function to add a signup job
export async function queueSignupTasks(userId, email, name) {
  // Add multiple jobs at once — one for each task
  await signupQueue.addBulk([
    {
      name: 'send-welcome-email',
      data: { userId, email, name },
    },
    {
      name: 'create-stripe-customer',
      data: { userId, email, name },
    },
    {
      name: 'notify-slack',
      data: { userId, email, name },
    },
    {
      name: 'generate-onboarding-pdf',
      data: { userId, email, name },
      opts: {
        attempts: 5,          // PDF generation is flaky, give it more retries
      },
    },
  ]);
}

File 2: worker.js — Processing the jobs

// worker.js — The worker that actually does the work
import { Worker } from 'bullmq';
import { sendWelcomeEmail } from './services/email.js';
import { createStripeCustomer } from './services/stripe.js';
import { notifySlack } from './services/slack.js';
import { generateOnboardingPdf } from './services/pdf.js';

const connection = {
  host: process.env.REDIS_HOST || 'localhost',
  port: parseInt(process.env.REDIS_PORT) || 6379,
};

// Create a worker that listens to the 'signup-tasks' queue
const worker = new Worker(
  'signup-tasks',
  async (job) => {
    // Route each job to the right handler based on its name
    switch (job.name) {
      case 'send-welcome-email':
        await sendWelcomeEmail(job.data.email, job.data.name);
        console.log(`✅ Welcome email sent to ${job.data.email}`);
        break;

      case 'create-stripe-customer':
        const customerId = await createStripeCustomer(job.data);
        console.log(`✅ Stripe customer created: ${customerId}`);
        return { customerId };  // Return data gets stored with the job

      case 'notify-slack':
        await notifySlack(`New signup: ${job.data.name} (${job.data.email})`);
        console.log(`✅ Slack notified`);
        break;

      case 'generate-onboarding-pdf':
        const pdfUrl = await generateOnboardingPdf(job.data);
        console.log(`✅ PDF generated: ${pdfUrl}`);
        return { pdfUrl };

      default:
        throw new Error(`Unknown job type: ${job.name}`);
    }
  },
  {
    connection,
    concurrency: 5,  // Process up to 5 jobs at the same time
  }
);

// Log when things go wrong
worker.on('failed', (job, err) => {
  console.error(`❌ Job ${job.name} failed (attempt ${job.attemptsMade}/${job.opts.attempts}):`, err.message);
});

// Log when jobs complete
worker.on('completed', (job) => {
  console.log(`✅ Job ${job.name} completed for user ${job.data.userId}`);
});

// Graceful shutdown — finish current jobs before stopping
process.on('SIGTERM', async () => {
  console.log('Shutting down worker...');
  await worker.close();
  process.exit(0);
});

File 3: routes/signup.js — The API endpoint

// routes/signup.js — The signup endpoint that uses the queue
import express from 'express';
import { queueSignupTasks } from '../queue.js';

const router = express.Router();

router.post('/api/signup', async (req, res) => {
  try {
    const { email, password, name } = req.body;

    // 1. Create the user in your database (this is fast — do it directly)
    const user = await db.users.create({ email, password: hash(password), name });

    // 2. Queue all the slow stuff — don't make the user wait
    await queueSignupTasks(user.id, email, name);

    // 3. Respond immediately — the background jobs will handle the rest
    res.status(201).json({
      message: 'Account created! Check your email for a welcome message.',
      userId: user.id,
    });
  } catch (error) {
    console.error('Signup failed:', error);
    res.status(500).json({ error: 'Signup failed. Please try again.' });
  }
});

export default router;

Understanding Each Part

Let's break down what you just saw. You don't need to memorize any of this — you need to recognize the pattern when AI generates it.

The Queue (queue.js)

Think of this as creating the order rail. The new Queue('signup-tasks', ...) line says "create a named to-do list called signup-tasks and store it in Redis." The name matters because workers use it to know which queue to listen to.

The defaultJobOptions section is where the smart stuff lives:

  • attempts: 3 — If a job fails, try it again up to 3 times. This handles temporary glitches like a Stripe API timeout.
  • backoff: { type: 'exponential', delay: 2000 } — Wait 2 seconds before the first retry, 4 seconds before the second, 8 before the third. This prevents hammering a failing service.
  • removeOnComplete / removeOnFail — Keeps a history of recent jobs so you can debug, but doesn't let Redis fill up with old data forever.

The queueSignupTasks function is the waiter writing tickets. It uses addBulk to drop all four jobs onto the queue at once. Each job has a name (what kind of work) and data (the information needed to do it).

The Worker (worker.js)

This is the chef. The new Worker('signup-tasks', ...) line says "listen to the signup-tasks queue and process whatever comes in." The callback function receives each job and uses a switch statement to route it to the right handler.

Key things to notice:

  • concurrency: 5 — Process up to 5 jobs simultaneously. Like having 5 chefs in the kitchen instead of 1.
  • Event listeners — The worker.on('failed', ...) and worker.on('completed', ...) lines let you log what's happening. This is essential for debugging.
  • Graceful shutdown — The SIGTERM handler makes sure the worker finishes its current jobs before stopping. Without this, you'd lose work in progress when you deploy.

The API Route (routes/signup.js)

This is where the magic pays off. Look at the flow:

  1. Create the user in the database (fast — maybe 50ms)
  2. Queue the background jobs (fast — maybe 10ms to write to Redis)
  3. Respond to the user (total wait: ~60ms)

Without the queue, steps 1-4 would include sending an email (1-3 seconds), hitting Stripe's API (0.5-2 seconds), posting to Slack (0.5-1 second), and generating a PDF (2-5 seconds). That's potentially 11 seconds of waiting on a signup page. With the queue, it's under 100 milliseconds.

How These Files Work Together

This is the part that confuses most people: the worker runs as a separate process. Your API server (Express) runs as one process, and the worker runs as another. They don't talk to each other directly — they both talk to Redis.

In development, you'd typically run two terminal windows:

# Terminal 1 — Your API server
node server.js

# Terminal 2 — Your queue worker
node worker.js

In production, you'd run both as separate services. If you're on Railway, Render, or Fly.io, that means two separate deployments from the same codebase — one for the API, one for the worker.

What AI Gets Wrong

AI will get you 85% of the way with message queues. Here's where it consistently stumbles:

1. Missing the Redis Connection Requirement

AI sometimes generates BullMQ code without mentioning that Redis needs to be running. You'll get an error like ECONNREFUSED 127.0.0.1:6379 and have no idea what it means. It means Redis isn't running. On Mac, it's brew install redis && brew services start redis. On Linux, sudo apt install redis-server. In production, use your cloud provider's managed Redis.

2. No Graceful Shutdown Handling

A lot of AI-generated worker code just creates the worker and calls it done. No SIGTERM handler, no graceful shutdown. This means when you deploy (which kills and restarts your processes), any job mid-execution gets lost or marked as stalled. The code above includes this, but if your AI skips it, tell it: "Add graceful shutdown handling to the worker so it finishes current jobs before stopping."

3. Forgetting That Workers Run Separately

AI will sometimes put the queue and worker in the same file as the API server. This defeats the entire purpose. If the worker is running inside your API process, a slow job still blocks your server's resources. Workers need to be separate processes. If AI puts everything in server.js, tell it: "Split the worker into its own file that runs as a separate process."

4. Over-Engineering Simple Tasks

Not everything needs a queue. If you're sending one email on signup and your entire app has 10 users, a queue is overkill. AI doesn't know your scale. If the task takes under a second and failure isn't critical, just do it inline and move on. You can always add a queue later. Tell AI: "Just send the email directly in the endpoint — I don't need a queue for this yet."

How to Debug with AI

Message queues fail silently. The user clicks "Sign Up," gets a success message, and never receives their welcome email. No error on the frontend. No crash. The job just... failed somewhere in the background. Here's how to debug this with your AI tools.

In Cursor or Windsurf

Open the worker file and ask:

💬 Debug prompt:

"Jobs are being added to the queue but the worker isn't processing them. Add detailed logging to the worker so I can see when jobs are received, started, completed, and failed. Also add a health check endpoint that shows me queue stats — how many jobs are waiting, active, completed, and failed."

This gets you a /api/queue/health endpoint that shows you exactly what's happening inside the queue. BullMQ has built-in methods for this: queue.getWaitingCount(), queue.getActiveCount(), queue.getFailedCount().

In Claude Code (Terminal)

When debugging queue issues, give Claude the full picture:

💬 Debug prompt:

"My BullMQ worker isn't processing jobs. Here's what I know: Redis is running on localhost:6379. Jobs are being added (I can see them with bull-board). The worker starts without errors. But no jobs are being picked up. Check that the queue name matches between the producer and worker, check the Redis connection, and check for stalled jobs."

Common Queue Issues and What to Tell AI

Symptom Likely Cause What to Tell AI
Jobs added but never processed Worker isn't running, or queue names don't match "Check that my worker is listening to the same queue name as the producer"
ECONNREFUSED errors Redis isn't running "How do I start Redis on my machine and verify it's running?"
Jobs stuck in "active" forever Worker crashed mid-job, no stall detection "Add stalled job detection to my BullMQ worker with lockDuration"
Jobs failing silently No error logging on the worker "Add error event listeners to my worker and log the full error with stack trace"
Redis running out of memory Completed/failed jobs never cleaned up "Add removeOnComplete and removeOnFail options to keep Redis from filling up"

The Bull Board Trick

One of the most useful things you can ask AI to add is Bull Board — a visual dashboard for your queues. Tell AI:

💬 Debug prompt:

"Add bull-board to my Express app so I can see a visual dashboard of all my queues at /admin/queues. Show waiting, active, completed, and failed jobs."

This gives you a web page where you can see every job, its status, its data, and any error messages. It's like having a security camera on the kitchen. When something goes wrong, you can see exactly which ticket got dropped.

When to Use a Queue (and When to Skip It)

Use a message queue when:

  • The task takes more than 1-2 seconds (sending emails, calling external APIs, generating files)
  • The user doesn't need the result immediately ("your report will be emailed to you")
  • You need retry logic for unreliable operations (payment processing, webhook delivery)
  • You want to control throughput ("only process 10 images at a time so we don't max out memory")
  • You need scheduled or delayed jobs ("send a follow-up email 24 hours after signup")

Skip the queue when:

  • The task is fast (under 500ms) and the user needs the result now
  • Your app has low traffic and the added complexity isn't worth it
  • You're just learning and need to ship something simple first
  • The task is truly fire-and-forget with no need for retries (basic logging, analytics events)

Remember: you can always add a queue later. Start simple. When you notice your API responses getting slow because of background work, that's when you tell your AI: "Extract the email sending into a BullMQ background job."

How Queues Fit Into the Bigger Picture

Message queues are often the first step toward microservices architecture — but you don't need microservices to use them. A simple Express app with a BullMQ worker is perfectly fine and is how most small-to-medium apps handle background work.

Here's how queues connect to other concepts you'll encounter:

  • Redis — The storage engine behind BullMQ. Redis keeps your queue data in memory, which is why it's so fast. You don't need to learn Redis commands — BullMQ handles all of that.
  • Middleware — Your Express middleware handles the incoming request. The queue handles what happens after. They work at different stages of the request lifecycle.
  • REST APIs — Your API endpoints add jobs to the queue. External APIs (Stripe, Resend, Slack) get called from the worker. Queues sit between your API and the slow external services.
  • Caching — Both caching and queues use Redis, but for different purposes. Caching stores results to avoid re-computing them. Queues store tasks to process them later. Same tool, different jobs.
  • Database-Backed Queues — Don't want to set up Redis? You can use your existing PostgreSQL database as a queue instead. Simpler setup, fewer moving parts — with tradeoffs at scale.

What to Learn Next

Now that you understand what message queues do and why AI reaches for them, here's where to go next:

  • What Is Redis? — Redis is the engine behind BullMQ. Understanding what Redis does (and what happens when it goes down) makes queue debugging much easier.
  • What Is Caching? — Another pattern that uses Redis. If your AI adds both caching and queues, this explains how they work together without stepping on each other.
  • What Are Microservices? — If your app keeps growing and you keep adding more workers, you might eventually need separate services. This article explains when that actually makes sense (spoiler: probably not yet).
  • What Is Middleware? — Understanding middleware helps you see where queues fit in the request flow. Middleware runs during the request. Queues run after it.
  • What Is a REST API? — Your API endpoints are where jobs get added to the queue. If API concepts are still fuzzy, start here.
  • What Are Database-Backed Queues? — If Redis feels like overkill, you can use PostgreSQL as your queue. This article explains when that's a smart tradeoff.

Frequently Asked Questions

What is a message queue in simple terms?

A message queue is a to-do list for your server. Instead of doing slow tasks immediately (like sending emails or processing images), your app drops the task into a queue and moves on. A separate worker process picks up those tasks one at a time and handles them in the background. The user never waits.

Do I need Redis to use message queues?

For BullMQ (the most common Node.js queue library), yes — Redis is required. Redis acts as the storage layer that holds the queue. But you don't need to understand Redis deeply. You just need it installed and running. If you're deploying to production, most cloud providers offer managed Redis (like AWS ElastiCache or Railway's Redis add-on) so you don't have to manage it yourself.

What happens if my queue worker crashes?

This is the beauty of message queues — the jobs don't disappear. Because they're stored in Redis, any job that was in progress when the worker crashed will be marked as "stalled" and can be automatically retried when the worker comes back online. Jobs waiting in the queue are completely safe. This is why queues are more reliable than just running tasks directly in your API.

When should I use a message queue instead of just doing the work directly?

Use a queue when the work takes more than a second or two, when the user doesn't need to see the result immediately, or when you need to limit how many of something happen at once. Common examples: sending emails, generating PDFs, processing uploaded images, syncing data with external APIs, or sending push notifications. If the work is fast and the user needs the result right now, skip the queue.

What is the difference between a message queue and a cron job?

A cron job runs on a schedule — like "every 5 minutes" or "every night at midnight." A message queue processes tasks as they arrive, one at a time. Use cron for scheduled, recurring work (like sending a daily report). Use a queue for event-driven work that happens whenever a user does something (like processing an upload or sending a welcome email after signup). BullMQ actually supports both patterns.