TL;DR: Database replication keeps copies of your data on multiple servers. One server (the primary) handles all writes. The copies (replicas) handle reads. This makes your app faster for read-heavy workloads and keeps it running if one server dies. When you ask Claude to "scale the database" or "add high availability," replication is usually what it sets up. The biggest gotcha: data written to the primary takes a moment to reach replicas, so a user might create something and not see it immediately. That's called replication lag, and it's the #1 source of confusing bugs in replicated systems.

Why AI Coders Need to Know This

You probably won't set up database replication yourself. Your AI will. Or a managed platform like Supabase or AWS RDS will handle it behind the scenes. So why do you need to understand it?

Because replication creates a category of bugs that look completely insane if you don't know what's happening.

Here's the classic scenario: A user signs up on your app. The app writes their account to the primary database. Then the app immediately tries to read their profile — but from a replica that hasn't received the data yet. The user sees a "Profile not found" error. They refresh the page. Now it works. They think your app is broken. You think the database is broken. Your AI can't reproduce the bug because the timing is different every time.

That's replication lag. And once you know what it is, the fix takes five minutes. Without that knowledge, you could spend days chasing ghosts.

Replication also shows up in conversations about cost. When your PostgreSQL database is handling both reads and writes, and you're running load tests that push it to 80% CPU, the first thing your AI will suggest is read replicas. Understanding what that means — and whether you actually need it — keeps you from spending money on infrastructure you don't need yet.

Real Scenario: "Make My App Handle More Users"

Let's say you built a recipe-sharing app. It's getting popular — a few thousand daily users. Most of them browse recipes (reads). A smaller number create and edit recipes (writes). Your database CPU is climbing during peak hours, and page loads are getting sluggish.

You tell your AI:

Prompt

My PostgreSQL database is running at 78% CPU during peak hours. Most of the load is from users browsing recipes — reads outnumber writes about 20 to 1. Page load times are up to 3 seconds. How can I scale the database to handle more traffic without changing my app architecture?

Your AI comes back with a plan that includes read replicas. Here's the kind of configuration it might generate:

What Your AI Generated

// database.js — Connection setup with read replicas
import { Pool } from 'pg';

// Primary database — handles all writes (INSERT, UPDATE, DELETE)
const primary = new Pool({
  host: 'db-primary.your-app.com',
  port: 5432,
  database: 'recipes_db',
  user: 'app_user',
  password: process.env.DB_PASSWORD,
  max: 20,  // max connections in pool
});

// Read replica — handles all reads (SELECT)
const replica = new Pool({
  host: 'db-replica-1.your-app.com',
  port: 5432,
  database: 'recipes_db',
  user: 'app_user',
  password: process.env.DB_PASSWORD,
  max: 40,  // more connections — reads are the bottleneck
});

// Route queries to the right database
export async function query(text, params, options = {}) {
  // Use primary for writes or when fresh data is critical
  if (options.write || options.freshRead) {
    return primary.query(text, params);
  }
  // Use replica for regular reads
  return replica.query(text, params);
}

// Usage in your app:

// Browsing recipes — reads from replica (fast, no load on primary)
const recipes = await query(
  'SELECT * FROM recipes WHERE published = true ORDER BY created_at DESC LIMIT 20',
  [],
  { write: false }
);

// Creating a recipe — writes to primary
await query(
  'INSERT INTO recipes (title, ingredients, steps, author_id) VALUES ($1, $2, $3, $4)',
  [title, ingredients, steps, userId],
  { write: true }
);

// Reading a recipe the user JUST created — reads from primary
// (replica might not have it yet due to replication lag)
const myNewRecipe = await query(
  'SELECT * FROM recipes WHERE id = $1',
  [newRecipeId],
  { freshRead: true }  // ← This flag forces reading from primary
);

Look at what the AI did here. It created two database connections: one to the primary (for writes) and one to the replica (for reads). Then it built a routing function that decides which one to use. The freshRead: true flag is critical — it tells the app "this data was just written, so read from the primary to avoid lag."

This is the fundamental pattern of database replication in action. Let's break down each part.

Understanding Each Part

Primary and Replica: The Boss and the Copies

In a replicated database setup, one server is the primary (also called the "leader" or "master" in older documentation). This is the single source of truth. Every write — every INSERT, UPDATE, DELETE — goes to the primary.

The other servers are replicas (also called "followers," "standbys," or "read replicas"). They receive a continuous stream of changes from the primary and apply them to their own copy of the data. Think of it like a shared Google Doc: one person is editing, and everyone else sees the changes appear on their screen. Except in this case, there might be a tiny delay before the changes show up.

Why not let replicas accept writes too? Because that creates conflicts. If two servers accept a write to the same row at the same time, which one wins? Resolving those conflicts is a hard computer science problem. The primary/replica pattern avoids it entirely: one server writes, everyone else reads. Simple. Reliable.

💡 Terminology note

You'll see "master/slave" in older docs, Stack Overflow posts, and AI-generated code. The industry has moved to "primary/replica" or "leader/follower." If your AI generates master/slave terminology, it's pulling from outdated training data — the concepts are identical.

Synchronous vs. Asynchronous Replication

This is where things get interesting — and where most confusion lives.

Asynchronous replication (the default for most systems): The primary writes data, immediately tells your app "done!", and then sends the changes to replicas in the background. Fast for your app. But replicas might be a fraction of a second behind.

Synchronous replication: The primary writes data, waits for at least one replica to confirm it received the changes, then tells your app "done!" Slower for your app, but you're guaranteed the data exists on multiple servers before the write is considered complete.

Here's how to think about the tradeoff:

Factor Async Replication Sync Replication
Write speed Fast — doesn't wait for replicas Slower — waits for confirmation
Data safety Small window where data exists only on primary Data confirmed on 2+ servers before success
Replication lag Usually <1 second, can spike Zero — replicas are always current
If replica goes down Primary keeps working normally Primary stalls until replica recovers
Best for Most web apps, read scaling Financial systems, zero-data-loss requirements

Most apps use asynchronous replication. The sub-second lag is acceptable for browsing recipes, reading blog posts, or loading product listings. Synchronous replication is for when losing even one transaction is unacceptable — think bank transfers or medical records.

When you ask your AI to set up replication and don't specify, it will almost always choose asynchronous. That's usually the right call.

Read Replicas: The Workhorses

A read replica is a replica specifically created to handle read traffic. That's its entire job. Your app sends SELECT queries to read replicas and INSERT/UPDATE/DELETE queries to the primary.

Why this matters for performance: In most web apps, reads vastly outnumber writes. A recipe app might have 100 people browsing for every 1 person uploading a recipe. A social media feed might have 1,000 views for every post. An e-commerce site might have 10,000 product page views for every order placed.

By offloading reads to replicas, you free up the primary to focus on writes. You can also add more replicas as traffic grows — two replicas, five replicas, ten replicas — spreading the read load across all of them.

// Example: Round-robin across multiple read replicas
const replicas = [
  new Pool({ host: 'replica-1.your-app.com', /* ... */ }),
  new Pool({ host: 'replica-2.your-app.com', /* ... */ }),
  new Pool({ host: 'replica-3.your-app.com', /* ... */ }),
];

let currentReplica = 0;

function getReadPool() {
  const pool = replicas[currentReplica];
  currentReplica = (currentReplica + 1) % replicas.length;
  return pool;
}

This pattern — called round-robin — distributes reads evenly across replicas. Managed services like AWS RDS and Supabase handle this routing for you, but if your AI is configuring a self-hosted setup, you'll see code like this.

Failover: What Happens When the Primary Dies

Here's where replication becomes about more than just performance — it's about survival. If your only database server crashes, your app goes down. Period. No database, no app.

With replication, you have copies of your data sitting on other servers. If the primary dies, you can promote a replica to become the new primary. This is called failover.

Managed services handle this automatically. AWS RDS will detect a primary failure and promote a replica within minutes. Supabase and Neon have their own failover mechanisms built in. If you're self-hosting, tools like Patroni (for PostgreSQL) automate this process.

Without automatic failover, you're manually promoting a replica at 3 AM while your users tweet about your outage. Configure it ahead of time.

💡 Replication ≠ Backup

A common mistake: treating replicas as backups. They're not. If someone accidentally runs DELETE FROM users on the primary, that delete replicates to every replica within seconds. Now all your copies are missing the same data. You still need separate backups — point-in-time recovery snapshots that let you restore to a specific moment before the mistake happened.

What AI Gets Wrong About Database Replication

AI is good at setting up the basic infrastructure for replication. It struggles with the subtle, real-world issues that make replicated systems tricky. Here's what to watch for:

1. Ignoring Replication Lag in Application Code

This is the big one. Your AI will happily set up a read replica and route all SELECT queries to it. But it often forgets to handle the case where a user writes data and immediately reads it back. The code works in development (where there's no replica) and fails unpredictably in production.

What to tell your AI: "After any write operation, the next read for that same user should come from the primary, not the replica. Implement a strategy to handle replication lag — either sticky reads to primary for 5 seconds after a write, or a freshRead flag."

2. Suggesting Replication Too Early

Your AI might recommend read replicas when your database is barely breaking a sweat. A well-tuned PostgreSQL instance with proper indexes and connection pooling can handle enormous workloads on a single server. Adding replication before you've optimized queries and indexes is like buying a second car because your first one needs an oil change.

What to tell your AI: "Before suggesting read replicas, check if there are missing indexes, unoptimized queries, or connection pooling issues. Only recommend replication if we've exhausted single-server optimizations."

3. Not Configuring Connection Pooling for Replicas

Your AI might create direct connections to replicas without connection pooling. Each replica has a limited number of connections, and without pooling, your app can exhaust them quickly — especially during traffic spikes.

What to tell your AI: "Set up connection pooling (PgBouncer or built-in pool) for both the primary and all read replicas. Max connections should account for the number of app instances connecting."

4. Mixing Up Replication and Sharding

Replication copies all your data to multiple servers. Sharding splits your data across multiple servers. They solve different problems. AI sometimes confuses them or suggests both when only one is needed. Replication helps with read scaling and availability. Sharding helps when your data is too large for one server. Most apps need replication long before they need sharding.

What to tell your AI: "I need read scaling and high availability, not data partitioning. Set up replication, not sharding."

How to Debug Replication Issues with AI

When something goes wrong with replication, the symptoms are often bizarre. Data appears and disappears. The same query returns different results depending on when you run it. Users see different versions of the same page. Here's how to debug the most common issues with your AI's help.

Debugging Replication Lag

Prompt

Users are reporting that after they post a comment, it doesn't appear for a few seconds. I'm using PostgreSQL with async read replicas. Write a query to check the current replication lag on each replica, and show me how to add read-after-write consistency for the user who just posted.

Your AI should give you something like:

-- Check replication lag on the primary
SELECT
  client_addr AS replica_ip,
  state,
  sent_lsn,
  replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS bytes_behind,
  replay_lag
FROM pg_stat_replication;

-- If replay_lag is consistently > 1 second, you have a problem.
-- Common causes:
--   1. Replica hardware is slower than primary
--   2. Heavy write load creating more WAL than replica can process
--   3. Network issues between primary and replica
--   4. Long-running queries on replica blocking replay

For the read-after-write fix, a practical pattern is session-based routing:

// Middleware: after any write, pin user to primary for reads
const STICKY_PRIMARY_MS = 5000; // 5 seconds

async function handleWrite(userId, queryText, params) {
  await primary.query(queryText, params);
  // Mark this user as "just wrote" — use Redis, session, or in-memory cache
  await redis.set(`sticky:${userId}`, '1', 'PX', STICKY_PRIMARY_MS);
}

async function handleRead(userId, queryText, params) {
  const isSticky = await redis.get(`sticky:${userId}`);
  const pool = isSticky ? primary : getReadPool();
  return pool.query(queryText, params);
}

Debugging "Replica Not Receiving Data"

Prompt

My PostgreSQL read replica stopped receiving updates. It's stuck with data from 2 hours ago. The primary is fine. What do I check, and how do I resync the replica without downtime?

Common causes your AI should walk you through:

  • WAL files deleted on primary — The primary purged transaction logs (WAL) before the replica consumed them. Fix: increase wal_keep_size or use replication slots.
  • Network partition — The replica can't reach the primary. Check connectivity, firewalls, security groups.
  • Replica disk full — The replica ran out of disk space and can't apply new changes. Free up space or expand the volume.
  • Replication slot bloat — If using replication slots and the replica was offline for too long, the primary may be holding onto massive amounts of WAL. Monitor slot lag.

Debugging "Writes Succeeding But Not Appearing"

If writes seem to succeed but data doesn't show up, check these in order:

  1. Are you reading from a replica? The data might be on the primary but not replicated yet. Query the primary directly to confirm.
  2. Is the write inside a transaction that hasn't committed? Uncommitted transactions aren't replicated.
  3. Are you checking the right database? If your AI set up multiple environments (dev, staging, production), you might be writing to one and reading from another.

Types of Replication You'll Actually See

Your AI might mention several types. Here's what they mean in practice:

Streaming Replication (PostgreSQL)

The primary streams write-ahead log (WAL) records to replicas in real-time. This is PostgreSQL's built-in replication method and what most managed services use. The replica is an exact byte-for-byte copy of the primary. Fast, reliable, and the default choice for PostgreSQL.

Logical Replication (PostgreSQL)

Instead of streaming raw WAL records, the primary sends logical changes (like "row X was inserted into table Y"). This allows you to replicate specific tables instead of the entire database, and replicas can have different indexes or even different schemas. More flexible but more complex to set up.

Statement-Based vs. Row-Based (MySQL)

MySQL can replicate the SQL statements themselves ("replay this INSERT on the replica") or the actual row changes ("this row was added"). Row-based is the modern default because statement-based can produce different results if the statement uses non-deterministic functions like NOW() or RAND().

What managed services handle for you: If you're using Supabase, Neon, PlanetScale, or AWS RDS, the replication type and configuration are handled behind the scenes. You don't pick between streaming and logical — you click "Add read replica" and the platform makes the right choice. Understanding the types helps when debugging, not when setting up.

When You Actually Need Database Replication

Not every app needs replication. Here's a realistic guide:

You probably DON'T need it if:

  • You have fewer than 10,000 daily active users
  • Your database CPU stays under 60% during peak hours
  • You haven't optimized queries, added indexes, or set up connection pooling yet
  • Brief downtime (minutes) is acceptable for your use case
  • You're still building your MVP

You probably DO need it if:

  • Read traffic is crushing your primary despite optimization
  • You need high availability — your app must stay up even if a server fails
  • You're serving users in multiple geographic regions and need low-latency reads
  • You're running analytics queries that would slow down your production database
  • Your business loses significant money during any database downtime

The progression for most apps looks like this: Single database → Add connection pooling → Optimize queries and indexes → Add read replicas → Consider sharding (rare). Most vibe-coded apps live happily in the first three stages.

What to Learn Next

Frequently Asked Questions

Do I need database replication for a side project or MVP?

Probably not. A single PostgreSQL database can handle thousands of concurrent users and millions of rows without breaking a sweat. Replication adds operational complexity — more servers to monitor, replication lag to debug, connection routing to configure. Start with one database, add connection pooling when it feels slow, and only consider replication when you have clear evidence you need it: read-heavy traffic that maxes out your primary, or a hard uptime requirement where any downtime costs real money. If your AI suggests replication for a brand-new project, push back.

What is replication lag and why should I care?

Replication lag is the delay between when data is written to your primary database and when it appears on the replica. With asynchronous replication, this is typically under one second but can spike during heavy writes. It matters because if a user creates something (like posting a comment) and the app immediately reads from a replica that hasn't received that data yet, the user sees nothing — their comment appears to vanish. This is the single most common replication bug AI-generated code creates.

What is the difference between synchronous and asynchronous replication?

Synchronous replication waits for the replica to confirm it received the data before telling your app the write succeeded. Zero data loss, but slower writes because every write depends on network speed between servers. Asynchronous replication tells your app the write succeeded immediately and sends data to replicas in the background. Faster, but replicas can be slightly behind. Most production systems use async because the speed tradeoff is worth it — the lag is usually under a second.

Can I use database replication with Supabase or Neon?

Yes, but in different ways. Supabase Pro plans include read replicas you can spin up in different regions — your app connects to the nearest one for reads. Neon uses a branching model where you create read-only compute endpoints that share the same storage layer, giving you read scaling without traditional replication. Both platforms handle the replication infrastructure for you. If you're self-hosting PostgreSQL, you configure streaming replication or logical replication manually.

What happens to replicas if the primary database goes down?

Replicas continue serving read traffic but cannot accept writes. To restore full functionality, you promote a replica to become the new primary — this is called failover. Managed services like AWS RDS and Supabase handle automatic failover. If you're self-hosting, you need tools like Patroni or pg_auto_failover to automate promotion. Without automatic failover configured, your app loses write capability until you fix the primary or manually promote a replica.