What Is Database Seeding? Why AI Always Generates That Fake Data File

TL;DR: Database seeding is pre-populating your database with starter data — test users, sample products, default categories. AI generates seed files because an empty database is useless for development. You can't test a product listing page with zero products. Seed files fill your database with enough data to actually build and test your app. The important things to know: seeds add data, migrations change structure. Seeds should be safe to run multiple times (use upsert, not insert). And never, ever deploy test seed data to production.

The Scenario Every Vibe Coder Recognizes

You open Claude, Cursor, or ChatGPT. You type something like:

Prompt You Typed

Build me a full-stack e-commerce app with Next.js and Prisma.
Include products, users, and orders.
Use PostgreSQL for the database.

AI generates a whole project. Schema file, API routes, React components — the works. And buried in there, you notice a file you didn't ask for:

prisma/
  schema.prisma
  seed.ts          ← What is this?

You open seed.ts and find something like this:

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function main() {
  // Create test users
  await prisma.user.createMany({
    data: [
      { name: 'Alice Johnson', email: 'alice@example.com', role: 'ADMIN' },
      { name: 'Bob Smith', email: 'bob@example.com', role: 'USER' },
      { name: 'Carol Williams', email: 'carol@example.com', role: 'USER' },
    ],
  });

  // Create sample products
  await prisma.product.createMany({
    data: [
      { name: 'Wireless Headphones', price: 79.99, category: 'Electronics', inStock: true },
      { name: 'Leather Notebook', price: 24.99, category: 'Office', inStock: true },
      { name: 'Running Shoes', price: 129.99, category: 'Sports', inStock: false },
    ],
  });

  console.log('Database seeded successfully!');
}

main()
  .catch(console.error)
  .finally(() => prisma.$disconnect());

Three fake users. Three fake products. None of them real. And you're wondering: why did AI create a file full of made-up data?

What Database Seeding Actually Means

Seeding is pre-populating your database with initial data. Think of it like planting seeds in a garden — you're putting something in the ground so that something useful grows.

When you create a database and run your migrations, you get empty tables. The structure exists — users table, products table, orders table — but there's no data in any of them. Zero rows. Nothing.

An empty database is like a spreadsheet with headers but no rows. You can't test anything. You can't see what your product page looks like with actual products. You can't check if your login flow works because there are no users to log in as. You can't see if your dashboard calculates totals correctly because there are no orders to total.

A seed file solves this. It inserts a known set of data into your database so you have something to work with immediately. One command, and your empty tables are populated with users, products, categories — whatever your app needs to actually function during development.

Why AI Always Generates a Seed File

AI generates seed files for the same reason a good mentor would tell you to create one: an empty database is useless for development.

Think about what happens without seed data:

You build a product listing page. You open it in your browser. It's blank. You have to manually go into your database and insert products one by one before you can even see if your page layout works.
You build a user dashboard. You need to create a user, log in, create some orders, just to see your own dashboard. Every time you reset your database, you start over.
You build a search feature. You need data to search through. Without seed data, you're searching an empty void.

AI has been trained on thousands of codebases that include seed files. It's a recognized best practice in every major framework. Prisma has a built-in seed command. Knex has a seed generator. Drizzle projects commonly include a seed script. AI generates the seed file because it's part of a complete, functional project setup.

The AI is actually doing you a favor. Without that seed file, you'd spend your first 30 minutes of development manually inserting test data instead of actually building features.

Two Types of Seed Data (and They're Very Different)

Not all seed data is created equal. There are two fundamentally different kinds, and confusing them is one of the most common mistakes:

1. Development/Test Data

This is fake data that only exists so you can build and test your app. It should never appear in production.

// Development seed data — fake, disposable, for testing only
const testUsers = [
  { name: 'Alice Johnson', email: 'alice@example.com', role: 'USER' },
  { name: 'Bob Smith', email: 'bob@example.com', role: 'USER' },
  { name: 'Test Admin', email: 'admin@example.com', role: 'ADMIN' },
];

This data lets you click around your app, test different user roles, see what a page looks like with content. When you're done developing, this data gets thrown away.

2. Production Defaults

This is real data that your app needs to function — even in production. Things like:

// Production seed data — real defaults your app requires
const defaultRoles = [
  { name: 'ADMIN', description: 'Full system access' },
  { name: 'EDITOR', description: 'Can create and edit content' },
  { name: 'VIEWER', description: 'Read-only access' },
];

const defaultCategories = [
  { name: 'Electronics', slug: 'electronics' },
  { name: 'Clothing', slug: 'clothing' },
  { name: 'Home & Garden', slug: 'home-garden' },
];

Your app might have a dropdown menu that lists categories. If the categories table is empty, the dropdown is empty. These seeds are part of your app's functionality — they need to exist in every environment, including production.

The smart approach is to separate these into different files:

prisma/
  seeds/
    production.ts    ← roles, categories, default settings (runs everywhere)
    development.ts   ← fake users, sample products (only local/staging)

How Seed Files Work in Practice

Every database toolkit handles seeding slightly differently, but the concept is always the same: a script that connects to your database and inserts data. Here's how the major tools do it.

Prisma Seed

Prisma has built-in seeding support. You define a seed script in your package.json:

// package.json
{
  "prisma": {
    "seed": "ts-node prisma/seed.ts"
  }
}

// prisma/seed.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();

async function main() {
  // Upsert ensures running twice won't create duplicates
  const admin = await prisma.user.upsert({
    where: { email: 'admin@yourapp.com' },
    update: {},  // if exists, do nothing
    create: {
      name: 'Admin User',
      email: 'admin@yourapp.com',
      role: 'ADMIN',
    },
  });

  // Create sample products
  const products = [
    { name: 'Wireless Headphones', price: 79.99, category: 'Electronics' },
    { name: 'Leather Notebook', price: 24.99, category: 'Office' },
    { name: 'Running Shoes', price: 129.99, category: 'Sports' },
  ];

  for (const product of products) {
    await prisma.product.upsert({
      where: { name: product.name },
      update: { price: product.price },
      create: product,
    });
  }

  console.log('Seeded:', { admin, productCount: products.length });
}

main()
  .catch(console.error)
  .finally(() => prisma.$disconnect());

Run it with:

npx prisma db seed

Prisma also runs this automatically after npx prisma migrate reset, which is handy — when you reset your database during development, the seed data comes right back.

Knex Seed

Knex has a dedicated seed system with its own directory and CLI commands:

# Generate a new seed file
npx knex seed:make 01_users

# Run all seeds
npx knex seed:run

// seeds/01_users.js
exports.seed = async function(knex) {
  // Clear existing data first (careful — development only!)
  await knex('users').del();

  // Insert seed data
  await knex('users').insert([
    { id: 1, name: 'Alice Johnson', email: 'alice@example.com', role: 'admin' },
    { id: 2, name: 'Bob Smith', email: 'bob@example.com', role: 'user' },
    { id: 3, name: 'Carol Williams', email: 'carol@example.com', role: 'user' },
  ]);
};

// seeds/02_products.js
exports.seed = async function(knex) {
  await knex('products').del();

  await knex('products').insert([
    { id: 1, name: 'Wireless Headphones', price: 79.99, category: 'Electronics' },
    { id: 2, name: 'Leather Notebook', price: 24.99, category: 'Office' },
  ]);
};

Knex runs seed files alphabetically, which is why they're usually prefixed with numbers (01_, 02_). This matters when tables have foreign key relationships — you need to seed the users table before the orders table, because orders reference users.

Drizzle Seed

Drizzle doesn't have a built-in seed command, so most projects use a custom script:

// src/db/seed.ts
import { db } from './index';
import { users, products } from './schema';

async function seed() {
  console.log('Seeding database...');

  await db.insert(users).values([
    { name: 'Alice Johnson', email: 'alice@example.com', role: 'admin' },
    { name: 'Bob Smith', email: 'bob@example.com', role: 'user' },
  ]).onConflictDoNothing();  // ← idempotent! Won't fail if data exists

  await db.insert(products).values([
    { name: 'Wireless Headphones', price: 79.99, category: 'Electronics' },
    { name: 'Leather Notebook', price: 24.99, category: 'Office' },
  ]).onConflictDoNothing();

  console.log('Seeding complete!');
  process.exit(0);
}

seed().catch(console.error);

Then add a script to your package.json:

{
  "scripts": {
    "db:seed": "tsx src/db/seed.ts"
  }
}

And run it with npm run db:seed.

Raw SQL Seeds

You don't need a fancy ORM to seed a database. A plain SQL file works just as well:

-- seeds/seed.sql
INSERT INTO users (name, email, role) VALUES
  ('Alice Johnson', 'alice@example.com', 'admin'),
  ('Bob Smith', 'bob@example.com', 'user'),
  ('Carol Williams', 'carol@example.com', 'user')
ON CONFLICT (email) DO NOTHING;  -- PostgreSQL upsert

INSERT INTO products (name, price, category) VALUES
  ('Wireless Headphones', 79.99, 'Electronics'),
  ('Leather Notebook', 24.99, 'Office'),
  ('Running Shoes', 129.99, 'Sports')
ON CONFLICT (name) DO NOTHING;

# Run it directly
psql -d your_database -f seeds/seed.sql

Seed vs Migration: The Critical Difference

This trips up a lot of people. Migrations and seeds are both files that run against your database, but they do completely different things:

	Migrations	Seeds
What they change	Database structure (tables, columns, indexes)	Database data (rows, records)
Example	`CREATE TABLE users`, `ALTER TABLE ADD COLUMN`	`INSERT INTO users VALUES (...)`
Run order	Sequential — each builds on the last	Can usually re-run independently
Run in production?	Yes — every migration runs in every environment	Carefully — only production defaults, not test data
Reversible?	Usually has an "undo" (down migration)	Usually just delete/re-insert

Think of it this way: migrations build the house, seeds furnish it. You need the rooms (tables) to exist before you can put furniture (data) in them. That's why you always run migrations first, then seeds.

# The correct order — always
npx prisma migrate dev    # 1. Create/update tables (structure)
npx prisma db seed        # 2. Fill tables with data (content)

Faker.js: Why AI Generates Realistic-Looking Fake Data

Sometimes AI doesn't just hardcode names like "Alice" and "Bob." Instead, it pulls in a library called Faker.js (now @faker-js/faker) to generate realistic-looking data programmatically:

import { faker } from '@faker-js/faker';
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function main() {
  // Generate 50 realistic fake users
  const users = Array.from({ length: 50 }, () => ({
    name: faker.person.fullName(),          // "Margaret O'Brien"
    email: faker.internet.email(),           // "margaret.obrien42@gmail.com"
    phone: faker.phone.number(),             // "(555) 123-4567"
    address: faker.location.streetAddress(), // "742 Evergreen Terrace"
    city: faker.location.city(),             // "Portland"
    state: faker.location.state(),           // "Oregon"
    bio: faker.lorem.paragraph(),            // Realistic paragraph of text
    avatarUrl: faker.image.avatar(),         // URL to a placeholder avatar
    createdAt: faker.date.past({ years: 2 }), // Random date in last 2 years
  }));

  await prisma.user.createMany({ data: users });
  console.log(`Seeded ${users.length} users`);

  // Generate 100 realistic products
  const products = Array.from({ length: 100 }, () => ({
    name: faker.commerce.productName(),       // "Handcrafted Wooden Table"
    price: parseFloat(faker.commerce.price()), // 42.99
    description: faker.commerce.productDescription(),
    category: faker.commerce.department(),     // "Electronics"
    sku: faker.string.alphanumeric(8).toUpperCase(), // "X7K2M9P1"
  }));

  await prisma.product.createMany({ data: products });
  console.log(`Seeded ${products.length} products`);
}

main()
  .catch(console.error)
  .finally(() => prisma.$disconnect());

Why use Faker instead of typing "Test User 1, Test User 2"?

Realistic UI testing. "Margaret O'Brien" shows you what your layout looks like with a real-length name. "T1" doesn't.
Edge cases surface naturally. Faker generates names with apostrophes (O'Brien), accented characters (José), and long names that might overflow your UI.
Volume. You can generate 1,000 products with one loop. Try typing 1,000 product names by hand.
No personal data risk. Faker data looks real but isn't. No one's actual email ends up in your test database.

AI uses Faker because it's the standard library for this. If you see @faker-js/faker in your seed file, that's a sign the AI is doing it right.

Making Seeds Idempotent (The Most Important Best Practice)

Here's the word that trips everyone up: idempotent. It means "safe to run multiple times with the same result." An idempotent seed file gives you the same database state whether you run it once, twice, or twenty times.

Why does this matter? Because during development, you'll run seeds a lot. You'll reset your database, tweak your schema, re-run migrations, and need seed data again. If your seed file isn't idempotent, running it twice creates duplicate data:

// ❌ NOT idempotent — running twice = duplicate users
await prisma.user.create({
  data: { name: 'Admin', email: 'admin@app.com', role: 'ADMIN' },
});
// First run: 1 admin user. Second run: ERROR (duplicate email) or 2 admin users.

// ✅ Idempotent — running twice = same result
await prisma.user.upsert({
  where: { email: 'admin@app.com' },
  update: {},  // exists? do nothing
  create: { name: 'Admin', email: 'admin@app.com', role: 'ADMIN' },
});
// First run: 1 admin user. Second run: still 1 admin user. Safe.

Different tools handle this differently:

// Prisma: upsert
await prisma.user.upsert({
  where: { email: 'admin@app.com' },
  update: {},
  create: { name: 'Admin', email: 'admin@app.com' },
});

// Drizzle: onConflictDoNothing
await db.insert(users).values({ name: 'Admin', email: 'admin@app.com' })
  .onConflictDoNothing();

// Raw SQL: ON CONFLICT
INSERT INTO users (name, email) VALUES ('Admin', 'admin@app.com')
  ON CONFLICT (email) DO NOTHING;

// Knex: typically delete-then-insert (less ideal)
await knex('users').del();
await knex('users').insert([...]);

The Knex pattern of delete-everything-first works for development but is obviously dangerous anywhere near production. Prefer upsert-style seeds whenever possible.

What AI Gets Wrong About Database Seeding

⚠️ Seed mistakes can be costly. A bad seed file run against a production database can insert test data alongside real data, overwrite actual records, or expose hardcoded credentials. Always review what AI generates.

1. Hardcoding Passwords in Seed Files

This is the most dangerous mistake. AI frequently generates seed files with plaintext passwords:

// ❌ AI does this ALL THE TIME
await prisma.user.create({
  data: {
    name: 'Admin',
    email: 'admin@app.com',
    password: 'admin123',     // Plaintext password in source code!
    role: 'ADMIN',
  },
});

This password will end up in your Git repository, visible to anyone with access. Even for development, passwords should be hashed:

// ✅ Hash passwords, even in seed files
import bcrypt from 'bcrypt';

await prisma.user.create({
  data: {
    name: 'Admin',
    email: 'admin@app.com',
    password: await bcrypt.hash('admin123', 10),  // Store the hash, not the plaintext
    role: 'ADMIN',
  },
});

Better yet, use environment variables for seed passwords so they're not in your code at all:

password: await bcrypt.hash(process.env.SEED_ADMIN_PASSWORD || 'dev-only-password', 10),

2. Not Making Seeds Idempotent

AI frequently uses create or createMany instead of upsert. The first time you run the seed, everything works. The second time, you either get errors (duplicate key violations) or duplicate data. Always check that AI used upsert, ON CONFLICT, or onConflictDoNothing.

3. No Separation Between Dev and Production Seeds

AI typically generates one seed file that mixes fake test data (Alice, Bob, sample products) with real defaults (admin roles, categories). If someone runs that seed in production, you get "Alice Johnson" and "Wireless Headphones" in your real database alongside actual customers and products.

Better Prompt

Create two separate seed files:
1. A production seed with only essential defaults (roles, categories, initial admin user)
2. A development seed with 50 fake users and 100 sample products using Faker.js
Make both idempotent using upsert.

4. Ignoring Foreign Key Order

If your orders table has a foreign key to users, you need to seed users first. AI sometimes generates seeds that try to create orders before creating the users they reference, causing foreign key constraint errors. The fix: seed tables in dependency order — parent tables first, then child tables.

5. Seeding Too Little or Too Much

Three products don't show you how pagination works. Ten thousand products make your seed file take 30 seconds to run. AI usually errs on the side of too little — three users, three products, three orders. For meaningful development testing, you often want 20-50 records per table. Enough to see pagination, test search, and verify sorting — but not so many that the seed takes forever.

Best Practices for Database Seeds

Here's the checklist you should use every time AI generates a seed file — or when you write one yourself:

Make it idempotent. Use upsert, ON CONFLICT DO NOTHING, or onConflictDoNothing. Running seeds twice should produce the same database state, not duplicate data.
Separate dev and production seeds. Test users and sample products live in a dev-only seed file. Essential defaults (roles, categories, settings) go in a production-safe seed.
Hash passwords. Even in development seeds. Never store plaintext passwords in code that gets committed to Git.
Respect foreign key order. Seed parent tables (users, categories) before child tables (orders, products).
Use environment checks. Add a guard so dev seeds can't accidentally run in production:

// Guard against running dev seeds in production
if (process.env.NODE_ENV === 'production') {
  console.log('Skipping dev seeds in production');
  process.exit(0);
}

// ... your test data seeding below

Use Faker for volume testing. When you need 50+ records, use @faker-js/faker instead of manually typing fake data. It's faster and surfaces edge cases.
Keep seeds fast. A seed file that takes more than 10 seconds to run will annoy you every time you reset your database. If you need large volumes, make it an optional flag.
Log what you seed. Print a summary at the end so you know what was created: console.log('Seeded: 3 users, 10 products, 5 categories').

Cheat Sheet: Running Seeds in Every Tool

# Prisma
npx prisma db seed                    # Run seed script defined in package.json
npx prisma migrate reset              # Reset DB + auto-run seeds

# Knex
npx knex seed:run                     # Run all seed files in seeds/ directory
npx knex seed:make seed_name          # Generate a new seed file

# Drizzle (custom script)
npm run db:seed                        # Whatever you named it in package.json
npx tsx src/db/seed.ts                 # Run directly

# Raw SQL
psql -d your_database -f seeds/seed.sql

# Generic npm script
npm run seed                           # If defined in package.json scripts

Complete Real-World Example: SaaS App Seed

Let's put it all together. Here's what a well-structured seed file looks like for a real project — a SaaS app with users, teams, and projects:

// prisma/seed.ts — production-safe, idempotent seed
import { PrismaClient } from '@prisma/client';
import bcrypt from 'bcrypt';

const prisma = new PrismaClient();

async function seedProductionDefaults() {
  // These run in EVERY environment, including production

  // Default roles
  const roles = ['OWNER', 'ADMIN', 'MEMBER', 'VIEWER'];
  for (const role of roles) {
    await prisma.role.upsert({
      where: { name: role },
      update: {},
      create: { name: role, description: `${role} role` },
    });
  }

  // Default subscription plans
  await prisma.plan.upsert({
    where: { slug: 'free' },
    update: {},
    create: { name: 'Free', slug: 'free', price: 0, maxProjects: 3 },
  });
  await prisma.plan.upsert({
    where: { slug: 'pro' },
    update: {},
    create: { name: 'Pro', slug: 'pro', price: 29, maxProjects: 50 },
  });

  console.log('✅ Production defaults seeded');
}

async function seedDevData() {
  // These ONLY run in development
  if (process.env.NODE_ENV === 'production') return;

  // Create a test admin
  await prisma.user.upsert({
    where: { email: 'dev-admin@test.local' },
    update: {},
    create: {
      name: 'Dev Admin',
      email: 'dev-admin@test.local',
      password: await bcrypt.hash('dev-password-123', 10),
      role: 'ADMIN',
    },
  });

  // Create a test team with projects
  const team = await prisma.team.upsert({
    where: { slug: 'test-team' },
    update: {},
    create: { name: 'Test Team', slug: 'test-team' },
  });

  const projectNames = ['Marketing Site', 'Mobile App', 'API Backend'];
  for (const name of projectNames) {
    await prisma.project.upsert({
      where: { teamId_name: { teamId: team.id, name } },
      update: {},
      create: { name, teamId: team.id, status: 'ACTIVE' },
    });
  }

  console.log('✅ Development data seeded');
}

async function main() {
  await seedProductionDefaults();  // Always runs
  await seedDevData();             // Only in development
}

main()
  .catch(console.error)
  .finally(() => prisma.$disconnect());

This seed file does everything right: it separates production defaults from dev data, uses upsert everywhere, hashes passwords, and has a production guard. When you see AI generate a seed file, use this as your benchmark for what it should look like.

What to Learn Next

Now that you understand seeding, these related concepts will make more sense:

Database Migrations → What Is Prisma? → What Is Drizzle ORM? → What Is SQL? → Environment Variables →

Frequently Asked Questions

Database seeding is pre-populating your database with initial data before your app runs. This includes test users, sample products, default categories, or admin accounts. Instead of starting with an empty database and manually adding data, a seed file does it automatically with one command.

An empty database is useless for development. You can't test a product listing page with zero products or a user dashboard with no users. AI generates seed files because it knows you need data to actually see your app working. It's following a best practice — giving you something to work with immediately.

Migrations change your database structure — creating tables, adding columns, modifying data types. Seeds add data to existing tables — inserting test users, sample products, default settings. Migrations define the shape of your database. Seeds fill it with content. You run migrations first, then seeds.

It depends on how the seed is written. A well-written seed uses upsert (insert or update if exists) so running it twice just updates existing records. A poorly written seed uses plain insert, which creates duplicate data every time you run it. Always check that your seed files are idempotent — safe to run multiple times.

Faker.js (now @faker-js/faker) is a library that generates realistic-looking fake data — names, emails, addresses, phone numbers, company names. AI uses it in seed files to create test data that looks real without using actual personal information. It's much better than hardcoding "Test User 1" and "test@test.com" because it surfaces edge cases like long names or special characters.