What Is a Vector Database? The AI Coder's Guide to Embeddings and Semantic Search
When you ask Claude to "search my docs for similar questions," it's using vectors under the hood. Here's what that actually means — and how to set it up in your own apps without a computer science degree.
TL;DR
A vector database stores data as lists of numbers (embeddings) that represent meaning — not just keywords. When you search, it finds results by how similar the meaning is, not whether the exact words match. This is how RAG works, how "search my codebase" features work, and how AI-powered apps find relevant results even when users phrase things differently. For most vibe coders, start with pgvector (a PostgreSQL extension) if you already have Postgres, or Chroma for quick local prototyping. Graduate to Pinecone or Weaviate when you hit millions of records.
Why AI Coders Need to Understand Vector Databases
You've probably already used a vector database without knowing it. Every time you:
- Ask an AI to "search my docs for the answer"
- Build a chatbot that knows about your specific product
- Create a "find similar items" feature
- Set up RAG (Retrieval Augmented Generation) so an AI can answer questions about your data
…there's a vector database doing the heavy lifting behind the scenes.
Here's the problem with regular databases: they search by exact matches. If your help docs contain "How to reset your password" and a user searches "I forgot my login credentials," a traditional WHERE title LIKE '%forgot login%' query returns nothing. Zero results. The words don't match, even though the meaning is identical.
A vector database understands that "forgot my login credentials" and "reset your password" mean the same thing. It searches by meaning, not by matching characters. That's the entire magic — and it's surprisingly straightforward once you see how it works.
If you're building anything with AI that needs to know about your own data — your docs, your codebase, your customer history — you need vectors. This is the foundational infrastructure behind every RAG system, every AI search feature, and every "chat with your data" app.
Real Scenario: Building a "Search My Docs" Feature
You're building an internal tool for your company. You have 500 help articles, and you want users to find answers by asking natural questions — not by guessing the exact keywords in the title.
"I have 500 help articles in a PostgreSQL database. I want users to search by asking natural questions like 'how do I change my billing plan' and get relevant results even if those exact words aren't in the article title. Set up semantic search using pgvector and OpenAI embeddings."
Without vectors, you're stuck with keyword search — LIKE '%billing%' — which misses anything phrased differently. With vectors, the search understands meaning. Let's see what AI builds for you.
What AI Generated
Step 1: Enable pgvector in PostgreSQL
-- Enable the pgvector extension (PostgreSQL 15+ recommended)
-- Tested with pgvector 0.8.0
CREATE EXTENSION IF NOT EXISTS vector;
Step 2: Create the Table with a Vector Column
-- Store articles with their embeddings
CREATE TABLE help_articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
category VARCHAR(100),
embedding vector(1536), -- OpenAI text-embedding-3-small outputs 1536 dimensions
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Create an index for fast similarity search
CREATE INDEX idx_articles_embedding ON help_articles
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 20);
-- Also index category for filtering
CREATE INDEX idx_articles_category ON help_articles(category);
Step 3: Generate Embeddings and Insert
// generate-embeddings.js
// Converts article text into vectors and stores them
import OpenAI from 'openai';
import pg from 'pg';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
async function embedArticle(article) {
// Convert text to a vector (list of 1536 numbers)
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: `${article.title}\n\n${article.content}`,
});
const embedding = response.data[0].embedding;
// Store the article with its embedding
await pool.query(
`INSERT INTO help_articles (title, content, category, embedding)
VALUES ($1, $2, $3, $4)`,
[article.title, article.content, article.category,
JSON.stringify(embedding)] // pgvector accepts JSON array format
);
}
// Process all articles
for (const article of articles) {
await embedArticle(article);
console.log(`Embedded: ${article.title}`);
}
Step 4: Search by Meaning
// semantic-search.js
// Search articles by meaning, not keywords
async function searchArticles(userQuestion) {
// Convert the user's question to a vector
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: userQuestion,
});
const queryEmbedding = response.data[0].embedding;
// Find the 5 most similar articles (by cosine distance)
const results = await pool.query(
`SELECT id, title, content, category,
1 - (embedding <=> $1::vector) AS similarity
FROM help_articles
ORDER BY embedding <=> $1::vector
LIMIT 5`,
[JSON.stringify(queryEmbedding)]
);
return results.rows;
}
// Example usage:
const results = await searchArticles("how do I change my billing plan");
// Returns articles about billing, subscription changes, payment updates
// even if none of them contain the exact phrase "change my billing plan"
Understanding Each Part
What Are Embeddings? (The Plain English Version)
An embedding is a list of numbers that represents what a piece of text means. Think of it like GPS coordinates for ideas.
In GPS, two places that are close together have similar coordinates. "123 Main Street" and "125 Main Street" have nearly identical latitude and longitude numbers. A place across the country has very different numbers.
Embeddings work the same way, but for meaning:
- "How do I reset my password" → [0.023, -0.041, 0.087, ... 1,533 more numbers]
- "I forgot my login credentials" → [0.025, -0.039, 0.091, ... 1,533 more numbers] ← very similar!
- "Best pizza restaurants in NYC" → [-0.156, 0.234, -0.012, ... 1,533 more numbers] ← completely different
The AI model (like OpenAI's text-embedding-3-small) reads your text and outputs 1,536 numbers that capture the concept. Similar concepts → similar numbers. Different concepts → different numbers. That's it. That's the whole idea.
What Is vector(1536)?
In the table definition, embedding vector(1536) means "a column that stores a list of exactly 1,536 numbers." The number 1,536 is called the dimension — it's determined by which embedding model you use:
| Embedding Model | Dimensions | Notes |
|---|---|---|
| OpenAI text-embedding-3-small | 1,536 | Best balance of cost and quality |
| OpenAI text-embedding-3-large | 3,072 | Higher quality, more expensive |
| Cohere embed-v3 | 1,024 | Good multilingual support |
| Voyage AI voyage-3 | 1,024 | Strong for code search |
| Local models (e.g., nomic-embed-text) | 768 | Free, runs on your machine |
Critical rule: The dimension in your table must exactly match the model's output. If you use text-embedding-3-small (1,536 dimensions) but create a vector(768) column, everything breaks.
What Does <=> Mean? (Cosine Distance)
The <=> operator in pgvector calculates cosine distance — how different two vectors are. Think of it like measuring the angle between two arrows:
- 0.0 = identical meaning (arrows point the same direction)
- 1.0 = completely unrelated (arrows point in opposite directions)
When you write ORDER BY embedding <=> $1::vector, you're saying "sort by how similar the meaning is, closest first." The query returns the articles whose embeddings point in the most similar direction to the user's question.
You can also use <-> (L2/Euclidean distance) or <#> (inner product), but cosine distance is the standard for text search because it works regardless of vector length.
What's That Index Doing?
Without an index, searching vectors means comparing your query against every single embedding in the table. With 500 articles, that's fine. With 500,000? That takes seconds.
-- IVFFlat index: divides vectors into clusters for faster search
CREATE INDEX idx_articles_embedding ON help_articles
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 20);
-- "lists = 20" means: divide into 20 clusters
-- Rule of thumb: lists = sqrt(number_of_rows)
-- 500 articles → lists = ~22
-- 100,000 articles → lists = ~316
This is an approximate nearest neighbor (ANN) index. It's slightly less accurate than checking every row (might miss the #1 result occasionally), but it's 10–100x faster. For most applications, the speed trade-off is worth it. Just like regular database indexes, vector indexes trade some write speed for dramatically faster reads.
pgvector also supports HNSW indexes, which are newer, faster for queries, but use more memory:
-- HNSW index: faster queries, more memory, better accuracy
CREATE INDEX idx_articles_embedding_hnsw ON help_articles
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Use HNSW for production workloads. Use IVFFlat for prototyping.
The RAG Connection
Here's how this all fits together in a RAG (Retrieval Augmented Generation) system:
- User asks: "How do I upgrade my subscription?"
- Embed the question: Convert it to a vector of 1,536 numbers
- Search the vector database: Find the 5 most similar articles
- Pass to AI as context: "Here are the relevant help articles: [articles]. Now answer the user's question."
- AI responds: With an accurate, grounded answer based on your actual docs
Without the vector database, the AI just makes up an answer based on its training data. With it, the AI answers based on your real, up-to-date content. That's the difference between a generic chatbot and a useful one.
Vector Database Providers: Which One Should You Use?
| Provider | Best For | Pricing | Setup Effort |
|---|---|---|---|
| pgvector (PostgreSQL extension) | Already using Postgres, <1M vectors | Free (part of your existing DB) | Low — just CREATE EXTENSION vector |
| Chroma | Local prototyping, Python projects | Free (open source, runs locally) | Very low — pip install chromadb |
| Pinecone | Production apps, managed service | Free tier, then ~$70/mo+ | Low — managed API, no infrastructure |
| Weaviate | Complex search with filters, multi-modal | Free self-hosted, managed from $25/mo | Medium — more config options |
| Qdrant | High-performance, Rust-based | Free self-hosted, managed from $25/mo | Medium — Docker or cloud |
| Supabase (pgvector) | Full-stack apps already on Supabase | Free tier, then $25/mo+ | Low — built into Supabase |
The Decision Flowchart
Here's how to pick:
- Already using PostgreSQL? → Start with pgvector. No new infrastructure, and it handles up to ~1 million vectors comfortably.
- Just prototyping locally? → Use Chroma. Install with pip, runs in-memory, zero config.
- Production app, don't want to manage infrastructure? → Pinecone. Fully managed, scales automatically, great SDK.
- Need advanced filtering + search combined? → Weaviate. Built-in hybrid search (vectors + keyword matching).
- Building with Supabase already? → Use their built-in pgvector. It's ready to go.
pgvector vs. Dedicated Vector DBs: The Real Trade-Off
This is the question every vibe coder asks: "Should I add pgvector to my existing PostgreSQL database, or use a separate, specialized vector database?"
Use pgvector when:
- You already have PostgreSQL (Supabase, Neon, Railway, local)
- Your dataset is under 1 million vectors
- You want one database to manage, not two
- You need to JOIN vector results with your regular data (users, orders, etc.)
- You're building a prototype or MVP
Use a dedicated vector DB (Pinecone, Weaviate, Qdrant) when:
- You're searching 10+ million vectors
- Query speed at scale is critical (sub-10ms at millions of records)
- You need advanced features: real-time index updates, hybrid search, multi-tenancy
- You want managed infrastructure with zero operational overhead
- Your vector workload would strain your primary database
The honest answer for most vibe coders: Start with pgvector. It's "good enough" for 95% of projects, and it means one less service to manage, pay for, and debug. You can always migrate later if you outgrow it — and you'll know because queries get slow.
What AI Gets Wrong About Vector Databases
1. Mismatched Dimensions
The #1 error. AI creates a vector(768) column but uses text-embedding-3-small which outputs 1,536 dimensions. Or it switches embedding models mid-project without updating the table. Everything silently breaks.
-- ❌ This will error on INSERT:
CREATE TABLE docs (embedding vector(768));
-- Then inserting a 1536-dimension vector from OpenAI → ERROR
-- ✅ Match the model:
-- text-embedding-3-small → vector(1536)
-- text-embedding-3-large → vector(3072)
-- nomic-embed-text → vector(768)
Fix: Always check your embedding model's documentation for the output dimension, and make sure your table column matches exactly.
2. No Vector Index
AI creates the table and column but forgets to create an index. With 100 test records, you won't notice. With 100,000 records, every search takes 2+ seconds because it's comparing against every single row.
Fix: After AI generates the schema, ask: "Did you add a vector index? Use HNSW for production."
3. Poor Chunking Strategy
This is the most impactful mistake, and AI almost always gets it wrong. When you embed documents, you need to split them into chunks first. AI tends to either:
- Store entire pages as one embedding → The vector represents an average of everything on the page, so it matches weakly with specific questions
- Split on arbitrary character counts → Cuts sentences in half, loses context
// ❌ AI's typical chunking (bad):
const chunks = text.match(/.{1,1000}/g); // Splits mid-sentence!
// ✅ Better: split by paragraphs, with overlap
function chunkText(text, maxChars = 800, overlap = 100) {
const paragraphs = text.split('\n\n');
const chunks = [];
let current = '';
for (const para of paragraphs) {
if ((current + para).length > maxChars && current) {
chunks.push(current.trim());
// Keep the last bit for context overlap
current = current.slice(-overlap) + '\n\n' + para;
} else {
current += (current ? '\n\n' : '') + para;
}
}
if (current.trim()) chunks.push(current.trim());
return chunks;
}
Rule of thumb: Chunks of 200–800 words work best. Each chunk should be able to stand alone as a meaningful piece of content. Overlap by ~10% so you don't lose context at boundaries.
4. Hardcoded API Keys
AI puts your OpenAI API key right in the embedding code. This is a security issue whether you're using vector databases or any other API. See our guide on API authentication for best practices.
// ❌ AI does this:
const openai = new OpenAI({ apiKey: 'sk-abc123...' });
// ✅ Use environment variables:
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
5. Wrong Distance Metric
AI sometimes creates the index with vector_l2_ops (Euclidean distance) instead of vector_cosine_ops (cosine similarity). For normalized text embeddings, cosine is almost always what you want. L2 distance cares about vector magnitude, which varies based on text length and isn't meaningful for similarity.
-- ❌ Euclidean distance (usually wrong for text):
CREATE INDEX idx ON docs USING hnsw (embedding vector_l2_ops);
-- ✅ Cosine distance (standard for text embeddings):
CREATE INDEX idx ON docs USING hnsw (embedding vector_cosine_ops);
How to Debug Vector Search Issues
Problem: Search Returns Irrelevant Results
"My vector search returns irrelevant results. I'm using pgvector with OpenAI text-embedding-3-small. The user searches 'how to cancel my account' but gets articles about account creation. Here's my search query: [paste]. Here's how I generate embeddings: [paste]. What could be wrong? Check for dimension mismatches, wrong distance metrics, and chunking issues."
Common causes:
- Embeddings were generated with a different model than the query embedding
- Chunks are too large (entire pages instead of focused paragraphs)
- You're not embedding the right content (title only vs. title + content)
- Missing
WHEREfilters — returning results from all categories when you should filter first
Problem: Search Is Slow
"My pgvector similarity search takes 3 seconds on a table with 200,000 rows. Here's my query: [paste]. Here's my table definition: [paste]. Do I have the right index? What type should I use and what parameters?"
Common fixes:
- Add an HNSW or IVFFlat index if you haven't
- For IVFFlat, make sure
lists≈ sqrt(row_count) - For HNSW, increase
ef_searchif accuracy matters more than speed:SET hnsw.ef_search = 100; - Make sure you're using the matching
_opsclass (e.g.,vector_cosine_opswith<=>)
Problem: "Different embedding dimensions" Error
-- Check what dimension your existing embeddings have:
SELECT vector_dims(embedding) FROM help_articles LIMIT 1;
-- Returns: 1536
-- Make sure your query embedding matches.
-- If you switched models, you need to re-embed everything.
There's no way to mix dimensions. If you change embedding models, you must regenerate every embedding in the table. Plan for this by storing the model name alongside the embedding.
What to Learn Next
Frequently Asked Questions
What is a vector database in simple terms?
A vector database stores data as lists of numbers (called embeddings) that represent the meaning of text, images, or other content. Instead of searching by exact keywords like a regular database, it finds results by meaning — so searching for "how to fix a broken pipe" could match a document about "plumbing repair techniques" even though the words are completely different. Think of it as a search engine that understands concepts, not just characters.
What are embeddings and why do AI apps need them?
Embeddings are lists of numbers (vectors) that represent the meaning of text, images, or audio. An AI model like OpenAI's text-embedding-3-small converts "how do I deploy my app" into a list of 1,536 numbers that capture its meaning. Similar concepts get similar numbers. AI apps need them because traditional keyword search fails when users phrase things differently than your content — embeddings understand meaning, not just exact word matches. They're the bridge between human language and mathematical search.
Should I use pgvector or a dedicated vector database like Pinecone?
Use pgvector if you already have PostgreSQL, your dataset is under 1 million vectors, and you want to keep everything in one database. It's simpler, cheaper, and lets you JOIN vector results with your regular tables. Use a dedicated vector database like Pinecone or Weaviate if you need to search tens of millions of vectors, want managed infrastructure with zero ops, or need advanced features like real-time indexing and hybrid search. For most AI-enabled coders building their first RAG app, pgvector is the right starting point — you can always migrate later.
What is RAG and how do vector databases enable it?
RAG (Retrieval Augmented Generation) is a technique where you give an AI model relevant context from your own data before it answers a question. The vector database is the "retrieval" part — it finds the most relevant documents, code snippets, or knowledge base articles by meaning, then passes them to the AI as context. Without a vector database, the AI only knows what it was trained on. With RAG, it can answer questions about your specific codebase, docs, or business data — accurately and with up-to-date information.
What does AI get wrong when setting up vector search?
The most common mistakes: using wrong embedding dimensions (mixing 1,536 vs 768), forgetting to create vector indexes (making search slow at scale), using the wrong distance metric (L2 instead of cosine for text), poor document chunking (storing entire pages instead of paragraphs), and hardcoding API keys. The highest-impact mistake is usually poor chunking — if your chunks are too large, the AI gets diluted context; too small, and it misses important information. Aim for 200–800 words per chunk with ~10% overlap.