Understanding Embeddings & Vector Search

Embeddings are the foundation of modern semantic search, RAG, recommendation systems, and clustering. They're also one of the most elegant ideas in machine learning. This article explains what they are, why they work, and how to use them practically.

What is an Embedding?

An embedding is a fixed-size vector of numbers that represents a piece of data (text, image, audio) in a mathematical space where similar things are close together.

For text, a well-trained embedding model converts "The dog ran fast" into something like [0.12, -0.45, 0.87, ..., 0.33] — a list of 384, 768, or 1536 numbers, depending on the model. The magic: semantically similar sentences produce numerically similar vectors.

The key insight: Distance in vector space = semantic distance. "Happy" and "joyful" are closer to each other than "happy" and "melancholy." This lets us find semantically similar content without any keyword matching.

The Famous Word2Vec Analogy

Early embedding models demonstrated remarkable algebraic properties:

embedding("king") − embedding("man") + embedding("woman") ≈ embedding("queen")
embedding("Paris") − embedding("France") + embedding("Italy") ≈ embedding("Rome")
embedding("walking") − embedding("walk") ≈ embedding("swimming") − embedding("swim")

These relationships emerge from the statistical co-occurrence patterns in the training data — the model learns that "king" and "queen" appear in similar contexts, just with different gendered words nearby.

How Embedding Models Work

Modern text embedding models use Transformer encoders (like BERT). The model processes the input text and outputs a dense vector that captures its semantic meaning in a high-dimensional space.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained embedding model
model = SentenceTransformer("BAAI/bge-small-en-v1.5")  # 33M params, free, fast

# Embed a sentence
sentence = "The quick brown fox jumps over the lazy dog"
embedding = model.encode(sentence)

print(f"Shape: {embedding.shape}")       # (384,) — 384 dimensions
print(f"Type: {embedding.dtype}")        # float32
print(f"First 5 values: {embedding[:5]}")  # [-0.12, 0.45, 0.03, ...]

# Embed multiple sentences at once (batched for speed)
sentences = [
    "I love cats.",
    "My favorite animal is a cat.",
    "I'm learning Python programming.",
    "The stock market fell today.",
]
embeddings = model.encode(sentences, batch_size=32)
print(f"Shape: {embeddings.shape}")  # (4, 384)

Measuring Similarity: Cosine Similarity

The most common similarity metric is cosine similarity — it measures the angle between two vectors, ignoring their magnitude. A score of 1 means identical direction (same meaning), 0 means orthogonal (unrelated), -1 means opposite.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

sentences = [
    "I love cats.",                         # Reference
    "My favorite animal is a cat.",          # Should be high similarity
    "Dogs are great pets.",                  # Medium similarity
    "The economy is growing rapidly.",       # Should be low similarity
]

embeddings = model.encode(sentences)

# Compare all to the first sentence
reference = embeddings[0].reshape(1, -1)
similarities = cosine_similarity(reference, embeddings[1:])

for sent, score in zip(sentences[1:], similarities[0]):
    print(f"{score:.3f} | {sent}")

# Output:
# 0.923 | My favorite animal is a cat.     ← Very similar
# 0.687 | Dogs are great pets.             ← Moderately similar
# 0.143 | The economy is growing rapidly.  ← Unrelated

Building a Simple Semantic Search System

import numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-small-en-v1.5")

# Your knowledge base
documents = [
    "Return policy: Items can be returned within 30 days for a full refund.",
    "Shipping: We offer free shipping on orders over $50.",
    "Payment: We accept Visa, Mastercard, PayPal, and Apple Pay.",
    "Customer service is available Monday-Friday 9am-6pm EST.",
    "Our premium membership costs $9.99/month and includes free returns.",
]

# Embed all documents (do this once, store the result)
doc_embeddings = model.encode(documents, normalize_embeddings=True)

def semantic_search(query: str, top_k: int = 3) -> list[dict]:
    """Find the most relevant documents for a query."""
    query_embedding = model.encode([query], normalize_embeddings=True)
    scores = (query_embedding @ doc_embeddings.T)[0]  # Dot product = cosine sim
    top_indices = np.argsort(scores)[::-1][:top_k]
    return [
        {"document": documents[i], "score": float(scores[i])}
        for i in top_indices
    ]

# Test it
results = semantic_search("How do I send something back?")
for r in results:
    print(f"Score: {r['score']:.3f} | {r['document'][:80]}")

Choosing an Embedding Model

BAAI/bge-small-en

🟢 Free, 33M params, 384 dims. Fast and good quality. Best for local/offline use.

text-embedding-3-small

OpenAI. $0.02/1M tokens. 1536 dims. Excellent quality. Best balance of cost and performance.

text-embedding-3-large

OpenAI. $0.13/1M tokens. 3072 dims. Highest quality from OpenAI.

voyage-3

Anthropic/Voyage AI. Best performing model for RAG tasks in most benchmarks (MTEB).

Embeddings Beyond Text

Images: CLIP embeddings allow text and image search in the same vector space
Code: CodeBERT, StarCoder embeddings for semantic code search
Audio: Whisper + sentence embeddings for audio search
Multi-modal: Models like GPT-4V can embed text+image pairs together

Key Takeaways

Embeddings are fixed-size vectors where similar content = similar vectors
Cosine similarity measures semantic similarity between any two embeddings
Use normalize_embeddings=True — enables faster dot product instead of cosine calculation
BAAI/bge-small-en-v1.5 for free/local; OpenAI text-embedding-3-small for production quality
Always embed with the same model you indexed with — models are not interchangeable
Embeddings are the engine behind semantic search, RAG, recommendations, and clustering

← LangGraph Next: Choosing a Vector DB →

Understanding Embeddings & Vector Search:The Mathematics of Meaning

What is an Embedding?

The Famous Word2Vec Analogy

How Embedding Models Work

Measuring Similarity: Cosine Similarity

Building a Simple Semantic Search System

Choosing an Embedding Model

BAAI/bge-small-en

text-embedding-3-small

text-embedding-3-large

voyage-3

Embeddings Beyond Text

Key Takeaways

Understanding Embeddings & Vector Search:
The Mathematics of Meaning