๐ŸŸข Basic

Understanding Embeddings & Vector Search:
The Mathematics of Meaning

๐Ÿ—ƒ๏ธ Vector Databasesโฑ 10 min read๐Ÿ—“ May 2026

Embeddings are the foundation of modern semantic search, RAG, recommendation systems, and clustering. They're also one of the most elegant ideas in machine learning. This article explains what they are, why they work, and how to use them practically.

What is an Embedding?

An embedding is a fixed-size vector of numbers that represents a piece of data (text, image, audio) in a mathematical space where similar things are close together.

For text, a well-trained embedding model converts "The dog ran fast" into something like [0.12, -0.45, 0.87, ..., 0.33] โ€” a list of 384, 768, or 1536 numbers, depending on the model. The magic: semantically similar sentences produce numerically similar vectors.

The key insight: Distance in vector space = semantic distance. "Happy" and "joyful" are closer to each other than "happy" and "melancholy." This lets us find semantically similar content without any keyword matching.

The Famous Word2Vec Analogy

Early embedding models demonstrated remarkable algebraic properties:

These relationships emerge from the statistical co-occurrence patterns in the training data โ€” the model learns that "king" and "queen" appear in similar contexts, just with different gendered words nearby.

How Embedding Models Work

Modern text embedding models use Transformer encoders (like BERT). The model processes the input text and outputs a dense vector that captures its semantic meaning in a high-dimensional space.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load a pre-trained embedding model
model = SentenceTransformer("BAAI/bge-small-en-v1.5")  # 33M params, free, fast

# Embed a sentence
sentence = "The quick brown fox jumps over the lazy dog"
embedding = model.encode(sentence)

print(f"Shape: {embedding.shape}")       # (384,) โ€” 384 dimensions
print(f"Type: {embedding.dtype}")        # float32
print(f"First 5 values: {embedding[:5]}")  # [-0.12, 0.45, 0.03, ...]

# Embed multiple sentences at once (batched for speed)
sentences = [
    "I love cats.",
    "My favorite animal is a cat.",
    "I'm learning Python programming.",
    "The stock market fell today.",
]
embeddings = model.encode(sentences, batch_size=32)
print(f"Shape: {embeddings.shape}")  # (4, 384)

Measuring Similarity: Cosine Similarity

The most common similarity metric is cosine similarity โ€” it measures the angle between two vectors, ignoring their magnitude. A score of 1 means identical direction (same meaning), 0 means orthogonal (unrelated), -1 means opposite.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

sentences = [
    "I love cats.",                         # Reference
    "My favorite animal is a cat.",          # Should be high similarity
    "Dogs are great pets.",                  # Medium similarity
    "The economy is growing rapidly.",       # Should be low similarity
]

embeddings = model.encode(sentences)

# Compare all to the first sentence
reference = embeddings[0].reshape(1, -1)
similarities = cosine_similarity(reference, embeddings[1:])

for sent, score in zip(sentences[1:], similarities[0]):
    print(f"{score:.3f} | {sent}")

# Output:
# 0.923 | My favorite animal is a cat.     โ† Very similar
# 0.687 | Dogs are great pets.             โ† Moderately similar
# 0.143 | The economy is growing rapidly.  โ† Unrelated

Building a Simple Semantic Search System

import numpy as np
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-small-en-v1.5")

# Your knowledge base
documents = [
    "Return policy: Items can be returned within 30 days for a full refund.",
    "Shipping: We offer free shipping on orders over $50.",
    "Payment: We accept Visa, Mastercard, PayPal, and Apple Pay.",
    "Customer service is available Monday-Friday 9am-6pm EST.",
    "Our premium membership costs $9.99/month and includes free returns.",
]

# Embed all documents (do this once, store the result)
doc_embeddings = model.encode(documents, normalize_embeddings=True)

def semantic_search(query: str, top_k: int = 3) -> list[dict]:
    """Find the most relevant documents for a query."""
    query_embedding = model.encode([query], normalize_embeddings=True)
    scores = (query_embedding @ doc_embeddings.T)[0]  # Dot product = cosine sim
    top_indices = np.argsort(scores)[::-1][:top_k]
    return [
        {"document": documents[i], "score": float(scores[i])}
        for i in top_indices
    ]

# Test it
results = semantic_search("How do I send something back?")
for r in results:
    print(f"Score: {r['score']:.3f} | {r['document'][:80]}")

Choosing an Embedding Model

BAAI/bge-small-en

๐ŸŸข Free, 33M params, 384 dims. Fast and good quality. Best for local/offline use.

text-embedding-3-small

OpenAI. $0.02/1M tokens. 1536 dims. Excellent quality. Best balance of cost and performance.

text-embedding-3-large

OpenAI. $0.13/1M tokens. 3072 dims. Highest quality from OpenAI.

voyage-3

Anthropic/Voyage AI. Best performing model for RAG tasks in most benchmarks (MTEB).

Embeddings Beyond Text

Key Takeaways