The Mathematics of Neural Search | Moses Gameli

Modern search has evolved from keyword matching to "understanding" meaning. This transformation is powered by high-dimensional vector math. In this post, we’ll explore the equations that make neural search possible.

1. Vector Embeddings

At its core, an embedding is a function $f: \text{text} \to \mathbb{R}^n$ that maps a string of text into a high-dimensional space. In this space, distance correlates with semantic similarity.

2. Measuring Similarity

To find the most relevant documents, we calculate the Cosine Similarity between the query vector $\mathbf{q}$ and a document vector $\mathbf{d}$ .

The similarity $s$ is defined as the cosine of the angle $\theta$ between them:

s = \cos(\theta) = \frac{\mathbf{q} \cdot \mathbf{d}}{\|\mathbf{q}\| \|\mathbf{d}\|} = \frac{\sum_{i=1}^n q_i d_i}{\sqrt{\sum_{i=1}^n q_i^2} \sqrt{\sum_{i=1}^n d_i^2}}

A similarity of $1.0$ indicates identical meaning, while $0.0$ indicates orthogonality (no relation).

3. The RAG Flow

Retrieval-Augmented Generation (RAG) uses this math to ground AI responses in factual data. The orchestration involves several moving parts:

Booting diagram engine...

4. Dimensionality Reduction

Visualizing these spaces requires projecting $n$ -dimensions (often $1536$ or more) down to $2$ or $3$ . This is typically done using algorithms like t-SNE or UMAP.

The optimization objective for many of these algorithms involves minimizing the Kullback-Leibler divergence:

KL(P\|Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}

Summary

Neural search is more than just "AI magic"—it is a rigorous application of linear algebra and probability theory. By mastering these foundations, we can build more reliable and transparent agentic systems.

1. Vector Embeddings

At its core, an embedding is a function $f: \text{text} \to \mathbb{R}^n$ that maps a string of text into a high-dimensional space. In this space, distance correlates with semantic similarity.

2. Measuring Similarity

To find the most relevant documents, we calculate the Cosine Similarity between the query vector $\mathbf{q}$ and a document vector $\mathbf{d}$ .

The similarity $s$ is defined as the cosine of the angle $\theta$ between them:

s = \cos(\theta) = \frac{\mathbf{q} \cdot \mathbf{d}}{\|\mathbf{q}\| \|\mathbf{d}\|} = \frac{\sum_{i=1}^n q_i d_i}{\sqrt{\sum_{i=1}^n q_i^2} \sqrt{\sum_{i=1}^n d_i^2}}

A similarity of $1.0$ indicates identical meaning, while $0.0$ indicates orthogonality (no relation).

3. The RAG Flow

Retrieval-Augmented Generation (RAG) uses this math to ground AI responses in factual data. The orchestration involves several moving parts:

Booting diagram engine...

4. Dimensionality Reduction

Visualizing these spaces requires projecting $n$ -dimensions (often $1536$ or more) down to $2$ or $3$ . This is typically done using algorithms like t-SNE or UMAP.

The optimization objective for many of these algorithms involves minimizing the Kullback-Leibler divergence:

KL(P\|Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}

Summary

1. Vector Embeddings

At its core, an embedding is a function $f: \text{text} \to \mathbb{R}^n$ that maps a string of text into a high-dimensional space. In this space, distance correlates with semantic similarity.

2. Measuring Similarity

To find the most relevant documents, we calculate the Cosine Similarity between the query vector $\mathbf{q}$ and a document vector $\mathbf{d}$ .

The similarity $s$ is defined as the cosine of the angle $\theta$ between them:

s = \cos(\theta) = \frac{\mathbf{q} \cdot \mathbf{d}}{\|\mathbf{q}\| \|\mathbf{d}\|} = \frac{\sum_{i=1}^n q_i d_i}{\sqrt{\sum_{i=1}^n q_i^2} \sqrt{\sum_{i=1}^n d_i^2}}

A similarity of $1.0$ indicates identical meaning, while $0.0$ indicates orthogonality (no relation).

3. The RAG Flow

Retrieval-Augmented Generation (RAG) uses this math to ground AI responses in factual data. The orchestration involves several moving parts:

Booting diagram engine...

4. Dimensionality Reduction

Visualizing these spaces requires projecting $n$ -dimensions (often $1536$ or more) down to $2$ or $3$ . This is typically done using algorithms like t-SNE or UMAP.

The optimization objective for many of these algorithms involves minimizing the Kullback-Leibler divergence:

KL(P\|Q) = \sum_{i \neq j} p_{ij} \log \frac{p_{ij}}{q_{ij}}