How Does Vector Search Work?

A part 2 to my post on How Does Search Work?

Vector Search as a concept is nothing new. We have been projecting data onto planes and querying the distances and relationships in these vectors for decades.
But cheap(er) accessability to LLMs has lead to the rapid rising popularity of Semantic Document Search with Vectors.

Similar to how Lexucal BM25 search works, Vector Search is also surprisingly easy to grock.

Embeddings

The heart of leveraging Vector Search technologies for Semantic Search. The Embedding

In this context, we are generating embeddings for both documents and for queries to find those documents.
It is important that the embedding model used to generate these are compatible. In this semenatic vector world view, when we index a document for Search, we do so by

Generating an Embedding for an incoming document
Storing that Embedding in a vector database for later retrieval

And similarly when querying

Generate the Embedding for the Query
Use a Vector Search algorithm to find documents which are Semantically close / related to the semantic meaning of the search.

The Vector Search algorithm is traditionally kNN, but in a live system where performance is matter we usually us ANN. More on this later.

You should already see a difference here to the BM25 algorithm that we ran through earlier. We are no longer performing searches on keywords, but on semantic concepts. This is exciting and lets us find related documents even if they might use different words.
It also helps us solve tricky multi-lingual problems. Particularly for non-english speaking countries you will often find a mix of languages that people are working in, this is a problem close to my heart.

Generating Embeddings

To avoid actually calling out to an LLM, let’s just define an interface

type Embedder interface {
    Embed(text string) ([]float32, error)
}

In the real world, your implementation may call out to an API like OpenAI’s Vector API.

documents := []string{
    "I love riding my bike",
    "Cycling is my favourite sport",
    "Pizza is delicious",
}

for _, doc := range documents {
    vector, _ := embedder.Embed(doc)

    fmt.Printf(
        "%q -> dimensions=%d\n",
        doc,
        len(vector),
    )
}

During indexing we use our embedder to generate a vector for each document looking something like this.

"I love riding my bike"

[-0.12, 0.44, -0.01, 0.88, ...]

Just a bunch of numbers. Numbers which represent the semantic meaning of our document which we can put into a Vector Database for querying.

Indexing Vectors

To actually make this jumble of numbers useful for lookup, we need a database capable of efficiently storing and retrieving documents based on these Vectors.

type Document struct {
    ID     string
    Vector []float32
}

type VectorIndex struct {
    docs []Document
}

func (i *VectorIndex) Add(doc Document) {
    i.docs = append(i.docs, doc)
}

So to put it together it would look like

func IndexDocument(
	index *VectorIndex,
	embedder Embedder,
	doc SourceDocument,
) error {

	vector, err := embedder.Embed(doc.Text)
	if err != nil {
		return err
	}

	index.Add(Document{
		ID:     doc.ID,
		Vector: vector,
	})

	return nil
}

There is no inverted index, we just store all of the vectors. However, precisely searching across every documents vectors becomes expensive. So how do we query it?

Query Embeddings

The first step of being able to query our vector database, is that the incoming query must also be Embedded using the same (or a compatible) model.

query := "good road cycling training"

queryVector, _ := embedder.Embed(query)

Which gives us another vector

"good road cycling training"
↓
[-0.08, 0.17, ...]

Finding Similar Vectors

This is where the complexity lies. There are many possible techniques, calculating distances and similarities between the vectors.

The basic standard similarity here is cosine similarity.
It’s the Dot product (direction and magnitude), normalised.

cosine(a, b) = (a · b) / (|a| |b|)

Applied to our example this becomes

func CosineSimilarity(a, b []float32) float64 {
	var dot, normA, normB float64

	for i := range a {
		dot += float64(a[i] * b[i])
		normA += float64(a[i] * a[i])
		normB += float64(b[i] * b[i])
	}

	return dot / (math.Sqrt(normA) * math.Sqrt(normB))
}

The higher the value returned, the more similar the two vectors are.

This means that we take our query, generate the embeddings for it, and itterate over each vector in our DB and sort them in order of the most similar to our query, returning the top-k results. Leading to kNN vector search.

Our search winds up looking like

type Result struct {
	ID    string
	Score float64
}

func (i *VectorIndex) Search(queryVector []float32) []Result {
	results := []Result{}

	for _, doc := range i.docs {
		score := CosineSimilarity(queryVector, doc.Vector)

		results = append(results, Result{
			ID:    doc.ID,
			Score: score,
		})
	}

	sort.Slice(results, func(a, b int) bool {
		return results[a].Score > results[b].Score
	})

	return results
}

You will notice this is quite inefficient. While BM25 is O(k + Σ postings(term_i)), or simply O(relevant documents only),
our naieve lookup is O(N × d), with query complexity increasing both on the number of documents and the complexity of the vector itself. Do we really have to visit every single document? With BM25 we only looked at documents that contained the keywords we were searching for, drastically reducing our initial candidate search size, and a lot of the data was already pre-computed for us.

Making Vector Search scale.

To trade off accuracy for speed, we use an ANN algorithm, to find the Aproximate Nearest Neighbour. Let’s start with a greedy graph traversal.
We will need to modify our implementation at index time, to build up the graph, and then change our search to use graph traversal.

Building The Graph

Similar to BM25, the key here is that we are doing a lot of incremental work up-front at index time as each new document comes in.

After generating our Embeddings, instead of just shoving it in a big bag we build a Graph.

type Document struct {
	ID     string
	Vector []float32
}

type Node struct {
	Document Document
	Neighbors []*Node
}

type VectorGraph struct {
	Nodes []*Node
	Entry *Node //important for search later
	Sim   SimilarityFunc
}

func (g *VectorGraph) Add(doc Document, k int) {
	node := &Node{
		Document: doc,
	}

	// first node becomes entry point
	if len(g.Nodes) == 0 {
		g.Nodes = append(g.Nodes, node)
		g.Entry = node
		return
	}

	type candidate struct {
		node  *Node
		score float64
	}

	candidates := make([]candidate, 0, len(g.Nodes))

	// 1. score against all existing nodes
	for _, existing := range g.Nodes {
		score := g.Sim(node.Document.Vector, existing.Document.Vector)

		candidates = append(candidates, candidate{
			node:  existing,
			score: score,
		})
	}

	// 2. sort candidates by similarity descending
	sort.Slice(candidates, func(i, j int) bool {
		return candidates[i].score > candidates[j].score
	})

	// 3. connect to top-K most similar nodes
	limit := k
	if len(candidates) < k {
		limit = len(candidates)
	}

	for i := 0; i < limit; i++ {
		neighbor := candidates[i].node

		node.Neighbors = append(node.Neighbors, neighbor)
		neighbor.Neighbors = append(neighbor.Neighbors, node)
	}

	// 4. add to graph
	g.Nodes = append(g.Nodes, node)
}

We just moved the similarity calculation from Query time to Index time.

Querying the Graph

Our Query is now a greedy graph traversal. Where we start at the Entry point, and we walk the graph in whatever direction increases similarity to our query.

type Result struct {
	ID    string
	Score float64
}

func (g *VectorGraph) Search(query []float32) Result {
	if g.Entry == nil {
		return Result{}
	}

	current := g.Entry
	currentScore := g.Sim(query, current.Document.Vector)

	visited := map[*Node]bool{
		current: true,
	}

	for {
		best := current
		bestScore := currentScore

		// explore neighbors
		for _, n := range current.Neighbors {
			if visited[n] {
				continue
			}

			visited[n] = true

			score := g.Sim(query, n.Document.Vector)

			if score > bestScore {
				best = n
				bestScore = score
			}
		}

		// if no improvement, stop
		if best == current {
			return Result{
				ID:    current.Document.ID,
				Score: currentScore,
			}
		}

		// move to better node
		current = best
		currentScore = bestScore
	}
}

Finding a global optima

The next optimisation we need to bridge to to understand how real systems work, is that we need to break out of local optimas and find the global optima.
Our greedy graph search can easily get stuck in a local optima, and not find the best match!

The main algorithm used at the moment here is the Hierarchical navigable small world.

The HNSW algorithm introduces layers. It has a sparcely populated layer at the entry point, which you walk to find an optimal area to query deeper, and then you go to the next layer and continue. Until eventually you reach the bottom, the fully populated dense graph to generate your candidate set.

This is getting a bit long, so I will save implementing the full HNSW algorithm for another follow-up.

Is Semantic Search the future? Does this replace BM25?

I hearby invoke Betteridge’s law of headlines.
No. It does not replace BM25.

At least at the moment, raw Semantic Vector Search performs worse than BM25. But Hybrid Search performs better than either technique does alone.

The most influential paper here will be Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., & Gurevych, I. (2021). Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663..
Which showed that BM25 won in some examples, Vectors in others, and Hybrid had consistently strong results.

So it’s important to keep in mind that we shouldn’t use this new technique to replace our old systems. But rather, to augment them.

Embeddings#

Generating Embeddings#

Indexing Vectors#

Query Embeddings#

Finding Similar Vectors#

Making Vector Search scale.#

Building The Graph#

Querying the Graph#

Finding a global optima#

Is Semantic Search the future? Does this replace BM25?#