What is indexing?

Indexing as a general concept is used to enhance speed of search and retrieval (ie, lookup) at the expense of storage space. And an index is a data structure that allows us to do this.A

We could scan all our data every time. An index exists so we don’t have to.

Its used in various places like,

What does indexing mean in different settings?

So overall with indexing we are trying to do search and retrieval but what that means changes in different settings, as we see below.

For traditional databases, when we do select * from employees where email = "rob@email.com", the point of indexing would be to get those rows of the DB with email as rob@email.com quickly without scanning all rows of the DB. This is similar to the way books are stored in a library. Instead of looking through the entire library, you go directly to a specific section where the required book is placed.

On the other hand, vector indexes answer: “Which points are closest to this point in high-dimensional space?”

This is called Nearest Neighbor Search (NN) or Approximate Nearest Neighbor (ANN) in practice.

Vector index and vector embedding

Vector indexes are used to store and retrieve vector embeddings. Vector embeddings are a numeric vector associated with an image or text which captures its semantics or characteristics.

Since vectors are mathematical in nature, we can use that property to cluster and store similar vectors in the vector index. When we need to retrieve a vector, we figure out which cluster our search vector could be part of and search for it in that cluster rather than going through all the vectors.

Common indexing techniques

These indexing techinques are used to store similar vectors together. This makes it faster for lookup.

Clustering Based - Inverted File (IVF)

Graph Based - Hierarchical Navigable Small World (HNSW) Algorithm


This is a good intro to the topic of vector indexing.