Question:

What is an embedding?

An embedding is a way of representing something (a word, a sentence, an image, a piece of code) as a list of numbers that captures its meaning. It’s how AI systems understand and compare things semantically rather than just matching exact text.

The idea is that similar things should produce similar numbers. If you embed the words “dog” and “puppy”, the resulting number sequences will be close together. “Dog” and “rocket” will be far apart. This lets a computer measure how related two pieces of content are without needing to understand language the way humans do.

Embeddings are what make semantic search possible. A traditional search engine looks for exact keyword matches. A search powered by embeddings can find results that are conceptually related even if they use completely different words. Search for “how do I fix a flat tire” and it can return results about “repairing a punctured bicycle wheel” because the embeddings are close in meaning.

They’re also the foundation of RAG. When you want an AI to search your documents for relevant context, you first convert all your documents into embeddings and store them. At query time, you embed the user’s question and find the documents whose embeddings are closest. That’s how the system knows which chunks to pull into the prompt.

Embeddings are generated by AI models and are usually hundreds or thousands of numbers long. You don’t work with them directly. You store them in a vector database and let the database handle the similarity search.

My main hands-on experience with embeddings has been through RAG. If you’re rolling your own RAG pipeline rather than using a managed service, you have to run all your documents through an embedding service to convert them into vectors before you can store and search them. It’s one of those steps that’s easy to overlook until you actually go to build one.

You might also like