Question:

What is an embedding?

An embedding is a way of representing something (a word, a sentence, an image, a piece of code) as a list of numbers that captures its meaning. It’s how AI systems understand and compare things semantically rather than just matching exact text.

The idea is that similar things should produce similar numbers. If you embed the words “dog” and “puppy”, the resulting number sequences will be close together. “Dog” and “rocket” will be far apart. This lets a computer measure how related two pieces of content are without needing to understand language the way humans do.

Embeddings are what make semantic search possible. A traditional search engine looks for exact keyword matches. A search powered by embeddings can find results that are conceptually related even if they use completely different words. Search for “how do I fix a flat tire” and it can return results about “repairing a punctured bicycle wheel” because the embeddings are close in meaning.

They’re also the foundation of RAG. When you want an AI to search your documents for relevant context, you first convert all your documents into embeddings and store them. At query time, you embed the user’s question and find the documents whose embeddings are closest. That’s how the system knows which chunks to pull into the prompt.

Embeddings are generated by AI models and are usually hundreds or thousands of numbers long. You don’t work with them directly. You store them in a vector database and let the database handle the similarity search.

My main hands-on experience with embeddings has been through RAG. If you’re rolling your own RAG pipeline rather than using a managed service, you have to run all your documents through an embedding service to convert them into vectors before you can store and search them. It’s one of those steps that’s easy to overlook until you actually go to build one.

#facts #ai

answered by me

What is Code Q&A built with?

Code Q&A was built with Ruby on Rails! And it's server rendered! More specifically: Ruby on Rails...

#rails #meta

What is codex-spark?

Codex-Spark is OpenAI's real-time coding AI model that generates code at over 1,000 tokens per...

#facts #ai

What is Shannon?

Shannon is an AI pentesting tool that autonomously finds and exploits security vulnerabilities in...

#facts #ai #security

What is Sonnet?

Sonnet is Anthropic's most widely used AI model. It sits in the middle of their model lineup:...

#facts #ai

What is Opus?

Opus is Anthropic's most powerful AI model. It's the top tier in their model lineup, which goes...

#facts #ai

What does LLM mean?

LLM stands for Large Language Model. It's the type of AI model behind tools like ChatGPT, Claude,...

#facts #ai

What is an AI model?

An AI model is the trained brain behind tools like ChatGPT, Claude, and Gemini. The most familiar...

#facts #ai

What is Moltbook?

Moltbook is a social network for AI agents, not humans. Only AI agents can post, comment, and...

#facts #ai

What is OpenClaw?

OpenClaw is a personal AI assistant that runs on your computer 24/7 and can do things like run...

#facts #ai

What is Moltbot?

Moltbot is what ClawdBot was renamed to after Anthropic sent a trademark notice. The name "Clawd"...

#facts #ai

What is ClawdBot?

ClawdBot is an open-source personal AI assistant that runs on your computer and can actually do...

#facts #ai

What is LangGraph?

LangGraph is a framework for building complex AI workflows with loops, branching, and state...

#facts #ai

See all questions

What is an embedding?

You might also like