What is a context window?
A context window is the amount of text an AI model can read and hold in mind at once. Think of it as the model’s working memory. Everything within the window is available to the model when it generates a response. Anything outside it is invisible.
The context window includes everything in the conversation: your instructions, the conversation history, any documents you’ve pasted in, and the model’s own responses. All of it counts. When you hit the limit, the model can no longer “see” the earlier parts of the conversation, which is why long chats can start to feel like the AI is forgetting things.
Context windows are measured in tokens. A small model might have a context window of 8,000 tokens. Modern frontier models like Claude and GPT-4 support hundreds of thousands, with some approaching a million. Larger windows let you work with bigger documents, longer conversations, and more complex instructions.
The size of the context window has been one of the biggest areas of improvement in AI over the past few years. Early models had tiny windows that made sustained conversation difficult. Today’s models can hold entire codebases or book-length documents in context, which has opened up entirely new use cases.
My most common run-in with context windows is in Claude Code. When you work in a long session, you eventually hit the dreaded “Compacting conversation” message, which means the context window is full and the tool is summarizing earlier parts of the conversation to make room. It’s not the end of the world, but you can end up re-explaining things you covered an hour ago. The longer the session, the more likely you are to feel it.