In this lesson, we will explore the fundamental concept of Context Windows and how to manage conversation history to give your AI tools the illusion of memory. You will learn how to structure data to ensure your chatbot remains coherent, context-aware, and efficient during long interactions.
Large Language Models (LLMs) are stateless by default. This means that each request you send is treated as a completely new event, with no knowledge of what happened seconds or even minutes prior. To make a chatbot feel interactive, we must manually bundle the conversation history into each new request.
Think of the Context Window as the model's "working memory"โthe total amount of text (tokens) the model can consider at once. If you send a message, it is not just the question that counts; it is the question plus every previous message in the interaction. When the total number of tokens exceeds the modelโs limit, the conversation "breaks," leading to errors or the AI losing track of the topic. Therefore, developers must implement a rolling strategy to prune older exchanges while retaining the most relevant information.
To facilitate memory, we typically organize chat data into a list of objectsโoften referred to as a Message History. Each entry in this list usually contains a role and content.
system: Sets the behavior or personality of the AI.user: The end-user's input.assistant: The AI's generated response.By maintaining this chronological structure, we can verify the flow of the conversation. When generating a new prediction, we send this entire list to the API.
Since the Context Window has a maximum capacity (measured in tokens), we must implement a buffer mechanism to prevent overflow. A common approach is the "Sliding Window." As the conversation grows, we programmatically remove the oldest user-assistant message pairs from the list.
If we define the token count of a message as , the total context size is . We must ensure this sum remains below the threshold:
When the sum exceeds , we remove entries from the beginning of the list (keeping the initial system prompt intact) until the count is safe.
Beyond memory constraints, there is a financial cost factor. Because every message is re-sent on every turn, the cost of processing a conversation grows quadratically over time. To optimize, you can implement Summarization of older messages.
Note: For very long conversations, you might keep the last 5 messages in raw format, while condensing everything that came before into a single "summary" message provided by the AI itself. This maintains the "gist" of the conversation history without bloating the token count.
system, user, and assistant roles.Context Window token limit.