In this lesson, we will bridge the gap between static Python AI logic and interactive web applications using Streamlit. By the end, you will understand how to transform a command-line script into a fully functional, browser-based chatbot that users can interact with in real-time.
A typical AI-powered web application is composed of two main layers: the backend logic (OpenAIβs API) and the frontend interface (Streamlit). When a user types a prompt into your web app, the Streamlit frontend captures that input, sends it as a structured request to OpenAI's completion engine, and then renders the output back to the user.
To manage conversational state, we utilize a Session State, which is a special dictionary provided by Streamlit that persists values across the inevitable page reloads that occur as users interact with the app. Without this, your chatbot would "forget" previous messages every time a user clicked "send," as the entire script executes from the top down on every interaction.
Note: Always store your API keys as environment variables rather than hardcoding them into your script. This protects your credentials from being accidentally exposed in version control systems like GitHub.
Once your environment is set up, the next step is building the completion loop. You use the OpenAI client library to send a request containing a list of objects representing the conversation history. Each object contains a role (usually 'system', 'user', or 'assistant') and defined content.
The logic follows this pattern: first, append the user input to the message history; second, stream the response from the model; and third, update the history with the assistant's reply.
The power of Streamlit lies in its simplicity. You don't need HTML, CSS, or JavaScript. Instead, you use functions like st.chat_input() for user entries and st.chat_message() for individual message bubbles.
When designing your UI, consider the lifecycle of an interaction. First, you iterate through the existing messages in your Session State to recreate the chat history on the screen. Then, you display the input area at the bottom. This ensures that even after a user sends a message, they can immediately scroll up to see their previous conversation, creating a seamless experience.
As your bot grows from a basic question-answering tool to an expert system, you will eventually reach the limits of the context window. The context window is the maximum amount of text (tokens) the model can process at once, defined by the formula:
If the conversation history exceeds this limit, the model will throw an error. To prevent this, you should implement a truncation strategy or a rolling window approach, where you only send the most recent messages to the API. Another best practice is to define a system prompt that enforces strict behavior, such as limiting the response length or requiring a specific output format like JSON, which helps maintain predictability in your application.