AI
AI Chat Loop
A turn-by-turn conversation with an AI, where each user message and AI response is added to a growing history that the AI sees on the next turn.
When to use this
When you need a back-and-forth conversation (customer support, tutoring, ideation). Not for one-shot tasks (use a single API call instead).
What I assumed
I made these guesses to fill gaps. Let me know if any are wrong.
Flow diagram
Step-by-step recipe
Copy this and paste into Cursor, Claude Code, or v0.
PATTERN: AI Chat Loop
INPUT: user_message, conversation_history
OUTPUT: ai_response_message (streamed)
STEPS:
1. User opens chat interface (history loaded if existing conversation)
2. User types and sends a message
3. Append user message to local history immediately (optimistic UI)
4. Show "AI is thinking..." indicator
5. Send full conversation history + new message to LLM API
6. Stream response tokens back to UI as they arrive
7. IF stream completes successfully โ finalize and append AI message to history
8. Persist updated conversation to DB (debounced)
9. Reset input, ready for next user message
10. Loop to step 2
ERROR_HANDLING:
- LLM API timeout โ show "Taking longer than usual, retry?" button
- LLM rate limit hit โ show "Hold on a moment" and auto-retry with backoff
- Stream interrupted mid-response โ save partial, mark as incomplete, offer "Continue"
- Token context limit exceeded โ summarize older messages or start new conversation
- User sends empty message โ ignore silently or disable send button
EXTENSION_POINTS:
- Function/tool calling within the loop (composable_with: ["tool-calling"])
- Document retrieval before each call (composable_with: ["rag-retrieval"])
- Hand off to human when AI struggles (composable_with: ["human-escalation"])
- Voice input/output (composable_with: ["voice-chat"])
States โ how things change
| State | Description | Transitions |
|---|---|---|
| Awaiting user input | Chat is idle, user hasn't typed yet |
|
| AI generating response | Streaming tokens from LLM API |
|
| Error recovery | Showing retry option after API failure |
|
| Context overflow | Conversation too long for LLM token limit |
|
Easy-to-miss situations
The kinds of edge cases that break demos.
What if the conversation gets too long for the AI's memory?
highAll LLMs have a token limit. Past it, you can't send the full history.
Suggested handling: Summarize older messages with a smaller LLM call ("rolling summary"), or offer "Start new chat" with summary brought over. Show a soft warning at 75% capacity.
What if the user sends a message while AI is still responding?
mediumMost chat UIs allow only one in-flight call. Need to handle queuing or interruption.
Suggested handling: Queue the new message and send after current response finishes. Show "Sending after current response..." hint. Or disable input until done.
What if the AI gives a wrong or harmful answer?
highLLMs hallucinate and can produce unsafe content. User trust depends on handling this gracefully.
Suggested handling: Add ๐ thumbs-down on every AI message. Log + review. Use system prompt safety instructions. For high-stakes domains (medical, legal), add disclaimer + human escalation path.
What if the streaming connection drops mid-response?
mediumUser sees half a response and confusion.
Suggested handling: Save partial response, mark with "Connection lost, [Continue]" button. On continue, send "Please complete your previous response" as a hidden continuation prompt.
What if the AI service costs spike unexpectedly?
highLong conversations or abusive users can drain budget fast.
Suggested handling: Per-user daily token budget. Cache common responses (e.g., greetings) at gateway level. Use cheaper model for first-pass classification, escalate to expensive model only when needed.
Composes well with
Combine these patterns when you need a richer flow.