AI Chat Loop

A turn-by-turn conversation with an AI, where each user message and AI response is added to a growing history that the AI sees on the next turn.

🌿

When to use this

When you need a back-and-forth conversation (customer support, tutoring, ideation). Not for one-shot tasks (use a single API call instead).

aichatconversationllmstreaminghistory

✨ Built using these library patterns:

chat-loop

What I assumed

I made these guesses to fill gaps. Let me know if any are wrong.

Flow diagram

Step-by-step recipe

Copy this and paste into Cursor, Claude Code, or v0.

PATTERN: AI Chat Loop
INPUT: user_message, conversation_history
OUTPUT: ai_response_message (streamed)

STEPS:
  1. User opens chat interface (history loaded if existing conversation)
  2. User types and sends a message
  3. Append user message to local history immediately (optimistic UI)
  4. Show "AI is thinking..." indicator
  5. Send full conversation history + new message to LLM API
  6. Stream response tokens back to UI as they arrive
  7. IF stream completes successfully → finalize and append AI message to history
  8. Persist updated conversation to DB (debounced)
  9. Reset input, ready for next user message
  10. Loop to step 2

ERROR_HANDLING:
  - LLM API timeout → show "Taking longer than usual, retry?" button
  - LLM rate limit hit → show "Hold on a moment" and auto-retry with backoff
  - Stream interrupted mid-response → save partial, mark as incomplete, offer "Continue"
  - Token context limit exceeded → summarize older messages or start new conversation
  - User sends empty message → ignore silently or disable send button

EXTENSION_POINTS:
  - Function/tool calling within the loop (composable_with: ["tool-calling"])
  - Document retrieval before each call (composable_with: ["rag-retrieval"])
  - Hand off to human when AI struggles (composable_with: ["human-escalation"])
  - Voice input/output (composable_with: ["voice-chat"])

States — how things change

State	Description	Transitions
Awaiting user input	Chat is idle, user hasn't typed yet	Message sent→AI generating response
AI generating response	Streaming tokens from LLM API	Stream complete→Awaiting user input Stream failed→Error recovery Token limit hit→Context overflow
Error recovery	Showing retry option after API failure	Retry→AI generating response Give up→Awaiting user input
Context overflow	Conversation too long for LLM token limit	Summarized→AI generating response New chat→Awaiting user input

Easy-to-miss situations

The kinds of edge cases that break demos.

What if the conversation gets too long for the AI's memory?
high
All LLMs have a token limit. Past it, you can't send the full history.
Suggested handling: Summarize older messages with a smaller LLM call ("rolling summary"), or offer "Start new chat" with summary brought over. Show a soft warning at 75% capacity.
What if the user sends a message while AI is still responding?
medium
Most chat UIs allow only one in-flight call. Need to handle queuing or interruption.
Suggested handling: Queue the new message and send after current response finishes. Show "Sending after current response..." hint. Or disable input until done.
What if the AI gives a wrong or harmful answer?
high
LLMs hallucinate and can produce unsafe content. User trust depends on handling this gracefully.
Suggested handling: Add 👎 thumbs-down on every AI message. Log + review. Use system prompt safety instructions. For high-stakes domains (medical, legal), add disclaimer + human escalation path.
What if the streaming connection drops mid-response?
medium
User sees half a response and confusion.
Suggested handling: Save partial response, mark with "Connection lost, [Continue]" button. On continue, send "Please complete your previous response" as a hidden continuation prompt.
What if the AI service costs spike unexpectedly?
high
Long conversations or abusive users can drain budget fast.
Suggested handling: Per-user daily token budget. Cache common responses (e.g., greetings) at gateway level. Use cheaper model for first-pass classification, escalate to expensive model only when needed.

Composes well with

Combine these patterns when you need a richer flow.

tool-calling rag-retrieval human-escalation streaming-response

Build a flow starting from this pattern →

AI Chat Loop

What if the conversation gets too long for the AI's memory?

What if the user sends a message while AI is still responding?

What if the AI gives a wrong or harmful answer?

What if the streaming connection drops mid-response?

What if the AI service costs spike unexpectedly?

Composes well with