โ† Pattern library

AI

AI Chat Loop

A turn-by-turn conversation with an AI, where each user message and AI response is added to a growing history that the AI sees on the next turn.

๐ŸŒฟ

When to use this

When you need a back-and-forth conversation (customer support, tutoring, ideation). Not for one-shot tasks (use a single API call instead).

aichatconversationllmstreaminghistory
โœจ Built using these library patterns:
chat-loop

What I assumed

I made these guesses to fill gaps. Let me know if any are wrong.

    Flow diagram

    Step-by-step recipe

    Copy this and paste into Cursor, Claude Code, or v0.

    PATTERN: AI Chat Loop
    INPUT: user_message, conversation_history
    OUTPUT: ai_response_message (streamed)
    
    STEPS:
      1. User opens chat interface (history loaded if existing conversation)
      2. User types and sends a message
      3. Append user message to local history immediately (optimistic UI)
      4. Show "AI is thinking..." indicator
      5. Send full conversation history + new message to LLM API
      6. Stream response tokens back to UI as they arrive
      7. IF stream completes successfully โ†’ finalize and append AI message to history
      8. Persist updated conversation to DB (debounced)
      9. Reset input, ready for next user message
      10. Loop to step 2
    
    ERROR_HANDLING:
      - LLM API timeout โ†’ show "Taking longer than usual, retry?" button
      - LLM rate limit hit โ†’ show "Hold on a moment" and auto-retry with backoff
      - Stream interrupted mid-response โ†’ save partial, mark as incomplete, offer "Continue"
      - Token context limit exceeded โ†’ summarize older messages or start new conversation
      - User sends empty message โ†’ ignore silently or disable send button
    
    EXTENSION_POINTS:
      - Function/tool calling within the loop (composable_with: ["tool-calling"])
      - Document retrieval before each call (composable_with: ["rag-retrieval"])
      - Hand off to human when AI struggles (composable_with: ["human-escalation"])
      - Voice input/output (composable_with: ["voice-chat"])
    

    States โ€” how things change

    StateDescriptionTransitions
    Awaiting user inputChat is idle, user hasn't typed yet
    • Message sentโ†’AI generating response
    AI generating responseStreaming tokens from LLM API
    • Stream completeโ†’Awaiting user input
    • Stream failedโ†’Error recovery
    • Token limit hitโ†’Context overflow
    Error recoveryShowing retry option after API failure
    • Retryโ†’AI generating response
    • Give upโ†’Awaiting user input
    Context overflowConversation too long for LLM token limit
    • Summarizedโ†’AI generating response
    • New chatโ†’Awaiting user input

    Easy-to-miss situations

    The kinds of edge cases that break demos.

    • What if the conversation gets too long for the AI's memory?

      high

      All LLMs have a token limit. Past it, you can't send the full history.

      Suggested handling: Summarize older messages with a smaller LLM call ("rolling summary"), or offer "Start new chat" with summary brought over. Show a soft warning at 75% capacity.

    • What if the user sends a message while AI is still responding?

      medium

      Most chat UIs allow only one in-flight call. Need to handle queuing or interruption.

      Suggested handling: Queue the new message and send after current response finishes. Show "Sending after current response..." hint. Or disable input until done.

    • What if the AI gives a wrong or harmful answer?

      high

      LLMs hallucinate and can produce unsafe content. User trust depends on handling this gracefully.

      Suggested handling: Add ๐Ÿ‘Ž thumbs-down on every AI message. Log + review. Use system prompt safety instructions. For high-stakes domains (medical, legal), add disclaimer + human escalation path.

    • What if the streaming connection drops mid-response?

      medium

      User sees half a response and confusion.

      Suggested handling: Save partial response, mark with "Connection lost, [Continue]" button. On continue, send "Please complete your previous response" as a hidden continuation prompt.

    • What if the AI service costs spike unexpectedly?

      high

      Long conversations or abusive users can drain budget fast.

      Suggested handling: Per-user daily token budget. Cache common responses (e.g., greetings) at gateway level. Use cheaper model for first-pass classification, escalate to expensive model only when needed.

    Composes well with

    Combine these patterns when you need a richer flow.

    Build a flow starting from this pattern โ†’