This document contains sample multi-turn conversations to test whether the ACE chatbot maintains context across messages. Each conversation is designed to verify that the bot remembers previous exchanges.
Purpose: Test if bot remembers numerical results from previous calculations
-
User: "What is 15 + 27?"
- Expected: Bot calculates and returns 42
-
User: "Now multiply that number by 3"
- Expected: Bot should remember 42 and return 126
-
User: "What's half of the previous result?"
- Expected: Bot should remember 126 and return 63
-
User: "Add 37 to it"
- Expected: Bot should remember 63 and return 100
✅ Pass Criteria: Bot correctly references previous calculations without needing to repeat the numbers
Purpose: Test if bot remembers information provided earlier in conversation
-
User: "My name is Alice and I work as a software engineer in Seattle"
- Expected: Bot acknowledges the information
-
User: "What's my name?"
- Expected: Bot should respond "Alice"
-
User: "What do I do for work?"
- Expected: Bot should respond "You're a software engineer"
-
User: "Where do I work?"
- Expected: Bot should respond "Seattle"
✅ Pass Criteria: Bot accurately recalls all three pieces of information without confusion
Purpose: Test if bot can build upon lists created in previous messages
-
User: "Let's create a shopping list. Add milk and eggs"
- Expected: Bot creates a list with milk and eggs
-
User: "Add bread and butter to the list"
- Expected: Bot adds to the existing list (now has 4 items)
-
User: "Remove eggs from the list"
- Expected: Bot removes eggs (now has milk, bread, butter)
-
User: "How many items are on the list now?"
- Expected: Bot should answer "3 items"
-
User: "What's on the list?"
- Expected: Bot should list milk, bread, and butter
✅ Pass Criteria: Bot maintains the list state across all modifications
Purpose: Test if bot can continue a narrative coherently
-
User: "Let's write a story. Start with: 'Once upon a time, there was a brave knight named Sir Roland'"
- Expected: Bot starts the story
-
User: "What happens next? Add a dragon to the story"
- Expected: Bot continues the story about Sir Roland and introduces a dragon
-
User: "How does Sir Roland defeat the dragon?"
- Expected: Bot continues the same story with a resolution
-
User: "What was the knight's name again?"
- Expected: Bot should remember "Sir Roland"
✅ Pass Criteria: Bot maintains story coherence and character consistency
Purpose: Test if bot remembers multiple items for comparison
-
User: "Tell me about Python programming language"
- Expected: Bot provides information about Python
-
User: "Now tell me about JavaScript"
- Expected: Bot provides information about JavaScript
-
User: "Which one is better for web development?"
- Expected: Bot compares both languages mentioned earlier
-
User: "What about the first language we discussed - is it object-oriented?"
- Expected: Bot refers back to Python and discusses its OOP features
✅ Pass Criteria: Bot maintains context of both programming languages and can compare them
Purpose: Test if bot can follow multi-step instructions that depend on previous steps
-
User: "Create a workout plan for Monday: 30 minutes running"
- Expected: Bot creates Monday's plan
-
User: "For Tuesday, add weightlifting for 45 minutes"
- Expected: Bot adds Tuesday to the existing plan
-
User: "What's the total workout time for the week so far?"
- Expected: Bot should calculate 30 + 45 = 75 minutes
-
User: "Add Wednesday: 20 minutes yoga"
- Expected: Bot adds Wednesday
-
User: "Show me the complete weekly plan"
- Expected: Bot lists Monday (running), Tuesday (weightlifting), Wednesday (yoga)
✅ Pass Criteria: Bot maintains the entire workout plan across all additions
Purpose: Test if bot remembers user preferences stated earlier
-
User: "I'm allergic to peanuts and I don't like spicy food"
- Expected: Bot acknowledges the preferences
-
User: "Suggest a meal for dinner"
- Expected: Bot suggests something without peanuts and not spicy
-
User: "What about a dessert?"
- Expected: Bot suggests a dessert without peanuts
-
User: "What were my dietary restrictions again?"
- Expected: Bot recalls peanut allergy and dislike of spicy food
✅ Pass Criteria: Bot consistently remembers and applies user preferences
Purpose: Test if bot maintains context during problem-solving
-
User: "I need to plan a road trip from New York to Los Angeles"
- Expected: Bot acknowledges the trip details
-
User: "The trip is 2,800 miles. If I drive 400 miles per day, how many days will it take?"
- Expected: Bot calculates 7 days
-
User: "If I want to complete it in 5 days instead, how many miles per day?"
- Expected: Bot remembers the 2,800 miles and calculates 560 miles/day
-
User: "Where am I traveling from and to?"
- Expected: Bot should remember New York to Los Angeles
✅ Pass Criteria: Bot maintains trip details and calculations throughout
Purpose: Test if bot can handle topic switches while maintaining context
-
User: "Tell me about the solar system"
- Expected: Bot provides information about the solar system
-
User: "Actually, let's talk about cooking instead. What's a good recipe for pasta?"
- Expected: Bot switches to cooking topic
-
User: "How many planets did we discuss earlier?"
- Expected: Bot should return to solar system context and mention 8 planets
-
User: "And what were we discussing about food?"
- Expected: Bot should remember the pasta recipe discussion
✅ Pass Criteria: Bot maintains both conversation threads and can switch between them
Purpose: Test how playbook bullets accumulate and are used in context
-
User: "Plan a birthday party for 20 people with a budget of $500"
- Expected: Bot creates a plan and extracts bullets about party planning
-
User: "Now plan a wedding for 100 people with a $10,000 budget"
- Expected: Bot should use learnings from first event, mentions scale differences
-
User: "What did we plan first - remind me of the budget?"
- Expected: Bot remembers the birthday party with $500 budget
-
User: "What strategies would work for both events?"
- Expected: Bot uses accumulated playbook bullets to suggest common strategies
✅ Pass Criteria: Bot maintains context AND applies learned playbook bullets across conversations
Purpose: Test if bot correctly resolves pronouns based on context
-
User: "Tell me about Albert Einstein"
- Expected: Bot provides information about Einstein
-
User: "What was his most famous theory?"
- Expected: Bot understands "his" refers to Einstein, mentions relativity
-
User: "When did he publish it?"
- Expected: Bot remembers both Einstein and the theory, provides date (1905/1915)
-
User: "Where was he born?"
- Expected: Bot remembers referring to Einstein, answers Germany
✅ Pass Criteria: Bot correctly resolves all pronouns to Einstein
Purpose: Test if bot remembers conditional statements
-
User: "If it rains tomorrow, I'll stay home and read. If it's sunny, I'll go hiking"
- Expected: Bot acknowledges both conditions
-
User: "What will I do if it rains?"
- Expected: Bot recalls "stay home and read"
-
User: "What about if the weather is nice?"
- Expected: Bot recalls "go hiking"
-
User: "What were the two options I mentioned?"
- Expected: Bot recalls both conditions and activities
✅ Pass Criteria: Bot remembers both conditional branches accurately
- Run tests in sequence: Copy each user message in order and paste into the chat
- Clear chat between test suites: Use "Clear Chat History" button between each Test (#1-12)
- Check responses: Verify bot responses match expected behavior
- Note failures: Document any instances where context is lost
- Check playbook: After several tests, verify playbook bullets are being created and used
- Context Retention: Bot maintains information for 4+ turns
- Accurate Recall: Bot retrieves correct information when asked
- Pronoun Resolution: Bot correctly identifies referents
- Playbook Integration: Bot learns from conversations and applies bullets
- No Hallucination: Bot doesn't invent information not provided
- Very long conversations: Test with 10+ turns to see if context window fills
- Complex nested context: Multiple topics interleaved
- Ambiguous references: Test with unclear pronouns
- Contradictory information: Provide conflicting info to see how bot handles it
- "Remember this sequence: 2, 4, 8, 16"
- "What comes next in the sequence?"
- "What was the first number?"
- "How many numbers did I give you?"
- "I have a meeting at 2 PM today"
- "I also have dinner plans at 7 PM"
- "What's my first commitment?"
- "How much time do I have between my two events?"
- "John is 25 years old and works as a teacher"
- "Mary is 30 years old and works as a doctor"
- "How old is John?"
- "What does Mary do?"
- "Who is older?"
Last Updated: October 11, 2025 Purpose: Validate conversation context maintenance in ACE Streamlit Demo