Sample Conversations for Context Testing

This document contains sample multi-turn conversations to test whether the ACE chatbot maintains context across messages. Each conversation is designed to verify that the bot remembers previous exchanges.

Test 1: Basic Math Context

Purpose: Test if bot remembers numerical results from previous calculations

User: "What is 15 + 27?"
- Expected: Bot calculates and returns 42
User: "Now multiply that number by 3"
- Expected: Bot should remember 42 and return 126
User: "What's half of the previous result?"
- Expected: Bot should remember 126 and return 63
User: "Add 37 to it"
- Expected: Bot should remember 63 and return 100

✅ Pass Criteria: Bot correctly references previous calculations without needing to repeat the numbers

Test 2: Information Recall

Purpose: Test if bot remembers information provided earlier in conversation

User: "My name is Alice and I work as a software engineer in Seattle"
- Expected: Bot acknowledges the information
User: "What's my name?"
- Expected: Bot should respond "Alice"
User: "What do I do for work?"
- Expected: Bot should respond "You're a software engineer"
User: "Where do I work?"
- Expected: Bot should respond "Seattle"

✅ Pass Criteria: Bot accurately recalls all three pieces of information without confusion

Test 3: List Building

Purpose: Test if bot can build upon lists created in previous messages

User: "Let's create a shopping list. Add milk and eggs"
- Expected: Bot creates a list with milk and eggs
User: "Add bread and butter to the list"
- Expected: Bot adds to the existing list (now has 4 items)
User: "Remove eggs from the list"
- Expected: Bot removes eggs (now has milk, bread, butter)
User: "How many items are on the list now?"
- Expected: Bot should answer "3 items"
User: "What's on the list?"
- Expected: Bot should list milk, bread, and butter

✅ Pass Criteria: Bot maintains the list state across all modifications

Test 4: Story Continuation

Purpose: Test if bot can continue a narrative coherently

User: "Let's write a story. Start with: 'Once upon a time, there was a brave knight named Sir Roland'"
- Expected: Bot starts the story
User: "What happens next? Add a dragon to the story"
- Expected: Bot continues the story about Sir Roland and introduces a dragon
User: "How does Sir Roland defeat the dragon?"
- Expected: Bot continues the same story with a resolution
User: "What was the knight's name again?"
- Expected: Bot should remember "Sir Roland"

✅ Pass Criteria: Bot maintains story coherence and character consistency

Test 5: Comparative Analysis

Purpose: Test if bot remembers multiple items for comparison

User: "Tell me about Python programming language"
- Expected: Bot provides information about Python
User: "Now tell me about JavaScript"
- Expected: Bot provides information about JavaScript
User: "Which one is better for web development?"
- Expected: Bot compares both languages mentioned earlier
User: "What about the first language we discussed - is it object-oriented?"
- Expected: Bot refers back to Python and discusses its OOP features

✅ Pass Criteria: Bot maintains context of both programming languages and can compare them

Test 6: Sequential Instructions

Purpose: Test if bot can follow multi-step instructions that depend on previous steps

User: "Create a workout plan for Monday: 30 minutes running"
- Expected: Bot creates Monday's plan
User: "For Tuesday, add weightlifting for 45 minutes"
- Expected: Bot adds Tuesday to the existing plan
User: "What's the total workout time for the week so far?"
- Expected: Bot should calculate 30 + 45 = 75 minutes
User: "Add Wednesday: 20 minutes yoga"
- Expected: Bot adds Wednesday
User: "Show me the complete weekly plan"
- Expected: Bot lists Monday (running), Tuesday (weightlifting), Wednesday (yoga)

✅ Pass Criteria: Bot maintains the entire workout plan across all additions

Test 7: Preference Memory

Purpose: Test if bot remembers user preferences stated earlier

User: "I'm allergic to peanuts and I don't like spicy food"
- Expected: Bot acknowledges the preferences
User: "Suggest a meal for dinner"
- Expected: Bot suggests something without peanuts and not spicy
User: "What about a dessert?"
- Expected: Bot suggests a dessert without peanuts
User: "What were my dietary restrictions again?"
- Expected: Bot recalls peanut allergy and dislike of spicy food

✅ Pass Criteria: Bot consistently remembers and applies user preferences

Test 8: Problem-Solving Context

Purpose: Test if bot maintains context during problem-solving

User: "I need to plan a road trip from New York to Los Angeles"
- Expected: Bot acknowledges the trip details
User: "The trip is 2,800 miles. If I drive 400 miles per day, how many days will it take?"
- Expected: Bot calculates 7 days
User: "If I want to complete it in 5 days instead, how many miles per day?"
- Expected: Bot remembers the 2,800 miles and calculates 560 miles/day
User: "Where am I traveling from and to?"
- Expected: Bot should remember New York to Los Angeles

✅ Pass Criteria: Bot maintains trip details and calculations throughout

Test 9: Conversation Topic Switch and Return

Purpose: Test if bot can handle topic switches while maintaining context

User: "Tell me about the solar system"
- Expected: Bot provides information about the solar system
User: "Actually, let's talk about cooking instead. What's a good recipe for pasta?"
- Expected: Bot switches to cooking topic
User: "How many planets did we discuss earlier?"
- Expected: Bot should return to solar system context and mention 8 planets
User: "And what were we discussing about food?"
- Expected: Bot should remember the pasta recipe discussion

✅ Pass Criteria: Bot maintains both conversation threads and can switch between them

Test 10: Complex Playbook Building

Purpose: Test how playbook bullets accumulate and are used in context

User: "Plan a birthday party for 20 people with a budget of $500"
- Expected: Bot creates a plan and extracts bullets about party planning
User: "Now plan a wedding for 100 people with a $10,000 budget"
- Expected: Bot should use learnings from first event, mentions scale differences
User: "What did we plan first - remind me of the budget?"
- Expected: Bot remembers the birthday party with $500 budget
User: "What strategies would work for both events?"
- Expected: Bot uses accumulated playbook bullets to suggest common strategies

✅ Pass Criteria: Bot maintains context AND applies learned playbook bullets across conversations

Test 11: Pronoun Resolution

Purpose: Test if bot correctly resolves pronouns based on context

User: "Tell me about Albert Einstein"
- Expected: Bot provides information about Einstein
User: "What was his most famous theory?"
- Expected: Bot understands "his" refers to Einstein, mentions relativity
User: "When did he publish it?"
- Expected: Bot remembers both Einstein and the theory, provides date (1905/1915)
User: "Where was he born?"
- Expected: Bot remembers referring to Einstein, answers Germany

✅ Pass Criteria: Bot correctly resolves all pronouns to Einstein

Test 12: Conditional Logic Memory

Purpose: Test if bot remembers conditional statements

User: "If it rains tomorrow, I'll stay home and read. If it's sunny, I'll go hiking"
- Expected: Bot acknowledges both conditions
User: "What will I do if it rains?"
- Expected: Bot recalls "stay home and read"
User: "What about if the weather is nice?"
- Expected: Bot recalls "go hiking"
User: "What were the two options I mentioned?"
- Expected: Bot recalls both conditions and activities

✅ Pass Criteria: Bot remembers both conditional branches accurately

How to Use These Tests

Run tests in sequence: Copy each user message in order and paste into the chat
Clear chat between test suites: Use "Clear Chat History" button between each Test (#1-12)
Check responses: Verify bot responses match expected behavior
Note failures: Document any instances where context is lost
Check playbook: After several tests, verify playbook bullets are being created and used

Success Metrics

Context Retention: Bot maintains information for 4+ turns
Accurate Recall: Bot retrieves correct information when asked
Pronoun Resolution: Bot correctly identifies referents
Playbook Integration: Bot learns from conversations and applies bullets
No Hallucination: Bot doesn't invent information not provided

Known Limitations to Test

Very long conversations: Test with 10+ turns to see if context window fills
Complex nested context: Multiple topics interleaved
Ambiguous references: Test with unclear pronouns
Contradictory information: Provide conflicting info to see how bot handles it

Additional Edge Cases

Test 13: Number Sequences

"Remember this sequence: 2, 4, 8, 16"
"What comes next in the sequence?"
"What was the first number?"
"How many numbers did I give you?"

Test 14: Time-Based Context

"I have a meeting at 2 PM today"
"I also have dinner plans at 7 PM"
"What's my first commitment?"
"How much time do I have between my two events?"

Test 15: Multiple Entity Tracking

"John is 25 years old and works as a teacher"
"Mary is 30 years old and works as a doctor"
"How old is John?"
"What does Mary do?"
"Who is older?"

Last Updated: October 11, 2025 Purpose: Validate conversation context maintenance in ACE Streamlit Demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample Conversations for Context Testing

Test 1: Basic Math Context

Test 2: Information Recall

Test 3: List Building

Test 4: Story Continuation

Test 5: Comparative Analysis

Test 6: Sequential Instructions

Test 7: Preference Memory

Test 8: Problem-Solving Context

Test 9: Conversation Topic Switch and Return

Test 10: Complex Playbook Building

Test 11: Pronoun Resolution

Test 12: Conditional Logic Memory

How to Use These Tests

Success Metrics

Known Limitations to Test

Additional Edge Cases

Test 13: Number Sequences

Test 14: Time-Based Context

Test 15: Multiple Entity Tracking

FilesExpand file tree

samples.md

Latest commit

History

samples.md

File metadata and controls

Sample Conversations for Context Testing

Test 1: Basic Math Context

Test 2: Information Recall

Test 3: List Building

Test 4: Story Continuation

Test 5: Comparative Analysis

Test 6: Sequential Instructions

Test 7: Preference Memory

Test 8: Problem-Solving Context

Test 9: Conversation Topic Switch and Return

Test 10: Complex Playbook Building

Test 11: Pronoun Resolution

Test 12: Conditional Logic Memory

How to Use These Tests

Success Metrics

Known Limitations to Test

Additional Edge Cases

Test 13: Number Sequences

Test 14: Time-Based Context

Test 15: Multiple Entity Tracking