Skip to content

xmtp/xmtp-qa-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QA Tools

This monorepo contains a comprehensive collection of tools for testing and monitoring the XMTP protocol and its implementations.

Automated workflows

Test suite Performance Resources Run frequency Networks
Functional Functional Workflow / Test code Every 3 hours dev production
Regression Regression Workflow / Test code Every 6 hours dev production
Performance Performance Workflow / Test code Every 30 min dev production
Delivery Performance Workflow / Test code Every 30 min dev production
Groups Performance Workflow / Test code Every 2 hours dev production
Agents Performance Workflow / Test code Every 15 min dev production
Browser Browser Workflow / Test code Every 30 min dev production

Architecture

This flowchart illustrates the XMTP protocol's layered architecture and testing scope:

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#0D1117', 'primaryTextColor': '#c9d1d9', 'primaryBorderColor': '#30363d', 'lineColor': '#8b949e', 'secondaryColor': '#161b22', 'tertiaryColor': '#161b22' }}}%%

flowchart LR
  %% Core components and bindings
  subgraph Bindings["Bindings"]
    wasm["WASM"]
    ffi["FFI"]
    napi["Napi"]
  end

  subgraph SDKs["SDKs"]
    browserSDK["Browser SDK"]
    swiftSDK["Swift SDK"]
    kotlinSDK["Kotlin SDK"]
    reactNativeSDK["React Native SDK"]
    nodesdk["Node SDK"]
  end

  subgraph Applications["Applications"]
    webApps["xmtp.chat"]
    mobileApps["Native Apps"]
    crossPlatformApps["Cross-platform Apps"]
    messagingApps["Convos"]
    botAgents["Bots & Agents"]
    backendServices["Backend Services"]
  end

  decentralNode["Decentralized<br>Nodes"] --> libxmtp["LibXMTP<br>(openmls)<br>(diesel)"]
  libxmtp --- wasm
  libxmtp --- ffi
  kotlinSDK --- mobileApps
  libxmtp --- napi

  wasm --- browserSDK
  ffi --- swiftSDK
  ffi --- kotlinSDK

  swiftSDK --- reactNativeSDK
  kotlinSDK --- reactNativeSDK

  browserSDK --- webApps

  swiftSDK --- mobileApps

  napi --- nodesdk
  nodesdk --- botAgents
  nodesdk --- backendServices

  reactNativeSDK --- messagingApps
  napi -.- reactNativeSDK

  linkStyle 0,4,12,13 stroke:#f66,stroke-width:4px,stroke-dasharray: 5,5;
  classDef highlightStroke stroke:#f66,color:#c9d1d9,stroke-width:4px;
  class centralNode,libxmtp,webApps,messagingApps,botAgents highlightStroke;
Loading

The highlighted path (red dashed line) in the architecture diagram shows our main testing focus.

LibXMTP is a shared library built in Rust and compiled to WASM, Napi, and FFI bindings. It encapsulates the core cryptography functions of the XMTP messaging protocol. Due to the complexity of the protocol, we are using openmls as the underlying cryptographic library, it's important to test how this bindings perform in their own language environments.

We can test all XMTP bindings using three main applications. We use xmtp.chat to test the Browser SDK's Wasm binding in actual web environments. We use Convos to test the React Native SDK, which uses both Swift and Kotlin FFI bindings for mobile devices. We use agents to test the Node SDK's Napi binding for server functions. This testing method checks the entire protocol across all binding types, making sure different clients work together, messages are saved, and users have the same experience across the XMTP system.

Operation performance

Core SDK Operations Performance

Operation Description Avg Target Performance
clientCreate Creating a client 588 <350 Concern
inboxState Checking inbox state 41 <350 On Target
newDm Creating a direct message conversation 258 <350 On Target
newDmWithIdentifiers Creating a dm by address 294 <350 On Target
sendGM Sending a group message 126 <200 On Target
receiveGM Receiving a group message 87 <200 On Target
createGroup Creating a group 315 <350 On Target
createGroupByIdentifiers Creating a group by address 313 <350 On Target
syncGroup Syncing group state 76 <200 On Target
updateGroupName Updating group metadata 129 <200 On Target
removeMembers Removing participants from a group 127 <250 On Target
sendGroupMessage Sending a group message 85 <200 On Target
receiveGroupMessage Processing group message strea 124 <200 On Target

Group operations performance

Sender-Side average performance

Size Send message Update name Remove members Create Performance
50 86 135 139 1329 On Target
100 88 145 157 1522 On Target
150 95 203 190 2306 On Target
200 93 193 205 3344 On Target
250 108 219 237 4276 On Target
300 97 244 247 5463 On Target
350 101 264 308 6641 On Target
400 111 280 320 7641 On Target

Note: This measurments are taken only from the sender side and after the group is created.

Receiver-Side stream performance

Group Size New conversation Metadata Messages Add Members Performance
50 687 141 131 401 On Target
100 746 155 117 420 On Target
150 833 163 147 435 On Target
200 953 179 173 499 On Target
250 1007 187 161 526 Concern
300 1040 195 167 543 Concern
350 1042 198 178 581 Concern
400 1192 214 173 609 Concern

Note: This measurments are taken only from the receiver side and after the group is created.

Receiver-Side sync performance

Size syncAll sync Performance
50 366 ... 291 ... On Target
100 503 521 424 372 On Target
150 665 727 522 622 On Target
200 854 1066 653 936 On Target
250 966 1582 768 1148 Concern
300 1225 1619 861 1362 Concern
350 1322 1846 1218 2017 Concern
400 1292 2082 1325 1792 Concern

Note: syncAll is measured only as the first cold start of the client (fresh inbox). Cumulative sync is measured as the first time all the groups are sync for the first time.

Networks performance

Network performance

Performance Metric Average Target Performance
DNS Lookup 13 <50 On Target
TCP Connection 48 <70 On Target
TLS Handshake 124 <150 On Target
Processing 35 <100 On Target
Server Call 159 <250 On Target

Regional Network Performance

Region Server Call TLS ~ us-east Performance
us-east 140 123 Baseline On Target
us-west 151 118 <20% ~ On Target
europe 230 180 <40% ~ On Target
asia 450 350 >100% ~ Concern
south-america 734 573 >200% ~ Concern

Note: Baseline is us-east region and production network.

Note: Production network consistently shows better network performance across all regions, with improvements ranging from 5.5% to 9.1%.

Message reliability

Message delivery testing

Test Area Average Target Performance
Stream Delivery Rate 100% successful 99.9% minimum On Target
Poll Delivery Rate 100% successful 99.9% minimum On Target
Recovery Rate 100% successful 99.9% minimum On Target
Stream Order 100% in order 99.9% in order On Target
Poll Order 100% in order 99.9% in order On Target
Recovery Order 100% in order 99.9% in order On Target

Note: Testing regularly in groups of 40 active members listening to one user sending 100 messages

Storage

Storage by Group Size

Group Size Groups Sender storage Avg Group Size Receiver storage Efficiency Gain
2 members 261 5.1 MB 0.020 MB 1.617 MB baseline
10 members 114 5.1 MB 0.044 MB 3.133 MB 2.2× better
50 members 31 5.3 MB 0.169 MB 3.625 MB 2.9× better
100 members 19 5.6 MB 0.292 MB 5.566 MB 3.3× better
150 members 12 5.6 MB 0.465 MB 6.797 MB 3.2× better
200 members 10 6.2 MB 0.618 MB 8.090 MB 3.2× better

Large Inbox Sync Performance Summary

Inbox Size Sync Time (ms) DB Size (MB) Existing Groups queryGroupMessages
Small 335 20 5 17
Medium 364 107 17 53
Large 365 208 31 95
XL 376 410 59 179

Success criteria summary

Metric Current Performance Target Performance
Core SDK Operations All within targets Meet defined targets On Target
Small Group Operations ≤300 ≤300 for <50 members On Target
Medium Group Operations ≤1000 ≤1000 for <400 members Concern
Network Performance All metrics within target Meet defined targets On Target
Message Delivery 100% 99.9% minimum On Target
Stream Message Loss 100% 99.9% minimum On Target
Poll Message Loss 100% 99.9% minimum On Target
Message Order 100% 100% in order On Target
South-america & Asia more than 40% <20% difference Concern
US & Europe less than 20% variance <20% difference On Target
Dev vs Production Production 4.5-16.1% better Production ≥ Dev On Target

Tools & utilities

Testing Summary

Test coverage

  • Functional: Core protocol (DMs, groups, streams, sync, consent, codecs, installations)
  • Metrics: Performance benchmarking, delivery reliability, large-scale testing (up to 400 members)
  • Regression: Backward compatibility testing for the last 3 versions
  • NetworkChaos: Partition tolerance, duplicate prevention, reconciliation, key rotation
  • Browser: Cross-browser compatibility via Playwright automation
  • Agents: Live production bot health monitoring
  • Mobile: Cross-platform performance testing
  • Bugs: Historical issue reproduction and regression prevention
  • Other: Security, spam detection, rate limiting, storage efficiency

Testing framework

  • Multi-version SDKs: Compatibility testing across versions 0.0.47 → 2.2.0+
  • Stream verification: Message delivery, conversation streams, metadata updates
  • Performance monitoring: Datadog metrics collection
  • Browser automation: Playwright-based web app testing
  • CI automation: Automated testing with logging and alerting
  • Alerting: Slack notifications with error pattern filtering
  • Log analysis: Automated error detection and deduplication
  • Dashboard: Datadog integration tracking delivery rates, response times, geographic performance
  • CLI Tools: Test execution, version management, key generation
  • Slack Bot: AI-powered responses, history fetching, log management
  • Geographic testing: Multi-region performance across US, Europe, Asia, South America

Metrics tracked

  • Delivery: 100% success rate (target: 99.9%)
  • Performance: <350ms core operations, <200ms messaging, <150ms TLS
  • Scale: Groups up to 400 members, high-volume message testing
  • Network: DNS, TCP, TLS timing across 5 global regions
  • Agent health: Live production bot response time monitoring

Development

Prerequisites

  • Node.js (>20.18.0)
  • Yarn 4.6.0

Installation

# Installation For a faster download with just the latest code
git clone --depth=1 https://github.com/xmtp/xmtp-qa-tools
cd xmtp-qa-tools
yarn install

Environment variables

XMTP_ENV="dev" #  environment (dev, production, local, multinode)
LOGGING_LEVEL="error" # Rust library logs
LOG_LEVEL="debug" # JS logs level

Running tests

To get started set up the environment in variables and run the tests with:

# Simple dms test
yarn test dms
# Full functional test
yarn test functional
# Performance test example
yarn test performance

Debug mode

yarn test functional --debug

This will save logs to logs/ directory and will not print to the terminal.

Rate limits

  • Read operations: 20,000 requests per 5-minute window
  • Write operations: 3,000 messages published per 5-minute window

Rate limits

  • Read operations: 20,000 requests per 5-minute window
  • Write operations: 3,000 messages published per 5-minute window

Resources

About

This monorepo contains multiple tools for testing and monitoring

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7

Languages