Skip to content

Conversation

@alexnorell
Copy link
Contributor

@alexnorell alexnorell commented Nov 24, 2025

Summary

  • Add OPCUAConnectionManager singleton for connection pooling
  • Implement circuit breaker pattern to fail fast when servers are unreachable
  • Default to max_retries=1, retry_backoff=0 (fail fast, don't block pipeline)

Circuit Breaker Behavior

When a connection fails:

  1. First attempt: tries to connect with timeout (default 2s)
  2. On failure: records failure timestamp and fails immediately
  3. Subsequent attempts within 30s: fail instantly (no blocking, no memory retention)
  4. After 30s: allows one retry attempt

This prevents:

  • Thread pool exhaustion from blocked connection attempts
  • Memory retention from queued closures holding frame data
  • Pipeline slowdown when OPC server is unreachable

Test plan

  • All 31 OPC UA writer tests pass
  • Tested with real OPC UA server in production environment

@alexnorell alexnorell force-pushed the fix/opcua branch 3 times, most recently from 5f61bc6 to 984b942 Compare November 25, 2025 03:14
- Add OPCUAConnectionManager singleton for connection pooling
- Implement circuit breaker pattern to fail fast when servers are unreachable
  - After connection failure, skip attempts for 30s (no blocking, no memory retention)
- Default to max_retries=1, retry_backoff=0 (fail fast, don't block pipeline)
- Use asyncua exception types (CONNECTION_ERROR_TYPES) instead of string matching
- Add support for all OPC UA numeric types
alexnorell and others added 7 commits November 24, 2025 21:37
- Circuit breaker timeout: 30s → 2s (faster retry after failure)
- Default max_retries: 1 → 3 (more resilient)
- Default retry_backoff_seconds: 0 → 0.015 (15ms exponential backoff)

The new behavior is:
- Connection fails → wait 15ms → retry
- Second failure → wait 30ms → retry
- Third failure → circuit opens for 2 seconds (fail fast)
- After 2 seconds → retry cycle starts again
@sberan sberan marked this pull request as ready for review November 26, 2025 19:22
@grzegorz-roboflow
Copy link
Collaborator

Integration tests are timing out, I increased timeout from 15 to 20 minutes but the problem is still there, can you verify if OPC tests are not hanging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants