Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Delay after publishing entity creation message #3446

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

albinsuresh
Copy link
Contributor

@albinsuresh albinsuresh commented Mar 5, 2025

Proposed changes

Introduce a delay after the entity registration is converted and published to C8Y, so that the telemetry data published afterwards doesn't race with it, leading to the auto-creation of a managed object with same ID.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#3439

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Introduce a delay after the entity registration is converted and published to C8Y,
to allow it to be processed by the cloud before sending any data messages.
This reduces the probability of the data message racing with the registration message
and auto-creating the target entity before the registration is processed.
Copy link

codecov bot commented Mar 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Additional details and impacted files

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

github-actions bot commented Mar 5, 2025

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
581 2 3 583 99.66 1h40m58.087957s

Failed Tests

Name Message ⏱️ Duration Suite
smartrest template custom operation successful Matching messages on topic 'c8y/s/dc/TST_handle_large_gaffer' is greater than maximum. wanted: 1 got: 3 messages: ['dm101,TST_handle_large_gaffer,Factory Wifi,factory-onboarding-wifi,WPA3-Personal', 'dm101,TST_handle_large_gaffer,Factory Wifi,factory-onboarding-wifi,WPA3-Personal', 'dm101,TST_handle_large_gaffer,Factory Wifi,factory-onboarding-wifi,WPA3-Personal'] 57.428 s Smartrest Template
Run custom operation with workflow execution Matching messages on topic 'te/device/main///cmd/take_picture/#' is greater than maximum. wanted: 1 got: 2 messages: ['{"@Version":"bfad95f8b7125200d66dd0f0ef4d619e5c65c0a1a8b29101563d2a8019f48f7d","c8y-mapper":{"on_fragment":"c8y_TakePicture","output":null},"duration":"5s","logPath":"/var/log/tedge/agent/workflow-take_picture-c8y-mapper-16623807.log","quality":"HD","status":"executing"}', '{"@Version":"bfad95f8b7125200d66dd0f0ef4d619e5c65c0a1a8b29101563d2a8019f48f7d","c8y-mapper":{"on_fragment":"c8y_TakePicture","output":null},"duration":"5s","logPath":"/var/log/tedge/agent/workflow-take_picture-c8y-mapper-16623807.log","quality":"HD","status":"executing"}'] 32.276 s Custom Operation Workflow

@@ -181,6 +181,9 @@ impl C8yMapperActor {
self.process_registration_message(pending_entity.reg_message)
.await?;

// Wait for a while to allow the registration message to be processed by the cloud before sending the data messages
tokio::time::sleep(Duration::from_secs(1)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it's going to break if the bridge loses connection. Is there a particular reason why we can't do what we do for the main device, i.e. publish the registration message multiple times and wait until we receive a suitable 41 response to confirm the device definitely exists?

Copy link
Contributor Author

@albinsuresh albinsuresh Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a known limitation. We're doing this low cost fix to just reduce the probability of this flakiness in 80-90% of the cases when everything is working fine. The plan was to try this fix first and see if it reduces the flakiness considerably.

The other two approaches considered were:

  1. Waiting for the duplicate device error over MQTT as done for the main device
  2. Performing the entity creation over HTTP

Since both approaches fundamentally change the control flow within the mapper, with huge impact on all existing tests(esp unit tests) as well, we just decided to do this minimal fix for now. But, we can revisit these approaches later once we achieve some stability.

@reubenmiller reubenmiller added theme:c8y Theme: Cumulocity related topics theme:mqtt Theme: mqtt and mosquitto related topics labels Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme:c8y Theme: Cumulocity related topics theme:mqtt Theme: mqtt and mosquitto related topics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants