Skip to content

Conversation

@allisonschiang
Copy link
Member

@allisonschiang allisonschiang commented Nov 10, 2025

Whenever a module fails to startup, add it to the failing modules map

When a module is failing, it doesn't get stored in mgr.modules, so when you compare config diffs it won't find the difference between the previous config and the newConfig (where you removed a failing module). So to be able to tell when the failing module has been removed, I compare my failing modules list directly to the newConfig and remove any difference in the failing modules list.

When modules are removed for other reasons, it also removes them from the failing modules.

TLDR: testing fixing a failing local module, non-local module, and deleting modules

Testing:

Adding failing module

  1. Original state: No failing modules and some working modules:
    11/10/2025, 1:02:53 PM error rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

  2. Adding a failing local module (called local-module-1):
    11/14/2025, 2:55:12 PM error rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; May be in failing module: [local-module-1]; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

  3. Add a failing module using hot reload
    11/14/2025, 2:56:20 PM error rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; May be in failing module: [local-module-1 allisonorg_hot-reload-module_from_reload]; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

Fixing failing module
4. Fix the failing local module
11/14/2025, 2:57:57 PM error rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; May be in failing module: [allisonorg_hot-reload-module_from_reload]; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

  1. Fix the failing hot reload module
    11/14/2025, 2:59:04 PM error rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

Deleting Failing Module
6. Re-add Failing module
11/14/2025, 3:00:37 PM error rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; May be in failing module: [local-module-1]; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

  1. Delete failing module
    11/14/2025, 3:01:27 PM error 'rdk.resource_manager.rdk:component:board/board-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:component:board with model s:s:s not registered; There may be no module in config that provides this model   resource rdk:component:board/board-1  model s:s:s

Make a working module fails
7. reload a working module with a broken link
11/19/2025, 1:38:14 PM error rdk.resource_manager.rdk:service:discovery/discovery-1   resource/graph_node.go:308   resource build error: unknown resource type: API rdk:service:discovery with model viam:find-webcams:webcam-discovery not registered; May be in failing module: [workingmodule]; There may be no module in config that provides this model   resource rdk:service:discovery/discovery-1  model viam:find-webcams:webcam-discovery

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Nov 10, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 10, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 14, 2025
moduleLogger.CErrorw(ctx, "Error adding module", "module", conf.Name, "error", err)
mgr.muFailedModules.Lock()
mgr.failedModules[conf.Name] = true
mgr.muFailedModules.Unlock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add modules to failedModules if failed to add

errs[i] = err
return
}
mgr.muFailedModules.Lock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a module gets successfully added, check to see if its on list and delete it from there if is

@allisonschiang allisonschiang requested a review from a team November 14, 2025 20:05
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 14, 2025
@cheukt
Copy link
Member

cheukt commented Nov 14, 2025

this looks good - please add some regression tests for each of the cases you tested manually!

@benjirewis
Copy link
Member

Agree with Cheuk, and I do see some nil pointer exception in the test failures.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 19, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 19, 2025
@viambot viambot removed the safe to test This pull request is marked safe to test from a trusted zone label Nov 19, 2025
Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new tests is a great start, can we also add some integration tests? in particular, annotating at least some of the tests in module_lifecycle_test.go (the ones with crashes in them in particular) with checks on failed modules would be good. You can get the local robot in those tests with r.(*localRobot)

@allisonschiang
Copy link
Member Author

allisonschiang commented Nov 20, 2025

the new tests is a great start, can we also add some integration tests? in particular, annotating at least some of the tests in module_lifecycle_test.go (the ones with crashes in them in particular) with checks on failed modules would be good. You can get the local robot in those tests with r.(*localRobot)

Good point, going to move all my tests into module_lifecycle_test.go so can have access to the localRobot to better replicate real conditions rather than calling isolated functions like Add() and UpdateFailedModules(). Will only test my Add/Remove functions

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 20, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
// to reconfigure.
if err := mod.Validate(""); err != nil {
manager.logger.CErrorw(ctx, "module config validation error; skipping", "module", mod.Name, "error", err)
manager.moduleManager.AddToFailedModules(mod.Name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you update the module with in invalid exec, it won't run reconfigure so manually add it

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants