Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): add an event exposing source routing information #1264

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

deviantintegral
Copy link
Contributor

This is the first step in Koenkk/zigbee2mqtt#21859.

@@ -15,6 +15,7 @@ interface AdapterEventMap {
zdoResponse: [clusterId: Zdo.ClusterId, response: ZdoTypes.GenericZdoResponse];
disconnected: [];
deviceLeave: [payload: AdapterEvents.DeviceLeavePayload];
sourceRoute: [payload: any];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be typed too? When I tried I was getting typing errors in the emit line in the zstack adapter class, but I didn't get much deeper than that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI check would indicate that yes, I should fix this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely needs typing, and definitely needs to be a generic form that works with all adapters (not the payload of zstack). Although, it's unlikely some adapters will even be able to support this.

else if (object.command.name === 'srcRtgInd') {
this.emit('sourceRoute', object.payload);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this from the debugger. Unfortunately, I don't have any other adapter types at hand to see if they support this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest "equivalent" in ember would probably be this:

/**
* Callback
* Reports the arrival of a route record command frame.
* @param NodeId The source of the route record.
* @param EUI64 The EUI64 of the source.
* @param lastHopLqi uint8_t The link quality from the node that last relayed the route record.
* @param lastHopRssi int8_t The energy level (in units of dBm) observed during the reception.
* @param uint8_t The number of relays in relayList.
* @param relayList uint8_t * The route record. Each relay in the list is an uint16_t node ID.
* The list is passed as uint8_t * to avoid alignment problems.
*/
ezspIncomingRouteRecordHandler(
source: NodeId,
sourceEui: EUI64,
lastHopLqi: number,
lastHopRssi: number,
relayCount: number,
relayList: number[],
): void {
logger.debug(
`ezspIncomingRouteRecordHandler(): callback called with: [source=${source}], [sourceEui=${sourceEui}], [lastHopLqi=${lastHopLqi}], [lastHopRssi=${lastHopRssi}], [relayCount=${relayCount}], [relayList=${relayList}]`,
NS,
);
// XXX: could at least trigger a `Events.lastSeenChanged` but this is not currently being listened to at the adapter level
}

@deviantintegral deviantintegral changed the title Add an event exposing source routing information (feat) add an event exposing source routing information Dec 14, 2024
@deviantintegral deviantintegral changed the title (feat) add an event exposing source routing information (feat): add an event exposing source routing information Dec 14, 2024
@Nerivec
Copy link
Collaborator

Nerivec commented Dec 14, 2024

I see a few problems with this;

  • Probably only ember & zstack can support this (has implications on consumers -like frontend, HA...-, since other adapters won't have this information)
  • I can think of several ways where this information could become out of sync and irrelevant (changes while offline, route propagated but not reported, etc.). Same problems with caching the network map, which was also requested a while back.
  • Since it adds an event that will publish, it could stress Node/MQTT further, I've seen cases with poorly designed devices/networks where this is triggered a lot.

@deviantintegral
Copy link
Contributor Author

deviantintegral commented Dec 15, 2024

Probably only ember & zstack can support this (has implications on consumers -like frontend, HA...-, since other adapters won't have this information)

If I remember right, the conbee II / raspbee II support source routing, but I'm also not sure if they pass an event up for it. But if they or other adapters don't, could we omit the data entirely from the frontend?

I can think of several ways where this information could become out of sync and irrelevant (changes while offline, route propagated but not reported, etc.).

I see what you're saying, but this also seems like it's a common issue in the space. I've had devices where "last seen" gets updated but the device is not responsive and even attributes aren't updating correctly. Or, something as simple as a contact sensor opening while zigbee2mqtt is offline. I think with "last seen" that it's clear this could be stale, since that gets updated when source route packets are sent. Are there other ways we could make that aspect clearer?

Same problems with caching the network map, which was also requested a while back.

I did notice that ZHA does this (or something like it). Putting on my "as a user" hat, it's really nice to get some information instead of none, and have the choice to trigger a rebuild (which also doesn't wipe the existing map away while it runs).

Since it adds an event that will publish, it could stress Node/MQTT further, I've seen cases with poorly designed devices/networks where this is triggered a lot.

Interesting, where they are specifically spamming source route updates? Or updates in general?

I've run this set of PRs on my home network (~115 devices) for around 5 hours now. It's averaged ~18 updates a minute, which is in no way a lot but I'd guess most of those "updates" are the same routes. If we debounced this to only trigger if the routes change, would that be enough to address this?

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 15, 2024

omit the data entirely from the frontend?

That's what I meant, you end up with blanks in frontend at best, or would have to implement adapter-specific features (which none of the stack, z2m & consumers, support).

really nice to get some information instead of none

Problem is, you have to assume this is outdated the instant past the reception of the network map (since technically, it can be, in reality, it will probably take at least a couple minutes, even for bad networks -hopefully-). And then it becomes about how fast you should refresh this, which turns into how much hammering of the network do you really want, to get this information properly?
This kind of information is better as a one-off, in my opinion, that's better than misinformation.
I've tried implementing something similar with parent before, but the same problems surface.

source route updates? Or updates in general?

Route updates because of poor network quality, bad routers, etc. (I've seen cases where you get a couple dozen within a couple seconds...).

It's averaged ~18 updates a minute

That is too much already in my opinion for not-even-tertiary data. It will still crowd the event emitter even if deduped+debounced for publish to some degree.
I pulled some quick references from my set of archived logs (from various users feedback):

  • Good network, ~100 devices:
    • ~13 per minute (~20000 per day)
  • Bad network, ~150 devices:
    • ~152 per minute (>200000 per day)

Overall, I don't think the information provided is worth the potential overhead, it's not something we really need in real time, and the cost could be quite significant for some users. And since you can't easily disable it from Z2M settings (tied to the low levels of ZH, at least for the emitter), this would become a constant cost. I think for "already bad" networks, low hardware setups, this might just finish them off...


I was thinking about looking into a way to filter out events based on settings, so as to not emit anything if not desired (some users, myself included, always like more statistics, and the lower levels of ZH could offer a lot, but the overall cost has to be considered 😅). That is quite a lot of work though.

In the meantime, I'm thinking most of this could be retrieved by trigger instead?
Something like a button on the Device page, "Retrieve network info", that would trigger a request that executes lqi() for just that device. Feedback should be pretty quick, and ephemeral (as it should be), with zero constant-impact on performance. As for the code, I think it should be pretty much just adding the MQTT endpoint and add frontend support. This would also be supported by all adapters (already implemented).
About caching this (and the network map), I'd say records would have to be timestamped, stored outside the database (that will make a mess, and grow its size too much, best as separate .json), settings-based, with a rotated array, so you can keep "X snapshots" for comparison (that'd be the most useful I think). A bit more implementation required though, especially in frontend.

@Koenkk what do you think?

@Koenkk
Copy link
Owner

Koenkk commented Dec 15, 2024

I think it would be best to cache this information in zh, and then expose it to z2m on a networkmap scan (as this is typically used to debug problems like Koenkk/zigbee2mqtt#21859)

@deviantintegral
Copy link
Contributor Author

deviantintegral commented Dec 16, 2024

A cache and retrieval operation sounds good to me. I’ll hopefully work on that later this week, and think about how we can present the data in the front end.

I had the “fortune” of having another issue with one of my Enbrighten switches this weekend. It was responding to the coordinator fine, but button presses weren’t reporting correctly. The map shows the switch as not connected directly to the coordinator. The coordinator is at a max of 32 directly connected devices. However, over the past two days there hasn’t been a single source route reported. In other words, it seems like the switch is acting like it’s directly connected, since other switches without direct connections are reporting source routes.

Is this situation something that would make sense to show as an error or an unhealthy network?

Edit: I just power cycled the switch and it immediately sent a source route going through two other routers. And to clarify, it’s not that the switch was sending an empty source routes. It wasn’t sending source route packets at all. I think we’d need to keep track of the timestamp of the last source route to see if they are stale. So, I’m imagining z2m being able to highlight devices with these sort of inconsistencies since it required quite a bit of technical background to figure this out.

@Nerivec
Copy link
Collaborator

Nerivec commented Dec 16, 2024

Far too many possible reasons/stack implementations to be able to do this without introducing massive confusion. (And old timestamp wouldn't necessarily mean bad route.)
Also will trigger like crazy for any network where the user often takes down routers (not supposed to, but...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants