Skip to content

Packet parsing reliability improvements#145

Open
Kinachi249 wants to merge 2 commits intonikshriv:mainfrom
Kinachi249:improve-status-stability
Open

Packet parsing reliability improvements#145
Kinachi249 wants to merge 2 commits intonikshriv:mainfrom
Kinachi249:improve-status-stability

Conversation

@Kinachi249
Copy link
Copy Markdown

Hey there, I was planning on tackling thermostat integration for this component as I'd love to have my GE thermostat in HA.

I found some bugs in the component related to just the presence of a thermostat on my mesh. I thought I'd make a PR to address some of these before I start on any actual thermostat changes.

  • Increased TCP read buffer size to 1500 bytes. Thermostat and sensor data is sent as JSON in the packet, and can exceed 1000 bytes. This would cause other incorrectly offset packets later on.
  • Moved the offset in the initial state packet right by one. It was always reading zero, instead of the proper mesh ID. This caused device states not to be accurately reflected on restart of HA
  • Add check to ensure switch ID exists in the HA setup, as the thermostat ID was being received in packets and causing a KeyError when looking it up.
  • In datapoint packets, the size of each chunk of the packet is defined by the 12 least significant bits of bytes 2-3 of the chunk, updated the loop to utilize this size instead of the hard coded 19 bytes
  • Added packet type 0xA8 to acceptable types for "probe" responses
  • Fix room update

Thanks, and feel free to let me know if there are questions.

@johnrosshunt
Copy link
Copy Markdown
Contributor

Hi, I gave this patch a try and received this in my HA logs:

Screenshot 2025-07-08 at 12 39 03 PM

Device states were not restored on startup. However, toggling switches manually brought them back in sync. Otherwise, everything seems to be working ok. I can provide more information if needed. Just let me know.

@Kinachi249
Copy link
Copy Markdown
Author

Thanks for the feedback! I did some additional research, and found that for the pipe packets (type 0x7), the inner frames (which have their boundary denoted by byte 0x7e) will have any actual usages of the byte 0x7e encoded to the bytes 0x7d5e instead, so the parser doesn't confuse the data 7e with the frame boundary 7e.

It just so happened that my Cync packets had a length of exactly 126 bytes (0x7e), so the length byte had the serialization applied to it, which caused my offset to be one-off.

I added some additional code to deserialize instances where this can happen, so that all packet lengths should be brought in line. I also adjusted the type 0x4 packet processing to stop processing if the remaining length is less than 4 bytes, as that is what the app seems to do in the decompiled code.

@johnrosshunt
Copy link
Copy Markdown
Contributor

johnrosshunt commented Jul 11, 2025

Git complained about illegal whitespace when applying the patch. Otherwise it merged correctly.

git_error

The fix works great. HA starts up fine, the devices come online, and their previous states are restored correctly. However, I did get a "list index out of range" error whenever I used the Cync app on my Android phone. I can confirm that every time I use the Cync app the error count increments. This only seems to happen when I use the app. Otherwise, the error doesn't occur at all.

EDIT: I think I've narrowed down the problem. The errors increment whenever I use the Cync app and switch to a new "Home" (i.e. Mesh). If I enter the app and exit, errors don't occur.

I have about 30 Cync devices (21 are wired) in an old building with very thick walls. In order to prevent devices from dropping off the mesh, I created 3 separate zones ("Homes") to isolate devices from those it had trouble reaching. This finally fixed all my connectivity problems. I can turn off the WiFi router and verify in the Cync app that all my devices can communicate via Bluetooth Mesh without dropping off 👍🏻. That's the reason I have multiple "Homes" in my home -- and apologies if I strayed a bit off-topic.

@johnrosshunt
Copy link
Copy Markdown
Contributor

Error log entry:

ha_log

@Kinachi249
Copy link
Copy Markdown
Author

I appreciate the info! I only have one configured "home" in my app, so I'll move some of my lights around today to a second home and do some investigating.

I do know that if you open the Cync app, the Cync server disconnects the Home Assistant connection in favor of the app's connection (which the integration re-establishes after the 15 second wait). Hopefully that's not what is at play here though, I'll look into it some more.

@Kinachi249
Copy link
Copy Markdown
Author

So I've spent a few hours trying to reproduce the list index out of range error, however I'm having a bit of a hard time.

I created a second home in the Cync app with a couple of my bulbs and got them showing in HA. However, whenever I open the Cync app and change homes, I get the standard "Connection Reset" error in HA, meaning the Cync server remotely closed the HA connection. This can't really be avoided since it's the server that's severing the connection, all the integration can do is re-establish the connection again in 15 seconds.

I am curious about the index out of range error though. Does it appear the instant you switch homes in the Cync app, or are there other actions performed in the app before it manifests? I use the iPhone app, so I also wonder if there's some sort of difference in how it handles other connections on the same account compared to Android.

@johnrosshunt
Copy link
Copy Markdown
Contributor

johnrosshunt commented Jul 12, 2025

I'm using Android. And yes, the counter increments whenever I switch homes.

Before using the Cync app:
Screenshot 2025-07-12 at 1 01 25 PM

After using the Cync app:
Screenshot 2025-07-12 at 1 02 40 PM

Here's what I know: The device closest to the router with the strongest signal is typically designated the hub, which acts as a gateway between the Bluetooth Mesh & WiFi network.

device 1 device 2 device 3 device 4

So far, so good. But what I've discovered is that some of the newer devices (in my case Reveal Color bulbs) take precedence over older devices when a hub is designated. So, I get this instead:

after reveal

No matter where the Reveal bulbs are located, they always take precedence over other devices, which is not optimal if the lights are far away from the router or other devices in the mesh are out of reach. Also, if you have multiple Reveal lights, the hub seems to rotate between these devices on a round robin basis. This is why I decided to isolate these lights. The Mesh network as a whole was unstable and this caused problems for HA as well as Google Home.

So, what I think is happening is that switching homes in the Cync app triggers something that causes the hub to rotate to the next device. I think hub rotation happens independent of the Cync app as well. I should reiterate that this sort of thing doesn't happen when a mesh is made up of older devices. My main mesh is comprised of Soft White Direct Connect bulbs, wired switches, and a Smart Plug located next to the router. The Smart Plug is always the hub. Always. It never changes unless a newer device like a Reveal bulb is added to the mesh.

Sorry if all of this was overkill. I didn't know how to explain things succinctly 😎.

EDIT: My theory is that these resets occur whenever the hub rotates to a new device (I should have led with this 🤦🏻‍♂️). With my configuration, this happens quite frequently.

@Kinachi249
Copy link
Copy Markdown
Author

Interesting, good to know!

Out of curiosity, have you either added a new device, removed a device, or moved a device between homes in the Cync app since setting up the integration?

I was testing some more this afternoon, and I moved one of my bulbs back to my original home in the Cync app. Then when I rebooted HA, I started seeing the index out of range errors that you had. For me, it was happening because the integration doesn't automatically update devices and their locations/homes when they're modified in the Cync app, so it was still using the old mesh IDs it had saved before the Cync change was made. This caused the IDs to be wrong and it was trying to access an ID that no longer existed.

When I went to the "integration entries" page for the integration, and clicked "Configure", it had me sign back in and it re-fetched the up to date device info. After that the error stopped appearing, so I wonder if it may help your case as well. Ideally it would update automatically, but I don't believe that would be caused by my change if that's the case.

As for the leader, this is something I've noticed too. The integration will always just pick the first device within a home to "ping" as the leader, and if that device ends up being a bluetooth only device, it won't get a response. Would be another good feature for the integration, seeing which wi-fi device is the current designated "leader" of each mesh and only pinging that one.

@johnrosshunt
Copy link
Copy Markdown
Contributor

No, I haven't made any changes lately. I always reload the integration after making changes.

Settings->Devices & Services->Cync Lights->Gear Icon->Reload Cync Configuration

Perhaps a notification to reload the integration could be triggered if an invalid configuration is detected? Most users probably don't know that it's necessary after changing things.

@Kinachi249
Copy link
Copy Markdown
Author

Gotcha, so I know on my iOS version at least, I need to tap this "Configure" option to fully re-fetch the up to date devices and locations from the Cync server. When debugging on my computer, if I just tap the 3 dots and then "Reload", it just reloads the HA configuration from the HA instance, but it doesn't make the call to the Cync API to re-fetch the device list as it currently exists on the server. Until that API call is made, the integration will continue to use the device list from either initial integration setup, or the last time "Configure" was tapped.
screenshot_cync

I agree though, I think remote device changes could be handled better. Also, just to clarify, were you also experiencing this prior to putting my code change in as a patch? Just wondering if this was related to my changes, or if this is more of an overall suggestion for the integration in its current state.

@johnrosshunt
Copy link
Copy Markdown
Contributor

Right. Reload restarts the integration and Configure does a full reauthorization and download. As of 2025.7.1 the Configure option is now a gear.

ios app

Since commit d53f550 I haven't noticed any "list index out of range" errors except when I switch Homes in the Cync app. Since this bug is unchanged with your patch applied I think it's fair to conclude the problem lies elsewhere. So, it's understandable if you wish to move on and complete your thermostat support. I'll continue to research the problem because it's a bit of a show stopper. Every time the error occurs device states get out of sync and switches need to be manually toggled to get back in sync. Either that, or the integration needs to be restarted.

@Kinachi249
Copy link
Copy Markdown
Author

Gotcha! I suppose I should rebase my HA branch from develop 😅

And no worries at all, I mainly ask so I know if I should look in the area of my code change or elsewhere in the code. I'm actually working on a reusable python library at the moment for Cync API controls, as I've been going through the decompiled Android app code and trying to make some code that's a bit less opinionated toward just lights and uses the more generic structures that the app uses. This'll be a case that I'll try and add to my code, though, so that multi-home setups hopefully won't run into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants