While troubleshooting a connectivity issue with a local DMR master, I discovered a mismatch between MMDVMHost and HBLink that is causing connection issues.
- MMDVMHost has a hard coded 10 second timer for connection status changes/RPTPING
- HBLink has a configurable timer, with a default of 5 seconds and 3 misses max.
This creates a scenario where single failed packet causes the connection to be dropped assuming the default configuration on HBLink. For example:
- 00:00 RPTPING sent from MMDVMHost and HBLink server responds MSTPONG. Maintenance timer starts.
- 00:10 RPTPING sent but is lost along the wire. The client does not know this immediately because it is UDP not TCP.
- 00:15 Master server sees no ping in 15 seconds from host and drops the client.
- 00:20 RPTPING sent, master responds MSTNAK because it thinks the connection died.
I have verified that changing the hard coded value to 5 seconds addresses the concern, and stops the master from dropping my session.
UDP based protocols have to manage fault tolerance on their own. To ensure reliability the control channel should ideally move to TCP while data remains on UDP, but that's not likely to happen. The interval should also be negotiated between the repeater and master, but again that is a protocol level change so not likely to happen. What I would suggest as a seemingly viable alternative is a 2 part change.
- The ping interval should be config driven. I would suggest adding an item to the DMR Network section for PingInterval
- If no MSTPONG is received, we should retry. My suggestion would be to retry with a backoff policy - e.g. PingInterval * 0.1, 0.3, 0.75, 1.0 until we hit timeout.
I am willing to code at least the first, possibly both, but I don't want to do so without knowing whether it would be accepted, nor do I have any way to test more scenarios than my own.
While troubleshooting a connectivity issue with a local DMR master, I discovered a mismatch between MMDVMHost and HBLink that is causing connection issues.
This creates a scenario where single failed packet causes the connection to be dropped assuming the default configuration on HBLink. For example:
I have verified that changing the hard coded value to 5 seconds addresses the concern, and stops the master from dropping my session.
UDP based protocols have to manage fault tolerance on their own. To ensure reliability the control channel should ideally move to TCP while data remains on UDP, but that's not likely to happen. The interval should also be negotiated between the repeater and master, but again that is a protocol level change so not likely to happen. What I would suggest as a seemingly viable alternative is a 2 part change.
I am willing to code at least the first, possibly both, but I don't want to do so without knowing whether it would be accepted, nor do I have any way to test more scenarios than my own.