bug: replication hangs indefinitely after brief network cutoffs #89

roeeklinger · 2025-02-06T12:23:57Z

I have been working on a custom sink, and I have noticed that when the network goes down for more than a few seconds, pg_replicate will hang indefinitely, instead of resuming where it left off once the connection is reestablished. is this normal / expected behaviour? how should such cases be handled?

imor · 2025-02-07T06:30:54Z

Can you share a minimally reproducible example? Without code it's hard to say what's going on although if I had to guess it could be due to the fact that pg_replicate doesn't handle disconnections (see this issue). The current workaround for disconnections is to restart the process.

Also curious which sink are you working on?

roeeklinger · 2025-02-08T14:46:20Z

Sure!

just use the default stdout example that is outlined in the README: cargo run -p pg_replicate --example stdout --features="stdout" -- --db-host localhost --db-port 5432 --db-name postgres --db-username postgres --db-password password cdc my_publication stdout_slot
Wait for the initial connection to be established successfully, then disable the networking of the machine running this code
Optionally make some changes to the table you are replicating, to make it easier to be sure these changes don't print to stdout.
Wait for a while, and reenable networking.

At least what I observed, is that short disconnections are handled just fine (5s -10s range), while disconnections longer then that just make the whole pipeline hang indefinitely.

I will implement the workaround you suggested as a quick fix, hopefully some time in the future I will have free time to contribute back and implement a suggested fix.

The sink I am working on is an in-memory Rust sink, my Rust application needs to have an entire up-to-date table in memory, in MySQL I did this by reading the binlog, so when migrating to Supabase it felt natural to choose this library, as it's the PG equivalent and Rust based, which I already use. I also tried using Supabase Realtime for this use-case but had to abandon that idea since Realtime doesn't have any reliability guarantees (e.g if even a single request doesn't make it back, the event will just be ignored and skipped, resulting in data mismatch). Replication and pg_replicate seems much more robust and a better fit in that case, it will not let me skip events / process new ones until the previous is processed and acknowledged.

Thank you for your time and response, it helped me a lot.

roeeklinger added the bug Something isn't working label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: replication hangs indefinitely after brief network cutoffs #89

bug: replication hangs indefinitely after brief network cutoffs #89

roeeklinger commented Feb 6, 2025

imor commented Feb 7, 2025

roeeklinger commented Feb 8, 2025

bug: replication hangs indefinitely after brief network cutoffs #89

bug: replication hangs indefinitely after brief network cutoffs #89

Comments

roeeklinger commented Feb 6, 2025

imor commented Feb 7, 2025

roeeklinger commented Feb 8, 2025