Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: replication hangs indefinitely after brief network cutoffs #89

Open
roeeklinger opened this issue Feb 6, 2025 · 2 comments
Open
Labels
bug Something isn't working

Comments

@roeeklinger
Copy link

I have been working on a custom sink, and I have noticed that when the network goes down for more than a few seconds, pg_replicate will hang indefinitely, instead of resuming where it left off once the connection is reestablished. is this normal / expected behaviour? how should such cases be handled?

@roeeklinger roeeklinger added the bug Something isn't working label Feb 6, 2025
@imor
Copy link
Contributor

imor commented Feb 7, 2025

Can you share a minimally reproducible example? Without code it's hard to say what's going on although if I had to guess it could be due to the fact that pg_replicate doesn't handle disconnections (see this issue). The current workaround for disconnections is to restart the process.

Also curious which sink are you working on?

@roeeklinger
Copy link
Author

Sure!

  1. just use the default stdout example that is outlined in the README: cargo run -p pg_replicate --example stdout --features="stdout" -- --db-host localhost --db-port 5432 --db-name postgres --db-username postgres --db-password password cdc my_publication stdout_slot
  2. Wait for the initial connection to be established successfully, then disable the networking of the machine running this code
  3. Optionally make some changes to the table you are replicating, to make it easier to be sure these changes don't print to stdout.
  4. Wait for a while, and reenable networking.

At least what I observed, is that short disconnections are handled just fine (5s -10s range), while disconnections longer then that just make the whole pipeline hang indefinitely.

I will implement the workaround you suggested as a quick fix, hopefully some time in the future I will have free time to contribute back and implement a suggested fix.

The sink I am working on is an in-memory Rust sink, my Rust application needs to have an entire up-to-date table in memory, in MySQL I did this by reading the binlog, so when migrating to Supabase it felt natural to choose this library, as it's the PG equivalent and Rust based, which I already use. I also tried using Supabase Realtime for this use-case but had to abandon that idea since Realtime doesn't have any reliability guarantees (e.g if even a single request doesn't make it back, the event will just be ignored and skipped, resulting in data mismatch). Replication and pg_replicate seems much more robust and a better fit in that case, it will not let me skip events / process new ones until the previous is processed and acknowledged.

Thank you for your time and response, it helped me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants