Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocked/hung pool when connected to Postgres through Supavisor #970

Open
nikhilro opened this issue Oct 27, 2024 · 2 comments
Open

Deadlocked/hung pool when connected to Postgres through Supavisor #970

nikhilro opened this issue Oct 27, 2024 · 2 comments

Comments

@nikhilro
Copy link

nikhilro commented Oct 27, 2024

Hey @porsager, thanks for the package.

Creating this issue mostly in hopes that someone else has ran into this issue.

We use Supabase for our Postgres DB and connect through their pooling service Supavisor in transaction mode; Supavisor is an alternative to pgBouncer (link). On postgres.js side, we set prepare: false.

What we're seeing is that the postgres.js pooler will start swallowing queries at some point. Imagine a function like:

async function userGet(id: string) {
  console.log("trying to get the user");
  const users = await db<User[]>`SELECT * FROM user WHERE id=${id}`;
  console.log("got the user!");
  return users;
}

After a while, this will print "trying to get the user" but not "got the user!". Does this ring any bells?

P.S. been trying to find an isolated script to reproduce but unsuccessful so far. the only reproduction is live production traffic.

@nikhilro
Copy link
Author

nikhilro commented Oct 27, 2024

I'm more confident that is an issue with Supavisor under load. A thing that would be helpful is postgres.js statement limit.

I know we have:

   connection: {
     statement_timeout: 1000 * 60 * 0.5, // 30 seconds, pg expects milliseconds
   },

But, that is not useful when connecting to transaction mode poolers.

Is there an easy way to "abort" the query if it's taking too long?

@lllleonnnn
Copy link

Please forgive not answering your exact question but we have run into very similar issues and it boiled down to our queries being blocking & locking. We've used this query (modify as needed) to run down issues in our Supabase PG instance:

SELECT blocked_locks.pid AS blocked_pid,
 blocked_activity.usename AS blocked_user,
 blocking_locks.pid AS blocking_pid,
         blocking_activity.usename AS blocking_user, 
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks 
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype 
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid 
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

Smashing this query over and over when the problem is happening may give some insight as well:

SELECT
	query,
	avg(now() - query_start) AS average_duration,
	usename,
	pid
FROM
	pg_stat_activity
WHERE
	state = 'active'
	AND query <> '<IDLE>'
	AND query NOT LIKE '%pg_stat_activity%'
GROUP BY
	query,
	usename,
	pid
ORDER BY
	average_duration DESC
LIMIT 10000;

If your logs aren't helpful, ye olde ALTER SYSTEM SET log_lock_waits TO on; may yield more helpful info in the Postgres and/or Pooler logging views, as well as altering log_statement_sample_rate and log_min_duration_sample.

re: your actual question - I did implement something hideous with await Promise.race(... which did not help. We did not have any specific issues with Supavisor but switched to running our own pgbouncers.

also likewise shout out to @porsager for making an amazing library that lets me avoid the hell of ORMs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@lllleonnnn @nikhilro and others