-
-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent connection problems with use of poll #1248
Comments
What are you using |
Yeah. The thread that uses the connection.poll is for database notifications with the LISTEN command. |
If you are receiving SSL connection errors I think it means you are using I think your application is polling indiscriminately other connections instead of just the one used for notifications? I don't think there is any change related to the connection procedure between the versions you mention. However the libssl used has probably been upgraded: you should check the NEWS file to know the version change. Can you do just one of the two things and work out better if it's a connection error or if it's about your use of |
We have 2 versions of our program, both with the poll and threads. The only difference between the 2 is the version of psycopg2. We have been narrowing down to find the specific issue. So we cut a lot of code and eventually narrowed it down to the use of poll and the version of psycopg2. So if we use the same script in python with psycopg2: 2.7.3.2, it works. When we use it in Python with 2.8.2, it breaks. If it helps I could post our test script. |
Yes please: being able to reproduce the problem is useful |
import select
from threading import Thread
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
class PollerThread(Thread):
def __init__(self, connection):
Thread.__init__(self)
self.setDaemon(1)
self.connection = connection
def run(self):
while True:
result = select.select([self.connection], [], [], 3.0)
if result != ([], [], []):
self.connection.poll()
if __name__ == "__main__":
dbUser = ""
dbPassword = ""
dbServer = ""
dbName = ""
fullDatabaseUrl = str("postgresql+psycopg2://%s:%s@%s/%s" % (dbUser, dbPassword, dbServer, dbName))
Session = sessionmaker(autocommit=True)
engine = create_engine(fullDatabaseUrl)
Session.configure(bind=engine)
session = Session()
connection = session.connection().connection.connection # <- psycopg2 connection object
dbEventListener = PollerThread(connection)
dbEventListener.start()
while True:
result = session.execute("SELECT * FROM table1")
data = result.fetchall()
time.sleep(0.1) |
Also, thank you for your lighting-quick responses. |
Does the problem also manifest without using sqlalchemy? Could you provide an use case which doesn't use it? |
Yes. Just tried it with psycopg2 only. The 2.7 works, but the 2.8 version crashes. import select
from threading import Thread
import psycopg2
import time
class PollerThread(Thread):
def __init__(self, connection):
Thread.__init__(self)
self.setDaemon(1)
self.connection = connection
def run(self):
while True:
result = select.select([self.connection], [], [], 3.0)
if result != ([], [], []):
self.connection.poll()
if __name__ == "__main__":
dbUser = ""
dbPassword = ""
dbServer = ""
dbName = ""
connection = psycopg2.connect(dbname=dbName, user=dbUser, password=dbPassword, host=dbServer)
dbEventListener = PollerThread(connection)
dbEventListener.start()
while True:
cursor = connection.cursor()
result = cursor.execute("SELECT * FROM table1")
data = cursor.fetchall()
time.sleep(0.1) |
You cannot use the same connection both for normal querying and waiting for notifications. |
Okay. But.. It is working in a previous version.. So that is a bug? |
I have to verify that: querying and polling should be protected indeed from each other. Testing your script with latest 2.8 gets stuck somewhere. I only have Python 2.8 handy here so cannot install < 2.8.4. Will try with Python 3.6 as soon as I can. However, in your program, do use a separate connection for polling. |
I can confirm the bug. Running the script (slightly modified, using |
Likely culprit is the refactoring happened in |
Hi, I hit the same bug. After some digging, I found out that pq_get_result_async() is called without proper locking, although such is required (The function should be called with the lock and holding the GIL.). The following patch fixed the issue for me:
|
Hi @stratsimir thank you very much: that indeed seems to fix the problem. Will include your patch in the upcoming release. |
Oops, too bad it segfaults in test suite... https://github.com/psycopg/psycopg2/runs/2732488602?check_suite_focus=true |
Looking better at the patch above. |
hi @dvarrazzo This is using psycopg2 2.9.3
Interestingly we captured the data on the wire via tcpdump, and there is a response available available for the request. I believe its stuck on waiting for read input. This is what led me to this issue, possibly some contention on the read status. I did try the fix above but unforunately it crashes. Going to dig in more, but open to suggestions on how to proceed or if there is another potential fix in progress. thanks. |
@mfmarche "pool" or "poll"? Do you also use the same connection to simultaneously query and get notifications? No, there is no fix in progress. I am not sure the problem is the same here, maybe you can try downgrading to verify it. |
I did mean pool, but that was a sqlalchemy term which is really one connection. I verified above using the example test that it fails in the same way. Your comment that a simulataneously query + poll is not allowed, which to me makes sense, but via green threads, I would assume that needs to be protected by the calls themselves, and it shouldn't be up to the callers to provide this protection. I'm starting to look at the _conn_poll_advance_read, trying to see about fitting in some mutex there, but not yet sure how/where. Somehow a connection in progress must be known, and the poll should somehow bail out i would assume and not touch the cursor/connection and cause the breakage. |
I modifed the gevent_wait_callback slightly, so that it will always retry on a timeout. I introduced a small timeout (10ms).The code would look like:
With this change, no more stalls/lockups are seen, suggesting that the data was ready and received before it decided to call wait_read. The system is comprised of these versions: cockroachdb v20.2.18 Although not ideal, this workaround appears to do the trick. Suggestions on what the proper fix is? |
Python: Windows 32-bit 2.7.15 & 3.7.9
Psycopg2: 2.7.3.2 upgraded to 2.8.2
Description:
We have a program with multiple threads. We use SQLAlchemy as a way to setup the connections and build the queries.
We have 1 connection pool for all the threads. This used to work fine, when we were using psycopg2 2.7.3.2.
Now we are upgrading all our dependencies which includes psycopg2 and are noticing some weird behavior.
We have narrowed it down to 2 things.
If we use connection.poll it breaks with weird errors like SSL verification error, SSL wrong version nummer error, file descriptor cannot be a negative integer error or even deadlocks.
We don't seem to have these problems in threads where we only run plain-queries.
It also doesn't only break the connection in the threads that is executing the poll, but other threads aswell.
The text was updated successfully, but these errors were encountered: