Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception ignored message when creating failed + valid session #691

Open
qubixes opened this issue Mar 13, 2025 · 29 comments
Open

Exception ignored message when creating failed + valid session #691

qubixes opened this issue Mar 13, 2025 · 29 comments
Assignees
Milestone

Comments

@qubixes
Copy link
Collaborator

qubixes commented Mar 13, 2025

Hi!

During work on one of my PRs we ran into some exception ignored messages such as:

Exception ignored in: <function Connection.__del__ at 0x7af078ff67a0>
Traceback (most recent call last):
  File "/home/qubix/.pyenv/versions/3.11.6/lib/python3.11/site-packages/irods/connection.py", line 132, in __del__
    self.disconnect()
  File "/home/qubix/.pyenv/versions/3.11.6/lib/python3.11/site-packages/irods/connection.py", line 361, in disconnect
    self.send(disconnect_msg)
  File "/home/qubix/.pyenv/versions/3.11.6/lib/python3.11/site-packages/irods/connection.py", line 148, in send
    raise NetworkException("Unable to send message")
irods.exception.NetworkException: Unable to send message

Even though the functionality in iBridges still works fine, this is of course somewhat confusing to the user/us. We first thought it was caused by the new PR, but it seems to originate from the PRC, since installing an old version of PRC removes the message. For reproducing I have found this works relatively consistently:

import sys
from irods.session import iRODSSession

if __name__  == "__main__":
    env_file = sys.argv[1]
    try:
        session = iRODSSession(password="bad_pw", irods_env_file=env_file)
        print(session.server_version)
    except Exception:
        pass
    session = iRODSSession(password="good_pw", irods_env_file=env_file)
    print(session.server_version)

If you can't reproduce it the first time, it might reproduce the second time in my experience.

Looking through the PRC release notes, if found that f2332b3 might be related to this, since I think the issue appears in the cleanup phase.

Let me know if you need more information (I get 4.2.12 for the server version)).

@d-w-moore
Copy link
Collaborator

@qubixes What's the scheme used by your client env file ? With native authentication, I wasn't able to reproduce the issue.

@d-w-moore
Copy link
Collaborator

Also (again to @qubixes ) - assuming I'm able to reproduce at some point - which older version of PRC corrects the problem?

@d-w-moore
Copy link
Collaborator

Will try with SSL enabled. Sometimes that makes a difference.

@qubixes Since the error is incurred while trying to send a disconnect message to the server, on a socket the iRODS server has already given up on, I am pretty sure functionality won't be threatened by the issue underlying this error message.

Perhaps it would be an option to turn this into an INFO level of logged message, instead of an actual exception. Will discuss with the rest of the iRODS Team.

@qubixes
Copy link
Collaborator Author

qubixes commented Mar 14, 2025

@d-w-moore I'll just paste my irods_environment.json for completeness:

{
    "irods_host": "geo.data.uu.nl",
    "irods_port": 1247,
    "irods_home": "/nluu11p/home",
    "irods_user_name": "<snip>",
    "irods_default_resource": "irodsResc",
    "irods_zone_name": "nluu11p",
    "irods_authentication_scheme": "pam",
    "irods_encryption_algorithm": "AES-256-CBC",
    "irods_encryption_key_size": 32,
    "irods_encryption_num_hash_rounds": 16,
    "irods_encryption_salt_size": 8,
    "irods_client_server_policy": "CS_NEG_REQUIRE",
    "irods_client_server_negotiation": "request_server_negotiation"
}

@qubixes
Copy link
Collaborator Author

qubixes commented Mar 14, 2025

Will try with SSL enabled. Sometimes that makes a difference.

@qubixes Since the error is incurred while trying to send a disconnect message to the server, on a socket the iRODS server has already given up on, I am pretty sure functionality won't be threatened by the issue underlying this error message.

Perhaps it would be an option to turn this into an INFO level of logged message, instead of an actual exception. Will discuss with the rest of the iRODS Team.

Yes, functionally there are no problems, it's just the messaging to the user.

@qubixes
Copy link
Collaborator Author

qubixes commented Mar 14, 2025

Also (again to @qubixes ) - assuming I'm able to reproduce at some point - which older version of PRC corrects the problem?

I did a little bit of a manual bisect, and I get the problem with PRC==2.2.0, but not with PRC==2.1.0. Let me know if you need me to bisect on commits.

@trel
Copy link
Member

trel commented Mar 14, 2025

not too many to eyeball there...

v2.1.0...v2.2.0

@d-w-moore
Copy link
Collaborator

not too many to eyeball there...

v2.1.0...v2.2.0

No, but I see 2 or 3 that are potentially influencing changes.

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 17, 2025

Since unhandled exceptions aren't good in finalizers, this loose end should be tied up neatly and logged. But the logging shouldn't be seen at the default level of WARNING, so let's do an INFO level message that simply displays the object representation of the Exception.

@d-w-moore
Copy link
Collaborator

It's still elusive to try and reproduce this error, so although I have what I think is a fix, I can't really test it.

@korydraughn
Copy link
Contributor

For now, keep your solution to this in its own branch until we can reproduce it. That keeps other PRs from being blocked by this.

@qubixes
Copy link
Collaborator Author

qubixes commented Mar 18, 2025

It's still elusive to try and reproduce this error, so although I have what I think is a fix, I can't really test it.

I could test it if you want!

@d-w-moore
Copy link
Collaborator

It's still elusive to try and reproduce this error, so although I have what I think is a fix, I can't really test it.

I could test it if you want!

Yes, it's certainly a help to find conditions that reproduce it; could help us to get a fix in sooner.

@alanking
Copy link
Contributor

@d-w-moore - Please share the branch and/or diff with the experimental fix and perhaps @qubixes can apply it and try it out?

@alanking alanking modified the milestones: 3.1.0, 3.2.0 Mar 19, 2025
@d-w-moore
Copy link
Collaborator

@d-w-moore - Please share the branch and/or diff with the experimental fix and perhaps @qubixes can apply it and try it out?

Good idea, will do.

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 19, 2025

@qubixes Please see dev branch https://github.com/d-w-moore/python-irodsclient/tree/691.m , the very latest commit 3239d54, as a basis for how to test for whether my existing patch helps in this issue

@qubixes
Copy link
Collaborator Author

qubixes commented Mar 20, 2025

@qubixes Please see dev branch https://github.com/d-w-moore/python-irodsclient/tree/691.m , the very latest commit 3239d54, as a basis for how to test for whether my existing patch helps in this issue

Yes, it does help the issue, I don't get a message about the connection anymore with the default logging level.

I can still reproduce the error message for the latest version on PyPI (3.0.0).

@korydraughn
Copy link
Contributor

Good.

@d-w-moore Does @qubixes' recent success with the change help us understand the problem?

@alanking alanking added the bug label Mar 20, 2025
@alanking alanking modified the milestones: 3.2.0, 3.1.1 Mar 20, 2025
@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 21, 2025

Good.

@d-w-moore Does @qubixes' recent success with the change help us understand the problem?

Not on any grand level. But it does suppress a message that was potentially not useful for the application user (ie noise control) -- provided the default WARNING log level isn't decreased for the log message in question. It also means we've put a bandage on what was a potential troublemaker, since exceptions should never allowed to be unhandled within the scope of the __del__ method of an object.

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 25, 2025

@qubixes - I tried again with iRODS 4.2.12, PRC v2.2.0, with PAM & SSL login, using your script above almost exactly, and am still not able to reproduce, which means I can't really write a test.

Can you think of any other special conditions (or modifications relative to a "fresh" install) that apply to your system? Or perhaps a set of steps you go through that is guaranteed to produce the error?

@qubixes
Copy link
Collaborator Author

qubixes commented Mar 25, 2025

@d-w-moore I have asked @stsnel, since he knows a lot more about the system than I do. For now we have the server logs from the server side from when I run the script:


Mar 25 09:17:19 pid:6687 NOTICE: writeLine: inString = {<email>#nluu12p} Agent process started from <ip1>
Mar 25 09:17:19 pid:6637 remote addresses: <ip2>, <ip1> ERROR: [-]     /irods_source/server/core/src/rsApiHandler.cpp:542:int readAndProcClientM
sg(rsComm_t *, int) :  status [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [failed to call 'read header']
        [-]     /irods_source/lib/core/src/sockComm.cpp:198:irods::error readMsgHeader(irods::network_object_ptr, msgHeader_t *, struct timeval *) :  status [SYS
_HEADER_READ_LEN_ERR]  errno [] -- message [failed to call 'read header']
                [-]     /irods_source/plugins/network/ssl/libssl.cpp:574:irods::error ssl_read_msg_header(irods::plugin_context &, void *, struct timeval *) :  s
tatus [SYS_HEADER_READ_LEN_ERR]  errno [] -- message [read 0 expected 4]

Mar 25 09:17:19 pid:6637 remote addresses: <ip2>, <ip1> ERROR: [-]     /irods_source/server/core/src/rodsAgent.cpp:577:int runIrodsAgentFactory(
sockaddr_un) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [failed to call 'agent stop']
        [-]     /irods_source/lib/core/src/sockComm.cpp:160:irods::error sockAgentStop(irods::network_object_ptr) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [failed to call 'agent stop']
                [-]     /irods_source/plugins/network/ssl/libssl.cpp:954:irods::error ssl_agent_stop(irods::plugin_context &) :  status [SSL_SHUTDOWN_ERROR]  errno [] -- message [error completing shutdown of SSL connection]

@stsnel
Copy link
Collaborator

stsnel commented Mar 25, 2025

Preliminary results for UU Yoda 1.9.x environment (based on iRODS 4.2.12) with Python 3.12.3 client:

  • Python-irodsclient 2.0.0: issue does not occur ✅
  • Python-irodsclient 2.1.0: issue does not occur ✅
  • Python-irodsclient 2.2.0: issue occurs reliably 🔴
  • Python-irodsclient 3.0.0: issue does not occur ✅
  • Python-irodsclient 3.1.0: issue occurs reliably 🔴

If I run a bisect between 3.0.0 and 3.1.0, I get:

bee82948ba15227b7cca990b133f8d2b829533cd is the first bad commit

Link to this commit: bee8294

I'll check with @qubixes to see if we can compare notes and gather more information.

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 26, 2025

Checking into this again. It could be sensitive to the slightest things, and I'm beginning to think the Python version makes the difference (It wouldn't be the first time, esp where interpreter exit / GC issues are concerned). I had not been testing with anything as recent as Python 3.11 or 3.12....

@stsnel , I was wondering what you use in your bisect to determine success/failure. Do you have a ready-made script or were you evaluating the condition manually?

@stsnel
Copy link
Collaborator

stsnel commented Mar 26, 2025

@stsnel , I was wondering what you use in your bisect to determine success/failure. Do you have a ready-made script or were you evaluating the condition manually?

I was using the script provided by Qubixes at #691 (comment) . I invoked it three times per PRC version to determine if the failure occurred (sort of) consistently.

@stsnel
Copy link
Collaborator

stsnel commented Mar 26, 2025

Checking into this again. It could be sensitive to the slightest things, and I'm beginning to think the Python version makes the difference (It wouldn't be the first time, esp where interpreter exit / GC issues are concerned). I had not been testing with anything as recent as Python 3.11 or 3.12....

Right, that makes sense, I also see that I get different results with Python 3.9.20 on my system:

  • Python-irodsclient 2.0.0: issue does not occur ✅
  • Python-irodsclient 2.1.0: issue does not occur ✅
  • Python-irodsclient 2.2.0: issue occurs reliably 🔴
  • Python-irodsclient 3.0.0: issue occurs reliably 🔴
  • Python-irodsclient 3.1.0: issue occurs reliably 🔴

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 27, 2025

Unfortunately I'm not getting consistent results with yours , @stsnel and @qubixes . Still unable to reproduce the error dump:

(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ ./pamss691.py  ~/.irods//irods_environment.json 
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Mar 27 2025, 12:30:19) [GCC 7.5.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
(py312) testuser@ac99544526dd:/tmp$ 

Using this script and alice as the PAM login with password 'apass'

(py312) testuser@ac99544526dd:/tmp$ more pamss691.py 
#!/usr/bin/env python3
import sys
from irods.session import iRODSSession

if __name__  == "__main__":
    env_file = sys.argv[1]
    try:
        session = iRODSSession(password="apass1", irods_env_file=env_file)
        print(session.server_version)
    except Exception:
        pass
    session = iRODSSession(password="apass", irods_env_file=env_file)
    print(f'{session.server_version = }')
    print(f'{session.pool.account.authentication_scheme = }')
    print(f'{sys.version = }')
    print(f'{type(list(session.pool.idle|session.pool.active)[0].socket) = }')

and in the same terminal session

(py312) testuser@ac99544526dd:/tmp$ python -c 'import irods;print(irods.__version__)'
2.2.0
(py312) testuser@ac99544526dd:/tmp$ 

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 27, 2025

@stsnel - Any changes to the irods.client_configuration settings and/or environment variables that are in force (related to those settings or perhaps to Yoda) may be influencing the results. Also, which OS are you running on?

@stsnel
Copy link
Collaborator

stsnel commented Mar 28, 2025

Hi @d-w-moore,

I've performed these tests on a laptop running Ubuntu 24.04 LTS.

I don't see any relevant environment variables that could influence this behaviour. Please find the settings in the irods_environment file on the client below:

$ grep -vE "(irods_home|irods_user_name|irods_zone_name|irods_host)" ~/.irods/irods_environment.json
{
    "irods_authentication_scheme": "pam",
    "irods_client_server_negotiation": "request_server_negotiation",
    "irods_client_server_policy": "CS_NEG_REQUIRE",
    "irods_port": 1247,
    "irods_default_resource": "irodsResc",
    "irods_encryption_algorithm": "AES-256-CBC",
    "irods_encryption_key_size": 32,
    "irods_encryption_num_hash_rounds": 16,
    "irods_encryption_salt_size": 8,
}

Settings on the provider that we're connecting to:

$ grep -E "(irods_client_server|irods_encryption)" .irods/irods_environment.json
    "irods_client_server_negotiation": "request_server_negotiation", 
    "irods_client_server_policy": "CS_NEG_REQUIRE", 
    "irods_encryption_algorithm": "AES-256-CBC", 
    "irods_encryption_key_size": 32, 
    "irods_encryption_num_hash_rounds": 16, 
    "irods_encryption_salt_size": 8, 

I can reproduce the issue with your script in combination with Python 3.12.3 and PRC 3.1.0:

$ ./test-daniel.py ~/.irods/irods_environment.json
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
Exception ignored in: <function Connection.__del__ at 0x7e6ae95fc900>
Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 147, in __del__
    self.disconnect()
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 376, in disconnect
    self.send(disconnect_msg)
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 163, in send
    raise NetworkException("Unable to send message")
irods.exception.NetworkException: Unable to send message

If I print a traceback for the exception that is raised when creating the first session, it seems to complain about a missing import:

Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 416, in __server_version
    conn = next(iter(self.pool.active))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/pool.py", line 93, in get_connection
    conn = self.idle.pop()
           ^^^^^^^^^^^^^^^
KeyError: 'pop from an empty set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 600, in _login_pam
    output_message = self.recv()
                     ^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 196, in recv
    raise exc
irods.exception.PAM_AUTH_PASSWORD_FAILED: None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/some-directory/./test-daniel.py", line 11, in <module>
    print(session.server_version)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 379, in server_version
    return self._server_version()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 412, in _server_version
    return self.__server_version() if version_func is None else version_func(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/session.py", line 419, in __server_version
    conn = self.pool.get_connection()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/pool.py", line 18, in method_
    ret = method(self, *s, **kw)
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/pool.py", line 117, in get_connection
    conn = Connection(self, self.account)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 123, in __init__
    self._login_pam()
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 601, in _login_pam
    except irods.exception.PAM_AUTH_PASSWORD_INVALID_TTL as exc:
           ^^^^^
NameError: name 'irods' is not defined

So it looks like connection.py needs an additional import here. However, adding the import does not resolve the issue, so this does not look related. Have submitted a separate PR for this: #705

@d-w-moore
Copy link
Collaborator

d-w-moore commented Mar 28, 2025

I can reproduce the issue with your script in combination with Python 3.12.3 and PRC 3.1.0:

$ ./test-daniel.py ~/.irods/irods_environment.json
session.server_version = (4, 2, 12)
session.pool.account.authentication_scheme = 'pam'
sys.version = '3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0]'
type(list(session.pool.idle|session.pool.active)[0].socket) = <class 'ssl.SSLSocket'>
Exception ignored in: <function Connection.__del__ at 0x7e6ae95fc900>
Traceback (most recent call last):
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 147, in __del__
    self.disconnect()
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 376, in disconnect
    self.send(disconnect_msg)
  File "/some-directory/venv/lib/python3.12/site-packages/irods/connection.py", line 163, in send
    raise NetworkException("Unable to send message")
irods.exception.NetworkException: Unable to send message

hi @stsnel ,
Indeed - I was able to see the error message myself with these exact versions, except using Ubuntu 18.04. Presumably the OS doesn't matter as much. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants