Skip to content

weird hang when execute TWO tests one-by-one (both use trace API) #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlexPeshkoff opened this issue Oct 26, 2022 · 8 comments
Closed
Assignees

Comments

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented Oct 26, 2022

Pavel, I've used to fix two bugs in firebird (FirebirdSQL/firebird#7359 & FirebirdSQL/firebird#7361) related with subj. Currently I can say for sure - used in trace-related tests method to stop service via fixed session ID is not reliable. Nobody can guarantee that after session with ID == N will take place N+1, it may reset to 1, and this is normal behavior, specially for CS.

From what Pavel Zotov says mention method is built into test suite. Can you confirm this? If yes, a way to make behavior stable should be found - for example check for correct session name in sessions list.

@pavel-zotov
Copy link
Contributor

check for correct session name in sessions list.

We have to use following syntax to STOP trace session:

fbtracemgr -SE service_mgr -STOP -ID <NNN>

(i.e. only using its ID rather than name).
One need to use cumbersome (and fragile) manipulations to obtain session ID via known trace session name:

$ /opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas ********* -start -name trace_5441 -config /var/tmp/trc_5441.cfg
$ /opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas ********* -start -name trace_5470 -config /var/tmp/trc_5470.cfg
$ /opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas masterkey -list

Session ID: 2
  name:  trace_5441
  user:  SYSDBA
  date:  2022-10-26 17:13:58
  flags: active, trace

Session ID: 3
  name:  trace_5470
  user:  SYSDBA
  date:  2022-10-26 17:14:14
  flags: active, trace

# get trace ID for name = 'trace_5470':
[ibase@glb-srv-ik-test firebird-qa]$ (/opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas masterkey -list | grep -B1 "  name:  trace_5470" ) | grep "Session ID:  | awk '{print $3}'"
3

This way seems to me poor because it strongly depends on FORMAT of 'fbtracemgr -list' output.
Why we can not stop trace just using its name, without preliminary search of its ID ?

PS.
Yes, i remember that we have discussed this privately a few years ago; but may be it is good to repeat for all involved persons.

@AlexPeshkoff
Copy link
Member Author

In script - yes, need to parse (though in 1 line using shell and Unix utilities). But in python program one gets session name in service reply in separate clumplet, i.e. finding it is trivial.

@pcisar
Copy link
Contributor

pcisar commented Oct 27, 2022

The qa plugin does not use arbitrary or fixed session ID, it uses the session ID reported back by service when session is started. Well, the firebird-driver have to parse the text output to get it, but anyway. The qa plugin also checks that trace was properly stopped, as the service response to stop request is normally "Trace session ID N stopped". This check is actually the reason why this Firebird issue was discovered.
So, your statement that "method to stop service via fixed session ID is not reliable" for qa suite AFAIK does not apply here.

@AlexPeshkoff
Copy link
Member Author

Sorry - you are absolutely right, problem is badly written test.

@pavel-zotov
Copy link
Contributor

I still have a question about why pytest HANGED after show final message about failed test (i.e. when bug was not yet fixed).
Consider following test:

[ibase@glb-srv-ik-test bugs]$ cat -n /home/ibase/firebird-qa/tests/bugs/core_hang_test.py
     1  #coding:utf-8
     2
     3  import time
     4  import pytest
     5  from firebird.qa import *
     6
     7  db = db_factory()
     8  act = python_act('db')
     9
    10  trace_5441 = [
    11      'log_statement_finish = true',
    12  ]
    13  #-------------------------------------------
    14  trace_5470 = [
    15      'log_statement_finish = true',
    16      'include_filter = "%foobar%"',
    17  ]
    18      #'exclude_filter = "%table%"',
    19  #-------------------------------------------
    20  @pytest.mark.version('>=3.0')
    21  def test_1(act: Action):
    22      with act.trace(db_events = trace_5441) as ts:
    23          # def trace() -> TraceSession
    24          if ts.output:
    25              # see core.py (firebird-driver),
    26              # class ServerTraceServices(ServerServiceProvider),
    27              #   def start(self, *, config: str, name: str=None) -> int
    28              #       ...
    29              #       response = self._srv()._fetch_line()
    30              #       return int(response.split()[3])
    31              session = str(ts.output[0])
    32          else:
    33              session = '[no output]'
    34          with open('/var/tmp/trace_5441.session.tmp', 'w') as f:
    35              f.write(f'trace_5441: session = {session}\n')
    36          #pass
    37      assert True
    38
    39  #----------------------------------------------
    40  @pytest.mark.version('>=3.0')
    41  def test_2(act: Action):
    42      #time.sleep(0.5) # ----------------------------------- [ NB ]
    43      with act.trace(db_events = trace_5470) as ts:
    44          # def trace() -> TraceSession
    45          if ts.output:
    46              session = str(ts.output[0])
    47          else:
    48              session = '[no output]'
    49          with open('/var/tmp/trace_5470.session.tmp', 'w') as f:
    50              # if we take delay for at least 0.5 second then session will be 1
    51              # otherwise it will be 2 and pytest hangs infinitely at the final point.
    52              f.write(f'trace_5470: session = {session}\n')
    53          #pass
    54      assert True

Pay note at line #42:
** #time.sleep(0.5) # ----------------------------------- [ NB ]**

If we comment it out then running this test on v4.0.3.2861, Classic, lead to following console output:

[ibase@glb-srv-ik-test firebird-qa]$ cd ~/firebird-qa ; /usr/local/bin/pytest --cache-clear --disable-warnings --basetemp=/var/tmp/fbqa -vv --tb=long --server fb401 ./tests
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.9.9, pytest-7.0.1, pluggy-1.0.0 -- /usr/local/bin/python3.9
cachedir: .pytest_cache
System:
  encodings: sys:utf-8 locale:UTF-8 filesystem:utf-8
Firebird:
  server: fb401 [v4.0.3.2861, Classic, Firebird/Linux/AMD/Intel/x64]
  home: /opt/fb40
  bin: /opt/fb40/bin
  client library: /opt/fb40/lib/libfbclient.so
rootdir: /home/ibase/firebird-qa, configfile: pytest.ini
plugins: firebird-qa-0.17.0, timestamper-0.0.9
collected 2 items

[2022-10-28 13:01:03] tests/bugs/core_hang_test.py::test_1 PASSED                                                                                                                                    [1/2]
[2022-10-28 13:01:06] tests/bugs/core_hang_test.py::test_2 FAILED                                                                                                                                    [2/2]

================================================================================================ FAILURES =================================================================================================
_________________________________________________________________________________________________ test_2 __________________________________________________________________________________________________

act = <firebird.qa.plugin.Action object at 0x7efe0f047cd0>

    @pytest.mark.version('>=3.0')
    def test_2(act: Action):
        #time.sleep(0.5) # ----------------------------------- [ NB ]
        with act.trace(db_events = trace_5470) as ts:
            # def trace() -> TraceSession
            if ts.output:
                session = str(ts.output[0])
            else:
                session = '[no output]'
            with open('/var/tmp/trace_5470.session.tmp', 'w') as f:
                # if we take delay for at least 0.5 second then session will be 1
                # otherwise it will be 2 and pytest hangs infinitely at the final point.
>               f.write(f'trace_5470: session = {session}\n')

tests/bugs/core_hang_test.py:52:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <firebird.qa.plugin.TraceSession object at 0x7efe0f047940>, exc_type = None, exc_value = None, traceback = None

    def __exit__(self, exc_type, exc_value, traceback) -> None:
        time.sleep(2)
        session = self.output.pop(0)
        with self.act.connect_server() as srv:
>           srv.trace.stop(session_id=session)

firebird/qa/plugin.py:1155:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <firebird.driver.core.ServerTraceServices object at 0x7efe1405fa00>

    def stop(self, *, session_id: int) -> str:
        """Stop trace session.

        Arguments:
            session_id: Trace session ID.

        Returns:
           Text message 'Trace session ID <x> stopped'.
        """
        #print('Intro trace stop(), session_id=',session_id)
>       return self.__action(ServerAction.TRACE_STOP, 'stopped', session_id)

../.local/lib/python3.9/site-packages/firebird/driver/core.py:5205:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <firebird.driver.core.ServerTraceServices object at 0x7efe1405fa00>, action = <ServerAction.TRACE_STOP: 23>, label = 'stopped', session_id = 2

    def __action(self, action: ServerAction, label: str, session_id: int) -> str:
        self._srv()._reset_output()
        with a.get_api().util.get_xpb_builder(XpbKind.SPB_START) as spb:
            spb.insert_tag(action)
            spb.insert_int(SrvTraceOption.ID, session_id)
            self._srv()._svc.start(spb.get_buffer())
        response = self._srv()._fetch_line()

        # 14.10.2022, temporary(?)
        #if action == ServerAction.TRACE_STOP:
        #    print('@@@ trace stopping: ', action.name, response)
        #else:
        #    print('@@@ other trace action: ', action.name)

        if not response.startswith(f"Trace session ID {session_id} {label}"):  # pragma: no cover
            # response should contain the error message
            #print('@@@ bad response for action ', action.name, ' : ', response)
>           raise DatabaseError(response)
E           firebird.driver.types.DatabaseError: Trace session ID 2 not found

../.local/lib/python3.9/site-packages/firebird/driver/core.py:5168: DatabaseError
------------------------------------------------------------------------------------------ Captured stdout setup ------------------------------------------------------------------------------------------
Creating db: localhost:/var/tmp/fbqa/test_20/test.fdb [page_size=None, sql_dialect=None, charset='NONE', user=SYSDBA, password=masterkey]
========================================================================================= short test summary info =========================================================================================
FAILED tests/bugs/core_hang_test.py::test_2 - firebird.driver.types.DatabaseError: Trace session ID 2 not found
================================================================================= 1 failed, 1 passed, 1 warning in 5.86s ==================================================================================

And pytest will infinitely HANG, w/o returning control to OS until we press Ctrl-C.
But why it happens ?

Pressing Ctrl-C leads to following additional output:

Traceback (most recent call last):
  File "/home/ibase/firebird-qa/firebird/qa/plugin.py", line 1088, in trace_thread
    for line in srv:
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5301, in __next__
    if (line := self.readline()) is not None:
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5404, in readline
    self._read_output()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5350, in _read_output
    data = self._query_output(timeout)
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5344, in _query_output
    self._svc.query(self._make_request(timeout), bytes([self.mode]), self.response.raw)
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 1039, in query
    self._check()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 105, in _check
    raise self.__report(DatabaseError, self.status.get_errors())
firebird.driver.types.DatabaseError: Error reading data from the connection.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ibase/firebird-qa/firebird/qa/plugin.py", line 1090, in trace_thread
    output.append(line)
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5295, in __exit__
    self.close()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5447, in close
    self._svc.detach()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 1031, in detach
    self._check()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 105, in _check
    raise self.__report(DatabaseError, self.status.get_errors())
firebird.driver.types.DatabaseError: connection shutdown
Exception ignored in: <module 'threading' from '/usr/local/lib/python3.9/threading.py'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 1470, in _shutdown
    lock.acquire()
KeyboardInterrupt:

And only after that control will returned to OS.
Why Ctrl-C was needed to "finalize" execution ?

PS. firebird.conf:

RemoteServicePort = 34099
ServerMode = Classic

@pcisar
Copy link
Contributor

pcisar commented Oct 29, 2022 via email

@AlexPeshkoff
Copy link
Member Author

AlexPeshkoff commented Oct 31, 2022 via email

@pcisar
Copy link
Contributor

pcisar commented Oct 31, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants