Skip to content

Remove the check in loadInventoryPage #8542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Zhdanov0
Copy link
Contributor

An error related to the difference between the transaction states on disk and in the cache has occurred.

Because the tip cache is not loaded and a new block in the tip cache is initialized with CN_ACTIVE, the following can happen:

A record was added to a table in transaction 1. It is read in transaction 3. TipCache::snapshotState() calls TipCache::setState(), which sees the value CN_ACTIVE in the cache. Re-checks through the loc-manager and reads tra_committed from disk. A new commit number is generated, which is 3. As a result, stateCn = 3 is returned from TipCache::snapshotState(). This occurs when transaction 3 reads the RDB$TABLESPACES table (Calls TRA_snapshot_state). Then a check is triggered if (state == tra_committed && stateCn > trans->tra_snapshot_number) , since stateCn = 3 and trans->tra_snapshot_number = 2. And it ends up returning tra_active.
Although it should be tra_commited.
I can describe in more detail if it's not clear. I described it briefly.

There is also a problem:
The transaction_start function should move the OIT based on the check (--oldest > dbb->dbb->dbb_oldest_transaction)
Without a commit, the variable oldest, which participates in the check within the test, always remains at value “1”.
With commit, the variable oldest changes from “1” to another value after calling TPC_find_states .
It's not entirely clear to me what this method does, but from what I can see it's based on cache only and doesn't do any retests.
I went through it in the debugger and saw that without a commit for transaction “1” the state is considered “0”, that is, the transaction is active.
Then the variable oldest is updated by the condition of transaction completion. Since our transaction is considered active in the cache, the oldest variable does not move.
As a consequence, does not move and dbb_oldest_transaction.

Broken two tests that are actually fixed, I think.
core_4382_test.test_2 and gh_7208_test.test_1

  1. With commit:
    When reading a table in a select * from g_test query , in VIO_chase_record_version the code reaches purge which will eventually call BTR_remove .
    Without commit:
    When reading a table in a select * from g_test query , in VIO_chase_record_version notify_garbage_collector is called to record .
    So the difference is who deletes the old record, either immediately or in GC.

This difference happened because of the situation I described above with the retarded promotion of the oldest. The difference is obtained in transaction->tra_oldest_active . With check the value = 1, and without = 9. This affects the condition rpb->rpb_transaction_nr >= oldest_snapshot

  1. Ends up with an error because the tabular statistics are slightly different. Basically the differences are that a value is expected in IMGC and it is missing and is added to PURGES.
    The problem is that OAT is not updated correctly and stays at “1” for a while, even though the transaction has already been completed. Because of this it is considered still active and IMGC is called instead of PURGES.

Copy link
Member

@hvlad hvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description is too hard to read and understand, sorry.
We already discussed the issue with @dyemanov privately and it was much more clear.
I see no problem accepting this change if:

  • it was carefully tested, and
  • description should be improved and simplified, it is important for documentation purposes.

@Zhdanov0
Copy link
Contributor Author

Zhdanov0 commented Jun 6, 2025

Ok, I'll describe more briefly. I'm not sure this will give a good understanding of the problem, but if more detail is needed, you can read the PR description.

The error is that there is an invalid state for the transaction in the cache.
This may be happening because of the check I removed in this PR. Due to the check, some transaction states are not loaded into the tip cache.
Because of this, when accessing TipCache::cacheState, it is possible to get a CN_ACITVE state (block in cache not initialized) for a transaction with tra_committed state.
After rechecking through lock manager (after which tra_committed state is read from disk) a new commit number is generated. But it can happen already at the moment of reading
a record in a new transaction, which has a smaller snapshot_request->req_snapshot.m_number and therefore the final status is tra_active. Which is an incorrect state.

Because of the same problem, another situation occurs: TPC_find_states analyzes the tip cache and does not recheck, because of this it can get an incorrect state
for transactions. As a consequence, dbb_oldest_transaction is not updated as fast as it should be due to the fact that the old transaction is still considered active.

How did this affect the tests:

  1. tests.bugs.core_4382_test.test_2
    The test is fixed here, because the value of nodes was originally expected to be 1.
    In the updated code, BTR_remove is called. In the old code, notify_garbage_collector was called.
    That is, the difference is who deletes the oldest record, either immediately or in GC. The reason is described above in the paragraph about TPC_find_states and the oldest counter.

  2. tests.bugs.gh_7208_test.test_1
    The difference is in different table statistics. In addition to the described difference in test №1, another one was added: a value is expected in IMGC, but it is missing and is added in PURGES.
    The problem is that the OAT was not updating correctly and stayed at the old value for a while, even though the transaction was already completed. Because of this, it was considered active and IMGC was called instead of PURGES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants