-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-33966: buf_page_make_young() is a contention point #3522
Open
dr-m
wants to merge
3
commits into
10.6
Choose a base branch
from
10.6-MDEV-33966
base: 10.6
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
dr-m
force-pushed
the
10.6-MDEV-33966
branch
8 times, most recently
from
September 24, 2024 14:34
e5d11b3
to
f14f819
Compare
Instead of repurposing buf_page_t::access_time for state()==MEMORY blocks that are part of recv_sys.pages, let us define an anonymous union around buf_page_t::hash. In this way, we will be able to declare access_time private.
buf_block_alloc(): Define as an alias in buf0lru.h, which defines the underlying buf_LRU_get_free_block(). buf_block_free(): Define as an alias of the non-inline function buf_pool.free_block(block).
The buf_pool.LRU list needs to reasonably accurately reflect recently accessed blocks, so that they will not be evicted prematurely. Because the list is protected by buf_pool.mutex, it is not a good idea to maintain the position on every page access. Instead of maintaining the LRU position on each access, we will set a "recently accessed" flag in the block descriptor. The buf_flush_page_cleaner() thread as well as some traversal of the buf_pool.LRU list will test and reset this flag on each visited block. If the flag was set, the position in the buf_pool.LRU list may be adjusted. buf_pool_t::freed_page_clock, buf_page_t::freed_page_clock: Remove. This is no longer meaningful in the revised design. page_zip_des_t::state: An atomic 16-bit field that will include the "accessed" and "old" flags that would more logically belong to buf_page_t. We maintain them here (along with some ROW_FORMAT=COMPRESSED specific state that is protected by page latches) in order to avoid race conditions and unnecessary memory overhead. buf_block_t::init(): Replaces buf_block_init(). buf_page_t::invalidate(): Replaces buf_block_modify_clock_inc(). Instead of maintaining a 64-bit counter, we will maintain one comprising 32+16=48 bits, in modify_clock_low,modify_clock_high. Worst case there will be exactly n<<48 calls to buf_page_t::invalidate() before some operation such as btr_pcur_t::restore_position() is executed. Such a count should be extremely unlikely but not completely impossible. It is worth noting that the DB_TRX_ID is only 48 bits, and each transaction start and commit/rollback will consume an identifier. buf_page_t::modify_clock(): Replaces the read access of buf_page_t::modify_clock. Assert that the caller is holding a page latch. Note: because invalidate() and modify_clock() are protected with buf_pool.mutex or the buf_page_t::lock, there can be no issue with regard to the atomicity of accessing the 48-bit field. buf_page_t::access_time: Store uint16_t(time(nullptr)). Yes, it will wrap around every 18.2 hours, and in the worst case, a rather recently accessed block may end up being evicted as "least recently used". But in this way we will avoid any alignment loss: the adjacent fields modify_clock_low, modify_clock_high, access_time of 32+16+16 bits will nicely add up to 64 bits. buf_page_t::make_young(): Clear the "recently accessed" flag of a block and move the block to the "not recently used" end of buf_pool.LRU if it qualifies for that. When we make a block young, we zero out access_time so that the comparison is more likely to hold on a subsequent invocation, or so that flag_accessed() will update the current access time, which due to wrap-around could look like less than the previously assigned access_time. buf_LRU_scan_and_free_block(): Declare static. buf_pool_invalidate(): Define in the same compilation unit with buf_LRU_scan_and_free_block(). buf_pool.LRU_old_time_threshold: Replaces buf_LRU_old_threshold_ms. PageConverter::run(): Renamed from fil_iterate(). In debug builds, acquire a dummy exclusive latch on the block, so that the assertion in buf_block_t::modify() will be satisfied. ut_time_ms(): Remove. Invoking my_hrtime_coarse() is sufficient for the remaining purposes.
dr-m
force-pushed
the
10.6-MDEV-33966
branch
from
September 25, 2024 09:38
9245f12
to
c8f4726
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The
buf_pool.LRU
list needs to reasonably accurately reflect recently accessed blocks, so that they will not be evicted prematurely. Because the list is protected bybuf_pool.mutex
, it is not a good idea to maintain the position on every page access.Instead of maintaining the LRU position on each access, we will set a "recently accessed" flag in the block descriptor. The
buf_flush_page_cleaner()
thread as well as some traversal of thebuf_pool.LRU
list will test and reset this flag on each visited block. If the flag was set, the position in thebuf_pool.LRU
list may be adjusted.As a result of this, the fields
buf_pool_t::freed_page_clock
andbuf_page_t::freed_page_clock
will lose their purpose and will be removed.The field
buf_page_t::access_time
will storeuint16_t(time(nullptr))
instead of storinguint32_t(ut_time_ms())
. So, instead of wrapping around in 49.7 days, the counter will wrap around in 18.2 hours. In the worst case, the wrap-around could cause a rather recently used block to be wrongly evicted, but there are some remedies against this inbuf_page_t::make_young()
.Also the 64-bit field
buf_block_t::modify_clock
will be moved to 32+16 bits inbuf_page_t
so that it can be stored within the same 64 bits withbuf_page_t::access_time
. This the removal offreed_page_clock
lead to a reduction ofsizeof(buf_page_t)
.Release Notes
freed_page_clock
will always be reported as 0 in theINFORMATION_SCHEMA.INNODB_
tables.The granularity of
innodb_old_blocks_time
will be reduced from milliseconds to seconds.How can this PR be tested?
With a rather extreme case (Sysbench
oltp_update_index
with 1 thread, 1 table, 100k rows, 5MiB buffer pool, 240 seconds run time), I was able to observe a significant improvement:In some multi-threaded runs, I observed a slight to significant performance regression. So, more testing will be needed to determine if this is a good or bad thing. It is possible that the approximated LRU replacement policy is now totally incorrect. If that turns out to be the case based on more testing, we’d need to find a small and short test workload that shows a significant regression.
I checked that on my GNU/Linux system,
time(nullptr)
will read the time stamp from a memory location, not invoke any system call.Basing the PR against the correct MariaDB version
main
branch.This bug affects 10.5 as well, but there are significant performance differences between 10.5 and 10.6. Therefore we target 10.6.
PR quality check