Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce db stress noise #13447

Closed

Conversation

mszeszko-meta
Copy link
Contributor

@mszeszko-meta mszeszko-meta commented Mar 7, 2025

This PR is a followup to #13408. Thick bandaid of ignoring all injected read errors in context of periodic iterator auto refreshes in db stress proved to be 'effective'. We confirmed our theory that errors are not really a consequence / defect related to this new feature but rather due to subtle ways in which downstream code paths handle their respective non-critical IO failures. In this change we're replacing a thick 'ignore all IO read errors' bandaid in no_batched_ops_stress with a much smaller, targeted patches in obsolete files purge / delete codepaths, table block cache reader and table cache lookup to make sure we don't miss signal and ensure there's a single mechanism for ignoring error injection in db stress tests.

Test Plan

[WIP] Expect all manually triggered sandcastle runs containing this change to succeed.

https://www.internalfb.com/sandcastle/workflow/171136785846515059
https://www.internalfb.com/sandcastle/workflow/711568741130906631
https://www.internalfb.com/sandcastle/workflow/2787728169348838431
https://www.internalfb.com/sandcastle/workflow/67553994429269742
https://www.internalfb.com/sandcastle/workflow/1585267068840817353
https://www.internalfb.com/sandcastle/workflow/626000348211043289
https://www.internalfb.com/sandcastle/workflow/1432144681510210764
https://www.internalfb.com/sandcastle/workflow/1459166279274495529
https://www.internalfb.com/sandcastle/workflow/4224376450479893049
https://www.internalfb.com/sandcastle/workflow/671036344484747841

@mszeszko-meta mszeszko-meta changed the title [Reduce db stress noise] Ignore status if error in delete/purge [Experimental] Reduce db stress noise Mar 7, 2025
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change lgtm - if the intense rehearsal CI shows to be stable, feel free to merge this

@mszeszko-meta
Copy link
Contributor Author

mszeszko-meta commented Mar 7, 2025

Change lgtm - if the intense rehearsal CI shows to be stable, feel free to merge this

Few of the scheduled jobs failed [1]. It might be a bit of whack-a-mole game to catch all the relevant callsites. I'll continue iterating on the fix-test until satisfaction is reached OR we derive a conclusion that ROI doesn't justify the effort : ).

[1]
https://www.internalfb.com/sandcastle/workflow/2990390152580439340/artifact/actionlog.2990390152695226293
https://www.internalfb.com/sandcastle/workflow/4562146422532671924/artifact/actionlog.4562146422646836732
https://www.internalfb.com/sandcastle/workflow/4003700068738720899/artifact/actionlog.4003700068852839372
https://www.internalfb.com/sandcastle/workflow/4003700068738720899/artifact/actionlog.4003700068852839372
https://www.internalfb.com/sandcastle/workflow/2990390152580439340/artifact/actionlog.2990390152695226293

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@mszeszko-meta
Copy link
Contributor Author

mszeszko-meta commented Mar 10, 2025

Stress test jobs [1] for the 2nd ("patched") run indicate that despite calling to ignore errors in PurgeObsoleteWALFiles, those errors are not properly corrected in shared state (ignore_read_errors still set to false). It could either be 1) timing issue or 2) something with a shared state / local thread storage. While I do not see anything obvious pointing to the clue that there could be some async activity in purging obsolete WAL files as a part of the iterator cleanup, the sleep_for commit should help us verify that (no. 1).

[1]

https://www.internalfb.com/sandcastle/workflow/477381560507616525
https://www.internalfb.com/sandcastle/workflow/3152519739165702497
https://www.internalfb.com/sandcastle/workflow/1585267068840779345
https://www.internalfb.com/sandcastle/workflow/2837267765249794731

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from a37471b to 7f4303a Compare March 10, 2025 22:00
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from 7f4303a to 7202178 Compare March 10, 2025 23:17
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from 7202178 to a19a972 Compare March 10, 2025 23:18
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from a19a972 to 818b919 Compare March 10, 2025 23:46
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from 818b919 to 3d10bc7 Compare March 10, 2025 23:48
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from 3d10bc7 to 0057b30 Compare March 10, 2025 23:48
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor

hx235 commented Mar 11, 2025

Stress test jobs [1] for the 2nd ("patched") run indicate that despite calling to ignore errors in PurgeObsoleteWALFiles, those errors are 'not properly' corrected in shared state (ignore_read_errors still set to false). It could either be 1) timing issue or 2) something with a shared state / local thread storage. While I do not see anything obvious pointing to the clue that there could be some async activity in purging obsolete WAL files as a part of the iterator cleanup, the sleep_for commit should help us verify that (no. 1).

[1]

https://www.internalfb.com/sandcastle/workflow/477381560507616525
https://www.internalfb.com/sandcastle/workflow/3152519739165702497
https://www.internalfb.com/sandcastle/workflow/1585267068840779345
https://www.internalfb.com/sandcastle/workflow/2837267765249794731

@mszeszko-meta Are you able to repro these locally fast by increasing the frequency of doing TestPrefixScan + refresh with snapshot?? This should speed up your debugging cycle. One of the last command of these failed runs seem to consistently repro within a decent amount of time < 1min

/db_stress --WAL_size_limit_MB=0 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=100 --adaptive_readahead=1 --adm_policy=0 --advise_random_on_open=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=1 --allow_unprepared_value=0 --async_io=1 --auto_readahead_size=0 --auto_refresh_iterator_with_snapshot=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=8 --bgerror_resume_retry_interval=1000000 --block_align=1 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=7 --bloom_bits=62.60756464965925 --bottommost_compression_type=none --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=8388608 --cache_type=tiered_lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=1 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_style=0 --compaction_ttl=100 --compress_format_version=1 --compressed_secondary_cache_ratio=0.6666666666666666 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --db_write_buffer_size=0 --decouple_partitioned_filters=1 --default_temperature=kHot --default_write_temperature=kHot --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=10000 --disable_manual_compaction_one_in=1000000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=1 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_remote_compaction=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=0 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=0 --exclude_wal_from_write_fault_injection=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=none --file_temperature_age_thresholds= --fill_cache=0 --flush_one_in=1000 --format_version=4 --get_all_column_family_metadata_one_in=10000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=5 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log_file_time_to_roll=60 --log_readahead_size=0 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=1 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=8 --max_total_wal_size=67108864 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=32 --metadata_write_fault_one_in=128 --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=-1 --open_metadata_read_fault_one_in=8 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_hits=1 --optimize_filters_for_memory=0 --optimize_multiget_for_io=1 --paranoid_file_checks=1 --paranoid_memory_checks=0 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --preclude_last_level_data_seconds=0 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=10000 --sample_for_compression=0 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --set_options_one_in=0 --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=foo --sqfc_version=1 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=4 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --test_ingest_standalone_range_deletion_one_in=0 --test_secondary=1 --top_level_index_pinning=0 --track_and_verify_wals=0 --uncache_aggressiveness=9587 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=1 --use_attribute_group=1 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=1 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=1 --write_fault_one_in=0 --write_identity_file=0 --writepercent=35

Can we do the error ignore in ~DBIter() instead of lower-level places like how we deal with ThreadStatus::OperationType?

@hx235
Copy link
Contributor

hx235 commented Mar 11, 2025

@mszeszko-meta from my debugger on the repro command above, it seems that IGNORE_STATUS_IF_ERROR_DEBUG() was never called into the callback we want when the code got to IGNORE_STATUS_IF_ERROR_DEBUG() after GetFileModificationTime(). This is weird - not sure if it's some library or sync point set up problem. I changed IgnoreReadErrorCallback_DEBUG as below so not to confuse with the another printing statement of "DBStress thread id:". This "IgnoreReadErrorCallback_DEBUG DBStress thread id" was never get printed.

  static void IgnoreReadErrorCallback_DEBUG([[maybe_unused]] void*) {
    fprintf(stdout, "IgnoreReadErrorCallback_DEBUG DBStress thread id: %lu\n",
            (unsigned long)pthread_self());
    ignore_read_error = true;
  }

Found it - the sync point set up only happen FLAGS_read_fault_one_in. Should also happen FLAGS_read_fault_one_in or FLAGS_metadata_read_fault_one_in. Right now we don't ignore write error I believe.

if (FLAGS_read_fault_one_in) {
#ifdef NDEBUG
      // Unsupported in release mode because it relies on
      // `IGNORE_STATUS_IF_ERROR` to distinguish faults not expected to lead to
      // failure.
      fprintf(stderr,
              "Cannot set nonzero value for --read_fault_one_in in "
              "release mode.");
      exit(1);
#else   // NDEBUG
      SyncPoint::GetInstance()->SetCallBack("FaultInjectionIgnoreError",
                                            IgnoreReadErrorCallback);
      SyncPoint::GetInstance()->SetCallBack("FaultInjectionIgnoreError_DEBUG",
                                            IgnoreReadErrorCallback_DEBUG);
      SyncPoint::GetInstance()->EnableProcessing();
#endif  // NDEBUG
    }

@mszeszko-meta
Copy link
Contributor Author

mszeszko-meta commented Mar 11, 2025

Found it - the sync point set up only happen FLAGS_read_fault_one_in. Should also happen FLAGS_read_fault_one_in or FLAGS_metadata_read_fault_one_in. Right now we don't ignore write error I believe.

Good catch, @hx235 ! Yes, that'd explain it. I've been looking at the recent failed runs and observed the same. The confusing part was the message Injected read error type = 0 interleaved with Injected metadata read error type X. Now that I look at the parametrization in failed jobs like [1], those are mostly dominated by cases where only metadata_read_fault_one_in > 0 and every other read fault type is set to 0. I think your proposal of extending the scope to FLAGS_metadata_read_fault_one_in make sense (is there any historical reason we didn't have that in first place?). Let me schedule a fresh run off of that change and see if we get better results.

Can we do the error ignore in ~DBIter() instead of lower-level places like how we deal with ThreadStatus::OperationType

Do you suggest something like IGNORE_STATUS_IF_ERROR(Status::IOError()) in ~DBIter()?

[1]

https://www.internalfb.com/sandcastle/workflow/4237887249362004362/artifact/actionlog.4237887249476705476.stdout.1?selectedLines=244039-244039-1-7016

@hx235
Copy link
Contributor

hx235 commented Mar 11, 2025

Found it - the sync point set up only happen FLAGS_read_fault_one_in. Should also happen FLAGS_read_fault_one_in or FLAGS_metadata_read_fault_one_in. Right now we don't ignore write error I believe.

Good catch, @hx235 ! Yes, that'd explain it. I've been looking at the recent failed runs and observed the same. The confusing part was the message Injected read error type = 0 interleaved with Injected metadata read error type X. Now that I look at the parametrization in failed jobs like [1], those are mostly dominated by cases where only metadata_read_fault_one_in > 0 and every other read fault type is set to 0. I think your proposal of extending the scope to FLAGS_metadata_read_fault_one_in make sense (is there any historical reason we didn't have that in first place?). Let me schedule a fresh run off of that change and see if we get better results.

Can we do the error ignore in ~DBIter() instead of lower-level places like how we deal with ThreadStatus::OperationType

Do you suggest something like IGNORE_STATUS_IF_ERROR(Status::IOError()) in ~DBIter()?

[1]

https://www.internalfb.com/sandcastle/workflow/4237887249362004362/artifact/actionlog.4237887249476705476.stdout.1?selectedLines=244039-244039-1-7016

is there any historical reason we didn't have that in first place?

metadata read injection was added later by me and I wasn't aware of the fact that this was missed in setting up the sync point so the reason is I made a mistake ..... lol lol lol

Do you suggest something like IGNORE_STATUS_IF_ERROR(Status::IOError()) in ~DBIter()?

Yeah - that was my suggestion before thinking we were in a whack-a-mole situation cuz there were too many places to patch. But if we fix it with sync point set up, I'm good with the current PR.

@mszeszko-meta
Copy link
Contributor Author

mszeszko-meta commented Mar 11, 2025

Found it - the sync point set up only happen FLAGS_read_fault_one_in. Should also happen FLAGS_read_fault_one_in or FLAGS_metadata_read_fault_one_in. Right now we don't ignore write error I believe.

Good catch, @hx235 ! Yes, that'd explain it. I've been looking at the recent failed runs and observed the same. The confusing part was the message Injected read error type = 0 interleaved with Injected metadata read error type X. Now that I look at the parametrization in failed jobs like [1], those are mostly dominated by cases where only metadata_read_fault_one_in > 0 and every other read fault type is set to 0. I think your proposal of extending the scope to FLAGS_metadata_read_fault_one_in make sense (is there any historical reason we didn't have that in first place?). Let me schedule a fresh run off of that change and see if we get better results.

Can we do the error ignore in ~DBIter() instead of lower-level places like how we deal with ThreadStatus::OperationType

Do you suggest something like IGNORE_STATUS_IF_ERROR(Status::IOError()) in ~DBIter()?
[1]
https://www.internalfb.com/sandcastle/workflow/4237887249362004362/artifact/actionlog.4237887249476705476.stdout.1?selectedLines=244039-244039-1-7016

is there any historical reason we didn't have that in first place?

metadata read injection was added later by me and I wasn't aware of the fact that this was missed in setting up the sync point so the reason is I made a mistake ..... lol lol lol

Do you suggest something like IGNORE_STATUS_IF_ERROR(Status::IOError()) in ~DBIter()?

Yeah - that was my suggestion before thinking we were in a whack-a-mole situation cuz there were too many places to patch. But if we fix it with sync point set up, I'm good with the current PR.

I like your approach. It has its pros (avoid whack-a-mole now) and cons (codepaths that don't originate in ~DBIter() [but invoke its' respective downstream callbacks] will encounter the same problems so someone eventually will either have to whack-a-mole on their own OR do the top level patching like we do). I'll try to find the right balance.

@mszeszko-meta mszeszko-meta force-pushed the ignore_errors_in_deletion branch from 049be10 to 3f4e1c6 Compare March 11, 2025 18:28
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has updated the pull request. You must reimport the pull request before landing.

@mszeszko-meta mszeszko-meta marked this pull request as ready for review March 11, 2025 18:28
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mszeszko-meta mszeszko-meta changed the title [Experimental] Reduce db stress noise Reduce db stress noise Mar 11, 2025
@facebook-github-bot
Copy link
Contributor

@mszeszko-meta merged this pull request in 8e16f8f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants