Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close cursor if it's opened by different tx #12546

Merged
merged 30 commits into from
Dec 17, 2024
Merged

Close cursor if it's opened by different tx #12546

merged 30 commits into from
Dec 17, 2024

Conversation

awskii
Copy link
Member

@awskii awskii commented Oct 30, 2024

So we keep cursor interface opened inside DomainRoTx. If we then create new tx and try to read with it from DomainRoTx, value will be fetched via previously opened cursor which is not expected behaviour.

I set stupid fix for that but i assume there could be better solutions.

@awskii awskii requested a review from AskAlexSharov October 30, 2024 10:55
@AskAlexSharov
Copy link
Collaborator

AskAlexSharov commented Oct 31, 2024

@awskii maybe we must panic in this case? just to enforce next invariant:

  • temporal.Tx guarantee that lifetime of dbtx is longer than lifteme of aggtx
  • not allowed to move between goroutines: tx, aggtx, doms

@awskii
Copy link
Member Author

awskii commented Oct 31, 2024

@AskAlexSharov from one side yes, it would be better to panic with info, but i had a small intention to reuse sharedDomains without closing and reopening (requires only to set up new aggtx) but this could end up badly.

@AskAlexSharov
Copy link
Collaborator

sharedDomains without closing and reopening (requires only to set up new aggtx) where do you need it?

@awskii
Copy link
Member Author

awskii commented Nov 1, 2024

i found it useful while did commitment rebuilding. I am going to add panic as you said.

@awskii awskii changed the title Close cursor if it's opened by different tx [wip] Close cursor if it's opened by different tx Nov 1, 2024
@awskii awskii added this to the 3.0.0-beta1 milestone Nov 26, 2024
@yperbasis yperbasis removed this from the 3.0.0-beta1 milestone Nov 29, 2024
@AskAlexSharov
Copy link
Collaborator

AskAlexSharov commented Dec 7, 2024

this branch, ethmainnet at startup got:

[INFO] [12-07|12:49:31.149] [1/6 OtterSync] Requesting downloads
[DBUG] [12-07|12:49:31.199] [snapshots.webseed] get from HTTP provider manifest-len=496 url=https://erigon2-v1-snapshots-mainnet.erigon.network
[DBUG] [12-07|12:49:31.399] [snapshots.webseed] /manifest.txt retrieval failed, no downloads from this webseed webseed=https://erigon2-v2-snapshots-mainnet.erigon.network status="404 Not Found"
[DBUG] [12-07|12:49:31.399] [snapshots.webseed] get from HTTP provider err="webseed.http: status=404, url=https://erigon2-v2-snapshots-mainnet.erigon.network/manifest.txt" url=https://erigon2-v2-snapshots-mainnet.erigon.network
[INFO] [12-07|12:49:31.427] [snapshots:download] Stat                blocks=21.23M indices=21.23M alloc=5.9GB sys=6.0GB
[INFO] [12-07|12:49:31.434] [snapshots:history] Stat                 blocks=21.23M txs=2.62B txNum2blockNum="1024=14760K,1536=19814K,1664=21052K,1680=21204K,1682=21222K,1683=21231K" first_history_idx_in_db=0 last_comitment_block=21231632 last_comitment_tx_num=2629687500 alloc=5.9GB sys=6.0GB
[DBUG] [12-07|12:49:31.434] [1/6 OtterSync] DONE                     in=487.960382ms
[DBUG] [12-07|12:49:31.434] [2/6 BlockHashes] DONE                   in=109.587µs
[DBUG] [12-07|12:49:31.435] [3/6 Senders] DONE                       in=547.795µs
set e1=71d64df015d8 e2=c17b728ac0 d=commitment
[INFO] [12-07|12:49:31.438] [4/6 Execution] starting                 from=21231632 to=21236999 fromTxNum=2629687353 offsetFromBlockBeginning=146 initialCycle=true useExternalTx=false inMem=false
[INFO] [12-07|12:49:31.453] [4/6 Execution] Done                     blk=21231799 blks=168 blk/s=11080.0 txs=1 tx/s=65 gas/s=0 buf=940B/512.0MB stepsInDB=0.00 step=1683.0 inMem=false alloc=5.9GB sys=6.0GB
panic: tx passed into ShredDomains is immutable: cursor parent tx c17ad67360; current tx c17b728ac0

goroutine 20132 [running]:
github.com/erigontech/erigon-lib/state.(*DomainRoTx).valsCursor(0xc17b3b0700, {0x71d64df015d8, 0xc17b728ac0})
	github.com/erigontech/[email protected]/state/domain.go:1731 +0x3c5
github.com/erigontech/erigon-lib/state.(*DomainRoTx).getLatestFromDb(0xc17b3b0700, {0x5b7f831, 0x1, 0x1}, {0x71d64df015d8?, 0xc17b728ac0?})
	github.com/erigontech/[email protected]/state/domain.go:1751 +0x4d
github.com/erigontech/erigon-lib/state.(*DomainRoTx).GetLatest(0xc17b3b0700, {0x5b7f831, 0x1, 0x1}, {0x71d64df015d8?, 0xc17b728ac0?})
	github.com/erigontech/[email protected]/state/domain.go:1808 +0x15e
github.com/erigontech/erigon-lib/state.(*AggregatorRoTx).GetLatest(0x44?, 0xf7c0?, {0x5b7f831?, 0x1c0?, 0x1?}, {0x71d64df015d8?, 0xc17b728ac0?})
	github.com/erigontech/[email protected]/state/aggregator.go:1857 +0x45
github.com/erigontech/erigon-lib/state.(*SharedDomains).GetLatest(0xc171f567e0, 0x4, {0x5b7f831, 0x1, 0x1})
	github.com/erigontech/[email protected]/state/domain_shared.go:972 +0x92
github.com/erigontech/erigon-lib/state.(*SharedDomains).DomainPut(0xc171f567e0, 0x4, {0x5b7f831, 0x1, 0x1}, {0x0, 0x0, 0x0}, {0xc17c51db40, 0x4, ...}, ...)
	github.com/erigontech/[email protected]/state/domain_shared.go:1009 +0x9f
github.com/erigontech/erigon/core/rawdb/rawtemporaldb.AppendReceipt({0x33562b0, 0xc171f567e0}, 0x71d6395c7200?, 0x0)
	github.com/erigontech/erigon/core/rawdb/rawtemporaldb/accessors_receipt.go:85 +0x103
github.com/erigontech/erigon/eth/stagedsync.(*serialExecutor).execute(0xc17b5f6908, {0x335e198, 0xc0012d9d10}, {0xc17b70a488, 0x51, 0x0?})
	github.com/erigontech/erigon/eth/stagedsync/exec3_serial.go:136 +0x152
github.com/erigontech/erigon/eth/stagedsync.ExecV3({_, _}, _, {_, _}, _, {{0x3375810, 0xc000be8150}, 0x20000000, {0x1, ...}, ...}, ...)
	github.com/erigontech/erigon/eth/stagedsync/exec3.go:590 +0x22fa
github.com/erigontech/erigon/eth/stagedsync.ExecBlockV3(_, {_, _}, {{_, _}, {_, _}, _}, _, {0x335e198, ...}, ...)
	github.com/erigontech/erigon/eth/stagedsync/stage_execute.go:162 +0x20f
github.com/erigontech/erigon/eth/stagedsync.SpawnExecuteBlocksStage(_, {_, _}, {{_, _}, {_, _}, _}, _, {0x335e198, ...}, ...)
	github.com/erigontech/erigon/eth/stagedsync/stage_execute.go:256 +0x10e
github.com/erigontech/erigon/eth/stagedsync.PipelineStages.func10(0x9?, 0x0?, {0x3357ca0?, 0xc174583ce0?}, {{0x0, 0x0}, {0x0, 0x0}, 0x0}, {0x3374208, ...})
	github.com/erigontech/erigon/eth/stagedsync/default_stages.go:238 +0xf0
github.com/erigontech/erigon/eth/stagedsync.(*Sync).runStage(0xc174583ce0, 0xc1749875e0, {0x3375810, 0xc000be8150}, {{0x0, 0x0}, {0x0, 0x0}, 0x0}, 0x1, ...)
	github.com/erigontech/erigon/eth/stagedsync/sync.go:528 +0x190
github.com/erigontech/erigon/eth/stagedsync.(*Sync).Run(0xc174583ce0, {0x3375810, 0xc000be8150}, {{0x0, 0x0}, {0x0, 0x0}, 0x0}, 0x0?, 0x1)
	github.com/erigontech/erigon/eth/stagedsync/sync.go:412 +0x2ad
github.com/erigontech/erigon/turbo/stages.ProcessFrozenBlocks({0x335e198, 0xc0012d9d10}, {0x3375810, 0xc000be8150}, {0x339b648, 0xc001600570}, 0xc174583ce0, 0x0)
	github.com/erigontech/erigon/turbo/stages/stageloop.go:153 +0x15b
github.com/erigontech/erigon/turbo/execution/eth1.(*EthereumExecutionModule).Start(0xc17495bcc0, {0x335e198, 0xc0012d9d10})
	github.com/erigontech/erigon/turbo/execution/eth1/ethereum_execution.go:320 +0x165
created by github.com/erigontech/erigon/eth.(*Ethereum).Start in goroutine 1
	github.com/erigontech/erigon/eth/backend.go:1584 +0xdff
exit status 2

@awskii
Copy link
Member Author

awskii commented Dec 12, 2024

Cursor keeps instance of tx inside of it. When tx is Committed/Rollback-ed, all bounded Cursors are invalidated. But existing kv.Cursor pointer (like one in DomainRoTx) does not know underlying tx has been expired. So regardless to check of ViewID we get NPE during mdbx.(*Cursor).getVal because mdbxCurosr (c._c) is invalid.
Added checks with panic, running on mainnet to verify.

@awskii
Copy link
Member Author

awskii commented Dec 12, 2024

Previous panic you had was realted to issue that sometimes we use tx, sometimes pass SharedDomains as kv.Tx implementation. Previous idea was to check pointers to given tx implementations and if they're mismatching, panic. So they did: we queried AggregatorRoTx with both tx and SharedDomains causing ptr mismatch => panic

@awskii awskii changed the title [wip] Close cursor if it's opened by different tx Close cursor if it's opened by different tx Dec 17, 2024
@awskii
Copy link
Member Author

awskii commented Dec 17, 2024

For the check being active, hid it behind "ERIGON_AGG_ASSERTS=true" env

@awskii awskii merged commit 8c12f51 into main Dec 17, 2024
13 checks passed
@awskii awskii deleted the badcursor branch December 17, 2024 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants