Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add FromAddres dimension & take it from cli parameter, filter txs by from_address & to_address #111

Merged
merged 7 commits into from
Nov 12, 2023

Conversation

cool-mestorf
Copy link
Contributor

Motivation

Partially solves #97. for erc*_transfers and trace-related calls, comments needed.

Solution

  • Take from_address from cli parameter to query & request params
    • Request params in collect() may need separation of filters that are supported natively by RPC calls vs manually supported by cryo.
    • Some non-exhaustive branches makes adding new cli param into filter query & debugging difficult. Noted by comments, and may be improved using macros.
  • Implement txs dataset to filter result in extract() layer

Request for comments

  • Is using from_address and to_address dimension to represent contract interaction appropriate? I think this is not a big problem because these datasets are derived from logs but there may be a case that causes name collision between tx's from/to. Alternatives like sender/recipient may be considered?
  • trace related datasets only carry tx hash and will require additional transaction by hash / full block call to fetch from / to address. This may be implemented just like txs dataset that optionally calls receipt only if exclude_failed option is present or gas_used parameter is required.
  • Any tests to add to assert filtering behavior?

PR Checklist

  • Added Tests
  • Added Documentation
  • Breaking changes

None
};
Ok((block, receipt, query.exclude_failed))
let receipts: Vec<Option<_>> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the receipts were collected after filtering, this would require collecting less data and would simply the filtering code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will be much more efficient, but it will require get_tx_receipts_in_block function to take additional transactions: Vec<H160> parameter if eth_getBlockReceipts fails because currently the function reuses block.transactions as fallback... :D I will refactor it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied in 27181e4

@@ -388,6 +404,7 @@ impl Partition {
}

/// return Vec of dimensions defined in partitions
// NOTE: this function is not exhaustive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by these notes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented points are branches that take or return Dim but does not emit type error even all of its variants are properly handled. For example, adding a variant for Dim does not guarantee Dim::all_dims() function to return every variant of the enum because the programmer would have to manually write the additional variant. These codes - for me - felt error-prone because a new commiter is hard to know every part of the codebase that requires additional code for added variant. I meant this for not exhaustive.

@sslivkoff
Copy link
Member

hi. this is looking good. left 2 comments

to your Q's:

  • I think collisions are actually good as long as they don't restrict functionality. they allow a simpler interface and reuse of underlying machinery. but we cannot use too few parameters either. as they would make the interface harder to understand and it would be problematic for datasets that need multiple dimensions of control. check out the dataset list here...there are examples where we need multiple parameters per dataset, so we cannot collapse those parameters into one
  • for the traces dataset, I was thinking it would not be filtering on the to_address of the transaction, but the to_address of what contract is being called by each trace. in the trace objects there are values analogous to to/from on the 4 action types
  • the testing framework is still a major WIP as automating data collection is an extra level of complexity. the right way to do this is with a mockprovider that returns predefined outputs and we test that things get filtered appropriately. but examples of this in the codebase are limited. instead, you can post some example cli commands that demonstrate the functionality. I have been gathering these types of commands and will eventually automate them once a better testing story is in place

@sslivkoff
Copy link
Member

I think this works as a self contained pr or if you wanted to add the other datasets that would also work

Copy link
Contributor Author

@cool-mestorf cool-mestorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to take this PR as self contained because I need this feature more urgent for txs than other datasets... :) I will come with additional PRs for erc transfers and trace datasets.

@@ -388,6 +404,7 @@ impl Partition {
}

/// return Vec of dimensions defined in partitions
// NOTE: this function is not exhaustive
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented points are branches that take or return Dim but does not emit type error even all of its variants are properly handled. For example, adding a variant for Dim does not guarantee Dim::all_dims() function to return every variant of the enum because the programmer would have to manually write the additional variant. These codes - for me - felt error-prone because a new commiter is hard to know every part of the codebase that requires additional code for added variant. I meant this for not exhaustive.

None
};
Ok((block, receipt, query.exclude_failed))
let receipts: Vec<Option<_>> =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will be much more efficient, but it will require get_tx_receipts_in_block function to take additional transactions: Vec<H160> parameter if eth_getBlockReceipts fails because currently the function reuses block.transactions as fallback... :D I will refactor it in this PR.

@sslivkoff
Copy link
Member

oh I see the issue with the ordering now. thats a bit tricky

maybe only use eth_getBlockReceipts if filters are NOT being used. because filters probably mean a tiny subset of the data is relevant, and the benefits of eth_getBlockReceipts disappear

@sslivkoff
Copy link
Member

basically use get_tx_receipts_in_block if no filters

and if there are filters, use the fallback code inside of get_tx_receipts_in_block:

        // fallback to `eth_getTransactionReceipt`
        let mut tasks = Vec::new();
        for tx in &block.transactions {
            let tx_hash = tx.hash;
            let fetcher = self.fetcher.clone();
            let task = task::spawn(async move {
                match fetcher.get_transaction_receipt(tx_hash).await? {
                    Some(receipt) => Ok(receipt),
                    None => {
                        Err(CollectError::CollectError("could not find tx receipt".to_string()))
                    }
                }
            });
            tasks.push(task);
        }
        let mut receipts = Vec::new();
        for task in tasks {
            match task.await {
                Ok(receipt) => receipts.push(receipt?),
                Err(e) => return Err(CollectError::TaskFailed(e)),
            }
        }

        Ok(receipts)

(could pull this out of get_tx_receipts_in_block into a separate new function like get_tx_receipts())

everything else looks good. and keeping it self contained sgtm

@sslivkoff sslivkoff merged commit 7da85e8 into paradigmxyz:main Nov 12, 2023
3 checks passed
@sslivkoff
Copy link
Member

awesome. great feature to have

@cool-mestorf cool-mestorf deleted the feat/filter-by-from-to-addr branch November 13, 2023 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants