Update the Etherscan scraper by IvanIvanoff · Pull Request #4789 · santiment/sanbase2

IvanIvanoff · 2025-08-21T09:37:52Z

Changes

Ticket

Checklist:

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have tried to find clearer solution before commenting hard-to-understand parts of code
I have added tests that prove my fix is effective or that my feature works

Copilot

Pull Request Overview

This PR updates the Etherscan scraper to adapt to changes in the Etherscan website structure by modifying HTML parsing selectors and extraction methods.

Removes website_link field from project information extraction
Updates HTML selectors to work with the current Etherscan website structure
Refactors the official_link function to handle multiple social media platforms with specific URL patterns

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File	Description
lib/sanbase/external_services/etherscan/scraper.ex	Updates HTML parsing logic with new selectors and removes website_link extraction
test/sanbase/external_services/etherscan/scraper_test.exs	Removes website_link assertion from test expectations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-21T09:39:02Z

lib/sanbase/external_services/etherscan/scraper.ex

+        |> case do
+          nil -> nil
+          link -> Floki.attribute(link, "href") |> List.first()
+        end


The Floki.find/2 function returns a list of elements, but Enum.find/2 expects each element to be a complete HTML element. However, the lambda function tries to extract the 'href' attribute from 'link', but 'link' might not be in the expected format. Consider using Floki.attribute/2 on the entire list first, then filtering the URLs.

Suggested change

end

|> Floki.attribute("href")

|> Enum.find(fn href ->

href && !String.contains?(href, "etherscan-blog")

end)

Copilot · 2025-08-21T09:39:02Z

lib/sanbase/external_services/etherscan/scraper.ex

-    |> String.split()
-    |> Enum.find(fn x -> String.starts_with?(x, "Supply") end)
-    |> (fn supply -> String.trim(supply, "Supply:") end).()
    |> Decimal.new()


The function removes the logic that previously parsed 'Supply:' prefix from the total supply string but retains the binary guard. This could cause parsing issues if the input format still contains text prefixes that need to be stripped before converting to Decimal.

Copilot · 2025-08-21T09:39:03Z

lib/sanbase/external_services/etherscan/scraper.ex

+        nil
+
+      h4 ->
+        Floki.find(h4, "b")


Using Floki.find/2 on an h4 element to find nested 'b' tags may not work as expected. The h4 variable contains a single HTML element, but Floki.find/2 typically expects an HTML document or fragment. Consider using a different approach to extract the bold text from within the h4 element.

Suggested change

Floki.find(h4, "b")

Floki.find([h4], "b")

IvanIvanoff · 2025-08-27T08:42:21Z

lib/sanbase/external_services/etherscan/scraper.ex

      | total_supply: total_supply(html) || project_info.total_supply,
        main_contract_address: project_info.main_contract_address || main_contract_address(html),
        token_decimals: project_info.token_decimals || token_decimals(html),
-        website_link: project_info.website_link || website_link(html),


this is removed as I couldn't find a proper way to extract the website URL from the HTML.

Update the Etherscan scraper

2a859ab

IvanIvanoff requested a review from Copilot August 21, 2025 09:38

Copilot AI reviewed Aug 21, 2025

View reviewed changes

IvanIvanoff commented Aug 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the Etherscan scraper#4789

Update the Etherscan scraper#4789
IvanIvanoff wants to merge 1 commit intomasterfrom
update-etherscan-scraper

IvanIvanoff commented Aug 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

IvanIvanoff Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

-        end
+        |> Floki.attribute("href")
+        |> Enum.find(fn href ->
+          href && !String.contains?(href, "etherscan-blog")
+        end)

Conversation

IvanIvanoff commented Aug 21, 2025

Changes

Ticket

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

IvanIvanoff Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments