Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Data Loading Validation Logic #11

Open
valentinedwv opened this issue Jul 9, 2022 · 5 comments
Open

Document Data Loading Validation Logic #11

valentinedwv opened this issue Jul 9, 2022 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@valentinedwv
Copy link
Contributor

valentinedwv commented Jul 9, 2022

ADD TO THIS DOCUMENT: https://github.com/earthcube/geocodes_documentation/wiki/DataLoadingValidationStory#what-do-we-need-do-to-setup-testing

Document the testing that will be needed to validate the data loading.
Repeat, document the steps we will need to implement to validate the data loading.

This is not asking for the implementation of the tests. It is asking for the testing plan with the steps that will will need to implement.

Places to possibly look:

Tasks:

[x] Spec: Summon working
[x] Spec: expected JSON-LD
[x] Spec: JSON-LD data load to triple store
[ ]Spec: JSON LD Renders in UI
[ ] Spec Tool Linkage:
[ ] spec: org provenance information
[ ] SPEC automate use Geocode_Metadata_Approval tests as part of a CI workflow
[ ] SPEC SHACL and other validation of full source/repository data

Examples

Spec: Summon working
[x] we have earthcube reports in the scheduler, and tests in the geocodes metadata

  • count of records from a sitemap (matches) count of records that made it into the bucket
  • possible tool: sitemap assay

Spec: expected JSON-LD
[x] approval tests in the geocodes metadata
possible tooling: https://github.com/gleanerio/notebooks/tree/master/notebooks/validation

  • validate jsonld using ?
  • does it have xxxx

Spec: JSON-LD data load to triple store
[x] we have earthcube reports in the scheduler,

  • count of files in s3 bucket (matches) count of graphs for an organization/record
  • For (% of data load or all if < 100 records)
    • retrieve JSONLD
      • urn
      • name
    • retrieve a graph by a urn from the service api
      • did we get a record
      • does information (name, urn, [other properties]) match
    • retrieve from triplestore using user interface query with name as

Spec: JSON LD Renders in UI

  • for a random number of jsonld in org.
    • create a set of curl url
    • open in a web test suite (selenium, etc) and see that they at least partially render to expectations

Spec Tool Linkage:

  • for a set of known files, do we match the look linkages.
    • This probably needs to be an approval test.

spec: org provenance information

SPEC automate use Geocode_Metadata_Approval tests as part of a CI workflow

SPEC SHACL and other validation of full source/repository data

@MBcode
Copy link
Contributor

MBcode commented Jul 13, 2022

Working up here: http://earthcube.ddns.net/ec/test will go into a gDoc, then into the evolving notebook

@MBcode
Copy link
Contributor

MBcode commented Jul 14, 2022

Maybe in new documentation repo, have a test directory, that is somewhat like https://github.com/MBcode/ec/tree/master/test with sitemap like csv that has expected values

DV: Also, gecodes-metadata can generate a sitemap; individual and org sitemaps maybe(base on records)
;I've also been able to access 2ndry metadata that isn't crawled, so could also check on this

@valentinedwv
Copy link
Contributor Author

@MBcode
Copy link
Contributor

MBcode commented Jul 25, 2022

Above wiki-story expands upon notebook towards use of gleaner, which had less depth there because we have to:
Start to deal w/gleaner's difficulty to look up ld-cache datasets by download url; (even though difficult to run part of workflow, just ref back to saved items)
As a plan is fleshed out for that part, the 2 docs can become more integrated

@MBcode
Copy link
Contributor

MBcode commented Aug 17, 2022

Have most of code for ci dbg/spot testing of what might go wrong w/ingest workflow. Breaking big ipynb .md version out to focus on this, in ec/test, now just testing.md. It will get ‘On a branch in Geocodes documentation’ soon
Other parts of original counts.md to be run each crawl & sample.md which created test/standard
Will get even more text around it; and get ec/test/standard test sets into a branch in GeoCODES-Metadata

@valentinedwv valentinedwv moved this to Todo in Decoder Dec 29, 2022
@valentinedwv valentinedwv removed the status in Decoder May 5, 2023
@valentinedwv valentinedwv added the documentation Improvements or additions to documentation label May 5, 2023
@ylyangtw ylyangtw moved this to Todo in Decoder Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: Todo
Development

No branches or pull requests

2 participants