You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When looking for data in ESGF, a common mode of working is to first search for a few facets and then use the unique column values to refine your search iteratively. This is currently very slow for initial queries that will return many records. This is what we currently implement when you call search():
Query each index node with your search and build a pandas dataframe from the responses
Merge this information into a single dataframe, removing older versions and populating lists of dataset_ids which contain location information
Call df.unique() to get the unique facet columns and return as the __repr__ of the catalog
The Solr indices will take a long time to return the complete response and even if Globus is faster, it consumes a lot of resources for information we really didn't need in early stages of the search.
Instead, we could have search perform what I will call a shallow query. That is, we return 0 records, but ask the index for the unique facets that are part of the search. This response we use to manually build up the unique facet columns and the underlying dataframe remains empty initially.
When the user makes reference to cat.df (either directly or indirectly by calling something that uses it, such as to_dataset_dict()), then we pay the price of the full search, hoping that you have a better idea of what you need at this point.
The text was updated successfully, but these errors were encountered:
When looking for data in ESGF, a common mode of working is to first search for a few facets and then use the unique column values to refine your search iteratively. This is currently very slow for initial queries that will return many records. This is what we currently implement when you call
search()
:df.unique()
to get the unique facet columns and return as the__repr__
of the catalogThe Solr indices will take a long time to return the complete response and even if Globus is faster, it consumes a lot of resources for information we really didn't need in early stages of the search.
Instead, we could have search perform what I will call a shallow query. That is, we return 0 records, but ask the index for the unique facets that are part of the search. This response we use to manually build up the unique facet columns and the underlying dataframe remains empty initially.
When the user makes reference to
cat.df
(either directly or indirectly by calling something that uses it, such asto_dataset_dict()
), then we pay the price of the full search, hoping that you have a better idea of what you need at this point.The text was updated successfully, but these errors were encountered: