-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider clustering all.requests
table by page
or rank
#263
Comments
Or maybe we should have |
Considering there is currently only 1 column to replace, I'd better go with a more granular I think there is a unique |
Here are queries for two sampled tables clustered by Here I added a I believe the cluster column query limitation doesn't allow to use @tunetheweb did you have another case in mind for |
Ah that's disappointing. There are pther benefits to clustering on page for select pages (e.g. get me all the requests for |
One thing I find really handy for the
all.pages
table is settingrank = 1000
as a quick way to get results and save costs but still see real data (often the more interesting data too, to be honest!).We can't do that with the
all.requests
table. We also can't quickly look up the data for a simple site so can't do this via theall.pages
table either. It would be handy to be able to do either of these by clustering theall.requests
table bypage
orrank
.Now there are a max of 4 clustering columns and we're already using 4 for
all.requests
:client
is_root_page
is_main_document
type
These are all useful so we'd need to drop one if we wanted to add a new column.
I think
is_main_document
is useful, but can mostly be repeated bytype='html' AND is_main_document
(not entirely but 99.8% of cases and the most useful ones!) so I'd prefer to replace that with eitherpage
orrank
. I'm thinkingpage
as can use that to get rank, but open to ideas.The text was updated successfully, but these errors were encountered: