Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for "dateless" datasets #3623

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

dbutenhof
Copy link
Member

It's been a bit annoying that --statistics=creation reports a total count that's less than the actual number of datasets. This is because when we report by creation date, we rely on a JOIN between Dataset and Metadata, where the creation date comes from the metadata.log pbench.date field, which is missing from some datasets (a bit over 4,000 on the production server).

This PR changes --statistics=creation to count all rows returned by the SQL query, but to separately report the number of empty rows where the metadata was missing: e.g.,

[pbench@n002 /]$ pbench-report-generator --statistics=creation
Dataset statistics by creation date:
 154,737 from 2012-04-13 19:21 to 2024-06-13 11:26
  (count includes 4,019 datasets without a date)

It's been a bit annoying that `--statistics=creation` reports a total count
that's less than the actual number of datasets. This is because when we report
by creation date, we rely on a `JOIN` between `Dataset` and `Metadata`, where
the creation date comes from the `metadata.log` `pbench.date` field, which is
missing from some datasets (a bit over 4,000 on the production server).

This PR changes `--statistics=creation` to count all rows returned by the SQL
query, but to separately report the number of empty rows where the metadata
was missing: e.g.,

```console
[pbench@n002 /]$ pbench-report-generator --statistics=creation
Dataset statistics by creation date:
 154,737 from 2012-04-13 19:21 to 2024-06-13 11:26
  (count includes 4,019 datasets without a date)
```
@dbutenhof dbutenhof added Server Operations Related to operation and monitoring of a service labels Jun 13, 2024
@dbutenhof dbutenhof requested a review from webbnh June 13, 2024 13:13
@dbutenhof dbutenhof self-assigned this Jun 13, 2024
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@dbutenhof dbutenhof merged commit a7419ce into distributed-system-analysis:main Jun 13, 2024
4 checks passed
@dbutenhof dbutenhof deleted the stattot branch June 13, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Operations Related to operation and monitoring of a service Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants