Skip to content

Commit

Permalink
Account for "dateless" datasets (#3623)
Browse files Browse the repository at this point in the history
It's been a bit annoying that `--statistics=creation` reports a total count
that's less than the actual number of datasets. This is because when we report
by creation date, we rely on a `JOIN` between `Dataset` and `Metadata`, where
the creation date comes from the `metadata.log` `pbench.date` field, which is
missing from some datasets (a bit over 4,000 on the production server).

This PR changes `--statistics=creation` to count all rows returned by the SQL
query, but to separately report the number of empty rows where the metadata
was missing: e.g.,

```console
[pbench@n002 /]$ pbench-report-generator --statistics=creation
Dataset statistics by creation date:
 154,737 from 2012-04-13 19:21 to 2024-06-13 11:26
  (count includes 4,019 datasets without a date)
```
  • Loading branch information
dbutenhof authored Jun 13, 2024
1 parent aa325ce commit a7419ce
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions lib/pbench/cli/server/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,8 @@ def summarize_dates(base_query: Query, options: dict[str, Any]):
this_month = 0
this_week = 0
this_day = 0
in_range = 0
count = 0
dateless = 0

filters = []

Expand All @@ -437,14 +438,14 @@ def summarize_dates(base_query: Query, options: dict[str, Any]):
rows = query.execution_options(stream_results=True).yield_per(SQL_CHUNK)

for row in rows:
count += 1
date: datetime.datetime = row[0]
if not isinstance(date, datetime.datetime):
detailer.message(f"Got non-datetime row {row}")
dateless += 1
continue
if not first:
first = date
last = date
in_range += 1
by_year[date.year] += 1
by_month[date.month] += 1
by_day[date.day] += 1
Expand All @@ -466,7 +467,9 @@ def summarize_dates(base_query: Query, options: dict[str, Any]):
)
return

click.echo(f" {in_range:,d} from {first:%Y-%m-%d %H:%M} to {last:%Y-%m-%d %H:%M}")
click.echo(f" {count:,d} from {first:%Y-%m-%d %H:%M} to {last:%Y-%m-%d %H:%M}")
if dateless:
click.echo(f" (count includes {dateless:,d} datasets without a date)")

if start < year:
click.echo(f" {this_year:,d} in year {year:%Y}")
Expand Down

0 comments on commit a7419ce

Please sign in to comment.