-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard names: *_threshold, allow for percentile based thresholds #19
Comments
Pinging participants of the CF 2021 Conventions Discussion that expressed interest: @bzah, @jesusff, @japamment, @larsbarring. |
Dear @zklaus and others Thanks for this. It seems fine to me to allow a threshold to be specified as a percentile, but I wonder whether there should be a different standard name
Best wishes Jonathan |
Dear @JonathanGregory, |
Dear all, First, with reference back to the conversation during the CF2021 workshop breakout discussion, I would like clarify that with Now over to @JonathanGregory's comment: There is nothing at all specific about a Finally, with reference to you third point, indeed there are use-cases for having multiple percentiles associated to different variables. Sometimes (often in fact) it is the same percentile value (e.g. 25 or 75), in which case one percentile variable will be enough (often actually meaningful in its context, else at least succinct). If different percentile values are needed two auxiliary coordinates are required, each one having its own [well chosen] variable name. But they can still have the same standard name |
Dear @larsbarring Regarding:
I think there would be risk of confusion. Variable names are arbitrary and meaningless in CF; some of the ways in which CF data are stored do not preserve variable names. I think that if you present a program with two coordinate variables that have the same standard name and units, it will cause some problem; disambiguating them would rely on some other non-standardised attribute like In your use-cases, the percentile is a coordinate variable, but it might become a data variable. It is conceivable that you could have a latitude-longitude field of temperature percentile corresponding to a specified temperature value, for example. I think you would want to identify the field as It's true that percentiles are just numbers between 0 and 100. But sea ice fraction and cloud fraction are just numbers between 0 and 1, yet they are not the same quantity. Best wishes Jonathan |
Dear @JonathanGregory Regarding your first point, I both agree and don't. I agree that variable names are arbitrary and meaningless in CF, and that relying on Regarding your second point it is a different use for this standard name compared to what we are now discussing, and I do not know of such an use-case. But I take your point and agree that in principle this could happen and be of interest. On your third point, I think that it depends on what you mean by quantity. One way to look at it is that the quantity is the fraction of the area covered by X, and what X is is another matter. But, finally, having now aired some arguments against Kind regards, |
I have a use case for percentiles as a data variable! We are currently working on a project looking at fire indexes calculated from climate model outputs, and we have found that looking at the percentiles of the fire indexes as a spatial field is useful. If we want to publish that data in CF-compliant format, we will need a (I do think it would be valuable to consider a mechanism for defining standard names that adhere to a formula like X_of_Y_in_Z in an automatic and implicit way rather than doing it explicitly, but that's a separate topic.) |
@sethmcg Thanks for your use case. Then |
Dear @larsbarring Thanks for your flexibility. You are right that the main point is that I support percentiles! I also agree with you that the standard name is not always sufficient metadata, but I think it helps to provide whatever metadata we can conveniently do, within the framework of the conventions. Best wishes Jonathan |
OK, I will open a new issue for discussing in more detail the new standard names |
Dear all, I think we are introducing some confusion here due to the use of the word "percentile" to refer to the probability associated to the percentile (@larsbarring mentioned this already but the discussion went on; we also discussed this during the CF workshop BOG). The percentile_of_X would have the units of X. I'm afraid the use-case put forward by @sethmcg refers to quantities such as FWI90 (the 90th percentile of the Fire Weather Index), which is a FWI value.
Here, a value of temperature would be fixed (say 20 degC) and the field would be percentile probabilities corresponding to that value in each point. |
Dear @jesusff Oh yes, you're right. Sorry I didn't notice that I had slipped into the confusion. I agree it's confusing that In that case, I think Lars's percentile should be called Best wishes Jonathan |
@jesusff, indeed, thank you. @JonathanGregory I like the first part of the standard name you suggest, But to complicate things, and build on @sethmcg's use case, I can imagine a situation where one would like to start with a cumulative probability value (percentile probability) and then calculate the corresponding percentile of a variable for some reference period, and then how the percentile probability of that particular value may change in some other period. While these two percentile probabilities are in principle similar they are used quite differently: the first is simply prescribed, the second is calculated from data. I think that it might useful to distinguish between these two uses. Could the prescribed one have standard name |
Dear @larsbarring I agree with you that it's debatable whether we should refer to a coordinate variable (for a data variable of frequency of extremes, for instance) as
In fact most of those are argument that we do have uses for Best wishes Jonathan |
Dear @JonathanGregory, I think that my point of view could be described as focussing on the 'fundamental nature' of the entity at hand, Kind regards, |
Dear @larsbarring Thanks for the discussion. It is a good exercise to work out the reasons. We shouldn't make things any more complicated than is useful. Regarding your interest in "fundamental nature", it has been commented before (not by me) that CF is all about "the essence of things". :-) Best wishes Jonathan |
If I may bounce back on @larsbarring and @sethmcg example:
It seems very common to compute the percentile values on a reference period instead of the whole period. How can we link this reference period to Illustrating with an example:Let's say:
The output would be something like (I'm stealing Klaus example)
But the reference period is missing. Thus the user cannot understand that the threshold used to retrieve Can we use something like It could look like:
|
Hello Abel @bzah I think that we need to distinguish between the two use cases:
With respect to your illustrative example I guess that you intend the standard name for the |
This comment has been minimized.
This comment has been minimized.
I think we have reached on consensus on the use of specific I have also added more examples that @larsbarring and I have been developing independently from @bzah, but coincidentally very much in the same spirit. Following those examples, particularly Example 3, I would like to turn the discussion to the encoding of the reference period that is often used for the derivation of thresholds from time-series and given cumulative probability values. |
Dear @zklaus I agree with your intentions and values for the variables in Example 3 - thanks. I have a reservation about the status of the Also Best wishes Jonathan |
Dear @JonathanGregory, thanks for your comments. I agree that it makes a lot of sense to treat the On cell methods, I also completely agree. Let's side-step the issue here by adopting the 50th percentile and going with median in the example and discuss the possible addition of a more complete set of cell methods in a separate issue. Cheers |
This issue has had no activity in the last 30 days. This is a reminder to please comment on standard name requests to assist with agreement and acceptance. Standard name moderators are also reminded to review @feggleton @japamment |
Before moving these suggested standard names towards acceptance, I would like to refer to the problems of having canonical units |
Thanks, @larsbarring, that makes sense. |
This issue has had no activity in the last 30 days. Accordingly:
Standard name moderators are also reminded to review @feggleton @japamment @efisher008 |
Introduction
This issue describes a proposed change to the description text of existing, threshold-based standard names.
It is the result of a number of discussions, most recently at the 2021 CF Conventions, climate index breakout group.
To allow for concrete discussions, the proposed change is first discussed as a concrete example. As such, it is based on the following current definition.
Changelog
This changelog is intended to allow for quickly catching up. If you are new to the issue or are coming back to it after some time, this summary should give you the most important information and you need to start reading only after the last comment mentioned in the following table.
Please let me know if you feel the table does not reflect the consensus appropriately!
and including
percentile(_of_X)
withcumulative_probability_of_X
Current Definition
number_of_days_with_air_temperature_above_threshold
Air temperature is the bulk temperature of the air, not the surface (skin) temperature. A variable whose standard name has the form
number_of_days_with_X_below|above_threshold
is a count of the number of days on which the conditionX_below|above_threshold
is satisfied. It must have a coordinate variable or scalar coordinate variable with the standard name of X to supply the threshold(s). It must have a climatological time variable, and acell_methods
entry for within days which describes the processing of quantityX
before the threshold is applied. Anumber_of_days
is an extensive quantity in time, and thecell_methods
entry for over days should be"sum"
.Proposed Definition
In the following proposed definition, the first paragraph is unchanged except for the removal of the sentence about the threshold coordinate variable, which is in its modified form in the second paragraph.
number_of_days_with_air_temperature_above_threshold
Air temperature is the bulk temperature of the air, not the surface (skin) temperature. A variable whose standard name has the form
number_of_days_with_X_below|above_threshold
is a count of the number of days on which the conditionX_below|above_threshold
is satisfied. It must have a climatological time variable, and acell_methods
entry for within days which describes the processing of quantityX
before the threshold is applied. Anumber_of_days
is an extensive quantity in time, and thecell_methods
entry for over days should be"sum"
.It must give information about the threshold in one or both of the following two ways.
With an explicit threshold in a coordinate variable or scalar coordinate variable with the standard name of
X
, or with a percentile threshold given in a scalar coordinate variable with the standard namecumulative_probability_of_X
.Implied Changes
The proposed definition given above requires the addition of new standard names,
cumulative_probability_of_X
. The proposed standard for this is- Term:
cumulative_probability_of_air_temperature
- Description: A probability percentile.
- Units: % (canonical units: 1)
Examples
Example 1: Only percentile threshold
This example aims to be as close to CF Conventions 1.9, Example 7.12 as possible, while still introducing the concept of percentile threshold.
It differs in the following ways:
n2
(spell length) has been removed for simplification...._below_...
to..._above_...
to follow this issue.Example 2: Only percentile threshold, timeseries
This example follows on the heels of Example 1. The only change is that here we are talking about a longer timeseries, where we are giving the number of days above a threshold per year for several years running.
Example 3: Percentile and numerical threshold
The following example contains data that has been computed for a threshold derived from the percentile of a climatology.
n1
contains the number of days per year above that threshold. Note that thetime
coordinate is a dimensional coordinate and not climatological.percentile_threshold
is the scalar that gives the percentile that underpins the threshold.threshold
is the field of thresholds over space andreference_time
, meaning essentially day-of-year, but seereference_time
below for details.reference_time
gives the reference period that was used for the calculation of the threshold from the percentile. In this case, it is derived from a 5 day window centered on the each day of year over a 30 year climatology.Date: 2021-09-23
The text was updated successfully, but these errors were encountered: