Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More clarity about expected lunisolar calendar behavior for large dates #2869

Open
Manishearth opened this issue May 30, 2024 · 27 comments
Open

Comments

@Manishearth
Copy link

Manishearth commented May 30, 2024

Prior context: unicode-org/icu4x#4917 in ICU4X, as well as unicode-org/icu4x#4713, unicode-org/icu4x#4904, and some others.

Temporal.PlainDate has a validity range of ≈ Unix epoch ± 250,000 years. This is quite a large range, but it makes perfect sense for working with mathematically defined calendars like the Gregorian calendar: the concept of a Gregorian day 200,000 years into the future is something where there is a reasonable answer to the question.

However, when it comes to lunisolar calendars dependent on astronomical concerns1, and even to some extent solar calendars like the Persian calendar, answering the question "what is $date in $calendar" becomes far murkier. For such calendars, there are three potential sources of answers:

 - "the ground truth": what people actually believe to be the details of the calendar: This is what is printed in almanacs and generally only extends at most 100 years into the future. When there are potential ambiguities; for example when moonrise occurs extremely close to sunrise time, the user community tends to make a call in some direction.
 - "the space truth": what is actually going on in space, plugged in to the definition of the calendar. This can be affected by higher-order characteristics of the celestial orbits, as well as some kinds of unpredictable uncertainties in the really long run.
 - "the math truth": what the algorithms say, and what computers say when they run the algorithms. This is what's actually implementable, but will diverge from the space truth due to celestial approximations, floating point error, and unpredictable higher order factors of space.

The long term intangibility of ground truth means that there is no right answer for the behavior of such a calendar beyond maybe 100 years into the future. You can make informed guesses, but their accuracy starts dwindling quickly as time passes. Of course, the usefulness of the question also dwindles over time: the precise date of the Chinese calendar exactly 10,000 years from now is not really that usable for anything other than idle curiosity.

(Similar considerations apply for the far past: there's little point debating the accuracy of a calendrical calculation for dates before the inception of the calendar)

Given that Temporal expects implementations to support dates in a very large range, it is probably useful to provide guidance and invariants that implementations should follow when dealing with these issues.

Some questions that could be answered:

  • Should such dates be accepted by calendared Temporal.PlainDate in the first place?
  • What are the date ranges we strongly care about for different calendars? Ones where we really do want accuracy with ground truth and relatively predictable behavior where there is no directly known ground truth?
  • Would it be acceptable for such dates to "fall back" to showing Gregorian when "out of range", similar to how the modern Japanese calendar falls back to Gregorian for pre-Meiji eras?
  • Would it be acceptable for such dates to go through simplified arithmetical calculations that are known to not match the calendar definition but will mostly be fine anyway? (E.g. the Chinese calendar could be approximated to following a Metonic cycle with some method for determining which month is the leap month, or even just fixing the leap month to one specific month)
  • How important is ISO roundtripping for these dates (probably extremely important)
  • How important are calendar internal invariants for such dates? These are invariants that are part of the definition of the calendar. For example:
    • Is it acceptable for an Islamic or Chinese month in the far future to have a number of days other than 29 or 30?
    • Is it acceptable to have an Islamic year with a number of days other than 354 or 355?
  • How important are general calendrical invariants for such dates? These are invariants that deal with generic expectations on how dates and calendars work. For example should:
    • adding one day always produce the next day in the month (or the first day in the next month)
    • adding and then subtracting a duration always roundtrip?
    • (@sffc please add others here if you think them important)

(We found that "calendar internal" invariants and "general calendrical" invariants are often in tension when attempting to patch up algorithms to behave nicely for such dates)

cc @hsivonen @anba

Footnotes

  1. All of them except Islamic Tabular and Hebrew. The former follows a fixed roughly-alternating cycle of short and long years, and the latter at the moment is considered to follow a purely arithmetical system where the lunation time is a known approximation expressed as an integer number of ḥalakim. This is a case where the ground truth is basically defined to deliberately ignore the space truth. This means the Hebrew calendar will slowly desynchronize from the lunar cycle but that is ultimately expected and okay. There are, of course, chances for future adjustment happening anyway.

@sffc
Copy link
Collaborator

sffc commented May 30, 2024

Temporal gives 3 ways of representing a particular PlainDate:

  1. ISO: isoYear, isoMonth, and isoDay with a calendar system (this is the one we use internally in the spec)
  2. Codes: era, eraYear, monthCode, and day
  3. Scalars: year, month, and day (this is what ICU4X uses internally)

Being able to convert between all three representations without ambiguity is I think the most important invariant. I will call this the equivalence relation.

Temporal also defines the following invariants for the scalar properties:

  • year is a signed integer representing the number of years relative to a calendar-specific epoch
  • The first month in every year has month equal to 1. The last month of every year has month equal to the monthsInYear property.
  • day is a positive integer representing the day of the month.

Following from these definitions and the equivalence relation are the following arithmetic invariants that I coded into ICU4X in unicode-org/icu4x#4904: The following operations must be equivalent: adding or subtracting 1 day to ISO, adding or subtracting 1 day to Codes, and adding or subtracting 1 day to Scalars. One can write a proof that these invariants must be true for the above definitions to hold.

Calendars that seem like they don't obey these invariants should be modified to do so. For example, juligreg skips about 10 days in the 1600s, violating the definition of day. To fix this problem, that particular month should be shortened and the day field should be adjusted to fill the gap. The offset can and should be fixed during formatting.

For reasons the champions have discussed previously, I think it is wise for Temporal to enforce these invariants. It allows careful developers to craft calendar-independent logic: no matter which calendar is in use, there are certain operations that are always sound, operations derived from the above invariants.

@khawarizmus
Copy link
Contributor

@Manishearth

How important are calendar internal invariants for such dates? These are invariants that are part of the definition of the calendar. For example:
Is it acceptable for an Islamic or Chinese month in the far future to have a number of days other than 29 or 30?
Is it acceptable to have an Islamic year with a number of days other than 354 or 355?

We have finally finalized a proposal named Hijri week calendar (HWC) that is a counterpart of the ISO calendar for Hijri calendars. We have a working Temporal implementation for it.

When doing so we realised that some Hijri calendars like the islamic or the islamic-rgsa don't follow the invariants of the Hijri calendar as you have mentioned. We considered that as a bug that will potentially be fixed in the ICU implementation.

I am mentioning this as to consider a fix for these calendars to make them compatible with the HWC as we are exploring to port the HWC to CLDR.

@ptomato
Copy link
Collaborator

ptomato commented Sep 6, 2024

I'll try to answer the above questions to the best of my knowledge, others please chime in if you feel that I missed the mark:

Should such dates be accepted by calendared Temporal.PlainDate in the first place?

Yes. plainDate.withCalendar() should not throw if you give it a valid calendar ID.

What are the date ranges we strongly care about for different calendars? Ones where we really do want accuracy with ground truth and relatively predictable behavior where there is no directly known ground truth?

I don't have any reason to pick a particular range so I'll arbitrarily say "1000 years before and after the present." That said, we assume the Gregorian calendar is proleptic (extended arbitrarily far into the past and the future), so we should do so for other calendars.

Note, we also assume that existing time zone DST rules continue arbitrarily far into the future, until defined to be otherwise; so Temporal.ZonedDateTime.from('+010000-01-01[America/New_York]').getTimeZoneTransition('next') gives 3 AM on March 12, 10000 CE even though I'd be willing to wager money that in 8000 years, New York will not be using the same DST rules as today.

That said I think this is always going to be on a best-effort basis, at least to some degree. I don't have any good ideas on how to make sure that is a cross-browser best effort, not differing between browsers, especially when it's sometimes not even clear what past dates in a calendar were.

Would it be acceptable for such dates to "fall back" to showing Gregorian when "out of range", similar to how the modern Japanese calendar falls back to Gregorian for pre-Meiji eras?

I'd say that depends on the calendar and the cultural expectations of its users.

Would it be acceptable for such dates to go through simplified arithmetical calculations that are known to not match the calendar definition but will mostly be fine anyway? (E.g. the Chinese calendar could be approximated to following a Metonic cycle with some method for determining which month is the leap month, or even just fixing the leap month to one specific month)

I'd say no, we should not recommend this if we wouldn't recommend it for the Gregorian calendar.

How important is ISO roundtripping for these dates (probably extremely important)

Extremely important, otherwise one of our fundamental assumptions breaks down.

How important are calendar internal invariants for such dates? These are invariants that are part of the definition of the calendar. For example:

  • Is it acceptable for an Islamic or Chinese month in the far future to have a number of days other than 29 or 30?
  • Is it acceptable to have an Islamic year with a number of days other than 354 or 355?

Here also I'd say we should not recommend breaking these invariants if we wouldn't recommend it for the Gregorian calendar. E.g. we would not fix all far-future years to have 365 days and all far-future Februaries to have 28.

How important are general calendrical invariants for such dates? These are invariants that deal with generic expectations on how dates and calendars work.

These I'd put in the same category as ISO roundtripping: extremely important.

@Manishearth
Copy link
Author

These answers are incompatible with each other: something has to give here. We have found examples of dates where the invariants start falling apart for both the islamic and chinese calendars.

Out of the following three things, one must go:

  • Stay faithful to the "formula"
  • Stay faithful to general calendrical invariants
  • ISO roundtripping (without it you can fudge the calendar to match invariants when the formula breaks them)

And, stepping back, I don't even think accurate for me to describe these calendars as having a "formula" in the first place, it doesn't work that way, as I've sketched out in the issue.

That said, we assume the Gregorian calendar is proleptic, so we should do so for other calendars.

I don't think that assumption is as easy to make for lunisolar calendars, since the Gregorian calendar is rather mathematical and they are not (except for Hebrew). The gregorian calendar is one where it actually makes some sense to have a "proleptic" version, for these there is no clear answer as to what that means. I gave three differing answers as to what a proleptic version might mean in the issue above, only one of them is actually implementable in computers and there can still be multiple formulae that are equally valid but give different answers.

Part of the point of this issue is to tease out what we mean when we are trying to construct a proleptic version of these calendars.

I'd say no, we should not recommend this if we wouldn't recommend it for the Gregorian calendar.

I don't think that justification holds. "Fall back to a simplified mathematical model after X years" works perfectly well for the Gregorian calendar, it is a simplified mathematical model.

Note, we also assume that existing time zone DST rules continue arbitrarily far into the future, until defined to be otherwise; so Temporal.ZonedDateTime.from('+010000-01-01[America/New_York]').getTimeZoneTransition('next') gives 3 AM on March 12, 10000 CE even though I'd be willing to wager money that in 8000 years, New York will not be using the same DST rules as today.

This isn't the same thing: the time zone is defined in a way that lets us very easily make a proleptic version. That is not true for lunisolar calendars.

@Manishearth
Copy link
Author

Also, stepping back a bit, it appears that some of the justifications here are because we don't want Gregorian to be "special", which is a reasonable concern in an i18n context. But I'd argue that forcing other calendars to conform to the expectations of the Gregorian calendar — that they be well defined for arbitrary periods of time — is treating Gregorian as special in a far worse way.

@justingrant
Copy link
Collaborator

Out of the following three things, one must go:

  • Stay faithful to the "formula"
  • Stay faithful to general calendrical invariants
  • ISO roundtripping (without it you can fudge the calendar to match invariants when the formula breaks them)

If we must prioritize, then I'd suggest that ISO roundtripping is the most important, because all of Temporal stores dates using the ISO calendar. You'd end up with really unexpected behavior if you couldn't guarantee that the input to APIs like Temporal.PlainDate.from doesn't match the output for properties like year or day. This seems like it'd break many apps in fairly fundamental and obvious ways.

For the next priority, I think it's "Stay faithful to general calendrical invariants" because if you can't depend on behavior like "adding one day always produce the next day in the month (or the first day in the next month)", then it will also break apps. This is the kind of issue that would, for example, break fuzz-testing of apps that expect calendrical invariants to hold, although the breaks wouldn't be nearly as common or obvious as breaks in ISO roundtripping.

So I think that leaves faithfulness to the formula as the one that gets the short straw. That said, I don't exactly know what this means. Does it mean that you just define new formula that is similar to the old one for near dates but diverges somewhat in order to retain calendar invariants when the year is far from today? If so honestly that sounds fine, because formulas (like all calendar calculations since the dawn of time!) have always been approximations of celestial behavior, and those approximations have a habit of being revised and improved from time to time. Perfection is not a reasonable expectation for the far past or far future.

I definitely agree with @ptomato that falling back to Gregorian would be unexpected and confusing for users, so I'd suggest an imprecise formula is better than precise (but clearly wrong culturally) Gregorian dates.

  • adding and then subtracting a duration always roundtrip?

Note that this is *not* an invariant, even in Gregorian, if there's a month or year in the duration, because of the complexity of handing arithmetic around variable-length months and (in lunisolar calendars, years too). The only cross-calendar invariant is that it should roundtrip only for durations with fixed-length date units like days or weeks. (And even weeks might get a bit dicey when rounding is involved, although I'd have to think it through more to know if it's OK or not.)

@sffc
Copy link
Collaborator

sffc commented Sep 6, 2024

Calendars which are subject to rounding and observational errors involving the moon, sun, stars, and planets (such as Chinese and Islamic) are only well-defined from the point at which they were created to a point perhaps a few decades into the future, or as long as published almanacs reach.

Since Temporal requires all calendars to be well-defined over the entire Temporal range, which is far greater than these calendars are actually well-defined, my preference is that these calendars fall back to a somewhat-reasonable formula outside of a certain range, a formula which obeys the roundtrip and arithmetic invariants and keeps the year lengths roughly consistent over the whole Temporal range.

I am also okay with a solution where out-of-range dates fall back to the proleptic Gregorian calendar.

@ptomato
Copy link
Collaborator

ptomato commented Sep 7, 2024

We have found examples of dates where the invariants start falling apart for both the islamic and chinese calendars.

This is helpful information, thanks! I didn't get that out of the original post. I was assuming "proleptic" meant "the math truth", as you phrased it, since that is the only computable one. I didn't realize you were saying that the math truth already leads to contradictions.

Out of curiosity, what happens in these cases? Is this what you were talking about with Islamic years and Chinese months with the wrong number of days?

@sffc
Copy link
Collaborator

sffc commented Sep 7, 2024

In Islamic, the number of days isn't always 29 or 30.

In Chinese, the new year drifts around. The Chinese new year is supposed to be between January 20 and February 21, I think, but according to the formulas we're using, it starts drifting to like January 19, January 18, ..., and eventually it even gets into December or earlier, for dates about 10,000 years into the future, which is within the Temporal range. I also discovered that Pope Gregory's calendar also still has a drift, just a much longer one (it loses a day every ~3000 years instead of every ~300 years), so I'm not sure who is mainly at fault: Gregorian drift, imprecise solstice calculations, or sidereal drift (movement of the north star, which impacts the solstices).

@Manishearth
Copy link
Author

For the next priority, I think it's "Stay faithful to general calendrical invariants" because if you can't depend on behavior like "adding one day always produce the next day in the month (or the first day in the next month)", then it will also break apps.

I guess I misspoke here, there are two kinds of calendar invariants at play. I talk about them in the issue but didn't list both here: There are "general" calendrical invariants like "adding a day gives the next day in the month or the first day in the next month" and "calendar internal" invariants, like "Islamic months are always 29 or 30 days". Do we prioritize these differently?

Generally I think that we're going to have to sacrifice faithfulness anyway so maintaining some of these invariants seems to not be too costly.


It is worth noting that "adding a day gives you the next day in the month" isn't quite a general calendrical invariant: The Hindu calendar follows a lunar notion of "day" that for everyday use is mapped to the solar day. That mapping is not one-to-one, in common reckoning you can have "merged" days and "double" days, which work about how you'd expect for religious observances.

There are some details on the mapping https://books.google.com/books?id=Fb9Zc0yPVUUC&pg=PA20#v=onepage&q&f=false: basically the solar day takes its name from the recentmost lunar day that started before its sunrise. Due to variation in length of lunar day (it's defined in terms of angular displacement, but hte moon's orbit is elliptical), the day can be both shorter or longer than a sunrise-to-sunrise solar day, so you can have sunrise-to-sunrise periods with no new lunar day (leading to a double/extra/leap day), and sunrise-to-sunrise periods with two days (leading to a deleted/merged day). Festivities seem to more commonly follow a similar rule but around noon instead. It's complicated.

@dminor
Copy link

dminor commented Sep 10, 2024

This issue is a potential blocker for us enabling Temporal on Nightly builds of Firefox; as it stands, the ICU4X code we use will hit debug assertions with large dates that might prevent us from fuzz testing our implementation properly. We can workaround this, but having a resolution here would be preferable :)

@sffc
Copy link
Collaborator

sffc commented Sep 11, 2024

About calendars with lunar days (my favorite is the Hawaiian calendar but it sounds like Hindu does this too?): in my opinion we let the ship sail when we said in Temporal that there is a 1-to-1 mapping to ISO-8601. So, a calendar with lunar days should use solar days for arithmetic purposes. However, a calendar with lunar days can absolutely add a new field to access this information, such as lunarDays that returns an array with 1 or 2 day numbers.

@Manishearth
Copy link
Author

About calendars with lunar days (my favorite is the Hawaiian calendar but it sounds like Hindu does this too?): in my opinion we let the ship sail when we said in Temporal that there is a 1-to-1 mapping to ISO-8601. So, a calendar with lunar days should use solar days for arithmetic purposes. However, a calendar with lunar days can absolutely add a new field to access this information, such as lunarDays that returns an array with 1 or 2 day numbers.

Yeah, the main problem is that this affects formatting: the name of the solar day derives from the lunar day(s) it ties to.

Which isn't a huge deal for Temporal, but may be annoying for ICU4X. I do think lunarDay is an elegant solution for this.

Anyway, not exactly the point of this discussion.

@Manishearth
Copy link
Author

Let me see if I can put forth a reasonable conclusion here:

We have the following inviolable invariants:

  • ISO roundtripping must work
  • General calendrical invariants must always work. "Adding a day gives you the next day or the first day in the next month"
  • Dates must be supported in the full Temporal range.

Calendar implementations, in decreasing order of priority:

  • SHOULD follow known ground-truth/almanac dates where possible (typically in dates near-present). This may involve deviation from published algorithms or other calendrical understandings
  • SHOULD follow published algorithms or other calendrical understandings for near-modern dates (±1000 years)
  • MAY use approximations outside of that
  • MAY deviate from calendar-specific invariants if needed

@ptomato
Copy link
Collaborator

ptomato commented Sep 11, 2024

If we need to choose, I'd like to think about deprioritizing "Dates must be supported in the full Temporal range." It seems less important to me than either of the SHOULDs in that list.

@Manishearth
Copy link
Author

Personally that was my hope but it doesn't seem to be the way this discussion has gone so far.

@sffc
Copy link
Collaborator

sffc commented Sep 12, 2024

If we need to choose, I'd like to think about deprioritizing "Dates must be supported in the full Temporal range." It seems less important to me than either of the SHOULDs in that list.

@ptomato, how do you propose handling dates that are outside of a calendar's supported range?

What I don't think we should do is make it a data-driven exception. The following code should not throw an exception for some users but not others:

let calendar = new Intl.DateTimeFormat().resolvedOptions().calendar;
myTemporalDate.withCalendar(calendar);

So, if we made this change, Temporal should normatively specify the range of dates that ICU4X needs to support. This is what it currently does, but the range is just too big a range, such that we need to have these discussions.

@ptomato
Copy link
Collaborator

ptomato commented Sep 12, 2024

If I had to choose, I'd prefer a data-driven exception above giving dates contrary to the known ground-truth in the near-present. (I'm not sure if those things are directly in opposition to each other, anyway.)

@sffc
Copy link
Collaborator

sffc commented Sep 12, 2024

If I had to choose, I'd prefer a data-driven exception above giving dates contrary to the known ground-truth in the near-present. (I'm not sure if those things are directly in opposition to each other, anyway.)

Hm? ICU4X would be buggy if it returns " dates contrary to the known ground-truth in the near-present". This is talking about dates far away from the present.

@ptomato
Copy link
Collaborator

ptomato commented Sep 12, 2024

I'm talking about

  • SHOULD follow known ground-truth/almanac dates where possible (typically in dates near-present). This may involve deviation from published algorithms or other calendrical understandings
  • SHOULD follow published algorithms or other calendrical understandings for near-modern dates (±1000 years)

Both of those things seem higher priority to me than supporting the entire half-million-year range. If I have to choose between getting a wrong date for 1066-09-20 or getting a data-driven exception for -271821-04-20, I'd choose the latter.

@Manishearth
Copy link
Author

Personally I'm very happy with deprioritizing support for the "whole Temporal range". I think it is better UX to give users exceptions than pretend to give meaningful answers. It appeared to be that people in prior discussions considered the range to be a sacred invariant, so I didn't want to poke at it too much.

I do think, as Shane says, we should spec what the validity range is per-calendar rather than based on what implementations decide.

@sffc
Copy link
Collaborator

sffc commented Sep 12, 2024

I'm talking about

  • SHOULD follow known ground-truth/almanac dates where possible (typically in dates near-present). This may involve deviation from published algorithms or other calendrical understandings
  • SHOULD follow published algorithms or other calendrical understandings for near-modern dates (±1000 years)

Both of those things seem higher priority to me than supporting the entire half-million-year range. If I have to choose between getting a wrong date for 1066-09-20 or getting a data-driven exception for -271821-04-20, I'd choose the latter.

I don't think anyone is proposing that we give the wrong date for 1066-09-20.

I'm okay with the following three proposals, in order of preference:

  1. Use ground truth data for near-modern dates and switch to algorithmic approximations which obey the arithmetic invariants for far-away dates
  2. Use ground truth data for near-modern dates and switch to Gregorian years and eras for far-away dates, similar to Japanese
  3. Specify in the Temporal proposal a smaller range of dates that are valid across the board in non-ISO calendars

Things I don't currently support:

  1. Data-driven exceptions where implementations decide what range of dates to support for a calendar
  2. Specify a different range of dates in each calendar system that are valid
  3. Using an algorithmic approximation for near-modern dates if the algorithmic approximation doesn't work

@Manishearth
Copy link
Author

I am also in support of all three options there. My strong preference is having some consensus here rather than leaving it up to implementors. My weak preference is probably 3 > 1 > 2. My current proposal above is Shane Proposal 1.

I also agree with Shane's points 1 and 3 for "things I don't currently support". I'm okay with the date ranges being per-calendar, but I think doing across-calendar ranges is reasonably doable.

My caveat for Shane Proposal 3 is that implementation should still have the freedom to switch to algorithmic approximations if they have trouble fitting the smaller range of dates, but the proposal switches the priority order in a way that makes that less necessary.

@justingrant
Copy link
Collaborator

I agree with @ptomato that it's OK if the full Temporal range is not supported for some calendars.

I also think that @sffc's suggestion is reasonable that we should specify a range that does *not* throw for all built-in calendars. If we do this, then I'd suggest that a reasonable range would be -10_000 to 10_000 which I expect would cover all calendars' recorded history.

2. Use ground truth data for near-modern dates and switch to Gregorian years and eras for far-away dates, similar to Japanese

I'm not a fan of this option for most calendars where there are only 1-2 eras, so adding another would be confusing for users. Doing this in Japanese doesn't seem as bad because Japanese users already know to expect a large number of eras, so adding another seems less disruptive.

@Manishearth
Copy link
Author

I think that's fine, though for a range that large some calendars will still have to fall back to arithmetical approximations, I think some of the islamic/chinese weirdnesses are not that far in the future.

@sffc
Copy link
Collaborator

sffc commented Sep 13, 2024

The oldest epoch of a CLDR calendar is 5492 BCE, Ethiopian Amete Alem's creation of the world. So that seems like a reasonable start point. For the end point, maybe something around 5000 CE.

This will still get into ranges where we have to implement some workarounds in Chinese and Islamic, though. The calendrical calculations fail as soon as a few hundred years into the future, although I have workarounds implemented in ICU4X that make them work a little bit longer than that.

@ptomato
Copy link
Collaborator

ptomato commented Sep 19, 2024

@Manishearth We discussed this in today's Temporal meeting. Everyone is happy with your proposal on the table in #2869 (comment). Those of us who advocated throwing on extreme dates have been convinced otherwise, because it'd cause unexpected (data-driven) exceptions in user applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants