Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Change value type backing to 128-bit integers #2084

Open
cjdsellers opened this issue Nov 30, 2024 · 6 comments
Open

RFC: Change value type backing to 128-bit integers #2084

cjdsellers opened this issue Nov 30, 2024 · 6 comments
Labels
enhancement New feature or request RFC A request for comment

Comments

@cjdsellers
Copy link
Member

cjdsellers commented Nov 30, 2024

The Nautilus value types Price, Money, and Quantity are currently backed by 64-bit integer raw values, with a maximum precision of 9 decimals enforced through validation (see table below). This design provides memory efficiency and high performance, using simple fixed-point arithmetic to represent actual values. (Integers are more CPU-native than string or decimal types, and are highly efficient for comparisons.)

Limitations of the current 64-bit specification:

  • 9 decimals of precision cannot fully represent all fractional units for certain digital assets (cryptocurrencies).
  • The allowable minimum and maximum value range is limiting for some use cases.
  • For Quantity, larger time frame bar volumes cannot be adequately represented.

To address these issues, increasing the raw integer width from 64 bits to 128 bits will significantly expand the precision and allowable range.

Current specification (64-bit integer backing):

Type Raw backing Max precision Min value Max value
Price i64 9 -9,223,372,036 9,223,372,036
Money i64 9 -9,223,372,036 9,223,372,036
Quantity u64 9 0 18,446,744,073

Proposed specification (128-bit integer backing):

Type Raw backing Max precision Min value Max value
Price i128 18 -170,141,183,460 170,141,183,460
Money i128 18 -170,141,183,460 170,141,183,460
Quantity u128 18 0 340,282,366,920

Pros of 128-bit backing:

  • 18 decimals of precision can represent practically all fractional units, making it suitable for most cryptocurrencies.
  • Increased headroom resolves limitations in representing large bar volumes and extreme price values.

Cons of 128-bit backing:

  • Increased in-memory footprint (nearly doubling for some types such as QuoteTick).
  • While Parquet efficiently compresses data, there will still be a modest increase in storage size.
  • Breaking change to specification and API requiring catalogs to be re-written to upgrade to the latest version.

Other solutions, such as introducing an additional scaling factor field and using arithmetic for more flexible precisions, were considered. However, these approaches significantly increased complexity and the likelihood of bugs, as raw values are directly accessed in many parts of the codebase.

Mitigations

For the in-memory cache, the number of 128-bit fields across all objects is relatively small, and for data is further constrained by deques with a defined maxlen.

For the in-memory footprint of data during large backtests, this is mostly alleviated by streaming from the Parquet data catalog where only a limited amount of data is held in memory at a time. Users who are using the BacktestEngine directly will be most affected.

The breaking change to catalogs is a temporary inconvenience. Users can decide whether to upgrade to nautilus_trader versions with 128-bit backing or choose the timing of their upgrade to align with their operational needs. Alternatively, they can compile from source with the existing 64-bit backing.

Implementation

To provide flexibility, @twitu is implementing a solution where a high-precision feature flag will control whether these values are backed by 64-bit or 128-bit integers. This allows users to:

  • Retain the current 64-bit specification for reduced memory usage.
  • Opt-in to 128-bit backing to overcome precision and range limitations.

#2072

Final comments

While the increased precision and range may be unnecessary or even excessive for traditional financial assets, the limitations are far more severe for crypto users. Traditional users may appreciate the lower memory footprint as a "nice to have," but crypto users are constrained by the 9-decimal precision cap and a more restricted range.

It's expected that high-precision will quickly become the default, as the advantages outweigh the trade-offs for most users.

We're open to feedback and suggestions on the above.

@cjdsellers cjdsellers added enhancement New feature or request RFC A request for comment labels Nov 30, 2024
@davidsblom
Copy link
Collaborator

Quick question on the fact that existing catalogs need to be upgraded. Will there be a convenience tool to convert an existing catalog from 64-bit to 128-bit?

@twitu
Copy link
Collaborator

twitu commented Dec 8, 2024

#2072 feature flags the type of the implementation backing the struct. It is also refactoring necessary functions to make them compatible with both i128 and i64. It will allow switching between the implementations by switching on/off the feature flag and recompiling.

Quick question on the fact that existing catalogs need to be upgraded. Will there be a convenience tool to convert an existing catalog from 64-bit to 128-bit?

However, any nautilus data models persisted to the catalog or some file with one precision mode will not work with another precision mode (because the backing types will be different, de-serialization will fail).

To make it easy to switch the catalog from one implementation to the other we will use a json/csv file to store the data temporarily. So the steps will become

  • Compile with high-precision turned off to use i64
  • Write catalog to csv/json files
  • Compile with high-precision turned on to use i128
  • Read catalog from csv/json files

On the other hand, this can be avoided by loading the catalog from the original source after turning high-precision on.

@twitu
Copy link
Collaborator

twitu commented Dec 21, 2024

I have some numbers from comparison tests on i128 and i64 backed values

Test i64 i128
Test data storage (9600 quote ticks, 128 unique prices) 132 KB 132 KB
Perf data storage (10M quote ticks, 49 unique prices) 140 MB 172 MB (22% bigger)
Perf data benchmark (10M quote ticks, 49 unique prices) 1.38 secs 1.85 secs (34% slower)

Higher precision values need more storage and are slightly slower on backtesting. However, storage difference is relatively small and at 5M ticks per second it's likely that CPU processing will be the bottleneck for backtesting. Using higher precision values can be made the default.

@davidsblom
Copy link
Collaborator

Thanks for the extensive benchmarks!

Another question is which changes would be needed on the adapter side. The instrument provider needs to be aware of the precision I suppose. Do we also need to use pyo3 objects in the adapter? Or will this change be merged only when the rust core is "feature complete"?

@cjdsellers
Copy link
Member Author

Thanks for the extensive benchmarks!

Another question is which changes would be needed on the adapter side. The instrument provider needs to be aware of the precision I suppose. Do we also need to use pyo3 objects in the adapter? Or will this change be merged only when the rust core is "feature complete"?

Hey @davidsblom

This change should actually be fairly transparent. Up in the Python layer, the only changes are maximum precision increases to 18, and there is a greater value range available.

@davidsblom
Copy link
Collaborator

davidsblom commented Dec 28, 2024

Yeah agreed. how I read it, is that both low precision and high precision are supported. Was thinking how an InstrumentProvider knows that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RFC A request for comment
Projects
None yet
Development

No branches or pull requests

3 participants