Please refer to LICENSE.txt and DATA_DISCLAIMER.txt. This software is supplied as-is, use at your own risk and information from using this software does NOT constitute financial advice.
Library
<dependency>
<groupId>com.limemojito.oss.trading.trading-data-stream</groupId>
<artifactId>model</artifactId>
<version>3.0.0</version>
</dependency>Check out the source to see a working example in example-cli (Spring Boot command line).
There is an example spring configuration in TradingDataStreamConfiguration suitable for @Import. We suggest the following dependencies for spring boot
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-json</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
<!-- For CSV usage -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
</dependency>
<!-- for S3 caching -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
</dependency>This example is using the standalone configuration suitable for testing. For Spring container usage, please refer to TradingDataStreamConfiguration. Note the standalone setup should not be called in a spring container.
TradingSearch search=TradingDataStreamConfiguration.standaloneSetup();
try(TradingInputStream<Tick> ticks = search("EURUSD","2020-01-02T00:00:00Z","2020-01-02T00:59:59Z")){
ticks.stream()
.foreach(t -> log.info("{} {} bid: {}}, t.getMillisecondsUtc(), t.getSymbol(), t.getBid());
}Further examples at https://limemojito.com/reading-dukascopy-bi5-tick-history-with-the-tradingdata-stream-library-for-java/
mvn clean installthis produces the model jar, example-cli and a cache primer application.
note that files are cached locally in ~/.dukascopy-cache. See LocalDukascopyCache.java for details.
java -jar example-cli/target/example-cli-3.0.0.jar --symbol=NZDUSD --period=M5 \
--start=2018-01-02T00:00:00Z --end=2018-01-02T00:59:59Z --output=test-nz.csv note this application cache chain is local <- s3 <- direct - ie the S3 cache is only used if it is not cached locally (~/.dukascopy-cache). See S3DukascopyCache.java and the chain configuration in DataStreamCli.java for details.
aws s3 mb s3://test-tick-bucket
java -jar example-cli/target/example-cli-3.0.0.jar --spring.profiles.active=s3 \
--bucket-name=test-tick-bucket --symbol=AUDUSD --period=M5 --start=2018-01-02T00:00:00Z \
--end=2018-01-02T00:59:59Z --output=test-au.csv java -jar cache-primer/target/cache-primer-3.0.0.jar --symbol=AUDUSD --symbol EURUSD \
--start=2018-01-01T00:00:00Z --end=2018-03-01T00:59:59Z note this application cache chain is s3 <- local <- direct. See S3DukascopyCache.java and the chain configuration in CachePrimer.java for details.
aws s3 mb s3://test-tick-bucket
java -jar cache-primer/target/cache-primer-3.0.0.jar --spring.profiles.active=s3 \
--bucket-name=test-tick-bucket --symbol=AUDUSD --symbol EURUSD \
--start=2018-01-01T00:00:00Z --end=2018-03-01T00:59:59Z Times are supplied in UTC as this matches the Dukascopy epoch data.
Dukascopy file format was reverse engineered using internet forums, the wayback machine, and bit twiddling.
Model classes have annotations to support @Value configuration. See DataStreamCLi for an example of configuring with a SpringBoot application.
Our models derive from "StreamData" which is an abstraction for having physical streams of these data records that may exist together on a common transport. You may not need this approach for what you are building.
Price represent Bid tone.
See BarTickStreamAggregator.java for details.
Run on an M1 Max with 100MB internet retrieving 559 M10 bars. Your performance may vary.
559 bars to CSV shows a 240X improvement on repeated bar aggregations versus 3.6% worst case performance increase. Worst case is dependent on how many days of H1 tick files are required to answer query.
java -jar example-cli/target/example-cli-3.0.0.jar --symbol=EURUSD --period=M10 \
--start="2019-05-07T00:00:00Z" --end="2019-05-11T00:00:00Z" --output=./test.csv| Version | Empty Cache Query Aggregation Time | Repeat Query Aggregation Time |
|---|---|---|
| 1.x | 55s | 12s |
| 2.0 | 57s | 0.05s |
Inspiration from C++ library:
https://github.com/ninety47/dukascopy
Note that dukascopy is a UTC+0 offset so no time adjustment is necessary. Dukascopy historical data.
The files I downloaded are named something like '00h_ticks.bi5'. These 'bi5' files are LZMA compressed binary data files. The binary data file are formatted into 20-byte rows.
32-bit integer: milliseconds since epoch
32-bit float: Ask price
32-bit float: Bid price
32-bit float: Ask volume
32-bit float: Bid volume
The ask and bid prices need to be multiplied by the point value for the symbol/currency pair. The epoch is extracted from the URL (and the folder structure I've used to store the files on disk). It represents the point in time that the file starts from e.g. 2013/01/14/00h_ticks.bi5 has the epoch of midnight on 14 January 2013. Example using C++ to work file format, including format and computation of “epoch time”:
LZ compression/decompression can be done with apache commons compress:
https://commons.apache.org/proper/commons-compress/
This format is “valid” after experimentation.
[ TIME ] [ ASKP ] [ BIDP ] [ ASKV ] [ BIDV ]
[0000 0800] [0002 2f51] [0002 2f47] [4096 6666] [4013 3333]
- TIME is a 32-bit big-endian integer representing the number of milliseconds that have passed since the beginning of this hour.
- ASKP is a 32-bit big-endian integer representing the asking price of the pair, multiplied by 100,000.
- BIDP is a 32-bit big-endian integer representing the bidding price of the pair, multiplied by 100,000.
- ASKV is a 32-bit big-endian floating point number representing the asking volume, divided by 1,000,000.
- BIDV is a 32-bit big-endian floating point number representing the bidding volume, divided by 1,000,000.
Note that epoch milliseconds is relative to UTC timezone. source is live | historical
{
"epochMilliseconds": 94875945798,
"symbol": "EURUSD",
"bid": 134567,
"ask": 134520,
"source": "live",
"streamId": "00000000-0000-0000-0000-000000000000"
}- Java 17 as a minimum requirement. Spring support library upgrades.
- TradingInputJsonStreams now supports a visitor when loading a stream from json data.
- Added an example spring configuration.
- Library updates.
- Moved generic stream classes into trading.model.streams package.
- There are some minor API changes such as adding IOException to close methods where missed.
- Reworked stream model and produced a set of generic input stream utilities in trading.model.stream.
- Added a stream Collection method to TradingJsonStreams, prefer the InputStream version for efficiency.
- trading.model.stream.TradingInputStreamMapper produces generic streams and transforms for any MODEL.
- trading.model.stream.TradingInputStreamMapper has overloads for onClose Runnable for cleanup operations.
- Spotted fix for hasNext in TradingInputJsonStreams. Fix ported from closed source.
- Correct aggregation bug in count before and count after bar searches. A duplicate bar may have been included in streams due to some end searching being inclusive.
- Correct write bug in JSON stream output that fails for large files with early close due to Jackson auto close.
- Support for streaming JSON file formats (as arrays) using TradingInputJsonStreams. Relies on Jackson.
- Added locking around local and s3 cache usage for multithreaded scenarios.
- Added bar caching that alters the cache storage and cache classes. Storage is backwards compatible however the construction of the caches has altered. Bar Caching produced repeated aggregations at 240X faster in our example in technical notes.
- Due to bar caching tick visitors on aggregation queries are no longer supported.
- Tick search functions are still supported for tick visitors.
- Added a read only json property to models with the model version (1.0).
- Split DukascopySearch implementation into Tick and Bar searches (backward compatible).
- Local cache can have the cache directory set via an alternate constructor.
- trading-data-stream development no longer deletes the default cache location.
- Improved CSV model by introducing a base class that can be extended for other projects.
- Stop searches at the Beginning of Time - arbitrarily defaulted to be 2010-01-01T00:00:00Z.
- Allow configuration of the Beginning of Time.
- Aggregate for a number of bars before a given instant.
- Aggregate for a number of bars after a given instant.
- Added delayed n-retry with exponential backoff on Dukascopy IO failure. Defaults to 3X and 2 second.
- Upped the rate limit after discovering 500s are occasional errors on empty paths.
- Added a cache primer application to test load limits (concurrent requests).
- Initial release to open source.
- Search ticks by symbol, time range,
- Aggregate bars by time range.
- Convert models (Tick, Bar) to csv.
- Cache dukascopy files Direct No Cache, Local Cache and S3 Cache.
- Combine caches into cache chains such as Local -> S3 -> Direct.
- Example CLI