Skip to content

Releases: thanos-io/thanos

v0.8.0

10 Oct 20:37
7b60b21
Compare
Choose a tag to compare

Lots of improvements this release! Outstanding items:

  • First Katacoda tutorial! 🐱
  • Fixed Deletion order causing Compactor to produce not needed 👻 blocks with missing random files.
  • Store GW memory improvements (more to come!).
  • Querier allows multiple deduplication labels.
  • Both Compactor and Store Gateway can be sharded within the same bucket using relabelling!
  • Sidecar exposed data from Prometheus can be now limited to given min-time (e.g 3h only).
  • Numerous Thanos Receive improvements.

Make sure you check out Prometheus 2.13.0 as well. New release drastically improves usage and resource consumption of both Prometheus and sidecar with Thanos: https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/

Added

  • #1619 Thanos sidecar allows to limit min time range for data it exposes from Prometheus.
  • #1583 Thanos sharding:
    • Add relabel config (--selector.relabel-config-file and selector.relabel-config) into Thanos Store and Compact components.
      Selecting blocks to serve depends on the result of block labels relabeling.
    • For store gateway, advertise labels from "approved" blocks.
  • #1540 Thanos Downsample added /-/ready and /-/healthy endpoints.
  • #1538 Thanos Rule added /-/ready and /-/healthy endpoints.
  • #1537 Thanos Receive added /-/ready and /-/healthy endpoints.
  • #1460 Thanos Store Added /-/ready and /-/healthy endpoints.
  • #1534 Thanos Query Added /-/ready and /-/healthy endpoints.
  • #1533 Thanos inspect now supports the timeout flag.
  • #1496 Thanos Receive now supports setting block duration.
  • #1362 Optional replicaLabels param for /query and
    /query_range querier endpoints. When provided overwrite the query.replica-label cli flags.
  • #1482 Thanos now supports Elastic APM as tracing provider.
  • #1612 Thanos Rule added resendDelay flag.
  • #1480 Thanos Receive flushes storage on hashring change.
  • #1613 Thanos Receive now traces forwarded requests.

Changed

  • #1362 query.replica-label configuration can be provided more than
    once for multiple deduplication labels like: --query.replica-label=prometheus_replica --query.replica-label=service.
  • #1581 Thanos Store now can use smaller buffer sizes for Bytes pool; reducing memory for some requests.
  • #1622 & #1590 Updated to Go 1.13.1
  • #1498 Thanos Receive change flag labels to label to be consistent with other commands.

Fixed

  • #1525 Thanos now deletes block's file in correct order allowing to detect partial blocks without problems.
  • #1505 Thanos Store now removes invalid local cache blocks.
  • #1587 Thanos Sidecar cleanups all cache dirs after each compaction run.
  • #1582 Thanos Rule correctly parses Alertmanager URL if there is more + in it.
  • #1544 Iterating over object store is resilient to the edge case for some providers.
  • #1469 Fixed Azure potential failures (EOF) when requesting more data then blob has.
  • #1512 Thanos Store fixed memory leak for chunk pool.
  • #1488 Thanos Rule now now correctly links to query URL from rules and alerts.

See full CHANGELOG here

v0.7.0

02 Sep 15:47
c6eaf68
Compare
Choose a tag to compare

v0.7.0

Accepted into CNCF:

Added

  • #1378 Thanos Receive now exposes thanos_receive_config_hash, thanos_receive_config_last_reload_successful and thanos_receive_config_last_reload_success_timestamp_seconds metrics to track latest configuration change
  • #1268 Thanos Sidecar added support for newest Prometheus streaming remote read added here. This massively improves memory required by single
    request for both Prometheus and sidecar. Single requests now should take constant amount of memory on sidecar, so resource consumption prediction is now straightforward. This will be used if you have Prometheus 2.13 or 2.12-master.
  • #1358 Added part_size configuration option for HTTP multipart requests minimum part size for S3 storage type
  • #1363 Thanos Receive now exposes thanos_receive_hashring_nodes and thanos_receive_hashring_tenants metrics to monitor status of hash-rings
  • #1395 Thanos Sidecar added /-/ready and /-/healthy endpoints to Thanos sidecar.
  • #1297 Thanos Compact added /-/ready and /-/healthy endpoints to Thanos compact.
  • #1431 Thanos Query added hidden flag to allow the use of downsampled resolution data for instant queries.
  • #1408 Thanos Store Gateway can now allow the specifying of supported time ranges it will serve (time sharding). Flags: min-time & max-time

Changed

  • #1414 Upgraded important dependencies: Prometheus to 2.12-rc.0. TSDB is now part of Prometheus.
  • #1380 Upgraded important dependencies: Prometheus to 2.11.1 and TSDB to 0.9.1. Some changes affecting Querier:
    • [ENHANCEMENT] Query performance improvement: Efficient iteration and search in HashForLabels and HashWithoutLabels. #5707
    • [ENHANCEMENT] Optimize queries using regexp for set lookups. tsdb#602
    • [BUGFIX] prometheus_tsdb_compactions_failed_total is now incremented on any compaction failure. tsdb#613
    • [BUGFIX] PromQL: Correctly display {name="a"}.
  • #1338 Thanos Query still warns on store API duplicate, but allows a single one from duplicated set. This is gracefully warn about the problematic logic and not disrupt immediately.
  • #1385 Thanos Compact exposes flag to disable downsampling downsampling.disable.

Fixed

  • #1327 Thanos Query /series API end-point now properly returns an empty array just like Prometheus if there are no results
  • #1302 Thanos now efficiently reuses HTTP keep-alive connections
  • #1371 Thanos Receive fixed race condition in hashring
  • #1430 Thanos fixed value of GOMAXPROCS inside container.

Deprecated

  • #1458 Thanos Query and Receive now use common instrumentation middleware. As as result, for sake of http_requests_total and http_request_duration_seconds_bucket; Thanos Query no longer exposes thanos_query_api_instant_query_duration_seconds, thanos_query_api_range_query_duration_second metrics and Thanos Receive no longer exposes thanos_http_request_duration_seconds, thanos_http_requests_total, thanos_http_response_size_bytes.
  • #1423 Thanos Bench deprecated.

v0.7.0-rc.0

28 Aug 12:27
Compare
Choose a tag to compare
v0.7.0-rc.0 Pre-release
Pre-release

TLDR; Move to CNCF, Added steaming between Prometheus and Sidecar, allow time sharding on Store Gateway and many bug fixes.

More detailed information on the release can be found here https://github.com/thanos-io/thanos/blob/master/CHANGELOG.md

v0.6.1

14 Aug 10:33
acb1cb0
Compare
Choose a tag to compare
v0.6.1

v0.6.0

18 Jul 11:22
c70b80e
Compare
Choose a tag to compare

Added

TL;DR: Jaeger tracing support (tracing flag changed), various observability improvements, Thanos receiver improvements, improvement external label propagation, including federated Queriers (!) and other fixes.

NOTE: Thanks to improved external labels propagation, if you run have duplicate queries in your Querier configuration with hierarchical federation of multiple Queries, Thanos now will detect this case and block all duplicates. New releases (potentially in v0.6.1) will just warn and block all but one.

  • #1097 Added thanos check rules linter for Thanos rule rules files.

  • #1253 Add support for specifying a maximum amount of retries when using Azure Blob storage (default: no retries).

  • #1244 Thanos Compact now exposes new metrics thanos_compact_downsample_total and thanos_compact_downsample_failures_total which are useful to catch when errors happen

  • #1260 Thanos Query/Rule now exposes metrics thanos_querier_store_apis_dns_provider_results and thanos_ruler_query_apis_dns_provider_results which tell how many addresses were configured and how many were actually discovered respectively

  • #1248 Add a web UI to show the state of remote storage.

  • #1217 Thanos Receive gained basic hashring support

  • #1262 Thanos Receive got a new metric thanos_http_requests_total which shows how many requests were handled by it

  • #1243 Thanos Receive got an ability to forward time series data between nodes. Now you can pass the hashring configuration via --receive.hashrings-file; the refresh interval --receive.hashrings-file-refresh-interval; the name of the local node's name --receive.local-endpoint; and finally the header's name which is used to determine the tenant --receive.tenant-header.

  • #1147 Support for the Jaeger tracer has been added!

breaking New common flags were added for configuring tracing: --tracing.config-file and --tracing.config. You can either pass a file to Thanos with the tracing configuration or pass it in the command line itself. Old --gcloudtrace.* flags were removed ⚠️

To migrate over the old --gcloudtrace.* configuration, your tracing configuration should look like this:

---
type: STACKDRIVER
config:
- service_name: 'foo'
  project_id: '123'
  sample_factor: 123

The other type you can use is JAEGER now. The config keys and values are Jaeger specific and you can find all of the information here.

Changed

  • #1284 Add support for multiple label-sets in Info gRPC service. This deprecates the single Labels slice of the InfoResponse, in a future release backward compatible handling for the single set of Labels will be removed. Upgrading to v0.6.0 or higher is advised.

  • #1314 Removes http_request_duration_microseconds (Summary) and adds http_request_duration_seconds (Histogram) from http server instrumentation used in Thanos APIs and UIs.

  • #1287 Sidecar now waits on Prometheus' external labels before starting the uploading process

  • #1261 Thanos Receive now exposes metrics thanos_http_request_duration_seconds and thanos_http_response_size_bytes properly of each handler

  • #1274 Iteration limit has been lifted from the LRU cache so there should be no more spam of error messages as they were harmless

  • #1321 Thanos Query now fails early on a query which only uses external labels - this improves clarity in certain situations

Fixed

  • #1227 Some context handling issues were fixed in Thanos Compact; some unnecessary memory allocations were removed in the hot path of Thanos Store.

  • #1183 Compactor now correctly propogates retriable/haltable errors which means that it will not unnecessarily restart if such an error occurs

  • #1231 Receive now correctly handles SIGINT and closes without deadlocking

  • #1278 Fixed inflated values problem with sum() on Thanos Query

  • #1280 Fixed a problem with concurrent writes to a map in Thanos Query while rendering the UI

  • #1311 Fixed occasional panics in Compact and Store when using Azure Blob cloud storage caused by lack of error checking in client library.

  • #1322 Removed duplicated closing of the gRPC listener - this gets rid of harmless messages like store gRPC listener: close tcp 0.0.0.0:10901: use of closed network connection when those programs are being closed

Deprecated

  • #1216 the old "Command-line flags" has been removed from Thanos Query UI since it was not populated and because we are striving for consistency

v0.6.0-rc.0

12 Jul 11:31
7f22009
Compare
Choose a tag to compare
v0.6.0-rc.0

v0.5.0

06 Jun 11:16
72820b3
Compare
Choose a tag to compare

TL;DR: Store LRU cache is no longer leaking, Upgraded Thanos UI to Prometheus 2.9, Fixed auto-downsampling, Moved to Go 1.12.5 and more.

This version moved tarballs to Golang 1.12.5 from 1.11 as well, so same warning applies if you use container_memory_usage_bytes from cadvisor. Use container_memory_working_set_bytes instead.

breaking As announced couple of times this release also removes gossip with all configuration flags (--cluster.*).

Fixed

  • #1142 fixed major leak on store LRU cache for index items (postings and series).
  • #1163 sidecar is no longer blocking for custom Prometheus versions/builds. It only checks if flags return non 404, then it performs optional checks.
  • #1146 store/bucket: make getFor() work with interleaved resolutions.
  • #1157 querier correctly handles duplicated stores when some store changes external labels in place.

Added

  • #1094 Allow configuring the response header timeout for the S3 client.

Changed

  • #1118 breaking swift: Added support for cross-domain authentication by introducing userDomainID, userDomainName, projectDomainID, projectDomainName.
    The outdated terms tenantID, tenantName are deprecated and have been replaced by projectID, projectName.

  • #1066 Upgrade Thanos ui to Prometheus v2.9.1.

    Changes from the upstream:

    • query:
      • [ENHANCEMENT] Update moment.js and moment-timezone.js PR #4679
      • [ENHANCEMENT] Support to query elements by a specific time PR #4764
      • [ENHANCEMENT] Update to Bootstrap 4.1.3 PR #5192
      • [BUGFIX] Limit number of merics in prometheus UI PR #5139
      • [BUGFIX] Web interface Quality of Life improvements PR #5201
    • rule:
      • [ENHANCEMENT] Improve rule views by wrapping lines PR #4702
      • [ENHANCEMENT] Show rule evaluation errors on rules page PR #4457
  • #1156 Moved CI and docker multistage to Golang 1.12.5 for latest mem alloc improvements.

  • #1103 Updated go-cos deps. (COS bucket client).

  • #1149 Updated google Golang API deps (GCS bucket client).

  • #1190 Updated minio deps (S3 bucket client). This fixes minio retries.

  • #1133 Use prometheus v2.9.2, common v0.4.0 & tsdb v0.8.0.

    Changes from the upstreams:

    • store gateway:
      • [ENHANCEMENT] Fast path for EmptyPostings cases in Merge, Intersect and Without.
    • store gateway & compactor:
      • [BUGFIX] Fix fd and vm_area leak on error path in chunks.NewDirReader.
      • [BUGFIX] Fix fd and vm_area leak on error path in index.NewFileReader.
    • query:
      • [BUGFIX] Make sure subquery range is taken into account for selection #5467
      • [ENHANCEMENT] Check for cancellation on every step of a range evaluation. #5131
      • [BUGFIX] Exponentation operator to drop metric name in result of operation. #5329
      • [BUGFIX] Fix output sample values for scalar-to-vector comparison operations. #5454
    • rule:
      • [BUGFIX] Reload rules: copy state on both name and labels. #5368

Deprecated

  • #1008 breaking Removed Gossip implementation. All --cluster.* flags removed and Thanos will error out if any is provided.

See full CHANGELOG here

v0.5.0-rc.0

31 May 17:32
Compare
Choose a tag to compare
v0.5.0-rc.0

v0.4.0

04 May 09:36
a676095
Compare
Choose a tag to compare

⚠️ IMPORTANT ⚠️ This is the last release that supports gossip. From Thanos v0.5.0, gossip will be completely removed.

Major improvements:

  • This release also disables gossip mode by default for all components.
    See this for more details.
  • Store Gateway startup process is massively improved in both efficiency and memory consumption
  • Remote receiver component was added.
  • StoreUI works now beautifully 🌷
  • Timeout improvements for Querier
  • Control of concurrency and sample limits on Store Gateway gRPC API
  • Graceful handling and deletion of partial uploads made by Compactor.

Added

  • thanos.io website & automation 🎉
  • #1053 compactor: Compactor & store gateway now handles incomplete uploads gracefully. Added hard limit on how long block upload can take (30m).
  • #811 Remote write receiver component ❤️ ❤️ thanks to RedHat (@brancz) contribution.
  • #910 Query's stores UI page is now sorted by type and old DNS or File SD stores are removed after 5 minutes (configurable via the new --store.unhealthy-timeout=5m flag).
  • #905 Thanos support for Query API: /api/v1/labels. Notice that the API was added in Prometheus v2.6.
  • #798 Ability to limit the maximum number of concurrent request to Series() calls in Thanos Store and the maximum amount of samples we handle.
  • #1060 Allow specifying region attribute in S3 storage configuration

⚠️ WARNING ⚠️ #798 adds a new default limit to Thanos Store: --store.grpc.series-max-concurrency. Most likely you will want to make it the same as --query.max-concurrent on Thanos Query.

New options:

New Store flags:

* `--store.grpc.series-sample-limit` limits the amount of samples that might be retrieved on a single Series() call. By default it is 0. Consider enabling it by setting it to more than 0 if you are running on limited resources.
* `--store.grpc.series-max-concurrency` limits the number of concurrent Series() calls in Thanos Store. By default it is 20. Considering making it lower or bigger depending on the scale of your deployment.

New Store metrics:

* `thanos_bucket_store_queries_dropped_total` shows how many queries were dropped due to the samples limit;
* `thanos_bucket_store_queries_concurrent_max` is a constant metric which shows how many Series() calls can concurrently be executed by Thanos Store;
* `thanos_bucket_store_queries_in_flight` shows how many queries are currently "in flight" i.e. they are being executed;
* `thanos_bucket_store_gate_duration_seconds` shows how many seconds it took for queries to pass through the gate in both cases - when that fails and when it does not.

New Store tracing span:
* store_query_gate_ismyturn shows how long it took for a query to pass (or not) through the gate.

  • #1016 Added option for another DNS resolver (miekg/dns client).
    Note that this is required to have SRV resolution working on Golang 1.11+ with KubeDNS below v1.14

    New Querier and Ruler flag: -- store.sd-dns-resolver which allows to specify resolver to use. Either golang or miekgdns

  • #986 Allow to save some startup & sync time in store gateway as it is no longer needed to compute index-cache from block index on its own for larger blocks.
    The store Gateway still can do it, but it first checks bucket if there is index-cached uploaded already.
    In the same time, compactor precomputes the index cache file on every compaction.

    New Compactor flag: --index.generate-missing-cache-file was added to allow quicker addition of index cache files. If enabled it precomputes missing files on compactor startup. Note that it will take time and it's only one-off step per bucket.

  • #887 Compact: Added new --block-sync-concurrency flag, which allows you to configure number of goroutines to use when syncing block metadata from object storage.

  • #928 Query: Added --store.response-timeout flag. If a Store doesn't send any data in this specified duration then a Store will be ignored and partial data will be returned if it's enabled. 0 disables timeout.

  • #893 S3 storage backend has graduated to stable maturity level.

  • #936 Azure storage backend has graduated to stable maturity level.

  • #937 S3: added trace functionality. You can add trace.enable: true to enable the minio client's verbose logging.

  • #953 Compact: now has a hidden flag --debug.accept-malformed-index. Compaction index verification will ignore out of order label names.

  • #963 GCS: added possibility to inline ServiceAccount into GCS config.

  • #1010 Compact: added new flag --compact.concurrency. Number of goroutines to use when compacting groups.

  • #1028 Query: added --query.default-evaluation-interval, which sets default evaluation interval for sub queries.

  • #980 Ability to override Azure storage endpoint for other regions (China)

  • #1021 Query API series now supports POST method.

  • #939 Query API query_range now supports POST method.

Changed

  • #970 Deprecated partial_response_disabled proto field. Added partial_response_strategy instead. Both in gRPC and Query API.
    No PartialResponseStrategy field for RuleGroups by default means abort strategy (old PartialResponse disabled) as this is recommended option for Rules and alerts.

    Metrics:

    • Added thanos_rule_evaluation_with_warnings_total to Ruler.
    • DNS thanos_ruler_query_apis* are now thanos_ruler_query_apis_* for consistency.
    • DNS thanos_querier_store_apis* are now thanos_querier_store_apis__* for consistency.
    • Query Gate thanos_bucket_store_series* are now thanos_bucket_store_series_* for consistency.
    • Most of thanos ruler metris related to rule manager has strategy label.

    Ruler tracing spans:

    • /rule_instant_query HTTP[client] is now /rule_instant_query_part_resp_abort HTTP[client]" if request is for abort strategy.
  • #1009: Upgraded Prometheus (~v2.7.0-rc.0 to v2.8.1) and TSDB (v0.4.0 to v0.6.1) deps.

    Changes that affects Thanos:

    • query:
      • [ENHANCEMENT] In histogram_quantile merge buckets with equivalent le values. #5158.
      • [ENHANCEMENT] Show list of offending labels in the error message in many-to-many scenarios. #5189
      • [BUGFIX] Fix panic when aggregator param is not a literal. #5290
    • ruler:
      • [ENHANCEMENT] Reduce time that Alertmanagers are in flux when reloaded. #5126
      • [BUGFIX] prometheus_rule_group_last_evaluation_timestamp_seconds is now a unix timestamp. #5186
      • [BUGFIX] prometheus_rule_group_last_duration_seconds now reports seconds instead of nanoseconds. Fixes our issue #1027
      • [BUGFIX] Fix sorting of rule groups. #5260
    • store: [ENHANCEMENT] Fast path for EmptyPostings cases in Merge, Intersect and Without.
    • tooling: [FEATURE] New dump command to tsdb tool to dump all samples.
    • compactor: [ENHANCEMENT] When closing the db any running compaction will be cancelled so it doesn't block.
    • [CHANGE] Renamed flag --sync-delay to --consistency-delay #1053

    For ruler essentially whole TSDB CHANGELOG applies beween v0.4.0-v0.6.1: https://github.com/prometheus/tsdb/blob/master/CHANGELOG.md

    Note that this was added on TSDB and Prometheus: [FEATURE] Time-ovelapping blocks are now allowed. #370
    Whoever due to nature of Thanos compaction (distributed systems), for safety reason this is disabled for Thanos compactor for now.

  • #868 Go has been updated to 1.12.

  • #1055 Gossip flags are now disabled by default and deprecated.

  • #964 repair: Repair process now sorts the series and labels within block.

  • #1073 Store: index cache for requests. It now calculates the size properly (includes slice header), has anti-deadlock safeguard and reports more metrics.

Fixed

  • #921 thanos_objstore_bucket_last_successful_upload_time now does not appear when no blocks have been uploaded so far.
  • #966 Bucket: verify no longer warns about overlapping blocks, that overlap 0s
  • #848 Compact: now correctly works with time series with duplicate labels.
  • [#894](https://github.com/improbable-eng/thanos...
Read more

v0.4.0-rc.1

26 Apr 17:51
e229b06
Compare
Choose a tag to compare
v0.4.0-rc.1