Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Kinesis checkpoints in Elasticsearch #22

Open
BenFradet opened this issue Jul 3, 2017 · 1 comment
Open

Store Kinesis checkpoints in Elasticsearch #22

BenFradet opened this issue Jul 3, 2017 · 1 comment

Comments

@BenFradet
Copy link
Contributor

from snowplow/snowplow#2456:

The idea here is:
When writing data to ES, we also store the Kinesis shard checkpoints alongside the data
These checkpoints will be backed up alongside the event data each night
In the case we need to do a restore, we will copy the checkpoints from ES back to DynamoDB before restarting the ES SInk
Doing this should mean we can recover our ES and restart drip feeding without data loss/duplication.
Open questions: how transactional is the ES backup - is there a risk of drift between data loaded and checkpoints stored during the S3 backup?

Note: this idea is borrowed from the Kafka guys, who suggest co-locating checkpoints alongside data in a storage target

@alexanderdean
Copy link
Member

Even better would be if we could move the master copies of the checkpoints to Elasticsearch, but this would be more difficult for our internal monitoring and not supported by the KCL...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants