Skip to content

do not wait timeout for no changed in record_version for expiry, trust the ttl_attribute #366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shlomimatichin
Copy link

hello,

first of all, thank you for maintaining this library, it great and fits our requirements neatly.

we found a use case which didn't work for us, and we are sharing our patch that resolved it. we are not sure if it's merge worthy, or something you maybe want to consider as an opt-in feature... or maybe we just didn't follow the library's logic? anyways, this is what didn't work for us:

  1. we have a periodic aws lambda function that runs every minute, called health check (pretty simple to guess what it does ;-). health checks can not run in parallel, so we need to lock against multiple invocation (which can happen spontaneously when using lambda).
  2. the health check can timeout. usually this means a bug. a lambda timeout is equivalent to kill -9, as in the library did not have the chance to release the lock.
  3. so our wrapper of the library, configuration attached here below, never recovers. it keeps throwing ACQUIRE_TIMEOUT. we would like that if this case happened, the health check lambda would be able to resume working with 2-3-4 minutes.

we found that this happens because the current implementation waits for timeout_period for no changes in the record_version_number, so we can't have quick allocations... we changed the logic to rely on the ttl_attribute (which means the record_version doesn't have a real purpose anymore?)

our wrapper:

import datetime
import os
import logging
import contextlib
import boto3
from python_dynamodb_lock import python_dynamodb_lock


@contextlib.contextmanager
def immediate_lock(name):
    lock = _lock_client().acquire_lock(
        name,
        retry_period=datetime.timedelta(seconds=1),
        retry_timeout=datetime.timedelta(seconds=10))
    logging.info("Acquired lock %s", name)
    try:
        yield lock
    finally:
        lock.release()
        logging.info("Released lock %s", name)


_CACHE = {}


def _dynamodb():
    if 'dynamodb_client' not in _CACHE:
        _CACHE['dynamodb_client'] = boto3.resource('dynamodb')
    return _CACHE['dynamodb_client']


def _lock_client():
    if 'lock_client' not in _CACHE:
        _CACHE['lock_client'] = python_dynamodb_lock.DynamoDBLockClient(
            _dynamodb(),
            table_name=os.environ['DYNAMODB_LOCK_TABLE'],
            partition_key_name='Id',
            sort_key_name='SortKey',
            ttl_attribute_name='expires',
            heartbeat_period=datetime.timedelta(seconds=20),
            expiry_period=datetime.timedelta(seconds=30))
    return _CACHE['lock_client']

so what would you suggest we do? is this merge material, or is there a way to configure the library to work for this case?

thank you,
Team sensibo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant