do not wait timeout for no changed in record_version for expiry, trust the ttl_attribute #366

shlomimatichin · 2019-05-12T13:05:43Z

hello,

first of all, thank you for maintaining this library, it great and fits our requirements neatly.

we found a use case which didn't work for us, and we are sharing our patch that resolved it. we are not sure if it's merge worthy, or something you maybe want to consider as an opt-in feature... or maybe we just didn't follow the library's logic? anyways, this is what didn't work for us:

we have a periodic aws lambda function that runs every minute, called health check (pretty simple to guess what it does ;-). health checks can not run in parallel, so we need to lock against multiple invocation (which can happen spontaneously when using lambda).
the health check can timeout. usually this means a bug. a lambda timeout is equivalent to kill -9, as in the library did not have the chance to release the lock.
so our wrapper of the library, configuration attached here below, never recovers. it keeps throwing ACQUIRE_TIMEOUT. we would like that if this case happened, the health check lambda would be able to resume working with 2-3-4 minutes.

we found that this happens because the current implementation waits for timeout_period for no changes in the record_version_number, so we can't have quick allocations... we changed the logic to rely on the ttl_attribute (which means the record_version doesn't have a real purpose anymore?)

our wrapper:

import datetime
import os
import logging
import contextlib
import boto3
from python_dynamodb_lock import python_dynamodb_lock


@contextlib.contextmanager
def immediate_lock(name):
    lock = _lock_client().acquire_lock(
        name,
        retry_period=datetime.timedelta(seconds=1),
        retry_timeout=datetime.timedelta(seconds=10))
    logging.info("Acquired lock %s", name)
    try:
        yield lock
    finally:
        lock.release()
        logging.info("Released lock %s", name)


_CACHE = {}


def _dynamodb():
    if 'dynamodb_client' not in _CACHE:
        _CACHE['dynamodb_client'] = boto3.resource('dynamodb')
    return _CACHE['dynamodb_client']


def _lock_client():
    if 'lock_client' not in _CACHE:
        _CACHE['lock_client'] = python_dynamodb_lock.DynamoDBLockClient(
            _dynamodb(),
            table_name=os.environ['DYNAMODB_LOCK_TABLE'],
            partition_key_name='Id',
            sort_key_name='SortKey',
            ttl_attribute_name='expires',
            heartbeat_period=datetime.timedelta(seconds=20),
            expiry_period=datetime.timedelta(seconds=30))
    return _CACHE['lock_client']

so what would you suggest we do? is this merge material, or is there a way to configure the library to work for this case?

thank you,
Team sensibo

Bugfix: do not wait 30 seconds for expiry, trust the ttl

78c05cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do not wait timeout for no changed in record_version for expiry, trust the ttl_attribute #366

do not wait timeout for no changed in record_version for expiry, trust the ttl_attribute #366

Uh oh!

shlomimatichin commented May 12, 2019

Uh oh!

Uh oh!

do not wait timeout for no changed in record_version for expiry, trust the ttl_attribute #366

Are you sure you want to change the base?

do not wait timeout for no changed in record_version for expiry, trust the ttl_attribute #366

Uh oh!

Conversation

shlomimatichin commented May 12, 2019

Uh oh!

Uh oh!