-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproduce worker termination race condition #73
base: master
Are you sure you want to change the base?
Conversation
@e-shubin Can you explain what you expect the behavior to be? |
@devinus I would like poolboy not to checkout a "dead" worker. In first assert I checkout a woker and execute an operation, which makes it to terminate. Then I try to do another operation, but terminated process is checked out. For me it seems to be a race condition. |
It is impossible to guarantee a "dead" worker is not checked out because a worker could die after being checked out but before the caller has handled the reply with the worker from poolboy. If you want to try to ensure a "dead" worker is not checked out the worker must exit and not be checked in (or checked in after it exits in the case of a transaction). Therefore the race condition is in the test, and not in poolboy. A cast is sent to the worker telling it to die but the client checks in in the worker without waiting for the worker to exit. This means the next client might get a worker that is about to die. |
I've been thinking about this issue for the past few days and also agree with @fishcakez. You're telling the worker to die, and the pool as no idea the worker is dead yet because the cleanup facilities haven't been triggered in the pool yet. It's not until after the worker is checked out again that the worker is able to shutdown properly. @fishcakez Do you know of any reliable way to test that a worker is not being |
@devinus do you mean something like |
@fishcakez No I was thinking of something like checking the processes state or something weird like that. |
That does check its |
There is a race condition between checking out of a worker from the pool and worker process's termination. In test case provided worker is terminated via gen_server:cast for more reliable reproducing, but it also occurs on gen_server:call.
Is it a know issue? Is there any recommendation of how to avoid it?