Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of Threading #56

Open
shikharsngh opened this issue May 31, 2019 · 1 comment
Open

Use of Threading #56

shikharsngh opened this issue May 31, 2019 · 1 comment

Comments

@shikharsngh
Copy link

Dumping the key-value pair from dictionary to disk has been done by creating a new thread from main thread. However, since the process is running on a single core and only one thread can be run concurrently, it is as good as dumping it to disk from the main thread.

If however, making of a new thread is replaced by a new process using fork, this might be of help while writing to disk. This is because the new process(child) will have the task of dumping the key value to disk, meanwhile the parent process would continue to run alongside. This will ensure consistency as even if the parent process crashes due to some errors, the child process will ensure that the key-value pair is written to the disk and is always consistent.

This consistency issue is not taken care by multi-threading because if the process crashes due to some errors, all the threads of the process will be killed and the key-value may might not be completely written to disk, making the disk inconsistent.

@patx please let me know your thoughts.

@sammck
Copy link

sammck commented Jul 26, 2021

Just read this. It's not entirely accurate, though. While Python does have a global interpreter lock (GIL), which prevents Python code from executing in more than one thread at a time, this lock is released during most calls into native code that block, including I/O (e.g., reading from or writing to disk), allowing other Python threads to run Python code. For threads that are mostly I/O bound, Python multithreading works well and the single core never needs to sit idle waiting for I/O.

It is true that the code that does the pickling/serialization prior to writing to disk can become CPU bound. Attempting to do that part after forking might be risky in a process that has multiple threads running, since the forked process will only have one thread, and any locks held by other threads at the time of fork will never be released in the forked process, which can lead to deadlocks if you are not careful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants