-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread splitting is expensive due to OT extension setup time #40
Comments
COT is worth re-enabling, but let's dig into history to see how it got lost. Atomic increments can be slow: at the end of the day it's a point of serialization. You can still try it to check performance, since I believe it is a very small change. It would be much nicer if we could minimize the use of shared structures as much as possible. This would mean somehow a centralized structure is being used to partition the counter space among all interested threads. Hence my prior suggestion of alternating rows, but as you pointed out that can get out of hand as well. I think it is worth reconsidering the What do you think? |
RE: COT: OK. I probably should read that paper in full before I look too closely at the code, but if I find some time, I'll take a look (and if you get to it first, that's OK too).
In principle, yes; I think the trick here would be to allocate indices to threads in blocks of maybe 2^10 or so, rather than involving the multithreading primitives on each OT. Also, IIRC intel has single-instruction atomic add-and-fetch instructions, which are slower than the nonatomic equivalents, but much faster than locks. Of course, this doesn't help on other architectures, but I'm hoping that maybe GCC's builtin is smart enough to use the appropriate equivalent. On that note: thoughts on GCC builtins? I can imagine that maybe we want to retain the option to use compcert or clang in the future, but I also sort of think we should worry about crossing those bridges when we come to them.
I agree with you, but I'm not quite sold on your proposed alternative. I think it matches the way we ideally want protocols to be used pretty well (i.e. mad recursive splitting should probably be frowned upon), and it's easy to reason about, which is really nice, but in practice, it may prevent libraries from working. Consider my case: I am splitting the protocol inside an ORAM library. If an application programmer decides to initialize an ORAM inside a protocol that's already been forked, then it will fail, and it doesn't seem to have any mechanism to "do the right thing" other than to fall back to a single protocol. For now, I'll do a performance check on the shared-OT approach and report back. |
OK, I have a candidate implementation over here. Note: so far it doesn't actually split the OT object when it splits threads, but it does have most of the necessary functionality: it uses atomics, does batched nonce allocations, reference-counts the master nonce, etc. I cannot detect a performance difference between this code and the original on my laptop. Although I haven't actually added a function call to split the OT object, I do foresee a problem: synchronizing nonce allotments between threads. There are two obvious solutions AFAICT:
Thoughts? |
I started another issue to deal with correlated OT: #43. Note that the implementation referenced by that issue depends upon the code I've written for this one. |
Under the current codebase, if you split the YGC protocol, you have to wait something like 300ms for OT extensions to be set up, which is a very long time, comparatively speaking. I am opening this thread to discuss solutions before actually implementing anything. @samee previously suggested that it might be possible to make the threads share a single OT setup, but alternate rows. It seems to me that repeated splitting could cause problems there. Admittedly, it shouldn't be a common case, but it still seems like one we should handle gracefully. The alternative AFAICT is to have the threads share a single OT setup and use an atomic increment (GCC has a builtin for this) to ensure two threads don't end up using the same row. Thoughts or alternatives?
On a semi-related note, COT was lost somewhere along the way. Is it worth re-enabling?
The text was updated successfully, but these errors were encountered: