Created metrication for inter-proclet communication#1
Created metrication for inter-proclet communication#1Bokai-Bi wants to merge 31 commits intoNu-NSDI23:mainfrom
Conversation
Bokai-Bi
commented
Jun 29, 2023
- Added instrumentation for inter-proclet communication. For local communication, each proclet logs the total amount of function calls. For remote communication, each proclet logs the total amount of calls and the total size of data transferred on a per-target-machine basis. Data about local communication is stored in a Counter in the caller's header while data about remote communication is stored in a std::unordered_map also in the caller's header synchronized by its spin_lock. No data is logged on the callee's side.
- Added a new benchmark bench/bench_proclet_logging, which benchmarks the performance of remote proclet communication. Only works when the main server and remote server are started under specific IPs specified in code.
inc/nu/impl/proclet.ipp
Outdated
|
|
||
| caller_header->spin_lock.lock(); | ||
|
|
||
| auto target_kvpair = caller_header->remote_call_map.find(target_ip); |
There was a problem hiding this comment.
why the key is the "target_ip" not simply the "callee_id"?
There was a problem hiding this comment.
For locality improvements we are interested in the total aggregate communication from a proclet to all other machines, using the ip of the callee allows us to collect data on a per-target-machine basis. Using callee_id as key will not be optimal since proclets can migrate
There was a problem hiding this comment.
I actually feel that using the callee_id might be better as it tells us whether should be colocate certain proclet pairs to improve locality. Using the per-machine metric hides all these details.
| ProcletSlabGuard slab_guard(&caller_header->slab); | ||
| NodeIP target_ip = get_runtime()->rpc_client_mgr()->get_ip_by_proclet_id(id); | ||
|
|
||
| caller_header->spin_lock.lock(); |
There was a problem hiding this comment.
This will limit the invocation tput of the caller proclet to roughly 1MOPS
There was a problem hiding this comment.
Is this because of the call to get_ip_by_proclet_id or the locking? The locking can theoretically be removed by replacing the map with an array that can be unsafely modified.
There was a problem hiding this comment.
I mean the locking. Yeah making it lockless would be better.
zainryan
left a comment
There was a problem hiding this comment.
Thanks Bokai, it looks functionally correct! I have some comments wrt to the metric and synchronization, please look at the embedded comments.
…d proclet logging benchmark