Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block I/O tracing #196

Closed
cvonelm opened this issue Nov 4, 2021 · 4 comments · Fixed by #197
Closed

block I/O tracing #196

cvonelm opened this issue Nov 4, 2021 · 4 comments · Fixed by #197
Assignees

Comments

@cvonelm
Copy link
Member

cvonelm commented Nov 4, 2021

I'm separating this from #194 so we can have a nice high level discussion there and get into the nitty gritty details of the implementation here.

event reading

by design there is one perf event stream per cpu that we read separately. This is problematic in this case, because one thread on one cpu can issue a block I/O request and a completely different thread (usually a kernel thread) that might be on a completely different cpu will receive the completion event.

A small BPF python hack shows, that separate issue/complete CPUs isn't an edge case : The majority of events have different CPUs where they were issued and completed, meaning that just discarding those events isn't an option.

So instead we probably need to cache the issue and complete events at measuring time and try to construct a coherent view based on the local event observations later

@cvonelm cvonelm self-assigned this Nov 4, 2021
@cvonelm
Copy link
Member Author

cvonelm commented Nov 5, 2021

I/O Handle in Otf2xx

in OTF2 there is the concept of the I/O handle, which you need to create before you can write I/O operations and destroy afterwards, corresponding to the classical concept of opening and closing a file or a network connection. Block I/O however is "stateless" so there is no direct equivalent. Do we assign one handle to every block device, or one handle to every block?

@bmario
Copy link
Member

bmario commented Nov 5, 2021

For the record:

  • Block I/O will only be available in system monitoring mode
  • We try to implement Block I/O using otf2 IO records
    • one IoHandle per block device
    • The request issued and request completed tracepoint will be mapped to IoOperationBegin/IoOperationIssued and IoOperationComplete

@cvonelm
Copy link
Member Author

cvonelm commented Dec 2, 2021

To measure latency you have to match up the queue insert with queue remove. However the events do not have a simple id that allows us to match them up. I've now tested matching the events up in different ways:

replay the kernel FIFO based on the events we have

This assumes that the queue for every block device is a FIFO. To match the inserts to complete, simply replay the behaviour of the kernel FIFO in lo2s, based on the events and timestamps we have.

Pro

  • If the underlying queue is really one FIFO we could replicate the kernel FIFO perfectly and get perfectly correct latency values.

Con

  • Completely useless if the underlying data structure is not a FIFO

Match based on sector number.

This basically assumes, that the sector that is written or read is unique and thus can be used as a key to match inserts with completes.

Pro

  • probably the closest thing that we can get to a real unique id with tracepoints

Con

  • due to caching the same sector shouldnt be read or written overlappingly very often, but it is not a true unique id.

Match based on the address of the struct *request

This is what biosnoop does. Use the memory address of the struct *request, which encodes the block I/O request, as a key. This address doesn't change, as it is alive throughout the whole request.

Pro

  • Pretty much a unique id

Con

  • requires BPF

Event losses

percentage of events for which no matching insert could be found

replay FIFO = 43% (so sadly no FIFO behaviour here)

match based on sector = 0.3%
match based on struct * request = 0.2%

Latency Histograms

struct *request as a key:
bpf

sector as a key:
best_effort

One could now test further if they match the same insert and complete events, but the latency histogram looks so that we can simply use the sector as the unique id.

@cvonelm
Copy link
Member Author

cvonelm commented Dec 17, 2021

This is how it looks in Vampir for a simple test trace:

Shared_Resource_Timeline_lo2s_trace_2021-12-14T15-24-21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants