Skip to content

Commit 1e99a69

Browse files
committed
1st draft tutorial for tuning ReadSize
Also: - Add references to this tutorial throughout other articles - Add workflows for the LoadTester operators
1 parent 6d07a7d commit 1e99a69

File tree

18 files changed

+692
-8
lines changed

18 files changed

+692
-8
lines changed

articles/getting-started/onix-configuration.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,8 @@ The data acquisition process is started when ContextTask passes through
5454
<xref:OpenEphys.Onix1.StartAcquisition>. StartAcquisition allows the user to set parameters that are
5555
related to data acquisition such as ReadSize and WriteSize. Setting the ReadSize property for a
5656
particular workflow is a balancing act of minimizing latency of data data transfers from the ONIX
57-
system and avoiding data accumulation in the ONIX system's hardware buffer.
57+
system and avoiding data accumulation in the ONIX system's hardware buffer. To learn about the
58+
process of tuning ReadSize, check out the <xref:tune-readsize> tutorial.
5859

5960
::: workflow
6061
![/workflows/getting-started/start-acquisition.bonsai workflow](../../workflows/getting-started/start-acquisition.bonsai)

articles/tutorials/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
items:
33
- href: ephys-processing-listening.md
44
- href: ephys-socket.md
5+
- href: tune-readsize.md
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
uid: tune-readsize
3+
title: Tune ReadSize
4+
---
5+
6+
This tutorial shows how to tune <xref:OpenEphys.Onix1.StartAcquisition>'s
7+
<xref:OpenEphys.Onix1.StartAcquisition.ReadSize> property to avoid buffer overflow errors which
8+
prematurely terminates the acquisition session and minimize latency of data transfer between the
9+
ONIX system and the computer for low-latency closed-loop feedback.
10+
11+
## ONIX Hardware Buffer and ReadSize
12+
13+
An important concept for understanding the effect of ReadSize is the hardware buffer. The hardware
14+
buffer is a temporary storage area that facilitates data transfer between the ONIX system and the
15+
PC. When the hardware buffer accumulates an amount of data that exceeds a threshold, this chunk of
16+
data is read by the PC and removed from the buffer. This threshold is determined by the value of
17+
ReadSize, a property of the StartAcquisition operator which is necessary for every workflow that
18+
uses <xref:OpenEphys.Onix1> to acquire data from ONIX.
19+
20+
### ReadSize Tradeoffs
21+
22+
There are two primary tradeoffs when selecting the optimal ReadSize value: latency and risk of
23+
hardware buffer overflow. Let's take at look at how those are affected by tuning ReadSize lower or
24+
higher for a given data rate.
25+
26+
As ReadSize decreases, less data in the buffer is required for the computer to be able to read from
27+
the buffer. This means less time would need to pass before having access to that data. Therefore,
28+
setting ReadSize lower provides lower latency data (i.e. allows access to data closer in time to the
29+
physical creation of that data). However, there is a limit to this. Each call to the function that
30+
reads data from the hardware buffer requires resources from the computer (i.e. memory and CPU
31+
cycles). If ReadSize is too low such that the read function is being called in succession too
32+
rapidly, your computer won't be able to keep up with the amount of read function calls required to
33+
read all the data from the buffer. This can lead to an over-accumulation of data in the hardware
34+
buffer and an error when the buffer reaches maximum capacity that terminates the acquisition
35+
session.
36+
37+
As ReadSize increases, more data in the buffer is required for the computer to be able to read from
38+
the buffer. This means more time would need to pass before having access to that data. Therefore,
39+
setting ReadSize higher reduces the frequency of calls to the read function thereby reducing the
40+
risk that your computer is overwhelmed by the amount of read functions calls required to clear the
41+
buffer. As you might have surmised already, this increases the latency between the creation of data
42+
and the retrieval of that data by your computer.
43+
44+
## Tuning ReadSize
45+
46+
### Setup
47+
48+
Follow the [Getting Started](xref:getting-started) guide to set up your Bonsai environment and
49+
familiarize yourself with using OpenEphys.Onix1 to acquire data from ONIX. Copy the following
50+
workflow into the Bonsai workflow editor by hovering over workflow image and clicking on the
51+
clipboard icon that appears.
52+
53+
::: workflow
54+
![SVG of load tester workflow](../../workflows/tutorials/tune-readsize/tune-readsize.bonsai)
55+
:::
56+
57+
Open Bonsai and paste this workflow by clicking the Bonsai workflow editor pane and hitting
58+
<kbd>Ctrl+V</kbd>.
59+
60+
### Workflow Description
61+
62+
::: workflow
63+
![SVG of load tester workflow configuration chain](../../workflows/tutorials/tune-readsize/configuration.bonsai)
64+
:::
65+
66+
The top-level configuration chain includes a <xref:OpenEphys.Onix1.ConfigureLoadTester>. The load
67+
tester device allows us to emulate different rates of data production and measure latency between
68+
data reads and data writes. For example, enabling two Neuropixels 2.0 probes will produce about 47
69+
MB/s ≈((8\*2+384\*2)\*30000\*2)/1e6. In this example, we'll use `ConfigureLoadTester` to emulate
70+
this payload. To do so, `ConfigureLoadTester`'s ReceivedWords and FramesPerSecond are respectively
71+
392 and 60,000. This parallels the rate of data being produced by two probes: 1
72+
<xref:OpenEphys.Onix1.NeuropixelsV2eDataFrame> at 60 kHz. The Enable property is set to True to
73+
enable the LoadTester device. Its DeviceName is set to "Load Tester" so that it has a
74+
straightforward name to use to link the <xref:OpenEphys.Onix1.LoadTesterData> and
75+
<xref:OpenEphys.Onix1.LoadTesterLoopback> operators. The DeviceAddress property is set to 11 because
76+
that's how this device is indexed in the ONIX system.
77+
<!-- TransmittedWords property -->
78+
79+
All of the <xref:OpenEphys.Onix1.ConfigureBreakoutBoard>'s devices except the MemoryMonitor are
80+
disabled.
81+
82+
<xref:OpenEphys.Onix1.StartAcquisition>'s <xref:OpenEphys.Onix1.StartAcquisition.WriteSize> is set
83+
to 16384. This defines a readily-available pool of memory for the creation of output data frames. A
84+
larger size will reduce the frequency of dynamic memory allocation system calls but increase the
85+
expense of each of those calls. The effect on real-time performance is typically not as large as
86+
that of the ReadSize property because it does not determine when they are written to hardware. Data
87+
is written to hardware as soon as an output frame has been created. In contrast, data is read from
88+
hardware whenever more than ReadSize bytes have accumulated in the input buffer. The ReadSize
89+
property is also set to 16384. We'll take a closer look at and play with that value in the next
90+
section.
91+
92+
::: workflow
93+
![SVG of load tester workflow loadtester branch](../../workflows/tutorials/tune-readsize/loadtester.bonsai)
94+
:::
95+
96+
LoadTesterData produces a sequence of
97+
[LoadTesterDataFrames](xref:OpenEphys.Onix1.LoadTesterDataFrame). The
98+
<xref:OpenEphys.Onix1.DataFrame.HubClock> member and the
99+
<xref:OpenEphys.Onix1.LoadTesterDataFrame.HubClockDelta> member are each selected from the
100+
LoadTesterDataFrame with their own <xref:Bonsai.Expressions.MemberSelectorBuilder>.
101+
102+
The HubClock member indicates the value of the hub's clock when that LoadTesterDataFrame was
103+
produced. A hub is a piece of hardware that coordinates a group of devices e.g. the breakout board
104+
is a hub that coordinates DigitalInput, DigitalOutput, AnalogIO, Memory Monitor, etc.. EveryNth is a
105+
<xref:Bonsai.Reactive.Condition> which only allows through every Nth element in the observable
106+
sequence. You can inspect its logic by double-clicking the node when the workflow is not running. In
107+
this case, the N property is set to 100, so every 100th sample is allowed through the EveryNth
108+
operator and sent to LoadTesterLoopback. This operator is a *sink* operator which writes the
109+
HubClock member that was passed through the EveryNth operator back to the load tester device. The
110+
load tester device will then update the HubClockDelta members in all subsequent
111+
LoadTesterDataFrames.
112+
<!-- why is everyNth necessary? -->
113+
114+
The HubClockDelta member indicates the difference between the HubClock value sent to the
115+
LoadTesterLoopback operator and the load tester's hub clock value when that HubClock value was
116+
received by the hardware. <xref:Bonsai.Reactive.DistinctUntilChanged> filters out repeated elements
117+
which is necessary because HubClockDelta only ends up getting updated every 100th
118+
LoadTesterDataFrame. The next operator <xref:Bonsai.Scripting.Expressions.ExpressionTransform>
119+
converts the HubClockDelta from units of Hub clock cycles to units of microseconds. This data gets
120+
sent to <xref:Bonsai.Dsp.Histogram1D> to help visualize the distribution of closed-loop latencies.
121+
122+
::: workflow
123+
![SVG of load tester workflow memorymonitor branch](../../workflows/tutorials/tune-readsize/memory-monitor.bonsai)
124+
:::
125+
126+
To learn about the <xref:OpenEphys.Onix1.MemoryMonitorData> branch, visit the [Breakout Board Memory
127+
Monitor](xref:breakout_memory-monitor) (or the equivalent for any of our other hardware) page.
128+
129+
### Measuring Latency at Different ReadSize Values
130+
131+
#### ReadSize = 16384
132+
133+
With ReadSize set to 16384, start the workflow, and [open the visualizers](xref:visualize-data) for
134+
the PercentUsed and Histogram1D nodes:
135+
136+
![screenshot of Histogram1D visualizers with ReadSize 16384](../../images/tutorials/tune-readsize/histogram1d_16384.webp)
137+
![screenshot of PercentUsed visualizers with ReadSize 16384](../../images/tutorials/tune-readsize/percent-used_16384.webp)
138+
139+
Average latency appears to be about 300 μs (in this plot, 1000 corresponds to 1 ms). This
140+
approximately comports with expectations. If data is being produced at about 47MB/s, it takes about
141+
348 μs to accumulate 16384 bytes. This isn't a perfect estimate because there are other devices
142+
producing data (e.g. the Memory Monitor and Heartbeat) though the data rate of those devices is
143+
completely dwarfed by the data rate of pay load tester. The most likely source of this discrepancy
144+
is that the computer is not 100% available to perform the read operation. This causes some reads to
145+
be delayed and some reads to happen sooner. This calculation that leads to a 348 μs latency can be a
146+
first order estimate of latency to determine an optimal ReadSize, but it's not perfect. The PayLoad
147+
tester provides empirical measurements of latency and memory usage.
148+
149+
The hardware buffer also doesn't seem to be over-accumulating data i.e. the MemoryMonitor
150+
PercentUsed visualizer shows that the percentage of the buffer being used remains close to zero.
151+
152+
For many experiments, the above latency is totally acceptable. In any case, let's see how much lower
153+
we can get the latency for more intense closed-loop experiments.
154+
155+
#### ReadSize = 2048
156+
157+
Set ReadSize to 2048 and restart the workflow (ReadSize is a
158+
[<button class="badge oe-badge-border oe-badge-yellow" data-bs-toggle="tooltip" title="Configuration properties have an effect on hardware when a workflow is started and are used to initialize the hardware state. If they are changed while a workflow is running, they will not have an effect until the workflow is restarted."> Configuration</button>](xref:OpenEphys.Onix1#configuration)
159+
property so it only updates when a workflow starts), and open the same visualizers:
160+
161+
![screenshot of Histogram1D visualizers with ReadSize 2048](../../images/tutorials/tune-readsize/histogram1d_2048.webp)
162+
![screenshot of PercentUsed visualizers with ReadSize 2048](../../images/tutorials/tune-readsize/percent-used_2048.webp)
163+
164+
The closed-loop latencies now average about 80 μs. The hardware buffer still seems pretty stable
165+
around zero even after letting some time pass. Let's see if we can decrease latency even further
166+
without overflowing the buffer.
167+
168+
#### ReadSize = 1024
169+
170+
Set ReadSize to 1024, restart the workflow, and open the same visualizers.
171+
172+
![screenshot of Histogram1D visualizers with ReadSize 1024](../../images/tutorials/tune-readsize/histogram1d_1024.webp)
173+
![screenshot of PercentUsed visualizers with ReadSize 1024](../../images/tutorials/tune-readsize/percent-used_1024.webp)
174+
175+
The Histogram1D visualizer appears to be empty. This is because the latency immediately exceeds the
176+
upper limit x-axis of 1 ms. You can see this by inspecting the visualizer for the node prior to
177+
Histogram1D. Because the computer cannot keep up with the amount of read operations necessary to
178+
clear the buffer, there is a bunch of data in the queue that needs to be read before the most recent
179+
data frames are accessible. Therefore, by the time a particular data frame gets read from the
180+
buffer, a bunch of time has already passed since that frame was generated. This means it takes
181+
longer for a given HubClock value to reach the LoadTesterLoopBack operator which subsequently means
182+
increased latencies.
183+
184+
Because the amount of data in the hardware buffer is rising (which can be seen by looking at the
185+
MemoryMonitor PercentUsed visualizer), the acquisition session will eventually terminate in an error
186+
when the MemoryMonitor PercentUsed reaches 100% and the hardware buffer overflows.
187+
188+
> [!NOTE]
189+
> The point at which your computer will not be able to keep up with with the number of reads to keep
190+
> the buffer clear as demonstrated here depends on your computer's capabilities and might be
191+
> different from when our computer can no longer do that. The computer used to create this tutorial
192+
> has the following specs:
193+
> - CPU: Intel i9-12900K
194+
> - RAM: 64 GB
195+
> - GPU: NVIDIA GTX 1070 8GB
196+
> - OS: Windows 11
197+
198+
#### Summary
199+
200+
The results of our experimentation are as follows:
201+
202+
| ReadSize | Latency | Buffer Usage | Notes |
203+
|----------|----------------------|-----------------|----------------------------------------------------------------------------------------------------|
204+
| 16384 | ~300 μs | Stable at 0% | Perfectly fine if there aren't any strict low latency requirements, lowest risk of buffer overflow |
205+
| 2048 | ~80 μs | Stable near 0% | Balances latency requirements with low risk of buffer overflow |
206+
| 1024 | Rises steadily | Rises untenably | Certain buffer overflow error |
207+
208+
These results may differ for your experimental system. For example, your system might have different
209+
bandwidth requirements (if you are using different devices, data is produced at a different rate) or
210+
use a computer with different performance capabilities (which changes how quickly it can perform
211+
read operations).
212+
213+
<!-- ## Tuning ReadSize with Real-Time Processing -->
12.1 KB
Loading
17.1 KB
Loading
13.3 KB
Loading
12 KB
Loading
12.2 KB
Loading
20.6 KB
Loading

0 commit comments

Comments
 (0)