|
| 1 | +--- |
| 2 | +uid: tune-readsize |
| 3 | +title: Tune ReadSize |
| 4 | +--- |
| 5 | + |
| 6 | +This tutorial shows how to tune <xref:OpenEphys.Onix1.StartAcquisition>'s |
| 7 | +<xref:OpenEphys.Onix1.StartAcquisition.ReadSize> property to avoid buffer overflow errors which |
| 8 | +prematurely terminates the acquisition session and minimize latency of data transfer between the |
| 9 | +ONIX system and the computer for low-latency closed-loop feedback. |
| 10 | + |
| 11 | +## ONIX Hardware Buffer and ReadSize |
| 12 | + |
| 13 | +An important concept for understanding the effect of ReadSize is the hardware buffer. The hardware |
| 14 | +buffer is a temporary storage area that facilitates data transfer between the ONIX system and the |
| 15 | +PC. When the hardware buffer accumulates an amount of data that exceeds a threshold, this chunk of |
| 16 | +data is read by the PC and removed from the buffer. This threshold is determined by the value of |
| 17 | +ReadSize, a property of the StartAcquisition operator which is necessary for every workflow that |
| 18 | +uses <xref:OpenEphys.Onix1> to acquire data from ONIX. |
| 19 | + |
| 20 | +### ReadSize Tradeoffs |
| 21 | + |
| 22 | +There are two primary tradeoffs when selecting the optimal ReadSize value: latency and risk of |
| 23 | +hardware buffer overflow. Let's take at look at how those are affected by tuning ReadSize lower or |
| 24 | +higher for a given data rate. |
| 25 | + |
| 26 | +As ReadSize decreases, less data in the buffer is required for the computer to be able to read from |
| 27 | +the buffer. This means less time would need to pass before having access to that data. Therefore, |
| 28 | +setting ReadSize lower provides lower latency data (i.e. allows access to data closer in time to the |
| 29 | +physical creation of that data). However, there is a limit to this. Each call to the function that |
| 30 | +reads data from the hardware buffer requires resources from the computer (i.e. memory and CPU |
| 31 | +cycles). If ReadSize is too low such that the read function is being called in succession too |
| 32 | +rapidly, your computer won't be able to keep up with the amount of read function calls required to |
| 33 | +read all the data from the buffer. This can lead to an over-accumulation of data in the hardware |
| 34 | +buffer and an error when the buffer reaches maximum capacity that terminates the acquisition |
| 35 | +session. |
| 36 | + |
| 37 | +As ReadSize increases, more data in the buffer is required for the computer to be able to read from |
| 38 | +the buffer. This means more time would need to pass before having access to that data. Therefore, |
| 39 | +setting ReadSize higher reduces the frequency of calls to the read function thereby reducing the |
| 40 | +risk that your computer is overwhelmed by the amount of read functions calls required to clear the |
| 41 | +buffer. As you might have surmised already, this increases the latency between the creation of data |
| 42 | +and the retrieval of that data by your computer. |
| 43 | + |
| 44 | +## Tuning ReadSize |
| 45 | + |
| 46 | +### Setup |
| 47 | + |
| 48 | +Follow the [Getting Started](xref:getting-started) guide to set up your Bonsai environment and |
| 49 | +familiarize yourself with using OpenEphys.Onix1 to acquire data from ONIX. Copy the following |
| 50 | +workflow into the Bonsai workflow editor by hovering over workflow image and clicking on the |
| 51 | +clipboard icon that appears. |
| 52 | + |
| 53 | +::: workflow |
| 54 | + |
| 55 | +::: |
| 56 | + |
| 57 | +Open Bonsai and paste this workflow by clicking the Bonsai workflow editor pane and hitting |
| 58 | +<kbd>Ctrl+V</kbd>. |
| 59 | + |
| 60 | +### Workflow Description |
| 61 | + |
| 62 | +::: workflow |
| 63 | + |
| 64 | +::: |
| 65 | + |
| 66 | +The top-level configuration chain includes a <xref:OpenEphys.Onix1.ConfigureLoadTester>. The load |
| 67 | +tester device allows us to emulate different rates of data production and measure latency between |
| 68 | +data reads and data writes. For example, enabling two Neuropixels 2.0 probes will produce about 47 |
| 69 | +MB/s ≈((8\*2+384\*2)\*30000\*2)/1e6. In this example, we'll use `ConfigureLoadTester` to emulate |
| 70 | +this payload. To do so, `ConfigureLoadTester`'s ReceivedWords and FramesPerSecond are respectively |
| 71 | +392 and 60,000. This parallels the rate of data being produced by two probes: 1 |
| 72 | +<xref:OpenEphys.Onix1.NeuropixelsV2eDataFrame> at 60 kHz. The Enable property is set to True to |
| 73 | +enable the LoadTester device. Its DeviceName is set to "Load Tester" so that it has a |
| 74 | +straightforward name to use to link the <xref:OpenEphys.Onix1.LoadTesterData> and |
| 75 | +<xref:OpenEphys.Onix1.LoadTesterLoopback> operators. The DeviceAddress property is set to 11 because |
| 76 | +that's how this device is indexed in the ONIX system. |
| 77 | +<!-- TransmittedWords property --> |
| 78 | + |
| 79 | +All of the <xref:OpenEphys.Onix1.ConfigureBreakoutBoard>'s devices except the MemoryMonitor are |
| 80 | +disabled. |
| 81 | + |
| 82 | +<xref:OpenEphys.Onix1.StartAcquisition>'s <xref:OpenEphys.Onix1.StartAcquisition.WriteSize> is set |
| 83 | +to 16384. This defines a readily-available pool of memory for the creation of output data frames. A |
| 84 | +larger size will reduce the frequency of dynamic memory allocation system calls but increase the |
| 85 | +expense of each of those calls. The effect on real-time performance is typically not as large as |
| 86 | +that of the ReadSize property because it does not determine when they are written to hardware. Data |
| 87 | +is written to hardware as soon as an output frame has been created. In contrast, data is read from |
| 88 | +hardware whenever more than ReadSize bytes have accumulated in the input buffer. The ReadSize |
| 89 | +property is also set to 16384. We'll take a closer look at and play with that value in the next |
| 90 | +section. |
| 91 | + |
| 92 | +::: workflow |
| 93 | + |
| 94 | +::: |
| 95 | + |
| 96 | +LoadTesterData produces a sequence of |
| 97 | +[LoadTesterDataFrames](xref:OpenEphys.Onix1.LoadTesterDataFrame). The |
| 98 | +<xref:OpenEphys.Onix1.DataFrame.HubClock> member and the |
| 99 | +<xref:OpenEphys.Onix1.LoadTesterDataFrame.HubClockDelta> member are each selected from the |
| 100 | +LoadTesterDataFrame with their own <xref:Bonsai.Expressions.MemberSelectorBuilder>. |
| 101 | + |
| 102 | +The HubClock member indicates the value of the hub's clock when that LoadTesterDataFrame was |
| 103 | +produced. A hub is a piece of hardware that coordinates a group of devices e.g. the breakout board |
| 104 | +is a hub that coordinates DigitalInput, DigitalOutput, AnalogIO, Memory Monitor, etc.. EveryNth is a |
| 105 | +<xref:Bonsai.Reactive.Condition> which only allows through every Nth element in the observable |
| 106 | +sequence. You can inspect its logic by double-clicking the node when the workflow is not running. In |
| 107 | +this case, the N property is set to 100, so every 100th sample is allowed through the EveryNth |
| 108 | +operator and sent to LoadTesterLoopback. This operator is a *sink* operator which writes the |
| 109 | +HubClock member that was passed through the EveryNth operator back to the load tester device. The |
| 110 | +load tester device will then update the HubClockDelta members in all subsequent |
| 111 | +LoadTesterDataFrames. |
| 112 | +<!-- why is everyNth necessary? --> |
| 113 | + |
| 114 | +The HubClockDelta member indicates the difference between the HubClock value sent to the |
| 115 | +LoadTesterLoopback operator and the load tester's hub clock value when that HubClock value was |
| 116 | +received by the hardware. <xref:Bonsai.Reactive.DistinctUntilChanged> filters out repeated elements |
| 117 | +which is necessary because HubClockDelta only ends up getting updated every 100th |
| 118 | +LoadTesterDataFrame. The next operator <xref:Bonsai.Scripting.Expressions.ExpressionTransform> |
| 119 | +converts the HubClockDelta from units of Hub clock cycles to units of microseconds. This data gets |
| 120 | +sent to <xref:Bonsai.Dsp.Histogram1D> to help visualize the distribution of closed-loop latencies. |
| 121 | + |
| 122 | +::: workflow |
| 123 | + |
| 124 | +::: |
| 125 | + |
| 126 | +To learn about the <xref:OpenEphys.Onix1.MemoryMonitorData> branch, visit the [Breakout Board Memory |
| 127 | +Monitor](xref:breakout_memory-monitor) (or the equivalent for any of our other hardware) page. |
| 128 | + |
| 129 | +### Measuring Latency at Different ReadSize Values |
| 130 | + |
| 131 | +#### ReadSize = 16384 |
| 132 | + |
| 133 | +With ReadSize set to 16384, start the workflow, and [open the visualizers](xref:visualize-data) for |
| 134 | +the PercentUsed and Histogram1D nodes: |
| 135 | + |
| 136 | + |
| 137 | + |
| 138 | + |
| 139 | +Average latency appears to be about 300 μs (in this plot, 1000 corresponds to 1 ms). This |
| 140 | +approximately comports with expectations. If data is being produced at about 47MB/s, it takes about |
| 141 | +348 μs to accumulate 16384 bytes. This isn't a perfect estimate because there are other devices |
| 142 | +producing data (e.g. the Memory Monitor and Heartbeat) though the data rate of those devices is |
| 143 | +completely dwarfed by the data rate of pay load tester. The most likely source of this discrepancy |
| 144 | +is that the computer is not 100% available to perform the read operation. This causes some reads to |
| 145 | +be delayed and some reads to happen sooner. This calculation that leads to a 348 μs latency can be a |
| 146 | +first order estimate of latency to determine an optimal ReadSize, but it's not perfect. The PayLoad |
| 147 | +tester provides empirical measurements of latency and memory usage. |
| 148 | + |
| 149 | +The hardware buffer also doesn't seem to be over-accumulating data i.e. the MemoryMonitor |
| 150 | +PercentUsed visualizer shows that the percentage of the buffer being used remains close to zero. |
| 151 | + |
| 152 | +For many experiments, the above latency is totally acceptable. In any case, let's see how much lower |
| 153 | +we can get the latency for more intense closed-loop experiments. |
| 154 | + |
| 155 | +#### ReadSize = 2048 |
| 156 | + |
| 157 | +Set ReadSize to 2048 and restart the workflow (ReadSize is a |
| 158 | +[<button class="badge oe-badge-border oe-badge-yellow" data-bs-toggle="tooltip" title="Configuration properties have an effect on hardware when a workflow is started and are used to initialize the hardware state. If they are changed while a workflow is running, they will not have an effect until the workflow is restarted."> Configuration</button>](xref:OpenEphys.Onix1#configuration) |
| 159 | +property so it only updates when a workflow starts), and open the same visualizers: |
| 160 | + |
| 161 | + |
| 162 | + |
| 163 | + |
| 164 | +The closed-loop latencies now average about 80 μs. The hardware buffer still seems pretty stable |
| 165 | +around zero even after letting some time pass. Let's see if we can decrease latency even further |
| 166 | +without overflowing the buffer. |
| 167 | + |
| 168 | +#### ReadSize = 1024 |
| 169 | + |
| 170 | +Set ReadSize to 1024, restart the workflow, and open the same visualizers. |
| 171 | + |
| 172 | + |
| 173 | + |
| 174 | + |
| 175 | +The Histogram1D visualizer appears to be empty. This is because the latency immediately exceeds the |
| 176 | +upper limit x-axis of 1 ms. You can see this by inspecting the visualizer for the node prior to |
| 177 | +Histogram1D. Because the computer cannot keep up with the amount of read operations necessary to |
| 178 | +clear the buffer, there is a bunch of data in the queue that needs to be read before the most recent |
| 179 | +data frames are accessible. Therefore, by the time a particular data frame gets read from the |
| 180 | +buffer, a bunch of time has already passed since that frame was generated. This means it takes |
| 181 | +longer for a given HubClock value to reach the LoadTesterLoopBack operator which subsequently means |
| 182 | +increased latencies. |
| 183 | + |
| 184 | +Because the amount of data in the hardware buffer is rising (which can be seen by looking at the |
| 185 | +MemoryMonitor PercentUsed visualizer), the acquisition session will eventually terminate in an error |
| 186 | +when the MemoryMonitor PercentUsed reaches 100% and the hardware buffer overflows. |
| 187 | + |
| 188 | +> [!NOTE] |
| 189 | +> The point at which your computer will not be able to keep up with with the number of reads to keep |
| 190 | +> the buffer clear as demonstrated here depends on your computer's capabilities and might be |
| 191 | +> different from when our computer can no longer do that. The computer used to create this tutorial |
| 192 | +> has the following specs: |
| 193 | +> - CPU: Intel i9-12900K |
| 194 | +> - RAM: 64 GB |
| 195 | +> - GPU: NVIDIA GTX 1070 8GB |
| 196 | +> - OS: Windows 11 |
| 197 | +
|
| 198 | +#### Summary |
| 199 | + |
| 200 | +The results of our experimentation are as follows: |
| 201 | + |
| 202 | +| ReadSize | Latency | Buffer Usage | Notes | |
| 203 | +|----------|----------------------|-----------------|----------------------------------------------------------------------------------------------------| |
| 204 | +| 16384 | ~300 μs | Stable at 0% | Perfectly fine if there aren't any strict low latency requirements, lowest risk of buffer overflow | |
| 205 | +| 2048 | ~80 μs | Stable near 0% | Balances latency requirements with low risk of buffer overflow | |
| 206 | +| 1024 | Rises steadily | Rises untenably | Certain buffer overflow error | |
| 207 | + |
| 208 | +These results may differ for your experimental system. For example, your system might have different |
| 209 | +bandwidth requirements (if you are using different devices, data is produced at a different rate) or |
| 210 | +use a computer with different performance capabilities (which changes how quickly it can perform |
| 211 | +read operations). |
| 212 | + |
| 213 | +<!-- ## Tuning ReadSize with Real-Time Processing --> |
0 commit comments