Conversation
Implement thermal monitoring and control with a state machine and PID-based fan speed regulation. The module is self-contained and has no dependency on board or ASIC code. It provides ThermalConfig for thresholds and operating frequency, ThermalState with hysteresis to avoid oscillation on noisy readings, ThermalController for temperature and fan or frequency commands, FanPIDController for fan speed control, and TemperatureFilter for smoothing sensor readings. Tests cover state transitions, PID behavior, and filter semantics. Integration with the Bitaxe board and BM13xx thread follows in the next commit.
Connect the thermal controller to the Bitaxe board lifecycle and the BM13xx thread so that temperature drives fan speed and frequency throttling at runtime. Serializing EMC2101 access avoids I2C bus contention between temperature reads and fan writes. Accepting frequency commands in the BM13xx thread allows PLL adjustment for thermal throttling. Sustained-overshoot frequency reduction with cooldown avoids over-correction on transient temperature spikes. Filtered temperature and hysteresis-based state transitions from the thermal module keep behavior stable.
|
Would you please review again @johnnyasantoss @csgui ? |
|
Hey @jayrmotta—Mujina maintainer here. It's exciting you picked Mujina for your project, and it is impressive what you've got working! We're eager to help you upstream parts of this to Mujina, if you're willing. For upstreaming, I would want the fan controller and frequency controller split into independent components, and submitted in different PRs. The first version of the fan controller could simply control fan duty cycle for a hard-coded ASIC temperature target.1 Structurally, I'd like to see it start local to Bitaxe—so, in bitaxe.rs, or better yet, we move bitaxe.rs to a module directory and have the fan controller in a submodule—and later migrate to a more global location if we reuse it. Future versions (subsequent PRs) could add API override of the fan duty cycle or temperature target. See some of the API changes we've merged lately for ideas. Frequency control gets a little more complicated, as it will eventually need to fit into a larger picture where the scheduler sets different modes or controls for different targets (max rate, max efficiency, a particular power target, etc.) Perhaps controlling to a temperature fits in there somewhere; I hadn't thought about it. What does make sense initially is a frequency controller that throttles the ASIC back if it goes over an unsafe temperature. That's internal to the board/thread, and would remain as a defensive mechanism no matter what the schedule was requesting. I could see that defensive frequency controller as its own PR. (Maybe it could be sensitive to other temperatures too, like the voltage regulator temperature.) But again, I'd like to see the frequency controller and the fan controller as completely independent parts, with their own control logic, even though they coincidentally use the same temperature as an input. Footnotes
|
Summary
Currently the thermal management available in Mujina sets the fan speed to 100% as the system initializes, it sits at that value through the entire firmware lifecycle. Issue 256foundation#9 describes this behavior and points esp-miner's implementation of a PID controller to control the fan speed.
Problem
Even with fan speed set to 100% the BitAxe I've been using to test shoots to 100 celcius ASIC temperature little after it starts running:
Solution
This PR closes 256foundation#9 by introducing a thermal management module that monitors temperature and reacts to it controlling the fan speed and operating frequency with the aim of keeping the device within safe operating temperature ranges.
This solution starts stabilizing the temperature soon after its boot and in my experience trends towards 75 celsius, delivering 600~800gh/s.
There is definitely room to explore tunning as I used to get 1th/s with somewhat-quiet fan using AxeOs, there I played not only with fan speed and frequency, but also with voltage.
Changes
Testing