Right now the 32x24 multiply and the 64-bit accumulator update sit in a single cycle in rtl/mac_engine.v (see the new_pv / old_pv products and the ma_nxt / num_nxt / den_nxt adds). That combinational path is what caps fmax, roughly 210 MHz on the -2 part.
Goal: register the products, push the accumulator update one stage later, and trade one cycle of latency for timing headroom.
Notes
- The output latency contract changes by one cycle, so
docs/interface.md and the testbench expectations need to move with it.
sim/reference.py is the source of truth. If the visible timing changes, line the model and tb/mac_engine_tb.v back up so make sim still passes 0 errors.
Done when: timing closes higher on the target part and make sim is still green.
Right now the 32x24 multiply and the 64-bit accumulator update sit in a single cycle in
rtl/mac_engine.v(see thenew_pv/old_pvproducts and thema_nxt/num_nxt/den_nxtadds). That combinational path is what caps fmax, roughly 210 MHz on the -2 part.Goal: register the products, push the accumulator update one stage later, and trade one cycle of latency for timing headroom.
Notes
docs/interface.mdand the testbench expectations need to move with it.sim/reference.pyis the source of truth. If the visible timing changes, line the model andtb/mac_engine_tb.vback up somake simstill passes 0 errors.Done when: timing closes higher on the target part and
make simis still green.