Skip to content

Register the multiplier and pipeline the accumulator update #1

@tylrcc

Description

@tylrcc

Right now the 32x24 multiply and the 64-bit accumulator update sit in a single cycle in rtl/mac_engine.v (see the new_pv / old_pv products and the ma_nxt / num_nxt / den_nxt adds). That combinational path is what caps fmax, roughly 210 MHz on the -2 part.

Goal: register the products, push the accumulator update one stage later, and trade one cycle of latency for timing headroom.

Notes

  • The output latency contract changes by one cycle, so docs/interface.md and the testbench expectations need to move with it.
  • sim/reference.py is the source of truth. If the visible timing changes, line the model and tb/mac_engine_tb.v back up so make sim still passes 0 errors.

Done when: timing closes higher on the target part and make sim is still green.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededrtlVerilog / hardware

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions