Skip to content

arch-riscv,configs: Add SE matrix smoke support#819

Open
jensen-yan wants to merge 3 commits intoxs-devfrom
se-matrix-smoke-se
Open

arch-riscv,configs: Add SE matrix smoke support#819
jensen-yan wants to merge 3 commits intoxs-devfrom
se-matrix-smoke-se

Conversation

@jensen-yan
Copy link
Copy Markdown
Collaborator

@jensen-yan jensen-yan commented Apr 9, 2026

Summary

Add the minimum RISC-V matrix support needed to run matrix smoke tests in SE mode.

What this changes

  • add matrix instruction definitions and decoder entries for the current SE smoke flow
  • add a simple functional matrix state/model in RiscvISA::ISA
  • support matrix tile configuration, loads, compute, stores, and minimal sync-token behavior
  • disable rename-time operand folding in configs/example/se.py so trap/syscall register updates remain visible in SE mode

Why

This PR is intended to make SE mode usable for matrix smoke validation before moving on to fuller FS-mode testing.

With these changes, the matrix smoke binaries can run in gem5 SE mode and the basic userland path is stable enough for bring-up and regression use.

Validation

Validated in SE mode with:

  • libc_mmap_smoke_xsai: passes
  • precomp_rand_repro: errors_shown=0
  • gemm_precomp: all 8 precomputed cases pass

Additional coverage:

  • hello_xsai enters userland, passes the allocator tests, and reaches the randomized matrix fuzz loop; it did not complete within the extended timeout window, but no correctness failure was observed before timeout

Notes

This PR intentionally stays focused on SE-mode smoke support in gem5 only.
It does not include FS/raw-linux bring-up changes.

Summary by CodeRabbit

  • New Features

    • Added RISC-V matrix instruction support: tile config, tile load/store, accumulator ops, and integer int8×int8→int32 matrix accumulation with synchronization tokens and checkpointing.
  • Documentation

    • Added SE-mode matrix smoke guide describing supported instruction subset, validation scenarios, and known limitations.
  • Chores

    • Ensured SE mode disables certain CPU optimization flags for correct syscall/trap behavior.

Change-Id: I082477f9d3ed53e4676638680a9b44a54255133d
Change-Id: Ie91d6c29d10ca45b16304cd1ab72a9810c7ef7b1
@jensen-yan jensen-yan changed the title Se matrix smoke se arch-riscv,configs: Add SE matrix smoke support Apr 9, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

This PR adds RISC-V matrix instruction support (ISA, decoder, formats, helpers, state, memory/compute ops, serialization) and moves SE config changes to unconditionally disable move-elimination and constant-folding across all system.cpu instances.

Changes

Cohort / File(s) Summary
Configuration Parameter Setup
configs/example/se.py
After applying ideal_kmhv3 preset (optional), iterate all system.cpu and set enableMoveElimination = False and enableConstantFolding = False unconditionally for SE mode.
Build / Insts List
src/arch/riscv/insts/SConscript
Added matrix.cc to RISC-V ISA source list and ensured file newline termination.
Matrix Helpers
src/arch/riscv/insts/matrix.hh, src/arch/riscv/insts/matrix.cc
New header constants for max dimensions and derived sizes; three clamping helper functions clampMatrixTileM/K/N that bound inputs to those maxima.
ISA Header / API
src/arch/riscv/isa.hh
Added matrix state members (tile dims, A/B int8 buffers, accumulator int32 buffer, token vector), accessors, lifecycle/sync methods, memory/compute operation declarations, and required includes.
ISA Implementation
src/arch/riscv/isa.cc
Implemented matrix blob read/write helpers, matrix state init/reset, token acquire/release/sync, clamped tile setters, matrix load/store (A8/B8/C32, store C32), zeroing accumulator, MM accumulate (int8×int8→int32), and serialization/unserialization of matrix state.
Decoder Entries
src/arch/riscv/isa/decoder.isa
Added QUADRANT 0x0a opcode routes and handlers for matrix configuration ops (msyncreset, mrelease, macquire, msettilem/k/n, mzero1r), memory ops (mlae8, mlbe8, mlce32, msce32), and arithmetic op (mmacc_w_b), dispatching to ISA methods with serialization/non-speculative semantics.
ISA Formats / Includes
src/arch/riscv/isa/formats/formats.isa, .../matrix_conf.isa, .../matrix_mem.isa, .../matrix_arith.isa, src/arch/riscv/isa/includes.isa
Added three new ISA format templates (MatrixConfOp, MatrixMemOp, MatrixArithOp) with serialization/memory flags and added arch/riscv/insts/matrix.hh to generated includes.
Docs
docs/Gem5_Docs/xsai/se_matrix_smoke.md
New SE-mode matrix smoke documentation describing scope, supported instruction subset, implementation notes, validation targets, runtime commands, and known limitations.

Sequence Diagram(s)

sequenceDiagram
    participant Exec as ExecContext
    participant ISA as RiscvISA::ISA
    participant Proxy as PortProxy
    participant Mem as Memory

    Exec->>ISA: matrixLoadA8(xc, base, stride)
    activate ISA
    ISA->>ISA: for each row -> compute addr
    ISA->>Proxy: matrixReadBlob(addr, buf)
    activate Proxy
    Proxy->>Mem: read request
    Mem-->>Proxy: data / fault
    Proxy-->>ISA: fault status
    deactivate Proxy
    alt fault
        ISA-->>Exec: return GenericPageTableFault
    else
        ISA->>ISA: store row into tile buffer
    end
    ISA-->>Exec: return NoFault
    deactivate ISA
Loading
sequenceDiagram
    participant Exec as ExecContext
    participant ISA as RiscvISA::ISA
    participant TileA as TileA(buf)
    participant TileB as TileB(buf)
    participant Acc as Accumulator(buf)

    Exec->>ISA: matrixMMAccWB()
    activate ISA
    loop m in 0..M-1
        loop n in 0..N-1
            Acc->>ISA: acc = Acc[m][n]
            loop k in 0..K-1
                ISA->>TileA: load a = A[m][k]
                ISA->>TileB: load b = B[k][n]
                ISA->>Acc: acc += int32(a) * int32(b)
            end
            ISA->>Acc: store Acc[m][n] = acc
        end
    end
    deactivate ISA
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • tastynoob

Poem

🐰 I hopped through tiles of M, K, N,

tokens clacked and accumulators grinned,
loads and stores in tidy rows,
mmacc danced where multiplication goes—
hooray, the matrix code begins! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'arch-riscv,configs: Add SE matrix smoke support' directly and clearly describes the main change: adding matrix support for SE mode smoke tests in RISC-V architecture and configurations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch se-matrix-smoke-se

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2665 -
This PR 2.2665 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

@jensen-yan
Copy link
Copy Markdown
Collaborator Author

Assuming the checkout is at xsai-env/GEM5, the SE matrix smoke flow can be reproduced with:

cd xsai-env/GEM5
scons build/RISCV/gem5.opt --gold-linker -j100

libc_mmap_smoke_xsai

./build/RISCV/gem5.opt \
  --outdir=/tmp/gem5-se-libc-mmap \
  configs/example/se.py \
  -c /tmp/libc_mmap_smoke_xsai \
  --enable-riscv-vector --no-pf

precomp_rand_repro

./build/RISCV/gem5.opt \
  --outdir=/tmp/gem5-se-precomp-rand \
  configs/example/se.py \
  -c /tmp/precomp_rand_repro \
  --enable-riscv-vector --no-pf

gemm_precomp

./build/RISCV/gem5.opt \
  --outdir=/tmp/gem5-se-gemm-precomp \
  configs/example/se.py \
  -c firmware/riscv-rootfs/apps/gemm_precomp/build/gemm_precomp \
  --enable-riscv-vector --no-pf

Optional longer-running coverage with hello_xsai:

timeout 300s ./build/RISCV/gem5.opt \
  --outdir=/tmp/gem5-se-hello-xsai \
  configs/example/se.py \
  -c firmware/riscv-rootfs/apps/hello_xsai/build/hello_xsai \
  --enable-riscv-vector --no-pf

Observed results on my side:

  • libc_mmap_smoke_xsai: passes
  • precomp_rand_repro: errors_shown=0
  • gemm_precomp: all 8 precomputed cases pass
  • hello_xsai: enters userland, passes allocator tests, and reaches randomized matrix fuzz

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/arch/riscv/isa.hh (1)

84-148: ⚠️ Potential issue | 🟠 Major

Copy the new matrix ISA state in copyRegsFrom.

Line 84 onward adds architectural state, but src/arch/riscv/isa.cc Lines 341-360 still only clone int/float regs and PC state. Any CPU/context handoff that goes through ISA::copyRegsFrom will drop live matrix tiles, accumulator contents, and tokens.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/arch/riscv/isa.hh` around lines 84 - 148, ISA::copyRegsFrom currently
copies basic int/float regs and PC but fails to copy the new matrix ISA state
(matrixTileM, matrixTileK, matrixTileN, matrixTileA, matrixTileB, matrixAcc,
matrixTokens), so live tiles/accumulators/tokens are lost during context
handoff; update ISA::copyRegsFrom to copy all matrix state from the source ISA
instance (use the source ThreadContext/src->getISA or cast to ISA to access
these members) including scalar tile dims (matrixTileM/K/N), the vectors
matrixTileA/matrixTileB/matrixAcc and the matrixTokens (ensure deep copy of
vectors and any RegVal contents), and preserve any token/index state used by
matrixSyncReset/matrixAcquire/matrixRelease so the target context has identical
matrix execution state.
🧹 Nitpick comments (2)
src/arch/riscv/insts/matrix.cc (1)

1-3: Consider clarifying the comment terminology.

The comment mentions "AME helpers" but the functions and file are named with "matrix". Consider aligning terminology for clarity (e.g., "Matrix extension helpers" or expand "AME" if it's an acronym).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/arch/riscv/insts/matrix.cc` around lines 1 - 3, The top-of-file comment
uses the term "AME helpers" which conflicts with the file/functions named
"matrix"; update the header comment to use consistent terminology (e.g., "Matrix
extension helpers" or expand the acronym "AME" to "Atomic Memory Extension (AME)
/ Matrix extension" as appropriate) so it matches the file name and function
names in matrix.cc and clarifies intent for readers.
src/arch/riscv/isa.cc (1)

331-333: Avoid double-initializing matrix state in the constructor.

clear() on Line 365 already calls resetMatrixState(), so the extra call here just zeros and reallocates the matrix backing vectors twice during construction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/arch/riscv/isa.cc` around lines 331 - 333, The constructor currently
calls resetMatrixState() and then clear(), but clear() already calls
resetMatrixState(), causing double-initialization; remove the explicit
resetMatrixState() invocation so the sequence becomes
miscRegFile.resize(NUM_MISCREGS); clear(); and rely on clear() to perform the
matrix reset (refer to the resetMatrixState(), clear(), and miscRegFile.resize
calls to locate the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@configs/example/se.py`:
- Around line 388-390: The loop unconditionally sets enableMoveElimination and
enableConstantFolding on every CPU in system.cpu, which raises AttributeError
for non-O3 CPUs; update the loop to check the CPU class/type before assignment
(e.g., isinstance or hasattr) so only CPUs that support these O3-only params
(e.g., BaseO3CPU-derived cores) get those attributes set; specifically guard the
assignments around system.cpu so you only set cpu.enableMoveElimination and
cpu.enableConstantFolding when the CPU exposes those symbols or is an O3 CPU.

In `@src/arch/riscv/isa.cc`:
- Around line 1015-1022: The unserialize path must validate fixed-size matrix
state after reading matrixTileM, matrixTileK, matrixTileN and the containers
miscRegFile, matrixTileA, matrixTileB, matrixAcc, matrixTokens: check that tile
dimensions are within architectural limits and that each container length equals
the expected fixed size (pad with zeros or resize to the fixed capacity, or fail
fast and log an error) before returning from unserialize(); reject or correct
malformed checkpoints to prevent matrixLoad* and matrixMMAccWB() from indexing
past the end. Ensure you reference the same symbols (unserialize(),
matrixTileM/K/N, matrixTileA, matrixTileB, matrixAcc, matrixTokens, matrixLoad*,
matrixMMAccWB()) when adding these checks and error paths.
- Around line 77-111: matrixReadBlob and matrixWriteBlob currently use
TranslatingPortProxy/SETranslatingPortProxy and bypass the standard ExecContext
memory path; replace those proxy reads/writes with the ExecContext APIs used by
vector/AMO code (use the ExecContext instance associated with the ThreadContext
and call its readMem()/writeMem() methods with the same addr, dst/src and size),
propagate the boolean success/failure into the existing GenericPageTableFault
return on failure, and otherwise return NoFault; update matrixReadBlob and
matrixWriteBlob to remove TranslatingPortProxy/SETranslatingPortProxy usage and
call xc->readMem()/xc->writeMem() on the ExecContext for correct
timing/cache/fault behavior.
- Around line 421-423: The panic format specifiers are incorrect: in
matrixAcquire replace the "%u" for token_idx (uint64_t) with a proper 64-bit
specifier (use PRIu64 from <inttypes.h> or %llu consistently) and in both
matrixToken overloads replace "%u" for idx (size_t) with "%zu"; update the
panic_if call string ("macquire tok%u ...") and any other panic/printf uses
referencing token_idx or idx accordingly, and add an include for <inttypes.h> if
you choose PRIu64 so the types and format macros match exactly.

---

Outside diff comments:
In `@src/arch/riscv/isa.hh`:
- Around line 84-148: ISA::copyRegsFrom currently copies basic int/float regs
and PC but fails to copy the new matrix ISA state (matrixTileM, matrixTileK,
matrixTileN, matrixTileA, matrixTileB, matrixAcc, matrixTokens), so live
tiles/accumulators/tokens are lost during context handoff; update
ISA::copyRegsFrom to copy all matrix state from the source ISA instance (use the
source ThreadContext/src->getISA or cast to ISA to access these members)
including scalar tile dims (matrixTileM/K/N), the vectors
matrixTileA/matrixTileB/matrixAcc and the matrixTokens (ensure deep copy of
vectors and any RegVal contents), and preserve any token/index state used by
matrixSyncReset/matrixAcquire/matrixRelease so the target context has identical
matrix execution state.

---

Nitpick comments:
In `@src/arch/riscv/insts/matrix.cc`:
- Around line 1-3: The top-of-file comment uses the term "AME helpers" which
conflicts with the file/functions named "matrix"; update the header comment to
use consistent terminology (e.g., "Matrix extension helpers" or expand the
acronym "AME" to "Atomic Memory Extension (AME) / Matrix extension" as
appropriate) so it matches the file name and function names in matrix.cc and
clarifies intent for readers.

In `@src/arch/riscv/isa.cc`:
- Around line 331-333: The constructor currently calls resetMatrixState() and
then clear(), but clear() already calls resetMatrixState(), causing
double-initialization; remove the explicit resetMatrixState() invocation so the
sequence becomes miscRegFile.resize(NUM_MISCREGS); clear(); and rely on clear()
to perform the matrix reset (refer to the resetMatrixState(), clear(), and
miscRegFile.resize calls to locate the change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ea2d01c-0528-40bb-aaf7-d822760a46e1

📥 Commits

Reviewing files that changed from the base of the PR and between 7a9783c and e471dff.

📒 Files selected for processing (12)
  • configs/example/se.py
  • src/arch/riscv/insts/SConscript
  • src/arch/riscv/insts/matrix.cc
  • src/arch/riscv/insts/matrix.hh
  • src/arch/riscv/isa.cc
  • src/arch/riscv/isa.hh
  • src/arch/riscv/isa/decoder.isa
  • src/arch/riscv/isa/formats/formats.isa
  • src/arch/riscv/isa/formats/matrix_arith.isa
  • src/arch/riscv/isa/formats/matrix_conf.isa
  • src/arch/riscv/isa/formats/matrix_mem.isa
  • src/arch/riscv/isa/includes.isa

Change-Id: I717bea122bbe582f5589c1f189b763be056d117a
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/Gem5_Docs/xsai/se_matrix_smoke.md (1)

84-85: Polish wording to avoid ambiguity in execution-delay sentence.

“没有给 AME 指令补单独的执行延迟”读起来不够顺。建议改成更直接的表述,避免歧义。

✏️ Suggested doc edit
-- 没有给 AME 指令补单独的执行延迟
+- 没有为 AME 指令单独建模执行延迟
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md` around lines 84 - 85, Replace the
ambiguous sentence "没有给 AME 指令补单独的执行延迟" with a clearer, more direct phrasing
such as "未为 AME 指令单独建模执行延迟" (or "未为 AME 指令单独设置执行延迟") so it explicitly states
that AME instructions do not have individual execution-delay modeling; update
the line alongside the existing bullet "没有做 matrix 单元与 LSU/L2 的时序建模" to keep
parallel structure and clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md`:
- Around line 166-173: The run command uses an ambiguous "cd GEM5" which is
unclear to first-time reproducers; update the shell snippet so it sets an
explicit repository path consistent with earlier instructions (e.g., replace "cd
GEM5" with "cd xsai-env/GEM5" or otherwise use the full repo path), and keep the
subsequent gem5 invocation lines unchanged so the example remains reproducible.

---

Nitpick comments:
In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md`:
- Around line 84-85: Replace the ambiguous sentence "没有给 AME 指令补单独的执行延迟" with a
clearer, more direct phrasing such as "未为 AME 指令单独建模执行延迟" (or "未为 AME
指令单独设置执行延迟") so it explicitly states that AME instructions do not have
individual execution-delay modeling; update the line alongside the existing
bullet "没有做 matrix 单元与 LSU/L2 的时序建模" to keep parallel structure and clarity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d86a7327-7a3b-45db-9eaa-750fe2aba49d

📥 Commits

Reviewing files that changed from the base of the PR and between e471dff and bf8e9ff.

📒 Files selected for processing (1)
  • docs/Gem5_Docs/xsai/se_matrix_smoke.md

Comment on lines +166 to +173
```bash
cd GEM5
./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-gemm-precomp \
configs/example/se.py \
-c firmware/riscv-rootfs/apps/gemm_precomp/build/gemm_precomp \
--enable-riscv-vector --no-pf
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make run command more reproducible with explicit repo path context.

这里的 cd GEM5 对首次复现者不够明确;建议和前文复现说明保持一致(例如 xsai-env/GEM5)。

🧭 Suggested doc edit
-cd GEM5
+cd xsai-env/GEM5
 ./build/RISCV/gem5.opt \
   --outdir=/tmp/gem5-se-gemm-precomp \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```bash
cd GEM5
./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-gemm-precomp \
configs/example/se.py \
-c firmware/riscv-rootfs/apps/gemm_precomp/build/gemm_precomp \
--enable-riscv-vector --no-pf
```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md` around lines 166 - 173, The run
command uses an ambiguous "cd GEM5" which is unclear to first-time reproducers;
update the shell snippet so it sets an explicit repository path consistent with
earlier instructions (e.g., replace "cd GEM5" with "cd xsai-env/GEM5" or
otherwise use the full repo path), and keep the subsequent gem5 invocation lines
unchanged so the example remains reproducible.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.2709 -
This PR 2.2709 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant