arch-riscv,configs: Add SE matrix smoke support#819
arch-riscv,configs: Add SE matrix smoke support#819jensen-yan wants to merge 3 commits intoxs-devfrom
Conversation
Change-Id: I082477f9d3ed53e4676638680a9b44a54255133d
Change-Id: Ie91d6c29d10ca45b16304cd1ab72a9810c7ef7b1
📝 WalkthroughWalkthroughThis PR adds RISC-V matrix instruction support (ISA, decoder, formats, helpers, state, memory/compute ops, serialization) and moves SE config changes to unconditionally disable move-elimination and constant-folding across all Changes
Sequence Diagram(s)sequenceDiagram
participant Exec as ExecContext
participant ISA as RiscvISA::ISA
participant Proxy as PortProxy
participant Mem as Memory
Exec->>ISA: matrixLoadA8(xc, base, stride)
activate ISA
ISA->>ISA: for each row -> compute addr
ISA->>Proxy: matrixReadBlob(addr, buf)
activate Proxy
Proxy->>Mem: read request
Mem-->>Proxy: data / fault
Proxy-->>ISA: fault status
deactivate Proxy
alt fault
ISA-->>Exec: return GenericPageTableFault
else
ISA->>ISA: store row into tile buffer
end
ISA-->>Exec: return NoFault
deactivate ISA
sequenceDiagram
participant Exec as ExecContext
participant ISA as RiscvISA::ISA
participant TileA as TileA(buf)
participant TileB as TileB(buf)
participant Acc as Accumulator(buf)
Exec->>ISA: matrixMMAccWB()
activate ISA
loop m in 0..M-1
loop n in 0..N-1
Acc->>ISA: acc = Acc[m][n]
loop k in 0..K-1
ISA->>TileA: load a = A[m][k]
ISA->>TileB: load b = B[k][n]
ISA->>Acc: acc += int32(a) * int32(b)
end
ISA->>Acc: store Acc[m][n] = acc
end
end
deactivate ISA
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
Assuming the checkout is at cd xsai-env/GEM5
scons build/RISCV/gem5.opt --gold-linker -j100
./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-libc-mmap \
configs/example/se.py \
-c /tmp/libc_mmap_smoke_xsai \
--enable-riscv-vector --no-pf
./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-precomp-rand \
configs/example/se.py \
-c /tmp/precomp_rand_repro \
--enable-riscv-vector --no-pf
./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-gemm-precomp \
configs/example/se.py \
-c firmware/riscv-rootfs/apps/gemm_precomp/build/gemm_precomp \
--enable-riscv-vector --no-pfOptional longer-running coverage with timeout 300s ./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-hello-xsai \
configs/example/se.py \
-c firmware/riscv-rootfs/apps/hello_xsai/build/hello_xsai \
--enable-riscv-vector --no-pfObserved results on my side:
|
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/arch/riscv/isa.hh (1)
84-148:⚠️ Potential issue | 🟠 MajorCopy the new matrix ISA state in
copyRegsFrom.Line 84 onward adds architectural state, but
src/arch/riscv/isa.ccLines 341-360 still only clone int/float regs and PC state. Any CPU/context handoff that goes throughISA::copyRegsFromwill drop live matrix tiles, accumulator contents, and tokens.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/arch/riscv/isa.hh` around lines 84 - 148, ISA::copyRegsFrom currently copies basic int/float regs and PC but fails to copy the new matrix ISA state (matrixTileM, matrixTileK, matrixTileN, matrixTileA, matrixTileB, matrixAcc, matrixTokens), so live tiles/accumulators/tokens are lost during context handoff; update ISA::copyRegsFrom to copy all matrix state from the source ISA instance (use the source ThreadContext/src->getISA or cast to ISA to access these members) including scalar tile dims (matrixTileM/K/N), the vectors matrixTileA/matrixTileB/matrixAcc and the matrixTokens (ensure deep copy of vectors and any RegVal contents), and preserve any token/index state used by matrixSyncReset/matrixAcquire/matrixRelease so the target context has identical matrix execution state.
🧹 Nitpick comments (2)
src/arch/riscv/insts/matrix.cc (1)
1-3: Consider clarifying the comment terminology.The comment mentions "AME helpers" but the functions and file are named with "matrix". Consider aligning terminology for clarity (e.g., "Matrix extension helpers" or expand "AME" if it's an acronym).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/arch/riscv/insts/matrix.cc` around lines 1 - 3, The top-of-file comment uses the term "AME helpers" which conflicts with the file/functions named "matrix"; update the header comment to use consistent terminology (e.g., "Matrix extension helpers" or expand the acronym "AME" to "Atomic Memory Extension (AME) / Matrix extension" as appropriate) so it matches the file name and function names in matrix.cc and clarifies intent for readers.src/arch/riscv/isa.cc (1)
331-333: Avoid double-initializing matrix state in the constructor.
clear()on Line 365 already callsresetMatrixState(), so the extra call here just zeros and reallocates the matrix backing vectors twice during construction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/arch/riscv/isa.cc` around lines 331 - 333, The constructor currently calls resetMatrixState() and then clear(), but clear() already calls resetMatrixState(), causing double-initialization; remove the explicit resetMatrixState() invocation so the sequence becomes miscRegFile.resize(NUM_MISCREGS); clear(); and rely on clear() to perform the matrix reset (refer to the resetMatrixState(), clear(), and miscRegFile.resize calls to locate the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@configs/example/se.py`:
- Around line 388-390: The loop unconditionally sets enableMoveElimination and
enableConstantFolding on every CPU in system.cpu, which raises AttributeError
for non-O3 CPUs; update the loop to check the CPU class/type before assignment
(e.g., isinstance or hasattr) so only CPUs that support these O3-only params
(e.g., BaseO3CPU-derived cores) get those attributes set; specifically guard the
assignments around system.cpu so you only set cpu.enableMoveElimination and
cpu.enableConstantFolding when the CPU exposes those symbols or is an O3 CPU.
In `@src/arch/riscv/isa.cc`:
- Around line 1015-1022: The unserialize path must validate fixed-size matrix
state after reading matrixTileM, matrixTileK, matrixTileN and the containers
miscRegFile, matrixTileA, matrixTileB, matrixAcc, matrixTokens: check that tile
dimensions are within architectural limits and that each container length equals
the expected fixed size (pad with zeros or resize to the fixed capacity, or fail
fast and log an error) before returning from unserialize(); reject or correct
malformed checkpoints to prevent matrixLoad* and matrixMMAccWB() from indexing
past the end. Ensure you reference the same symbols (unserialize(),
matrixTileM/K/N, matrixTileA, matrixTileB, matrixAcc, matrixTokens, matrixLoad*,
matrixMMAccWB()) when adding these checks and error paths.
- Around line 77-111: matrixReadBlob and matrixWriteBlob currently use
TranslatingPortProxy/SETranslatingPortProxy and bypass the standard ExecContext
memory path; replace those proxy reads/writes with the ExecContext APIs used by
vector/AMO code (use the ExecContext instance associated with the ThreadContext
and call its readMem()/writeMem() methods with the same addr, dst/src and size),
propagate the boolean success/failure into the existing GenericPageTableFault
return on failure, and otherwise return NoFault; update matrixReadBlob and
matrixWriteBlob to remove TranslatingPortProxy/SETranslatingPortProxy usage and
call xc->readMem()/xc->writeMem() on the ExecContext for correct
timing/cache/fault behavior.
- Around line 421-423: The panic format specifiers are incorrect: in
matrixAcquire replace the "%u" for token_idx (uint64_t) with a proper 64-bit
specifier (use PRIu64 from <inttypes.h> or %llu consistently) and in both
matrixToken overloads replace "%u" for idx (size_t) with "%zu"; update the
panic_if call string ("macquire tok%u ...") and any other panic/printf uses
referencing token_idx or idx accordingly, and add an include for <inttypes.h> if
you choose PRIu64 so the types and format macros match exactly.
---
Outside diff comments:
In `@src/arch/riscv/isa.hh`:
- Around line 84-148: ISA::copyRegsFrom currently copies basic int/float regs
and PC but fails to copy the new matrix ISA state (matrixTileM, matrixTileK,
matrixTileN, matrixTileA, matrixTileB, matrixAcc, matrixTokens), so live
tiles/accumulators/tokens are lost during context handoff; update
ISA::copyRegsFrom to copy all matrix state from the source ISA instance (use the
source ThreadContext/src->getISA or cast to ISA to access these members)
including scalar tile dims (matrixTileM/K/N), the vectors
matrixTileA/matrixTileB/matrixAcc and the matrixTokens (ensure deep copy of
vectors and any RegVal contents), and preserve any token/index state used by
matrixSyncReset/matrixAcquire/matrixRelease so the target context has identical
matrix execution state.
---
Nitpick comments:
In `@src/arch/riscv/insts/matrix.cc`:
- Around line 1-3: The top-of-file comment uses the term "AME helpers" which
conflicts with the file/functions named "matrix"; update the header comment to
use consistent terminology (e.g., "Matrix extension helpers" or expand the
acronym "AME" to "Atomic Memory Extension (AME) / Matrix extension" as
appropriate) so it matches the file name and function names in matrix.cc and
clarifies intent for readers.
In `@src/arch/riscv/isa.cc`:
- Around line 331-333: The constructor currently calls resetMatrixState() and
then clear(), but clear() already calls resetMatrixState(), causing
double-initialization; remove the explicit resetMatrixState() invocation so the
sequence becomes miscRegFile.resize(NUM_MISCREGS); clear(); and rely on clear()
to perform the matrix reset (refer to the resetMatrixState(), clear(), and
miscRegFile.resize calls to locate the change).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8ea2d01c-0528-40bb-aaf7-d822760a46e1
📒 Files selected for processing (12)
configs/example/se.pysrc/arch/riscv/insts/SConscriptsrc/arch/riscv/insts/matrix.ccsrc/arch/riscv/insts/matrix.hhsrc/arch/riscv/isa.ccsrc/arch/riscv/isa.hhsrc/arch/riscv/isa/decoder.isasrc/arch/riscv/isa/formats/formats.isasrc/arch/riscv/isa/formats/matrix_arith.isasrc/arch/riscv/isa/formats/matrix_conf.isasrc/arch/riscv/isa/formats/matrix_mem.isasrc/arch/riscv/isa/includes.isa
Change-Id: I717bea122bbe582f5589c1f189b763be056d117a
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/Gem5_Docs/xsai/se_matrix_smoke.md (1)
84-85: Polish wording to avoid ambiguity in execution-delay sentence.“没有给 AME 指令补单独的执行延迟”读起来不够顺。建议改成更直接的表述,避免歧义。
✏️ Suggested doc edit
-- 没有给 AME 指令补单独的执行延迟 +- 没有为 AME 指令单独建模执行延迟🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md` around lines 84 - 85, Replace the ambiguous sentence "没有给 AME 指令补单独的执行延迟" with a clearer, more direct phrasing such as "未为 AME 指令单独建模执行延迟" (or "未为 AME 指令单独设置执行延迟") so it explicitly states that AME instructions do not have individual execution-delay modeling; update the line alongside the existing bullet "没有做 matrix 单元与 LSU/L2 的时序建模" to keep parallel structure and clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md`:
- Around line 166-173: The run command uses an ambiguous "cd GEM5" which is
unclear to first-time reproducers; update the shell snippet so it sets an
explicit repository path consistent with earlier instructions (e.g., replace "cd
GEM5" with "cd xsai-env/GEM5" or otherwise use the full repo path), and keep the
subsequent gem5 invocation lines unchanged so the example remains reproducible.
---
Nitpick comments:
In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md`:
- Around line 84-85: Replace the ambiguous sentence "没有给 AME 指令补单独的执行延迟" with a
clearer, more direct phrasing such as "未为 AME 指令单独建模执行延迟" (or "未为 AME
指令单独设置执行延迟") so it explicitly states that AME instructions do not have
individual execution-delay modeling; update the line alongside the existing
bullet "没有做 matrix 单元与 LSU/L2 的时序建模" to keep parallel structure and clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d86a7327-7a3b-45db-9eaa-750fe2aba49d
📒 Files selected for processing (1)
docs/Gem5_Docs/xsai/se_matrix_smoke.md
| ```bash | ||
| cd GEM5 | ||
| ./build/RISCV/gem5.opt \ | ||
| --outdir=/tmp/gem5-se-gemm-precomp \ | ||
| configs/example/se.py \ | ||
| -c firmware/riscv-rootfs/apps/gemm_precomp/build/gemm_precomp \ | ||
| --enable-riscv-vector --no-pf | ||
| ``` |
There was a problem hiding this comment.
Make run command more reproducible with explicit repo path context.
这里的 cd GEM5 对首次复现者不够明确;建议和前文复现说明保持一致(例如 xsai-env/GEM5)。
🧭 Suggested doc edit
-cd GEM5
+cd xsai-env/GEM5
./build/RISCV/gem5.opt \
--outdir=/tmp/gem5-se-gemm-precomp \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```bash | |
| cd GEM5 | |
| ./build/RISCV/gem5.opt \ | |
| --outdir=/tmp/gem5-se-gemm-precomp \ | |
| configs/example/se.py \ | |
| -c firmware/riscv-rootfs/apps/gemm_precomp/build/gemm_precomp \ | |
| --enable-riscv-vector --no-pf | |
| ``` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/Gem5_Docs/xsai/se_matrix_smoke.md` around lines 166 - 173, The run
command uses an ambiguous "cd GEM5" which is unclear to first-time reproducers;
update the shell snippet so it sets an explicit repository path consistent with
earlier instructions (e.g., replace "cd GEM5" with "cd xsai-env/GEM5" or
otherwise use the full repo path), and keep the subsequent gem5 invocation lines
unchanged so the example remains reproducible.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Summary
Add the minimum RISC-V matrix support needed to run matrix smoke tests in SE mode.
What this changes
RiscvISA::ISAconfigs/example/se.pyso trap/syscall register updates remain visible in SE modeWhy
This PR is intended to make SE mode usable for matrix smoke validation before moving on to fuller FS-mode testing.
With these changes, the matrix smoke binaries can run in gem5 SE mode and the basic userland path is stable enough for bring-up and regression use.
Validation
Validated in SE mode with:
libc_mmap_smoke_xsai: passesprecomp_rand_repro:errors_shown=0gemm_precomp: all 8 precomputed cases passAdditional coverage:
hello_xsaienters userland, passes the allocator tests, and reaches the randomized matrix fuzz loop; it did not complete within the extended timeout window, but no correctness failure was observed before timeoutNotes
This PR intentionally stays focused on SE-mode smoke support in gem5 only.
It does not include FS/raw-linux bring-up changes.
Summary by CodeRabbit
New Features
Documentation
Chores