Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V NativeAOT port #106223

Open
am11 opened this issue Aug 10, 2024 · 188 comments
Open

RISC-V NativeAOT port #106223

am11 opened this issue Aug 10, 2024 · 188 comments
Labels
arch-riscv Related to the RISC-V architecture area-NativeAOT-coreclr
Milestone

Comments

@am11
Copy link
Member

am11 commented Aug 10, 2024

This is to track the progress of nativeaot port on riscv64 architecture.

WIP initial translation (based on @sunlijun-610's LA64 port):
main...am11:runtime:feature/nativeaot/riscv64-port

@am11 am11 added area-NativeAOT-coreclr arch-riscv Related to the RISC-V architecture labels Aug 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 10, 2024
@am11
Copy link
Member Author

am11 commented Aug 10, 2024

cc @dotnet/samsung, @filipnavara

If someone wants take over, let me know. I can also give ssh access to riscv computer (bianbu OS, Debian derivative) if needed for testing. 👍

@am11
Copy link
Member Author

am11 commented Aug 12, 2024

Some assertions are failing at build-time in AsmOffset* and llvm-libunwind due to the sizes of various structs. llvm-libunwind is getting 96 bytes size difference in UnwindCursor<A, R> vs. unw_cursor_t which is a mystery. See the TODO added in src/native/external/llvm-libunwind/src/UnwindCursor.hpp, that template function is there temporarily to print the sizes on stderr during the build.

Also, I've used the register names we have in coreclr/pal. Some of them might not be needed, but I'm not sure since llvm-libunwind uses different names X1,X2,X3.. while coreclr/pal is using T1,T2,T3.. so I used the latter and handled the mapping in src/coreclr/nativeaot/Runtime/unix/UnwindHelpers.cpp.

@MichalStrehovsky MichalStrehovsky added this to the Future milestone Aug 12, 2024
@MichalStrehovsky MichalStrehovsky removed the untriaged New issue has not been triaged by the area owner label Aug 12, 2024
@am11
Copy link
Member Author

am11 commented Aug 23, 2024

Applied changes from #106225.

FWIW, this is my workflow:

# on macOS arm64
$ cd runtime
$ docker run -e ROOTFS_DIR=/crossrootfs/riscv64 -v$(pwd):/runtime -w /runtime --rm \
    --platform linux/arm64 -it ubuntu

$ eng/common/native/install-dependencies.sh 
$ apt install -y debootstrap
$ eng/common/cross/build-rootfs.sh riscv64 noble --skipunmount --rootfsdir /crossrootfs/riscv64
$ src/coreclr/build-runtime.sh -cross -riscv64

# once nativeaot runtime ^ starts building in silo, we can try a full build
# rm -rf artifacts
# ./build.sh clr+libs+packs -cross --arch riscv64

currently it is not getting too far 😅

...
[ 45%] Building CXX object nativeaot/Runtime/Full/CMakeFiles/Runtime.WorkstationGC.dir/__/__/__/gc/unix/events.cpp.o
[ 45%] Building CXX object classlibnative/bcltype/CMakeFiles/bcltype.dir/system.cpp.o
In file included from /runtime/src/coreclr/nativeaot/Runtime/AsmOffsetsVerify.cpp:43:
In file included from /runtime/src/coreclr/nativeaot/Runtime/AsmOffsets.h:79:
/runtime/src/coreclr/nativeaot/Runtime/riscv64/AsmOffsetsCpu.h:10:1: error: static assertion failed due to requirement '(sizeof(ExInfo) == 2120) || (sizeof(ExInfo) > 2120)': Bad asm size for 'ExInfo', the actual size is smaller than 0x848.
   10 | PLAT_ASM_SIZEOF(848, ExInfo)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/runtime/src/coreclr/nativeaot/Runtime/AsmOffsetsVerify.cpp:36:19: note: expanded from macro 'PLAT_ASM_SIZEOF'
   36 |     static_assert((sizeof(cls) == 0x##size) || (sizeof(cls) > 0x##size), "Bad asm size for '" #cls "', the actual size is smaller than 0x" #size "."); \
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /runtime/src/coreclr/nativeaot/Runtime/AsmOffsetsVerify.cpp:43:
In file included from /runtime/src/coreclr/nativeaot/Runtime/AsmOffsets.h:79:
/runtime/src/coreclr/nativeaot/Runtime/riscv64/AsmOffsetsCpu.h:15:1: error: static assertion failed due to requirement '(__builtin_offsetof(ExInfo, m_passNumber) == 28) || (__builtin_offsetof(ExInfo, m_passNumber) > 28)': Bad asm offset for 'ExInfo.m_passNumber', the actual offset is smaller than 0x1C.
   15 | PLAT_ASM_OFFSET(1C, ExInfo, m_passNumber)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/runtime/src/coreclr/nativeaot/Runtime/AsmOffsetsVerify.cpp:32:19: note: expanded from macro 'PLAT_ASM_OFFSET'
   32 |     static_assert((offsetof(cls, member) == 0x##offset) || (offsetof(cls, member) > 0x##offset), "Bad asm offset for '" #cls "." #member "', the actual offset is smaller than 0x" #offset "."); \
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...

@filipnavara
Copy link
Member

I tried to compile your branch and I ended up pretty much at the same spot. Currently away from computer, but this should be easy to fix…

@am11
Copy link
Member Author

am11 commented Sep 29, 2024

@filipnavara, have you had a chance to check out the assertion failures? No rush—just wondering if there are any blockers or details we need to sort out. Thanks! 👍

@filipnavara
Copy link
Member

Unfortunately no, still tackling some things for .NET 9 release and few other hack projects that I want to get done for the .NET 10 timeframe. I hope to eventually look into it but it’s currently not a priority, sorry.

@am11
Copy link
Member Author

am11 commented Sep 29, 2024

Ok, no worries, I'll also take a look when I get a chance. Was hoping to get some feedback from @viewizard et al. on register naming since it's not immediately clear to which convention is preferred at present day (and given llvm-libunwind has its own which is obviously different than HP-libunwind, coreclr VM and PAL). 😅

@sunlijun-610
Copy link
Contributor

Here are some of my ideas, and I'm not sure if they're helpful.
I have also encountered the same problem when adding the LoongArch architecture, and I think the assertion failures should come from the incorrect offsets and sizes in src/coreclr/nativeaot/Runtime/riscv64/AsmOffsetsCpu.h. The offsets and sizes should be defined according to the actual definition of the structures.
Before rewriting src/coreclr/nativeaot/Runtime/riscv64/AsmOffsetsCpu.h, I think we should first determine the register names defined in the structures, for example, in src/coreclr/nativeaot/Runtime/regdisplay.h. Now the register names in regdisplay.h and AsmOffsetsCpu.h are still conflicting. And then determine the order of variables in the structures. After determining the definition of the structures, we can easily rewrite AsmOffsetsCpu.h all at once.

@am11
Copy link
Member Author

am11 commented Sep 30, 2024

Thanks @sunlijun-610, https://github.com/am11/runtime/tree/feature/nativeaot/riscv64-port has the changes. Currently, the size of UnwindCursor<A, R> is 864 and unw_context_t is 768, so the does_fit assertion fails in llvm-libunwind UnwindCursor.hpp https://github.com/am11/runtime/blob/feature/nativeaot/riscv64-port/src/native/external/llvm-libunwind/src/UnwindCursor.hpp#L1349-L1357 (the existing assertions only tell us one operand's value not both sides that's why I added this template one to see the computed sizes at built time). If you have ideas how to satisfy AsmOffsetsCpu and UnwindCursor assertions simultaneously, feel free to push commits to the branch when you have time. I've sent you the colab invite. :)

@sunlijun-610
Copy link
Contributor

Thanks! About the does_fit assertion failure in llvm-libunwind UnwindCursor.hpp, I think it's no need to add +32 in # define _LIBUNWIND_CURSOR_SIZE (_LIBUNWIND_CONTEXT_SIZE + 12) + 32, because _LIBUNWIND_CONTEXT_SIZE has been added. Then the size of unw_cursor_t is 864, same size as UnwindCursor<A, R>.
https://github.com/am11/runtime/blob/4cbeceef967c727605869d687a612d43315a13a1/src/native/external/llvm-libunwind/include/__libunwind_config.h#L158

@am11
Copy link
Member Author

am11 commented Jan 10, 2025

Bulk of change is merged and SDK is enabling it dotnet/sdk#45852.

Last year, .NET 9 preview 1 was released on Feb. 13, so in a month or so, runtime's global.json will pick up the RID. ATM it throws unsupported platform error with PublishAot=true.

@am11
Copy link
Member Author

am11 commented Jan 10, 2025

#111273 is a baby step to get the bits working. After that update, next issue is:

root@0900f1880a04:/app1# /runtime/artifacts/bin/coreclr/linux.arm64.Debug/ilc-published/ilc @app1.ilc.rsp 
ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
ILC: Assertion failed 'isValidSimm32(addr - (ssize_t)dst)' in 'System.ObsoleteAttribute:.ctor(System.String,ubyte):this' during 'Emit code' (IL size 21; hash 0x817d7d95; FullOpts)

ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
ILC: Assertion failed 'isValidSimm32(addr - (ssize_t)dst)' in 'Internal.CompilerGenerated.<Module>:DynamicInvoke(ulong,byref,byref,ulong):byref' during 'Emit code' (IL size 50; hash 0xc1e09f14; FullOpts)

ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
ILC: Assertion failed 'isValidSimm32(addr - (ssize_t)dst)' in 'System.Runtime.CompilerServices.NullableAttribute:.ctor(ubyte):this' during 'Emit code' (IL size 23; hash 0x1237d66e; FullOpts)

Aborted

while we are waiting for the upcoming SDK 10 preview 1 to be picked up in runtime global.json (a month away), I found a way to test things locally.

  1. cross build runtime using this branch on linux-arm64: Use live apphost when publishing ilc as singlefile #105004 (i'm running build on osx-arm64 in container that's why linux-arm64)
    $ /runtime/eng/common/cross/build-rootfs.sh noble riscv64 --rootfsdir /crossrootfs/riscv64
    
    $ export ROOTFS_DIR=/crossrootfs/riscv64
    
    # stage 1
    $ /runtime/build.sh -arch riscv64 -cross -s clr+libs+host -c Debug -rc Checked --keepnativesymbols true -p:StageOneBuild=true
    # stage 2
    $ /runtime/build.sh -arch riscv64 -cross -s tools+clr.tools+packs -c Debug -rc Checked --keepnativesymbols true -p:StageTwoBuild=true
    
    # regular build for host
    $ /runtime/build.sh  clr+libs -c Debug --keepnativesymbols true
  2. PublishAot a helloworld app app1 on linux-arm64 machine (which has SDK integration). dotnet10 new console --aot -n app1 and dotnet10 publish
  3. copy the response file app1.ilc.rsp from obj/ directory and manually tweak it to point to riscv64 cross build paths: https://gist.github.com/am11/fc0bbf930f95f1bee2f100f145474e0a
  4. copy libclrjit_unix_riscv64_arm64.so from /runtime/artifacts/bin/coreclr/linux.riscv64.Checked/arm64/libclrjit_unix_riscv64_arm64.so to /runtime/artifacts/bin/coreclr/linux.arm64.Debug/ilc-published/
  5. run /runtime/artifacts/bin/coreclr/linux.arm64.Debug/ilc-published/ilc @app1.ilc.rsp
Under lldb
(lldb) r
Process 40540 launched: '/runtime/artifacts/bin/coreclr/linux.arm64.Debug/ilc-published/ilc' (aarch64)
ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
ILC: Assertion failed 'isValidSimm32(addr - (ssize_t)dst)' in 'System.ObsoleteAttribute:.ctor(System.String,ubyte):this' during 'Emit code' (IL size 21; hash 0x817d7d95; FullOpts)

ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
ILC: Assertion failed 'isValidSimm32(addr - (ssize_t)dst)' in 'Internal.CompilerGenerated.<Module>:DynamicInvoke(ulong,byref,byref,ulong):byref' during 'Emit code' (IL size 50; hash 0xc1e09f14; FullOpts)

ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
ILC: Assertion failed 'isValidSimm32(addr - (ssize_t)dst)' in 'System.Runtime.CompilerServices.NullableAttribute:.ctor(ubyte):this' during 'Emit code' (IL size 23; hash 0x1237d66e; FullOpts)

ILC: /runtime/src/coreclr/jit/emitriscv64.cpp:1630
Process 40540 stopped
* thread #1, name = 'ilc', stop reason = signal SIGTRAP
    frame #0: 0x0000ffffe46480a8 libclrjit_unix_riscv64_arm64.so`DBG_DebugBreak at debugbreak.S:7
   4   	#include "unixasmmacros.inc"
   5   	
   6   	LEAF_ENTRY DBG_DebugBreak, _TEXT
-> 7   	    EMIT_BREAKPOINT
   8   	    ret
   9   	LEAF_END_MARKED DBG_DebugBreak, _TEXT
   10  	
  thread #23, name = '.NET TP Worker', stop reason = signal SIGTRAP
    frame #0: 0x0000ffffe46480a8 libclrjit_unix_riscv64_arm64.so`DBG_DebugBreak at debugbreak.S:7
   4   	#include "unixasmmacros.inc"
   5   	
   6   	LEAF_ENTRY DBG_DebugBreak, _TEXT
-> 7   	    EMIT_BREAKPOINT
   8   	    ret
   9   	LEAF_END_MARKED DBG_DebugBreak, _TEXT
   10  	
  thread #25, name = '.NET TP Worker', stop reason = signal SIGTRAP
    frame #0: 0x0000ffffe46480a8 libclrjit_unix_riscv64_arm64.so`DBG_DebugBreak at debugbreak.S:7
   4   	#include "unixasmmacros.inc"
   5   	
   6   	LEAF_ENTRY DBG_DebugBreak, _TEXT
-> 7   	    EMIT_BREAKPOINT
   8   	    ret
   9   	LEAF_END_MARKED DBG_DebugBreak, _TEXT
   10  	
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
(lldb) bt
* thread #1, name = 'ilc', stop reason = signal SIGTRAP
  * frame #0: 0x0000ffffe46480a8 libclrjit_unix_riscv64_arm64.so`DBG_DebugBreak at debugbreak.S:7
    frame #1: 0x0000ffffe45ecc8c libclrjit_unix_riscv64_arm64.so`::DebugBreak() at debug.cpp:406:9 [opt]
    frame #2: 0x0000ffffe4391a20 libclrjit_unix_riscv64_arm64.so`assertAbort(why="isValidSimm32(addr - (ssize_t)dst)", file="/runtime/src/coreclr/jit/emitriscv64.cpp", line=1630) at error.cpp:288:9 [opt]
    frame #3: 0x0000ffffe45c4608 libclrjit_unix_riscv64_arm64.so`emitter::emitOutputCall(this=0x0000aaaaaaf4ee08, ig=<unavailable>, dst="\x97\U00000003", id=0x0000aaaaaaf6beec, code=<unavailable>) at emitriscv64.cpp:1630:9 [opt]
    frame #4: 0x0000ffffe45c87c4 libclrjit_unix_riscv64_arm64.so`emitter::emitOutputInstr(insGroup*, emitter::instrDesc*, unsigned char**) [inlined] emitter::emitOutputInstr_OptsC(this=0x0000aaaaaaf4ee08, dst="\x97\U00000003", id=0x0000aaaaaaf6beec, ig=0x0000aaaaaaf6bce0, size=<unavailable>) at emitriscv64.cpp:3235:12 [opt]
    frame #5: 0x0000ffffe45c8760 libclrjit_unix_riscv64_arm64.so`emitter::emitOutputInstr(this=0x0000aaaaaaf4ee08, ig=0x0000aaaaaaf6bce0, id=0x0000aaaaaaf6beec, dp=0x0000ffffffffad68) at emitriscv64.cpp:3293:20 [opt]
    frame #6: 0x0000ffffe438aa14 libclrjit_unix_riscv64_arm64.so`emitter::emitIssue1Instr(this=0x0000aaaaaaf4ee08, ig=0x0000aaaaaaf6bce0, id=0x0000aaaaaaf6beec, dp=0x0000ffffffffad68) at emit.cpp:4346:10 [opt]
    frame #7: 0x0000ffffe438beb0 libclrjit_unix_riscv64_arm64.so`emitter::emitEndCodeGen(this=0x0000aaaaaaf4ee08, comp=0x0000aaaaaaf4db28, contTrkPtrLcls=<unavailable>, fullyInt=<unavailable>, fullPtrMap=<unavailable>, xcptnsCount=3898893764, prologSize=0x0000aaaaaaf4edb4, epilogSize=<unavailable>, codeAddr=0x0000ffffffffbb88, codeAddrRW=0x0000aaaaaaf4ed78, coldCodeAddr=0x0000aaaaaaf4ed90, coldCodeAddrRW=0x0000aaaaaaf4ed98, consAddr=0x0000aaaaaaf4eda0, consAddrRW=0x0000aaaaaaf4eda8, instrCount=0x0000ffffffffae24) at emit.cpp:7314:43 [opt]
    frame #8: 0x0000ffffe43543e0 libclrjit_unix_riscv64_arm64.so`CodeGen::genEmitMachineCode(this=0x0000aaaaaaf4e9b8) at codegencommon.cpp:2113:23 [opt]
    frame #9: 0x0000ffffe435fcf0 libclrjit_unix_riscv64_arm64.so`CodeGenPhase::DoPhase(this=<unavailable>) at codegen.h:1727:9 [opt]
    frame #10: 0x0000ffffe4547c4c libclrjit_unix_riscv64_arm64.so`Phase::Run(this=0x0000ffffffffaed0) at phase.cpp:61:26 [opt]
    frame #11: 0x0000ffffe4353e88 libclrjit_unix_riscv64_arm64.so`CodeGen::genGenerateCode(void**, unsigned int*) [inlined] DoPhase(_codeGen=0x0000aaaaaaf4e9b8, _phase=PHASE_EMIT_CODE, _action=<unavailable>) at codegen.h:1741:11 [opt]
    frame #12: 0x0000ffffe4353e5c libclrjit_unix_riscv64_arm64.so`CodeGen::genGenerateCode(this=0x0000aaaaaaf4e9b8, codePtr=0x0000ffffffffbb88, nativeSizeOfCode=<unavailable>) at codegencommon.cpp:1774:5 [opt]
    frame #13: 0x0000ffffe43704a0 libclrjit_unix_riscv64_arm64.so`Compiler::compCompile(this=0x0000aaaaaaf4db28, methodCodePtr=<unavailable>, methodCodeSize=<unavailable>, compileFlags=0x0000ffffffffbbb0) at compiler.cpp:5266:14 [opt]
    frame #14: 0x0000ffffe4373eec libclrjit_unix_riscv64_arm64.so`Compiler::compCompileHelper(this=0x0000aaaaaaf4db28, classPtr=<unavailable>, compHnd=<unavailable>, methodInfo=<unavailable>, methodCodePtr=0x0000ffffffffbb88, methodCodeSize=0x0000ffffffffbf88, compileFlags=0x0000ffffffffbbb0) at compiler.cpp:7247:5 [opt]
    frame #15: 0x0000ffffe437230c libclrjit_unix_riscv64_arm64.so`Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*) [inlined] Compiler::compCompile(CORINFO_MODULE_STRUCT_*, void**, unsigned int*, JitFlags*)::$_0::operator()(this=<unavailable>, __JITpParam=<unavailable>) const at compiler.cpp:6435:28 [opt]
    frame #16: 0x0000ffffe43722f4 libclrjit_unix_riscv64_arm64.so`Compiler::compCompile(this=0x0000aaaaaaf4db28, classPtr=0x4000000000420128, methodCodePtr=0x0000ffffffffbb88, methodCodeSize=0x0000ffffffffbf88, compileFlags=0x0000ffffffffbbb0) at compiler.cpp:6454:5 [opt]
    frame #17: 0x0000ffffe4374e74 libclrjit_unix_riscv64_arm64.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) [inlined] jitNativeCode(this=<unavailable>, __JITpParam=<unavailable>)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::'lambda'(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::__JITParam*)::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::__JITParam*) const::__JITParam*) const at compiler.cpp:7898:32 [opt]
    frame #18: 0x0000ffffe4374d20 libclrjit_unix_riscv64_arm64.so`jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*) [inlined] jitNativeCode(CORINFO_METHOD_STRUCT_*, CORINFO_MODULE_STRUCT_*, ICorJitInfo*, CORINFO_METHOD_INFO*, void**, unsigned int*, JitFlags*, void*)::$_0::operator()(this=<unavailable>, __JITpParam=<unavailable>) const at compiler.cpp:7922:9 [opt]
    frame #19: 0x0000ffffe4374d10 libclrjit_unix_riscv64_arm64.so`jitNativeCode(methodHnd=0x4000000000420120, classPtr=0x4000000000420128, compHnd=0x0000ffffffffbc90, methodInfo=0x0000ffffffffbfa8, methodCodePtr=0x0000ffffffffbb88, methodCodeSize=0x0000ffffffffbf88, compileFlags=0x0000ffffffffbbb0, inlineInfoPtr=0x0000000000000000) at compiler.cpp:7924:5 [opt]
    frame #20: 0x0000ffffe437feec libclrjit_unix_riscv64_arm64.so`CILJit::compileMethod(this=<unavailable>, compHnd=0x0000ffffffffbc90, methodInfo=0x0000ffffffffbfa8, flags=<unavailable>, entryAddress=0x0000ffffffffbf90, nativeSizeOfCode=0x0000ffffffffbf88) at ee_il_dll.cpp:291:14 [opt]
    frame #21: 0x0000ffffeefcf0c4 libjitinterface_arm64.so`JitCompileMethod(ppException=0x0000ffffffffbf98, pJit=0x0000ffffe4688958, thisHandle=0x0000ffffffffbfa0, callbacks=0x0000aaaaaaf4b810, methodInfo=0x0000ffffffffbfa8, flags=4294967295, entryAddress=0x0000ffffffffbf90, nativeSizeOfCode=0x0000ffffffffbf88) at jitwrapper.cpp:36:22
    frame #22: 0x0000fffff2215644
    frame #23: 0x0000fffff221230c
    frame #24: 0x0000fffff2211e64
    frame #25: 0x0000fffff22118b0
    frame #26: 0x0000fffff220ef9c
    frame #27: 0x0000fffff1ae29a0
    frame #28: 0x0000fffff1b85da8
    frame #29: 0x0000ffffef32d2e0
    frame #30: 0x0000ffffef34f31c
    frame #31: 0x0000ffffef35ce6c
    frame #32: 0x0000ffffef35bb80
    frame #33: 0x0000ffffef34d0f4
    frame #34: 0x0000fffff1b89070
    frame #35: 0x0000fffff1ae1a88
    frame #36: 0x0000fffff1b8b8a4
    frame #37: 0x0000fffff1b897e8
    frame #38: 0x0000fffff220ee84
    frame #39: 0x0000fffff220eb8c
    frame #40: 0x0000fffff1ae0b58
    frame #41: 0x0000fffff01d57ac
    frame #42: 0x0000fffff220cf40
    frame #43: 0x0000fffff220ce3c
    frame #44: 0x0000fffff0180c0c
    frame #45: 0x0000fffff017b060
    frame #46: 0x0000fffff017a940
    frame #47: 0x0000fffff017a768
    frame #48: 0x0000fffff0161eb8
    frame #49: 0x0000fffff749c9e8 libcoreclr.so`___lldb_unnamed_symbol9959 + 132
    frame #50: 0x0000fffff730bd60 libcoreclr.so`___lldb_unnamed_symbol4754 + 768
    frame #51: 0x0000fffff7202d14 libcoreclr.so`___lldb_unnamed_symbol861 + 756
    frame #52: 0x0000fffff720306c libcoreclr.so`___lldb_unnamed_symbol863 + 272
    frame #53: 0x0000fffff722a00c libcoreclr.so`___lldb_unnamed_symbol1538 + 652
    frame #54: 0x0000fffff71efcd8 libcoreclr.so`coreclr_execute_assembly + 168
    frame #55: 0x0000fffff793ad70 libhostpolicy.so`___lldb_unnamed_symbol532 + 856
    frame #56: 0x0000fffff793bd44 libhostpolicy.so`corehost_main + 340
    frame #57: 0x0000fffff79af578 libhostfxr.so`___lldb_unnamed_symbol379 + 1188
    frame #58: 0x0000fffff79ae898 libhostfxr.so`___lldb_unnamed_symbol377 + 760
    frame #59: 0x0000fffff79a9114 libhostfxr.so`hostfxr_main_startupinfo + 208
    frame #60: 0x0000aaaaaaab73b0 ilc`___lldb_unnamed_symbol227 + 960
    frame #61: 0x0000aaaaaaab76e0 ilc`___lldb_unnamed_symbol229 + 184
    frame #62: 0x0000fffff7a284c4 libc.so.6`___lldb_unnamed_symbol3097 + 116
    frame #63: 0x0000fffff7a28598 libc.so.6`__libc_start_main + 152

@am11
Copy link
Member Author

am11 commented Jan 12, 2025

#111317 fixes ilc object file writing. The next step is to figure out the linker failure:

root@0900f1880a04:/app1$ "clang" "obj/Release/net10.0/linux-riscv64/native/app1.o" \
-o "bin/Release/net10.0/linux-riscv64/native/app1" \
-Wl,--version-script=obj/Release/net10.0/linux-x64/native/app1.exports \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libSystem.Native.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libSystem.Globalization.Native.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libSystem.IO.Compression.Native.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libSystem.Net.Security.Native.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libSystem.Security.Cryptography.Native.OpenSsl.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/aotsdk/libbootstrapper.o \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/aotsdk/libRuntime.WorkstationGC.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/aotsdk/libeventpipe-disabled.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/aotsdk/libstandalonegc-disabled.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/aotsdk/libaotminipal.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/aotsdk/libstdc++compat.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libz.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libbrotlienc.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libbrotlidec.a \
/runtime/artifacts/bin/coreclr/linux.riscv64.Checked/ilc-published/libbrotlicommon.a \
--sysroot="/crossrootfs/riscv64" --target=riscv64-linux-gnu -g \
-Wl,--export-dynamic -gz=zlib -fuse-ld=lld -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread \
-ldl -lrt -lm -pie -Wl,-pie -Wl,-z,relro -Wl,-z,now -Wl,--eh-frame-hdr -Wl,--discard-all -Wl,--gc-sections


ld.lld: error: unknown relocation (3) against symbol System_Console_System_ConsoleKeyInfo____GetFieldHelper
ld.lld: error: unknown relocation (3) against symbol _lsda0System_Console_System_ConsoleKeyInfo____GetFieldHelper
ld.lld: error: unknown relocation (3) against symbol S_P_CoreLib_System_AttributeUsageAttribute____GetFieldHelper
ld.lld: error: unknown relocation (3) against symbol _lsda0System_Console_System_ConsoleKeyInfo____GetFieldHelper
ld.lld: error: unknown relocation (3) against symbol S_P_CoreLib_System_CLSCompliantAttribute____GetFieldHelper
...

I changed 0x3 in ntimage.h and ilc to 0x2, but this relocation 3 error is not going away. Not sure yet where it's coming from. 😅

@filipnavara
Copy link
Member

filipnavara commented Jan 12, 2025

I changed 0x3 in ntimage.h and ilc to 0x2, but this relocation 3 error is not going away. Not sure yet where it's coming from. 😅

That would be R_RISCV_RELATIVE emitted in ElfObjectWriter (assuming that the architecture is correctly set to TargetArchitecture.RiscV64; maybe worth checking?).

@am11
Copy link
Member Author

am11 commented Jan 25, 2025

With #111735, it gets us a little far on our helloworld journey. Next issue is:

# also put a breakpoint in RhpInterfaceDispatchSlow since the PC gets corrupted and it disconnects the chain

Thread 1 "app1" received signal SIGSEGV, Segmentation fault.
S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhResolveDispatchWorker (pObject=..., cell=0x2aaac73790 <S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve>, cellInfo=...)
    at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:128
128	            MethodTable* pInstanceType = pObject.GetMethodTable();
(gdb) bt
#0  S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhResolveDispatchWorker (pObject=..., cell=0x2aaac73790 <S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve>, cellInfo=...)
    at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:128
#1  0x0000002aaac73810 in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve_Worker (pObject=..., pCell=183253809040) at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:28
#2  0x0000002aaac737b0 in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve (callerTransitionBlockParam=-6138604891393753078, pCell=183253809040)
    at /runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:21
#3  0x0000002aaac12782 in RhpUniversalTransition_DebugStepTailCall () at /runtime/src/coreclr/nativeaot/Runtime/riscv64/UniversalTransition.S:179

There is probably still some register mapping need work in StubDispatch.S UniversalTransition.S. I fixed some by comparing the relationship between callee (JIT'd or C++) and caller in arm64 and those of riscv64, now the call to transition method goes through but fails most likely some arg mapping is off.

@am11
Copy link
Member Author

am11 commented Jan 25, 2025

Regarding the workflow #106223 (comment), I have switched to all debug configs. For rebuilds, we can make it faster. e.g. if we make a change in src/coreclr/nativeaot/Runtime, then we normally just need to rebuild StageOne for clr:

# repeat step
$ /runtime/build.sh -arch riscv64 -cross -s clr.nativeaotruntime -c Debug -p:StageOneBuild=true --keepnativesymbols true
$ cp /runtime/artifacts/bin/coreclr/linux.riscv64.Debug/arm64/libclrjit_unix_riscv64_arm64.so /runtime/artifacts/bin/coreclr/linux.arm64.Debug/ilc-published

If it's src/coreclr/tools, the StageTwo. Every now and then, I do a clean rebuild just in case.

@filipnavara
Copy link
Member

The multi-threading issues are going to be pain to debug. We should probably start by fixing RhpCheckedXchg atomicity guarantees (and checking RhpCheckedLockCmpXchg too, just in case).

There's also this thing which was fixed recently for ARM64 (#106219):

--- a/src/coreclr/tools/Common/Compiler/DependencyAnalysis/Target_RiscV64/RiscV64Emitter.cs
+++ b/src/coreclr/tools/Common/Compiler/DependencyAnalysis/Target_RiscV64/RiscV64Emitter.cs
@@ -25,6 +25,11 @@ public void EmitBreak()
             Builder.EmitUInt(0x00100073);
         }
 
+        public void EmitFENCE()
+        {
+            Builder.EmitUInt(0x0ff0000f);
+        }
+
         public void EmitLI(Register regDst, int offset)
         {
             Debug.Assert((offset >= -2048) && (offset <= 2047));
diff --git a/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_RiscV64/RiscV64ReadyToRunGenericHelperNode.cs b/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_RiscV64/RiscV64ReadyToRunGenericHelperNode.cs
index 584b37e53a7..9514c05cb93 100644
--- a/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_RiscV64/RiscV64ReadyToRunGenericHelperNode.cs
+++ b/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_RiscV64/RiscV64ReadyToRunGenericHelperNode.cs
@@ -75,6 +75,7 @@ protected sealed override void EmitCode(NodeFactory factory, ref RiscV64Emitter
                             // We need to trigger the cctor before returning the base. It is stored at the beginning of the non-GC statics region.
                             encoder.EmitADDI(encoder.TargetRegister.Arg3, encoder.TargetRegister.Arg0, -NonGCStaticsNode.GetClassConstructorContextSize(factory.Target));
                             encoder.EmitLD(encoder.TargetRegister.Arg2, encoder.TargetRegister.Arg3, 0);
+                            encoder.EmitFENCE();
                             encoder.EmitRETIfZero(encoder.TargetRegister.Arg2);
 
                             encoder.EmitMOV(encoder.TargetRegister.Arg1, encoder.TargetRegister.Result);
@@ -106,6 +107,7 @@ protected sealed override void EmitCode(NodeFactory factory, ref RiscV64Emitter
 
                             encoder.EmitADDI(encoder.TargetRegister.Arg2, encoder.TargetRegister.Arg2, -NonGCStaticsNode.GetClassConstructorContextSize(factory.Target));
                             encoder.EmitLD(encoder.TargetRegister.Arg3, encoder.TargetRegister.Arg2, 0);
+                            encoder.EmitFENCE();
                             encoder.EmitRETIfZero(encoder.TargetRegister.Arg3);
 
                             encoder.EmitMOV(encoder.TargetRegister.Arg1, encoder.TargetRegister.Result);

@filipnavara
Copy link
Member

I made an attempt to fix the multi-threaded issues and had some luck with running UnitTests to a successful end:

--- a/src/coreclr/nativeaot/Runtime/riscv64/WriteBarriers.S
+++ b/src/coreclr/nativeaot/Runtime/riscv64/WriteBarriers.S
@@ -303,6 +303,7 @@ LEAF_END RhpAssignRef, _TEXT
 //  t0, t1, t2, t6: trashed
 //
 LEAF_ENTRY RhpCheckedLockCmpXchg
+        fence
 
 LOCAL_LABEL(CmpXchgRetry):
         // Load the current value at the destination address.
@@ -342,13 +343,15 @@ LEAF_END RhpCheckedLockCmpXchg
 //
 // On exit:
 //  a0: original value of objectref
-//  t1: trashed
-//  t3, t6, t4: trashed
+//  t1, t2, t6: trashed
 //
 LEAF_ENTRY RhpCheckedXchg
+        fence
 
-        ld   t1, 0(a0)
-        sd   a1, 0(a0)
+RhpCheckedXchgRetry:
+        lr.d   t1, 0(a0)
+        sc.d   t2, a1, 0(a0)
+        bnez   t2, RhpCheckedXchgRetry // if store conditional failed, retry
 
 DoCardsXchg:
         // We have successfully updated the value of the objectref so now we need a GC write barrier.

I'll commit it as WIP. You should review it carefully.

@filipnavara
Copy link
Member

filipnavara commented Feb 18, 2025

DynamicGenerics passed too on my machine with all the committed fixes. I guess it's time to build and run all the smoke tests to see which ones are still failing.

@am11
Copy link
Member Author

am11 commented Feb 18, 2025

Running UnitTests a few times gave me:

Testing TestRuntime109496Regression
===== Test Interfaces.Run succeeded =====
===== Running test Threading.Run =====
    WaitSubsystemTests.DoubleSetOnEventWithTimedOutWaiterShouldNotStayInWaitersList
    WaitSubsystemTests.ManualResetEventTest
    WaitSubsystemTests.AutoResetEventTest
    WaitSubsystemTests.SemaphoreTest
    WaitSubsystemTests.MutexTest
    WaitSubsystemTests.WaitDurationTest
    TimersCreatedConcurrentlyOnDifferentThreadsAllFire
    ThreadPoolTests.RunProcessorCountItemsInParallel
    ThreadPoolTests.RunMoreThanMaxJobsMakesOneJobWaitForStarvationDetection
    ThreadPoolTests.ThreadPoolCanPickUpOneJobWhenThreadIsAvailable
    ThreadPoolTests.ThreadPoolCanPickUpMultipleJobsWhenThreadsAreAvailable
    ThreadPoolTests.ThreadPoolCanProcessManyWorkItemsInParallelWithoutDeadlocking
--------------------------------------------------
Debug Assertion Violation

Expression: 'IS_ALIGNED(len, sizeof(size_t))'

File: /runtime/src/coreclr/nativeaot/Runtime/GCMemoryHelpers.inl, Line: 74
--------------------------------------------------
Aborted (core dumped)

another run:

===== Test Devirtualization.Run succeeded =====
===== Running test StackTraces.Run =====
Process terminated. Assertion failed.
A QueueUserWorkItemCallback was called twice!
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x64
   at System.Diagnostics.Debug.Fail(String, String) + 0x70
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x58
   at System.Diagnostics.Debug.Assert(Boolean, String) + 0x3c
   at System.Threading.QueueUserWorkItemCallbackBase.Execute() + 0x6c
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute() + 0x3c
   at System.Threading.ThreadPoolWorkQueue.DispatchWorkItem(Object, Thread) + 0x100
   at System.Threading.ThreadPoolWorkQueue.Dispatch() + 0x888
   at System.Threading.PortableThreadPool.WorkerThread.WorkerDoWork(PortableThreadPool, Boolean&) + 0x6c
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() + 0x31c
   at UnitTests!<BaseAddress>+0x7e5504
   at System.Threading.Thread.StartHelper.RunWorker() + 0xac
   at System.Threading.Thread.StartHelper.Run() + 0xf8
   at System.Threading.Thread.StartThread(IntPtr) + 0x18c
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x38
Aborted (core dumped)

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

Ah, I had to delete artifacts/tests before rebuilding the tests. Apparently it doesn't recognize that updated/new toolchain is being used.

UnitTests are now passing under qemu (strangely even with BasicThreading), and without BasicThreading on the machine.
DynamicGenerics are failing at:

Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
Assert.AreEqual:Expected: [string3_thread#5]. Actual: [NULL]. 
Segmentation fault (core dumped)

sometimes:

--------------------------------
Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
Process terminated. Access Violation: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. The application will be terminated since this platform does not support throwing an AccessViolationException.
Aborted (core dumped)

and sometimes:

--------------------------------
Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
Assert.AreEqual:Expected: [305?9110]. Actual: [305839110]. 
Process terminated. Assertion failed.
Invalid arguments provided to method.
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x64
   at System.Diagnostics.Debug.Fail(String, String) + 0x70
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x58
   at System.Diagnostics.Debug.Assert(Boolean, String) + 0x3c
   at System.Text.Encoding.GetCharsWithFallback(Byte*, Int32, Char*, Int32, Int32, Int32, Boolean) + 0x98
   at System.Text.UTF8Encoding.GetCharsCommon(Byte*, Int32, Char*, Int32, Boolean) + 0x1d0
   at System.Text.UTF8Encoding.GetChars(Byte*, Int32, Char*, Int32) + 0x150
   at System.String.CreateStringFromEncoding(Byte*, Int32, Encoding) + 0x164
   at System.Text.Encoding.GetString(Byte*, Int32) + 0x78
   at Internal.NativeFormat.NativeReader.DecodeString(UInt32, String&) + 0x180
   at Internal.Metadata.NativeFormat.MdBinaryReader.Read(NativeReader, UInt32, String&) + 0x3c
   at Internal.Metadata.NativeFormat.ConstantStringValue..ctor(MetadataReader, ConstantStringValueHandle) + 0xdc
   at Internal.Metadata.NativeFormat.ConstantStringValueHandle.GetConstantStringValue(MetadataReader) + 0x4c
   at Internal.Runtime.TypeLoader.MetadataNameExtensions.GetFullName(TypeDefinitionHandle, MetadataReader, String&, String&, String&) + 0x1f0
   at Internal.Runtime.TypeLoader.MetadataNameExtensions.GetFullName(TypeDefinitionHandle, MetadataReader) + 0x74
   at Internal.Runtime.TypeLoader.LowLevelStringConverter.LowLevelToString(RuntimeTypeHandle) + 0xbc
   at Internal.TypeSystem.NoMetadata.NoMetadataType.ToString() + 0xe4
   at Internal.TypeSystem.TypeNameHelper.WithDebugName[T](T) + 0x98
   at Internal.TypeSystem.TypeSystemContext.ResolveRuntimeTypeHandle(RuntimeTypeHandle) + 0xa44
   at Internal.Runtime.TypeLoader.TypeLoaderTypeSystemContext.GetWellKnownType(WellKnownType, Boolean) + 0xa94
   at Internal.TypeSystem.TypeDescExtensions.IsWellKnownType(TypeDesc, WellKnownType) + 0x68
   at Internal.TypeSystem.TypeDesc.get_IsObject() + 0x30
   at Internal.TypeSystem.DebugNameFormatter.AppendNameForNamespaceType(StringBuilder, DefType, DebugNameFormatter.FormatOptions) + 0x57c
   at Internal.TypeSystem.TypeNameFormatter`2.AppendName(StringBuilder, DefType, TOptions) + 0x164
   at Internal.TypeSystem.TypeNameFormatter`2.AppendName(StringBuilder, TypeDesc, TOptions) + 0x318
   at Internal.TypeSystem.TypeNameFormatter`2.FormatName(TypeDesc, TOptions) + 0x7c
   at Internal.TypeSystem.TypeDesc.ToString() + 0x74
   at Internal.TypeSystem.TypeNameHelper.WithDebugName[T](T) + 0x98
   at Internal.TypeSystem.TypeSystemContext.ResolveRuntimeTypeHandle(RuntimeTypeHandle) + 0xa44
   at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.CanInstantiationsShareCode(RuntimeTypeHandle[], RuntimeTypeHandle[], CanonicalFormKind) + 0x13c
   at Internal.Runtime.TypeLoader.CanonicallyEquivalentEntryLocator.IsCanonicallyEquivalent(RuntimeTypeHandle) + 0x230
   at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.InvokeMapEntryDataEnumerator`2.GetNext(NativeParser&, ExternalReferencesTable&, MethodSignatureComparer&, CanonicallyEquivalentEntryLocator) + 0x394
   at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.TryGetMethodInvokeMetadataFromInvokeMap(MetadataReader, RuntimeTypeHandle, MethodHandle, RuntimeTypeHandle[], MethodSignatureComparer&, CanonicalFormKind, MethodInvokeMetadata&) + 0x304
   at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.TryGetMethodInvokeMetadata(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[], MethodSignatureComparer&, CanonicalFormKind, MethodInvokeMetadata&) + 0xcc
   at Internal.Reflection.Execution.ExecutionEnvironmentImplementation.TryGetMethodInvokeInfo(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[], MethodBase, MethodSignatureComparer&, CanonicalFormKind) + 0x78
   at Internal.Reflection.Execution.ExecutionEnvironmentImplementation.TryGetMethodInvoker(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[]) + 0x110
   at Internal.Reflection.Core.Execution.ExecutionEnvironment.GetMethodInvoker(RuntimeTypeInfo, QMethodDefinition, RuntimeTypeInfo[], MemberInfo, Exception&) + 0x2d0
   at System.Reflection.Runtime.MethodInfos.NativeFormat.NativeFormatMethodCommon.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo, Exception&) + 0xf0
   at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo`1.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo) + 0x58
   at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo`1.get_UncachedMethodInvoker() + 0xc0
   at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.get_MethodInvoker() + 0x7c
   at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.Invoke(Object, BindingFlags, Binder, Object[], CultureInfo) + 0x5c
   at System.Reflection.MethodBase.Invoke(Object, Object[]) + 0x48
   at ThreadLocalStatics.TLSTesting.MakeType2(Type, Type, Boolean) + 0xfa0
   at ThreadLocalStatics.TLSTesting.<>c__DisplayClass3_0.<MultiThreaded_Test>b__0() + 0xa8
   at System.Threading.Tasks.Task.InnerInvoke() + 0x94
   at System.Threading.Tasks.Task.<>c.<.cctor>b__292_0(Object obj) + 0x78
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x114
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x214
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread) + 0xcc
   at System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread) + 0x2c
   at System.Threading.ThreadPoolWorkQueue.DispatchWorkItem(Object, Thread) + 0x88
   at System.Threading.ThreadPoolWorkQueue.Dispatch() + 0x888
   at System.Threading.PortableThreadPool.WorkerThread.WorkerDoWork(PortableThreadPool, Boolean&) + 0x6c
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() + 0x31c
   at DynamicGenerics!<BaseAddress>+0x995e64
   at System.Threading.Thread.StartHelper.RunWorker() + 0xac
   at System.Threading.Thread.StartHelper.Run() + 0xf8
   at System.Threading.Thread.StartThread(IntPtr) + 0x18c
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x38
Aborted (core dumped)

@filipnavara
Copy link
Member

UnitTests are now passing under qemu (strangely even with BasicThreading), and without BasicThreading on the machine.

I believe that the bug affecting BasicThreading was the incorrect implementation of RhpCheckedXchg.

DynamicGenerics are failing at:

I can reproduce those. Apparently my previous run used DOTNET_gcConservative=1 which was a leftover from some previous tests. The test still passes reliably with the conservative GC on my machine. That means we are looking at GC hole but at least it narrows it down.

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

With DOTNET_gcConservative=1 (and servergc), DynamicGenerics passes on my machine as well.

BasicThreading, without DOTNET_gcConservative, takes a while before it OOMs:

===== Running test BasicThreading.Run =====
System.AggregateException: One or more errors occurred. (Thread failed to start.)
 ---> System.Threading.ThreadStartException: Thread failed to start.
 ---> System.OutOfMemoryException: Insufficient memory to continue the execution of the program.
   --- End of inner exception stack trace ---
   at System.Threading.Thread.StartCore() + 0x318
   at System.Threading.Thread.Start(Boolean) + 0xf8
   at System.Threading.Thread.Start() + 0x28
   at ThreadTest.<>c__DisplayClass10_0.<TestConcurrentIsBackgroundProperty>b__1() + 0x30
   at System.Threading.Tasks.Task.InnerInvoke() + 0x94
   at System.Threading.Tasks.Task.<>c.<.cctor>b__292_0(Object) + 0x78
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x114
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x198
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x214
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.WaitAllCore(ReadOnlySpan`1, Int32, CancellationToken) + 0x874
   at System.Threading.Tasks.Task.WaitAll(Task[]) + 0x94
   at ThreadTest.TestConcurrentIsBackgroundProperty() + 0x2cc
   at ThreadTest.Run() + 0xc8
   at BasicThreading.Run() + 0x9c
   at UnitTests!<BaseAddress>+0x734a74
   at Program.<<Main>$>g__RunTest|0_0(Func`1, String) + 0x94
===== Test BasicThreading.Run failed =====
===== Running test Delegates.Run =====
..

although, there is enough:

$ free -m
               total        used        free      shared  buff/cache   available
Mem:           15904         938       14305           4         890       14966
Swap:              0           0           0

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

A P/Invoke test is failing.

(gdb) b PInvokeNative.cpp:341
(gdb) b PInvoke.cs:702
(gdb) r

Thread 1 "PInvoke" hit Breakpoint 4, ReversePInvoke_Int (fnPtr=0x3ff7fbe000) at /runtime/src/tests/nativeaot/SmokeTests/PInvoke/PInvokeNative.cpp:341
341	    return fnPtr(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) == 55;
(gdb) c
Continuing.

Thread 1 "PInvoke" hit Breakpoint 6, PInvoke_PInvokeTests_Program__Sum (a=-136619912, b=2, c=3, d=4, e=5, f=-136620160, g=7, h=8, i=9, j=10) at /runtime/src/tests/nativeaot/SmokeTests/PInvoke/PInvoke.cs:702
702	            return a + b + c + d + e + f + g + h + i + j;
(gdb) bt
#0  PInvoke_PInvokeTests_Program__Sum (a=-136619912, b=2, c=3, d=4, e=5, f=-136620160, g=7, h=8, i=9, j=10) at /runtime/src/tests/nativeaot/SmokeTests/PInvoke/PInvoke.cs:702
#1  0x0000002aaaf99638 in Internal_CompilerGenerated__Module___<ReverseOpenStaticDelegateStub>PInvoke_PInvokeTests_Program_Delegate_Int () at /runtime/src/coreclr/nativeaot/Common/src/System/Collections/Generic/LowLevelDictionary.cs:289
#2  0x0000003ff7fd1800 in ReversePInvoke_Int (fnPtr=0x3ff7fbe000) at /runtime/src/tests/nativeaot/SmokeTests/PInvoke/PInvokeNative.cpp:341
#3  0x0000002aaad05cf0 in PInvoke_PInvokeTests_Program__ReversePInvoke_Int ()
#4  0x0000002aaad0c484 in PInvoke_PInvokeTests_Program__TestDelegate () at /runtime/src/tests/nativeaot/SmokeTests/PInvoke/PInvoke.cs:636
#5  0x0000002aaad0a9c4 in PInvoke_PInvokeTests_Program__Main () at /runtime/src/tests/nativeaot/SmokeTests/PInvoke/PInvoke.cs:339
#6  0x0000002aaafa04d4 in __managed__Main () at /runtime/src/coreclr/nativeaot/Common/src/System/Collections/Generic/LowLevelDictionary.cs:289
#7  0x0000002aaac5483a in main (argc=1, argv=0x3ffffff138) at /runtime/src/coreclr/nativeaot/Bootstrap/main.cpp:225

a and f have trash.

@filipnavara
Copy link
Member

filipnavara commented Feb 19, 2025

At least some of the PInvoke parameter trashing happens in RhCommonStub with the INLINE_GET_TLS_VAR. At very least it seems to trash a0 and likely a5 too.

At RhCommonStub+44 it's already trashed:

a0             0x3ff7db4878     274741282936
a1             0x2      2
a2             0x3      3
a3             0x4      4
a4             0x5      5
a5             0xf8     248
a6             0x7      7
a7             0x8      8
s2             0x1      1
s3             0x0      0

(also, INLINE_GET_TLS_VAR pushes the stack without producing unwinding information; we don't interrupt threads running native code so it should not cause issues but it may hinder debugging experience when stepping through the code)

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

Thanks! I've pushed a fix. When we will switch to tlsdesc in the future, these prolog/epilog won't be needed (as linker magic due to tlsdesc preserves it). I kept the old version because it requires "everything new" libc,binutils-bfd,lld etc. from 2024. In a year or so, we can bump the baseline and switch to tlsdesc.

@filipnavara
Copy link
Member

Thanks! I've pushed a fix.

Note that to be completely safe we should save all aX registers, including a7 and a8. It can also trash any tX registers and we currently only save t1 and use one of the tX for the return value. This is extremely fragile.

(FWIW I am running up-to-date Ubuntu STS so I should have all the packages necessary for the TLSDESC to work.)

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

Bianbu 2.1 (based on Ubuntu 24.04) is using linux 6.6, they are upstreaming their driver patches to linux, so hopefully I will be able to switch the distro soon (or they will switch to kernel 6.8 or above + the newer libc with TLSDESC 🙂). SpacemiT K1 SoC is also used in other devices (Milk-V https://github.com/milkv-jupiter/jupiter-bianbu-build and boards), so I hope people will extract the patches in other distros rather quickly.

We can also sync with @tomeksowi et al. on baseline versions after the port completion.

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

Note that to be completely safe we should save all aX registers

Pushed a7 (a0 is preserved at call-sites). PInvoke tests are passing. 👍

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

BasicThreading, without DOTNET_gcConservative, takes a while before it OOMs:

This one is sometimes stuck and from second terminal, strace -p shows it's waiting for futex sync:

$ strace -p 89552
strace: Process 89552 attached
futex(0x2ac8e17904, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY
(nothing afterwards)

and gdb -p:

Attaching to process 89552
[New LWP 97850]
[New LWP 97196]
[New LWP 97025]
[New LWP 95070]
[New LWP 93241]
[New LWP 89556]
[New LWP 89554]
[New LWP 89553]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
0x0000003fb635c742 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x2ac8e17904) at ./nptl/futex-internal.c:57

warning: 57	./nptl/futex-internal.c: No such file or directory
(gdb) bt
#0  0x0000003fb635c742 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x2ac8e17904) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x2ac8e17904, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x0000003fb635c77c in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x2ac8e17904, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x0000003fb635e7ac in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x2ac8e178b0, cond=0x2ac8e178d8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x2ac8e178d8, mutex=0x2ac8e178b0) at ./nptl/pthread_cond_wait.c:627
#5  0x0000002ac865956a in SystemNative_LowLevelMonitor_Wait (monitor=0x2ac8e178b0) at /runtime/src/native/libs/System.Native/pal_threading.c:153
#6  0x0000002ac89b3af0 in S_P_CoreLib_Interop_Sys__LowLevelMonitor_Wait ()
    at /runtime/artifacts/obj/coreclr/nativeaot/System.Private.CoreLib/linux.riscv64.Debug/generated/Microsoft.Interop.LibraryImportGenerator/Microsoft.Interop.LibraryImportGenerator/LibraryImports.g.cs:2524
#7  0x0000002ac88adacc in S_P_CoreLib_System_Threading_LowLevelMonitor__WaitCore (this=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelMonitor.Unix.cs:44
#8  0x0000002ac88ad77c in S_P_CoreLib_System_Threading_LowLevelMonitor__Wait (this=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelMonitor.cs:91
#9  0x0000002ac89e5394 in S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait (this=..., timeoutMilliseconds=-1, interruptible=true, isSleep=false, lockHolder=...)
    at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitSubsystem.ThreadWaitInfo.Unix.cs:322
#10 0x0000002ac89e7b74 in S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject__Wait_Locked (this=..., waitInfo=..., timeoutMilliseconds=-1, interruptible=true, prioritize=false, lockHolder=...)
    at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitSubsystem.WaitableObject.Unix.cs:368
#11 0x0000002ac89e76f0 in S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject__Wait (this=..., waitInfo=..., timeoutMilliseconds=-1, interruptible=true, prioritize=false)
    at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitSubsystem.WaitableObject.Unix.cs:320
#12 0x0000002ac88c1aa8 in S_P_CoreLib_System_Threading_WaitSubsystem__Wait_0 (waitableObject=..., timeoutMilliseconds=-1, interruptible=true, prioritize=false)
    at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitSubsystem.Unix.cs:339
#13 0x0000002ac88c19a0 in S_P_CoreLib_System_Threading_WaitSubsystem__Wait (handle=272382796208, timeoutMilliseconds=-1, interruptible=true) at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitSubsystem.Unix.cs:327
#14 0x0000002ac88bc6e0 in S_P_CoreLib_System_Threading_WaitHandle__WaitOneCore (handle=272382796208, millisecondsTimeout=-1, useTrivialWaits=false) at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.Unix.cs:11
#15 0x0000002ac88bab58 in S_P_CoreLib_System_Threading_WaitHandle__WaitOneNoCheck (this=..., millisecondsTimeout=-1, useTrivialWaits=false, associatedObject=..., waitSource=Unknown)
    at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs:183
#16 0x0000002ac88ba67c in S_P_CoreLib_System_Threading_WaitHandle__WaitOne (this=..., millisecondsTimeout=-1) at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs:104
#17 0x0000002ac88a5864 in S_P_CoreLib_System_Threading_Thread__JoinInternal (this=..., millisecondsTimeout=-1) at /runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.NativeAot.Unix.cs:82
#18 0x0000002ac88a4768 in S_P_CoreLib_System_Threading_Thread__Join (this=..., millisecondsTimeout=-1) at /runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.NativeAot.cs:327
#19 0x0000002ac88a6ac8 in S_P_CoreLib_System_Threading_Thread__Join_0 (this=...) at /runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs:540
#20 0x0000002ac8a6f174 in UnitTests_ThreadTest__ExpectPassed (testName=..., expectedPassed=10000) at /runtime/src/tests/nativeaot/SmokeTests/UnitTests/BasicThreading.cs:328
#21 0x0000002ac8a70748 in UnitTests_ThreadTest__TestConcurrentIsBackgroundProperty () at /runtime/src/tests/nativeaot/SmokeTests/UnitTests/BasicThreading.cs:448
#22 0x0000002ac8a71798 in UnitTests_ThreadTest__Run () at /runtime/src/tests/nativeaot/SmokeTests/UnitTests/BasicThreading.cs:605
#23 0x0000002ac8a6dddc in UnitTests_BasicThreading__Run () at /runtime/src/tests/nativeaot/SmokeTests/UnitTests/BasicThreading.cs:25
#24 0x0000002ac8acaad4 in S_P_CoreLib_System_Func_1<Int32>__InvokeOpenStaticThunk () at /runtime/src/libraries/System.Private.CoreLib/src/System/Array.Enumerators.cs:111
#25 0x0000002ac8a81394 in UnitTests_Program____Main___g__RunTest_0_0 (t=..., name=...) at /runtime/src/tests/nativeaot/SmokeTests/UnitTests/Main.cs:23
#26 0x0000002ac8a80e40 in UnitTests_Program___Main__ (args=...) at /runtime/src/tests/nativeaot/SmokeTests/UnitTests/Main.cs:7
#27 0x0000002ac8b8a30c in __managed__Main () at /runtime/src/libraries/System.Private.CoreLib/src/System/ValueTuple.cs:793
#28 0x0000002ac8669684 in main (argc=1, argv=0x3ffa116178) at /runtime/src/coreclr/nativeaot/Bootstrap/main.cpp:225

When I run it with strace -f, it completes. Also running it under gdb, it completes. So it seems like a race condition with locks. 👀

@filipnavara
Copy link
Member

filipnavara commented Feb 19, 2025

This one is sometimes stuck and from second terminal, strace -p shows it's waiting for futex sync:

That was happening to me before I made the RhpCheckedXchg fix. One of the threads running TestConcurrentIsBackgroundProperty/TestIsBackgroundProperty (iirc) was stuck on ManualResetEventSlim.Set, and the main thread was stuck on joining the thread that was blocked by the ManualResetEventSlim.

(Interrupting the syscall likely forced a loop in ManualResetEventSlim that did a recheck on the condition and unblocked the process.)

@am11
Copy link
Member Author

am11 commented Feb 19, 2025

Interesting, I have RhpCheckedXchg change in the branch. Did a fresh build, just in case, it persists.

Overall status of smoke tests is looking much better now, exceptions are:

  1. DynamicGenerics without DOTNET_gcConservative
  2. UnitTests -> BasicThreading category fails on my device if run standalone, passes under debugger/strace, passes on @filipnavara's device, passes on qemu (running on arm64).
  3. DwarfDump fails due to different count (not a real failure, just need if TARGET_RISCV64; unimportant for now)

I think we can send a PR after resolving 1. I haven't done any testing with Release build yet, but we can cover it after this phase is over (during the AOT libs tests 🙂).

@filipnavara
Copy link
Member

filipnavara commented Feb 19, 2025

I think we can send a PR after resolving 1.

I've started looking into it but haven't made much progress. From time to time I can make it crash inside the method table check (first pointer in object) with null value. One time it happened I dumped the pointer address and traced it back on the stack:

#0  0x0000002aab1b16ac in S_P_CoreLib_Internal_Runtime_MethodTable__get_HasComponentSize (this=...) at /home/ubuntu/runtime/src/coreclr/nativeaot/Common/src/Internal/Runtime/MethodTable.cs:183
#1  0x0000002aab1b17e8 in S_P_CoreLib_Internal_Runtime_MethodTable__get_ExtendedFlags (this=...) at /home/ubuntu/runtime/src/coreclr/nativeaot/Common/src/Internal/Runtime/MethodTable.cs:252
#2  0x0000002aab1b2b98 in S_P_CoreLib_Internal_Runtime_MethodTable__get_IsIDynamicInterfaceCastable (this=...)
    at /home/ubuntu/runtime/src/coreclr/nativeaot/Common/src/Internal/Runtime/MethodTable.cs:706
#3  0x0000002aab123280 in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhResolveDispatchWorker (pObject=..., cell=0x2aab6a9760, cellInfo=...)
    at /home/ubuntu/runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:137
#4  0x0000002aab122d3c in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve_Worker (pObject=..., pCell=183264515936)
    at /home/ubuntu/runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:28
#5  0x0000002aab122c8c in S_P_CoreLib_System_Runtime_CachedInterfaceDispatch__RhpCidResolve (callerTransitionBlockParam=273254704432, pCell=183264515936)
    at /home/ubuntu/runtime/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CachedInterfaceDispatch.cs:19
#6  0x0000002aaaed724a in RhpUniversalTransition_DebugStepTailCall () at /home/ubuntu/runtime/src/coreclr/nativeaot/Runtime/riscv64/UniversalTransition.S:182
#7  0x0000002aab30d84c in __VerifyUnifierConsistency (this=...) at /home/ubuntu/runtime/src/coreclr/nativeaot/Common/src/System/Collections/Concurrent/ConcurrentUnifierWKeyed.cs:341
#8  0x0000002aab30c194 in __GetOrAdd (this=..., key=...) at /home/ubuntu/runtime/src/coreclr/nativeaot/Common/src/System/Collections/Concurrent/ConcurrentUnifierWKeyed.cs:113
...

It was saved in the frame 7 on the stack. The value on the stack at -48(fp) and matched the value that was propagated down. The GC info for __VerifyUnifierConsistency marks that location as untracked but valid for the duration of the whole frame:

Stack slot id for offset -48 (-0x30) (frame) (untracked) = 1.

This should have been reported during the GC, yet somehow the object was collected and overwritten with zeroes. (On second thought, I didn't try tracing back in the JIT dump if there was a moment where it was not reported correctly before it was written to the stack location.)

@filipnavara
Copy link
Member

filipnavara commented Feb 19, 2025

I must be missing something in the JIT dump: https://gist.github.com/filipnavara/bce9efde00397de89c7f69c2f6166234

The garbage value I observed was loaded from stack to register at this instruction:

IN00b4: 000340      ld             a0, -48(fp)

The only time that stack location is referenced is in

IN009c: 0002D8      addi           a1, fp, -48

Where is it written? What am I missing?

Ah, nevermind, I see it now - System.WeakReference'1[System.__Canon]:TryGetTarget(byref):ubyte:this.

@filipnavara
Copy link
Member

filipnavara commented Feb 20, 2025

I think the instruction masks in IsInProlog may be incorrect. Coming from ARM I always get confused whether they are little/big endian with respect to the manual but regardless of the order they look weird to me.

May be worth checking. We cannot unwind prologs properly, or rather unwind to wrong functions or apply incorrect GC info, so if this was indeed incorrect it could explain some of the crashes in the heavily multi-threaded/GC tests.

Here's what I think the constants should be:

// store pair with signed offset
#define STW_PAIR_BITS 0x00003023
#define STW_PAIR_MASK 0x0000707F

// add fp, sp, x
// addi fp, sp, x
#define ADD_FP_SP_BITS 0x00010413
#define ADD_FP_SP_MASK 0x000FFFFF

#define STW_PAIR_RS1_MASK 0xF8000
#define STW_PAIR_RS1_SP  0x10000
#define STW_PAIR_RS1_FP  0x40000
#define STW_PAIR_RS2_MASK 0x1F00000
#define STW_PAIR_RS2_FP  0x800000
#define STW_PAIR_RS2_RA  0x100000

(I didn't actually check it yet. Also, it's missing the "addi sp, sp, imm" pattern.)

@filipnavara
Copy link
Member

filipnavara commented Feb 20, 2025

I ended up with this patch for IsInProlog:

diff --git a/src/coreclr/nativeaot/Runtime/unix/UnixNativeCodeManager.cpp b/src/coreclr/nativeaot/Runtime/unix/UnixNativeCodeManager.cpp
index a1e2e507df8..80b15694ece 100644
--- a/src/coreclr/nativeaot/Runtime/unix/UnixNativeCodeManager.cpp
+++ b/src/coreclr/nativeaot/Runtime/unix/UnixNativeCodeManager.cpp
@@ -699,22 +699,23 @@ int UnixNativeCodeManager::IsInProlog(MethodInfo * pMethodInfo, PTR_VOID pvAddre
 #elif defined(TARGET_RISCV64)
 
 // store pair with signed offset
-// 0100 00xx xxxxxxxx xxxx xxxx xxxx xxxx
-#define STW_PAIR_BITS 0x04000000
-#define STW_PAIR_MASK 0xFC000000
+#define STW_PAIR_BITS 0x00003023
+#define STW_PAIR_MASK 0x0000707F
 
-// add fp, sp, x
-// addi fp, sp, x
-// 0000 0001 100x xxxx xxxx xxxx 0000 0000
-#define ADD_FP_SP_BITS 0x01C00000
-#define ADD_FP_SP_MASK 0xFFFFE000
-
-#define STW_PAIR_RS1_MASK 0xF80
-#define STW_PAIR_RS1_SP  0xF80
-#define STW_PAIR_RS1_FP  0xF00
-#define STW_PAIR_RS2_MASK 0xF00
-#define STW_PAIR_RS2_FP  0xF00
-#define STW_PAIR_RS2_RA  0xF40
+// add[i] fp, sp, x
+#define ADD_FP_SP_BITS 0x00010413
+#define ADD_FP_SP_MASK 0x000FFFFF
+
+// add[i] sp, sp, x
+#define ADD_SP_SP_BITS 0x00010113
+#define ADD_SP_SP_MASK 0x000FFFFF
+
+#define STW_PAIR_RS1_MASK 0xF8000
+#define STW_PAIR_RS1_SP  0x10000
+#define STW_PAIR_RS1_FP  0x40000
+#define STW_PAIR_RS2_MASK 0x1F00000
+#define STW_PAIR_RS2_FP  0x800000
+#define STW_PAIR_RS2_RA  0x100000
 
     UnixNativeMethodInfo * pNativeMethodInfo = (UnixNativeMethodInfo *)pMethodInfo;
     ASSERT(pNativeMethodInfo != NULL);
@@ -740,7 +741,7 @@ int UnixNativeCodeManager::IsInProlog(MethodInfo * pMethodInfo, PTR_VOID pvAddre
         {
             establishedFp = true;
         }
-        else
+        else if ((instr & ADD_SP_SP_MASK) != ADD_SP_SP_BITS)
         {
             // JIT generates other patterns into the prolog that we currently don't
             // recognize (saving unpaired register, stack pointer adjustments). We

It doesn't fix the DynamicGenerics prolog by itself but it should still be reviewed for correctness.

UPD: TrailingEpilogueInstructionsCount could be wrong too, but at least it seems to get the masks right.

@am11
Copy link
Member Author

am11 commented Feb 20, 2025

Should we apply it to the branch?

@filipnavara
Copy link
Member

filipnavara commented Feb 20, 2025

Should we apply it to the branch?

I pushed it, but it certainly deserves extra scrutiny.

@am11
Copy link
Member Author

am11 commented Feb 20, 2025

So far testing has not reveled any regression/difference.

Two other types of failures in DynamicGenerics:

Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
DynamicGenerics: /runtime/src/coreclr/gc/gc.cpp:27196: bool WKS::is_in_heap_range(uint8_t *): Assertion `((g_gc_lowest_address <= o) && (o < g_gc_highest_address)) || (o == nullptr) || (ro_segment_lookup (o) != nullptr)' failed.
Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
Process terminated. Assertion failed.
hashCode == _entries[walk1]._hashCode
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x64
   at System.Diagnostics.Debug.Fail(String, String) + 0x70
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x58
   at System.Diagnostics.Debug.Assert(Boolean, String) + 0x3c
   at System.Collections.Concurrent.ConcurrentUnifier`2.Container.VerifyUnifierConsistency() + 0x390
   at System.Collections.Concurrent.ConcurrentUnifier`2.GetOrAdd(K) + 0x148
   at System.Reflection.Runtime.TypeInfos.RuntimeTypeInfo.TypeComponentsCache.GetQueriedMembers[M](MemberPolicies`1, String, Boolean) + 0x218
   at System.Reflection.Runtime.TypeInfos.RuntimeTypeInfo.Query[M](MemberPolicies`1, String, BindingFlags, Func`2) + 0x134
   at System.Reflection.Runtime.TypeInfos.RuntimeTypeInfo.Query[M](MemberPolicies`1, String, BindingFlags) + 0xb0
   at System.Reflection.Runtime.TypeInfos.RuntimeTypeInfo.GetMethodImpl(String, Int32, BindingFlags, Binder, CallingConventions, Type[], ParameterModifier[]) + 0x174
   at System.RuntimeType.GetMethodImpl(String, BindingFlags, Binder, CallingConventions, Type[], ParameterModifier[]) + 0x78
   at System.Type.GetMethod(String, BindingFlags) + 0x70
   at System.Reflection.TypeInfo.GetDeclaredMethod(String) + 0x34
   at ThreadLocalStatics.TLSTesting.MakeType1(Type, Boolean) + 0x608
   at ThreadLocalStatics.TLSTesting.<>c__DisplayClass3_0.<MultiThreaded_Test>b__0() + 0x48
   at System.Threading.Tasks.Task.InnerInvoke() + 0x94
   at System.Threading.Tasks.Task.<>c.<.cctor>b__292_0(Object obj) + 0x78
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread, ExecutionContext, ContextCallback, Object) + 0x114
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task&, Thread) + 0x214
   at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread) + 0xcc
   at System.Threading.Tasks.Task.ExecuteFromThreadPool(Thread) + 0x2c
   at System.Threading.ThreadPoolWorkQueue.DispatchWorkItem(Object, Thread) + 0x88
   at System.Threading.ThreadPoolWorkQueue.Dispatch() + 0x888
   at System.Threading.PortableThreadPool.WorkerThread.WorkerDoWork(PortableThreadPool, Boolean&) + 0x6c
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() + 0x31c
   at DynamicGenerics!<BaseAddress>+0x995ee4
   at System.Threading.Thread.StartHelper.RunWorker() + 0xac
   at System.Threading.Thread.StartHelper.Run() + 0xf8
   at System.Threading.Thread.StartThread(IntPtr) + 0x18c
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x38
Aborted

Aside fromThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test, others are passing for me in DynamicGenerics suite.

@filipnavara
Copy link
Member

I found some bugs in the GC hijacking assembly code. Trying to fix that and I'll report back.

@filipnavara
Copy link
Member

filipnavara commented Feb 20, 2025

Here's my current diff. It doesn't fix it but it changes the failure mode quite a lot, so it may be on the right track.

Firstly, PTFF_THREAD_HIJACK_HI had incorrect value and it was not applied correctly (shift to high bits). Secondly, #110799 changed how the flags in RhpTrapThreads are interpreted and that was not reflected.

diff --git a/src/coreclr/nativeaot/Runtime/riscv64/GcProbe.S b/src/coreclr/nativeaot/Runtime/riscv64/GcProbe.S
index 019199d548b..fc25713938e 100644
--- a/src/coreclr/nativeaot/Runtime/riscv64/GcProbe.S
+++ b/src/coreclr/nativeaot/Runtime/riscv64/GcProbe.S
@@ -44,7 +44,7 @@
 
     # Perform the rest of the PInvokeTransitionFrame initialization.
     sd  \threadReg, OFFSETOF__PInvokeTransitionFrame__m_pThread(sp)        # Thread * (unused by stackwalker)
-    sd  \BITMASK, (OFFSETOF__PInvokeTransitionFrame__m_pThread + 8)(sp)    # Save the register bitmask passed in by caller
+    sd  \BITMASK, OFFSETOF__PInvokeTransitionFrame__m_Flags(sp)            # Save the register bitmask passed in by caller
 
     addi  \trashReg, sp, PROBE_FRAME_SIZE                                  # Recover value of caller's SP
     sd  \trashReg, 0x78(sp)                                                # Save caller's SP
@@ -100,14 +100,13 @@
 NESTED_ENTRY RhpGcProbeHijack, _TEXT, NoHandler
     FixupHijackedCallstack
 
-    PREPARE_EXTERNAL_VAR_INDIRECT_W RhpTrapThreads, a3
-    andi  t3, a3, 1 << TrapThreadsFlags_TrapThreads_Bit
+    PREPARE_EXTERNAL_VAR_INDIRECT_W RhpTrapThreads, t3
+    andi  t3, t3, 1 << TrapThreadsFlags_TrapThreads_Bit
     bnez  t3, LOCAL_LABEL(WaitForGC)
     jr  ra
 
 LOCAL_LABEL(WaitForGC):
-    li  t6, (DEFAULT_FRAME_SAVE_FLAGS + PTFF_SAVE_A0 + PTFF_SAVE_A1 + PTFF_THREAD_HIJACK_HI)
-    or  t3, t3, t6
+    li  t3, (DEFAULT_FRAME_SAVE_FLAGS + PTFF_SAVE_A0 + PTFF_SAVE_A1 + (PTFF_THREAD_HIJACK_HI << 32))
     tail  C_FUNC(RhpWaitForGC)
 NESTED_END RhpGcProbeHijack
 
diff --git a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosloongarch64.inc b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosloongarch64.inc
index b78210c8f85..535d7ca303c 100644
--- a/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosloongarch64.inc
+++ b/src/coreclr/nativeaot/Runtime/unix/unixasmmacrosloongarch64.inc
@@ -198,7 +198,7 @@ C_FUNC(\Name):
 #define PTFF_SAVE_R4            0x00000800
 #define PTFF_SAVE_R5            0x00001000
 #define PTFF_SAVE_ALL_PRESERVED 0x000001FF // NOTE: r23-r31
-#define PTFF_THREAD_HIJACK_HI   0x00000002 // upper 32 bits of the PTFF_THREAD_HIJACK
+#define PTFF_THREAD_HIJACK_HI   0x00000001 // upper 32 bits of the PTFF_THREAD_HIJACK
 
 #define DEFAULT_FRAME_SAVE_FLAGS (PTFF_SAVE_ALL_PRESERVED + PTFF_SAVE_SP)
 

@filipnavara
Copy link
Member

filipnavara commented Feb 20, 2025

I pushed the above change with few minor tweaks. There's still something way off though...

I am pretty sure that I got trashed registers again because FixupHijackedCallstack calls INLINE_GETTHREAD which calls INLINE_GET_TLS_VAR and doesn't save the a0 register containing the return value of the hijacked method. I tried to save it to t3 but that apparently was not enough. We really should rationalize this because this is fragile and it will bite us again and again.

To get further, here's a debugging trick:

--- a/src/coreclr/nativeaot/Runtime/riscv64/GcProbe.S
+++ b/src/coreclr/nativeaot/Runtime/riscv64/GcProbe.S
@@ -101,6 +101,7 @@
 //
 NESTED_ENTRY RhpGcProbeHijack, _TEXT, NoHandler
     FixupHijackedCallstack
+    ret
 
     PREPARE_EXTERNAL_VAR_INDIRECT_W RhpTrapThreads, t3
     andi  t3, t3, 1 << TrapThreadsFlags_TrapThreads_Bit

We can skip the GC check in the hijack. It currently still fails which means that the hijacking trashes the state. This could either be the hijacking machinery, or FixupHijackedCallstack not behaving correctly.

@filipnavara
Copy link
Member

Aaargh, t3 gets trashed by INLINE_GETTHREAD too. Using t1 works.

@filipnavara
Copy link
Member

I think DynamicGenerics is fixed now. Please test.

@am11
Copy link
Member Author

am11 commented Feb 20, 2025

Congrats! 🎉 I tested it, it passes on device and qemu. Also, UnitTests suite is also passing now on my device consistently. Opened #112736.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Related to the RISC-V architecture area-NativeAOT-coreclr
Projects
Status: No status
Development

No branches or pull requests

6 participants