Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realm: assert at startup #1839

Open
Tracked by #1032
mariodirenzo opened this issue Feb 28, 2025 · 2 comments
Open
Tracked by #1032

Realm: assert at startup #1839

mariodirenzo opened this issue Feb 28, 2025 · 2 comments
Assignees

Comments

@mariodirenzo
Copy link

Because of changes in the master branch between a8906edfa and cc5db48d4, I cannot start any legion program on a system I utilize for the CI of HTR++.
In particular, I get the following assertion

/home/mdirenzo/legion/src/runtime/realm/hardware_topology.cc:789: Realm::HardwareTopology::HardwareTopology(const std::vector<Proc>&, const std::vector<MemoryInfo>&, size_t): Assertion `num_logical_cores_per_physical_core == proc.shares_fpu.size() + 1' failed.

This is the backtrace

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fffe2e4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fffe2e288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fffe2e2881b in __assert_fail_base (fmt=0x7fffe2fd01e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555558b8b708 "num_logical_cores_per_physical_core == proc.shares_fpu.size() + 1", file=file@entry=0x555558b8b508 "/home/mdirenzo/legion/src/runtime/realm/hardware_topology.cc", line=line@entry=789,
    function=function@entry=0x555558b8b668 "Realm::HardwareTopology::HardwareTopology(const std::vector<Proc>&, const std::vector<MemoryInfo>&, size_t)") at ./assert/assert.c:96
#6  0x00007fffe2e3b517 in __assert_fail (assertion=0x555558b8b708 "num_logical_cores_per_physical_core == proc.shares_fpu.size() + 1", file=0x555558b8b508 "/home/mdirenzo/legion/src/runtime/realm/hardware_topology.cc", line=789, function=0x555558b8b668 "Realm::HardwareTopology::HardwareTopology(const std::vector<Proc>&, const std::vector<MemoryInfo>&, size_t)")
    at ./assert/assert.c:105
#7  0x0000555557ff20b4 in Realm::HardwareTopology::HardwareTopology (this=0x7fffffffb9c0, logical_cores=std::vector of length 32, capacity 32 = {...}, memories=std::vector of length 1, capacity 1 = {...}, host_memory=202388615168) at /home/mdirenzo/legion/src/runtime/realm/hardware_topology.cc:789
#8  0x0000555557ff1c47 in Realm::HardwareTopology::create_topology () at /home/mdirenzo/legion/src/runtime/realm/hardware_topology.cc:760
#9  0x0000555557f7bf75 in Realm::RuntimeImpl::create_configs (this=0x5555598960f0, argc=2, argv=0x7fffffffd9f8) at /home/mdirenzo/legion/src/runtime/realm/runtime_impl.cc:2910
#10 0x0000555557f70bd1 in Realm::Runtime::create_configs (this=0x7fffffffbb38, argc=2, argv=0x7fffffffd9f8) at /home/mdirenzo/legion/src/runtime/realm/runtime_impl.cc:777
#11 0x0000555556c35065 in Legion::Internal::Runtime::initialize (argc=0x7fffffffd62c, argv=0x7fffffffd620, parse=true, filter=false) at /home/mdirenzo/legion/src/runtime/legion/runtime.cc:34160
#12 0x0000555556c346e1 in Legion::Internal::Runtime::start (argc=2, argv=0x7fffffffd9f8, background=false, supply_default_mapper=true, filter=false) at /home/mdirenzo/legion/src/runtime/legion/runtime.cc:34017
#13 0x0000555556b013e6 in Legion::Runtime::start (argc=2, argv=0x7fffffffd9f8, background=false, default_mapper=true, filter=false) at /home/mdirenzo/legion/src/runtime/legion/legion.cc:7311
#14 0x00007fffe2e2a1ca in __libc_start_call_main (main=main@entry=0x555556a416f0 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffd9f8) at ../sysdeps/nptl/libc_start_call_main.h:58
#15 0x00007fffe2e2a28b in __libc_start_main_impl (main=0x555556a416f0 <main>, argc=2, argv=0x7fffffffd9f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd9e8) at ../csu/libc-start.c:360
#16 0x0000555556a420a5 in _start ()

and

(gdb) f 7
#7  0x0000555557ff20b4 in Realm::HardwareTopology::HardwareTopology (this=0x7fffffffb9c0, logical_cores=std::vector of length 32, capacity 32 = {...}, memories=std::vector of length 1, capacity 1 = {...}, host_memory=202388615168) at /home/mdirenzo/legion/src/runtime/realm/hardware_topology.cc:789
789	        assert(num_logical_cores_per_physical_core == proc.shares_fpu.size() + 1);
(gdb) p num_logical_cores_per_physical_core
$1 = 2
(gdb) p proc.shares_fpu
$2 = std::set with 0 elements

Do you have any idea of what is going on?

@elliottslaughter, can you please add this issue to #1032 ?

@elliottslaughter
Copy link
Contributor

@eddy16112 have you touched the machine topology recently?

@eddy16112
Copy link
Contributor

Yes, I recently refactored the topology. @mariodirenzo Could you please try to remove the assertion, and run with -ll:show_rsrv? I would like to see the topology of the machine.

@eddy16112 eddy16112 self-assigned this Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants