Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on both CPU and GPU builds #57

Closed
TaufeqRazakh opened this issue Oct 25, 2023 · 2 comments
Closed

Segmentation fault on both CPU and GPU builds #57

TaufeqRazakh opened this issue Oct 25, 2023 · 2 comments

Comments

@TaufeqRazakh
Copy link
Member

I tried a CPU build with the following recipe

cmake -DCMAKE_CXX_COMPILER=icpx -DCMAKE_Fortran_COMPILER=mpif90 ..

DAT(input) file was built once with the following command

cd init
make all
/geninit -i nh3.xyz -n 
./geninit -i norm.xyz 
cp rxff.bin ../DAT

And I run into the following error when using the executble from the project dir where DAT/ is

~/RXMD> ./rxmd_exe 
INFO: myid,hostname          0 x1921c0s
              rxmd has started
================================================================================
element,mass,filename:   1-H      1.000   pot/deployed.pth
element,mass,filename:   2-N     14.000   pot/deployed.pth
================================================================================
 get_mdcontext_func : mdcontext_rxmdnn
================================================================================
element,mass,filename:   1-H      1.000   pot/deployed.pth
element,mass,filename:   2-N     14.000   pot/deployed.pth
================================================================================
INFO: Opening file in ReadBIN(): DAT/rxff.bin
------------------------------------------------------------
         req/alloc # of procs:        1  /        1
         req proc arrangement:        1        1        1
                time step[fs]:    2.00E-01
 MDMODE CURRENTSTEP NTIMESTEP:  1         0     20000
treq,vsfact,sstep,vmag_factor:      10.000   0.990       10   5.000
    fstep,pstep,xyz_num_stack:   100    10     1
               NATOMS GNATOMS:                     432                     432
                         LBOX:       1.000       1.000       1.000
                  Hmatrix [A]:         16.515          0.000          0.000
                  Hmatrix [A]:          0.000         16.515          0.000
                  Hmatrix [A]:          0.000          0.000         16.515
               lata,latb,latc:      16.515      16.515      16.515
          lalpha,lbeta,lgamma:      90.000      90.000      90.000
          NBUFFER, MAXNEIGHBS:      100000         100
------------------------------------------------------------
                     DataDir :         DAT
             FFPath, ParmPath:      ffield      rxmd.in
------------------------------------------------------------
------------------------------------------------------------
               density [g/cc]:    0.6768
         # of linkedlist cell:******************
            maxrc, lcsize [A]:     0.000       -0.00     -0.00     -0.00
          # of atoms per type:         324 - 1         108 - 2 
------------------------------------------------------------
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
libze_intel_gpu.s  000014908752C4D5  Unknown               Unknown  Unknown
libpthread-2.31.s  00001490909BC8C0  Unknown               Unknown  Unknown
rxmd_exe           0000000000417273  Unknown               Unknown  Unknown
rxmd_exe           00000000004691E5  Unknown               Unknown  Unknown
rxmd_exe           00000000004695E6  Unknown               Unknown  Unknown
rxmd_exe           000000000040CE5C  Unknown               Unknown  Unknown
rxmd_exe           000000000040CC6D  Unknown               Unknown  Unknown
libc-2.31.so       00001490907E624D  __libc_start_main     Unknown  Unknown
rxmd_exe           000000000040CB9A  Unknown               Unknown  Unknown

Any help would be greatly appreciated.

@ye-luo
Copy link
Collaborator

ye-luo commented Oct 25, 2023

compiler with -g and run with a debugger.

@TaufeqRazakh
Copy link
Member Author

Looks like the issue is from a bad division when adding a breakpoint to

  • utils::update_box_params
(gdb) list - 
271	LBOX(1)=lata/vprocs(1)
272	LBOX(2)=latb/vprocs(2)
273	LBOX(3)=latc/vprocs(3)
274	
275	!--- get the number of linkedlist cell per domain
276	cc(1:3)=int(LBOX(1:3)/maxrc)
277	
278	!--- local system size in the unscaled coordinate.
279	LBOX(1:3) = 1.d0/vprocs(1:3)
280	
(gdb) p vprocs 
$1 = (1, 1, 1)
(gdb) p lata
$2 = 16.51501
(gdb) p lbox 
$3 = (7.1253040736177892e-317, 7.1258139493642974e-317, 2.3815782291124779e-317)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants