Skip to content

Assert/assert_in_simultaneously_multiple_tus.cpp randomly failing on CUDA #8800

Open
@cperkinsintel

Description

@cperkinsintel

This test has been failing randomly on various pull requests on intel/llvm. It seems to be a flaky failure, since simply re-running the tests will usually clear it.

I'm planning on disabling the test, but it should be investigated.

Not sure if this link will work - it should go to an example failure: https://github.com/intel/llvm/runs/7246848829?check_suite_focus=true

Here are the relevant excerpts from the fail log

sycl-ls
  [ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 3090 0.0 [CUDA 11.7]
  [ext_intel_esimd_emulator:gpu:0] Intel(R) ESIMD_EMULATOR/GPU, ESIMD_EMULATOR 7.3 [0.1.0]
  [host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

[0/1] Running the SYCL tests for check-sycl-cuda-gpu_host backend
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:231: note: Backend: ext_oneapi_cuda
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:316: note: Test HOST device
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:346: warning: CPU device not used
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:361: note: Test GPU device
-- Testing: 912 tests, 12 workers --
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:397: warning: Accelerator device not used
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:432: note: Found llvm-spirv
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:432: note: Found llvm-link
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:451: warning: Couldn't find pre-installed AOT device compiler ocloc
lit.py: /__w/llvm/llvm/llvm_test_suite/SYCL/lit.cfg.py:448: note: Found pre-installed AOT device compiler opencl-aot
FAIL: SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp (26 of 912)
******************** TEST 'SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp' FAILED ********************
Script:
--
: 'RUN: at line 3';    /__w/llvm/llvm/toolchain/bin/clang++      -DSYCL_FALLBACK_ASSERT=1 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -DDEFINE_NDEBUG_INFILE2 -I /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/Inputs /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus.cpp /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/Inputs/kernels_in_file2.cpp -o /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.out -lpthread
: 'RUN: at line 4';   true /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.out &> /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt || true
: 'RUN: at line 5';   true FileCheck /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp --input-file /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt
: 'RUN: at line 12';   env SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY=1  env SYCL_DEVICE_FILTER=ext_oneapi_cuda:gpu,host SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1  /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.out &> /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt || true
: 'RUN: at line 13';    env SYCL_DEVICE_FILTER=ext_oneapi_cuda:gpu,host SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT=1  FileCheck /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp --input-file /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt
: 'RUN: at line 15';   true /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.out &> /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt
: 'RUN: at line 16';   true FileCheck /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp --check-prefix=CHECK-ACC --input-file /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt
--
Exit Code: 1
Command Output (stdout):
--
$ ":" "RUN: at line 3"
note: command had no output on stdout or stderr
$ "/__w/llvm/llvm/toolchain/bin/clang++" "-DSYCL_FALLBACK_ASSERT=1" "-fsycl" "-fsycl-targets=nvptx64-nvidia-cuda" "-DDEFINE_NDEBUG_INFILE2" "-I" "/__w/llvm/llvm/llvm_test_suite/SYCL/Assert/Inputs" "/__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus.cpp" "/__w/llvm/llvm/llvm_test_suite/SYCL/Assert/Inputs/kernels_in_file2.cpp" "-o" "/__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.out" "-lpthread"
# command stderr:
warning: linking module '/__w/llvm/llvm/toolchain/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/__w/llvm/llvm/toolchain/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas '/__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus.cpp' is 'nvptx64-nvidia-cuda'
^
/__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt:17:62: note: possible intended match here
terminate called after throwing an instance of 'cl::sycl::runtime_error'
                                                             ^
Input file: /__w/llvm/llvm/build/SYCL/Assert/Output/assert_in_simultaneously_multiple_tus_one_ndebug.cpp.tmp.txt
Check file: /__w/llvm/llvm/llvm_test_suite/SYCL/Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
-dump-input=help explains the following input dump.
Input was:
<<<<<<
            1:  
check:18'0     X error: no match found
            2: PI CUDA ERROR: 
check:18'0     ~~~~~~~~~~~~~~~
            3:  Value: 710 
check:18'0     ~~~~~~~~~~~~
            4:  Name: CUDA_ERROR_ASSERT 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~
            5:  Description: device-side assert triggered 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            6:  Function: wait 
check:18'0     ~~~~~~~~~~~~~~~~
            .
            .
            .
           12:  Name: CUDA_ERROR_ASSERT 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  Description: device-side assert triggered 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  Function: operator() 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~
           15:  Source Location: /__w/llvm/llvm/src/sycl/plugins/cuda/pi_cuda.cpp:2537 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:  
check:18'0     ~
           17: terminate called after throwing an instance of 'cl::sycl::runtime_error' 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:18'1                                                                  ?            possible intended match
           18:  what(): Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) 
check:18'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
error: command failed with exit status: 1
--
********************

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-endruntimeRuntime library related issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions