nvptx-none-run: Reduce CU_LIMIT_STACK_SIZE from 256 KiB to 128 KiB.

... to work around <#8>. According to Table 12, Technical Specifications per Compute Capability, on <http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications>, there is 512 KiB of local memory per thread, so a stack with 256 KiB seemed workable (and indeed is, with Nvidia Quadro K1000M hardware, driver version 340.24, CUDA 5.5 installation), but not on a system with Nvidia Tesla K20c hardware, driver version 319.37, CUDA 5.5 installation. On the Nvidia Quadro K1000M system, no changes in GCC testsuite results.
SourceryTools · Feb 13, 2015 · b4ca018 · b4ca018
1 parent 2104452
commit b4ca018
Showing 1 changed file with 3 additions and 2 deletions.
diff --git a/nvptx-run.c b/nvptx-run.c
@@ -224,14 +224,15 @@ This program has absolutely no warranty.\n",
   }
 
 #if 0
-  /* Default seems to be 8k stack, 8M heap.  */
+  /* Default seems to be 1 KiB stack, 8 MiB heap.  */
   size_t stack, heap;
   cuCtxGetLimit (&stack, CU_LIMIT_STACK_SIZE);
   cuCtxGetLimit (&heap, CU_LIMIT_MALLOC_HEAP_SIZE);
   printf ("stack %ld heap %ld\n", stack, heap);
 #endif
 
-  r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 256 * 1024);
+  /* <https://github.com/MentorEmbedded/nvptx-tools/issues/8>.  */
+  r = cuCtxSetLimit(CU_LIMIT_STACK_SIZE, 128 * 1024);
   fatal_unless_success (r, "could not set stack limit");
   r = cuCtxSetLimit(CU_LIMIT_MALLOC_HEAP_SIZE, 256 * 1024 * 1024);
   fatal_unless_success (r, "could not set heap limit");