- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33
Memory Usage in Kernel
        Kilik Kuo edited this page Mar 7, 2017 
        ·
        13 revisions
      
    This page shares findings or problems we've encountered.
NOTE : On windows, out of resources error may happen easily when NV device is used due to TdrLevel enabled, turn it OFF to fix it.
- On CPU device, local memory is a regular RAM - same as global memory.
- On GPU device, very fast on-chip controllable cache.
To find out device local memory size
import pyopencl as cl
from pyopencl import device_info as di
# dev is the target cl.Device instance.
local_memory_size = dev.get_info(di.LOCAL_MEM_SIZE)
Memory caching implementation on Intel architecture
- Two ways
- 
a) In-kernel allocation, e.g. #define LM_SIZE 1024 __kernel void test_1(...) { __local int localArray[LM_SIZE]; } __kernel void test_2(...) { __local int localArray[1024]; }
- 
b) Host-side allocation, e.g. allocating 32 KBs local memory. - Python
 prog.test_input_local(queue, global_work_items, local_work_items, cl.LocalMemory(32*1024)).wait()- Kernel
 __kernel void test_input_local(local int* localArray) {}NOTE : The usage of local memory can NOT be calculated during compilation. Out-Of-Resources may happen during runtime. 
 
- 
TBD
- Used when a). register spilling happens 2). private array is used.
TBD
TBD
   Device name                             : Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
   Device type                             :                                      CPU
   Device version                          :                    OpenCL 2.1 (Build 18)
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                      64B
   Global memory cache size                :                                 256.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   7.71GB
   Max preferred size of global variables  :                                  64.00KB
   Local memory size                       :                                  32.00KB
   Local memory type                       :                                   GLOBAL
   Max constant arguments count            :                                      480
   Max size of a constant buffer           :                                 128.00KB
   Max global variable size                :                                  64.00KB
   Max size of memory object allocation    :                                   1.93GB
   Max parameter size                      :                                   3.75KB
   Max pipe objects                        :                                       16
   max work group size                     :                                     8192
   max work item dimensions                :                                        3
   max work item size                      :                       [8192, 8192, 8192]
   base address align                      :                                     1024
   Local memory size                       :                                  32.00KB
   The max size of the device queue        :                                   4.00GB
   The size of the device queue            :                                   4.00GB
   Compute Units                           :                                        4
   Device name                             :                         GeForce GTX 950M
   Device type                             :                                      GPU
   Device version                          :                          OpenCL 1.2 CUDA
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                     128B
   Global memory cache size                :                                  80.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   2.00GB
   Max preferred size of global variables  :                  Not available (version)
   Local memory size                       :                                  48.00KB
   Local memory type                       :                                    LOCAL
   Max constant arguments count            :                                        9
   Max size of a constant buffer           :                                  64.00KB
   Max global variable size                :                  Not available (version)
   Max size of memory object allocation    :                                 512.00MB
   Max parameter size                      :                                   4.25KB
   Max pipe objects                        :                  Not available (version)
   max work group size                     :                                     1024
   max work item dimensions                :                                        3
   max work item size                      :                         [1024, 1024, 64]
   base address align                      :                                     4096
   Local memory size                       :                                  48.00KB
   The max size of the device queue        :                  Not available (version)
   The size of the device queue            :                  Not available (version)
   Device command-queue properties         :                  Not available (version)
   Host command-queue properties           :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE
   Device name                             :                 Intel(R) HD Graphics 530
   Device type                             :                                      GPU
   Device version                          :                              OpenCL 2.0
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                      64B
   Global memory cache size                :                                 512.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   3.15GB
   Max preferred size of global variables  :                                   2.00GB
   Local memory size                       :                                  64.00KB
   Local memory type                       :                                    LOCAL
   Max constant arguments count            :                                        8
   Max size of a constant buffer           :                                   2.00GB
   Max global variable size                :                                  64.00KB
   Max size of memory object allocation    :                                   2.00GB
   Max parameter size                      :                                   1.00KB
   Max pipe objects                        :                                       1
   max work group size                     :                                      256
   max work item dimensions                :                                        3
   max work item size                      :                          [256, 256, 256]
   base address align                      :                                     1024
   Local memory size                       :                                  64.00KB
   The max size of the device queue        :                                  64.00MB
   The size of the device queue            :                                 128.00KB
   Device command-queue properties         :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE
   Host command-queue properties           :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE
   Device name                             : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
   Device type                             :                                      CPU
   Device version                          :                   OpenCL 2.0 (Build 359)
   Device Profile                          :                             FULL_PROFILE
=====================================================================================
   Global memory cache line size           :                                      64B
   Global memory cache size                :                                 256.00KB
   Global memory cache type                :                         READ_WRITE_CACHE
   Global memory size                      :                                   7.89GB
   Max preferred size of global variables  :                                  64.00KB
   Local memory size                       :                                  32.00KB
   Local memory type                       :                                   GLOBAL
   Max constant arguments count            :                                      480
   Max size of a constant buffer           :                                 128.00KB
   Max global variable size                :                                  64.00KB
   Max size of memory object allocation    :                                   1.97GB
   Max parameter size                      :                                   3.75KB
   Max pipe objects                        :                                       16
   max work group size                     :                                     8192
   max work item dimensions                :                                        3
   max work item size                      :                       [8192, 8192, 8192]
   base address align                      :                                     1024
   Local memory size                       :                                  32.00KB
   The max size of the device queue        :                                   4.00GB
   The size of the device queue            :                                   4.00GB
   Device command-queue properties         :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE
   Host command-queue properties           :            OUT_OF_ORDER_EXEC_MODE_ENABLE
                                           :                         PROFILING_ENABLE