Hardware runtime decision and utility #29

Brandon-Wilson · 2017-02-06T04:03:19Z

Hardware selection is now a runtime decision, such that precompiled
binaries can execute on a system without a GPU without crashing. In
addition, the target can be selected at runtime to define custom
hardware selection logic.

Several utility features have been added, including device runtime
functions (getting the number of devices / setting the active device),
basic shared memory support, and 3D execution grids.

There has been careful consideration with all of these changes to avoid
compatibility issues with pre-existing code.

Hardware selection is now a runtime decision, such that precompiled binaries can execute on a system without a GPU without crashing. In addition, the target can be selected at runtime to define custom hardware selection logic. Several utility features have been added, including device runtime functions (getting the number of devices / setting the active device), basic shared memory support, and 3D execution grids. There has been careful consideration with all of these changes to avoid compatibility issues with pre-existing code.

… rhel machines

Brandon-Wilson · 2017-08-07T16:56:07Z

NOTE: My advisors at JPL are interested in using HEMI / CUDA for image processing on upcoming missions, but would like to maintain full CPU compatibility. Preferably, these changes could be pushed to the master branch for support in the future.

Thank you for your time!

harrism

Overall I'm not sure some of the added complexity is in the spirit of hemi's original design. Please address the comments. I also think there's too much in this for one pull request. Please separate the CPU/GPU runtime execution selection from all of the utility pieces. That would make it easier for me to test and integrate.

Also, I won't accept any new functionality without tests. Please add tests (see the google tests already included in hemi for reference) for your new functionality.

harrism · 2017-11-12T00:27:55Z

hemi/configure.h

@@ -69,6 +69,47 @@ size_t availableSharedBytesPerBlock(size_t sharedMemPerMultiprocessor,
    return bytes - sharedSizeBytesStatic;    
 }

+template <typename Function, typename... Arguments>
+cudaError_t getMaxBlocksForDevice(unsigned int& NumBlocks, ExecutionPolicy &p)


A comment to explain what this getMaxBlocksForDevice... function is for vs. configureGrid would help me review this.

harrism · 2017-11-12T00:28:23Z

hemi/device_api.h

@@ -21,70 +21,272 @@

 namespace hemi
 {
+	///////Global Grid Gets//////////////////////////////////////////////////////


This comment doesn't match the style of other comments in the project. And what do you mean by "Gets"?

harrism · 2017-11-12T00:29:17Z

hemi/device_api.h

-    	return 0;
-    #endif
-    }
+		int globalThreadIndex() {


Why the deep indent?

harrism · 2017-11-12T00:31:36Z

hemi/device_api.h

-    }
+		int globalThreadIndex() {
+#ifdef HEMI_DEV_CODE
+		return (blockIdx.x + blockIdx.y * gridDim.x + blockIdx.z * gridDim.x * gridDim.y) // get block idx


OK, so you went full-3D for the global index.... I think this needs more design thought. The common case is 1D, and this is going to make the common case slow. I would leave globalThreadIndex() alone, and then make new accessors such as : globalThreadIndex1D, globalThreadIndex2D, globalThreadIndex3D. Or something like that. Do you use a lot of 3D blocks/grids?

harrism · 2017-11-12T00:31:56Z

hemi/device_api.h

+	HEMI_DEV_CALLABLE_INLINE
+		int globalThreadCount() {
+#ifdef HEMI_DEV_CODE
+		return blockDim.x * gridDim.x * blockDim.y * gridDim.y * blockDim.z * gridDim.z;


See my comment about globalThreadIndex above.

harrism · 2017-11-12T01:16:08Z

hemi/hemi_core.h

@@ -0,0 +1,26 @@
+///////////////////////////////////////////////////////////////////////////////


I would consider renaming this hemi.h and renaming hemi.h hemi_core.h... (So the convenient default is to include everything, but if you want to pare it down you can).

harrism · 2017-11-12T01:16:42Z

hemi/launch.h

@@ -24,10 +24,18 @@ namespace hemi {
    template <typename Function, typename... Arguments>
    void launch(Function f, Arguments... args);

+	// Automatic parallel launch
+	template <typename Function, typename... Arguments>
+	void launch(Arguments... args);


What is a launch without a kernel?

harrism · 2017-11-12T01:16:53Z

hemi/launch.h

    // Launch function object with an explicit execution policy / configuration
    template <typename Function, typename... Arguments>
    void launch(const ExecutionPolicy &p, Function f, Arguments... args);

+	// Launch function with an explicit execution policy / configuration
+	template <typename Function, typename... Arguments>
+	void launch(const ExecutionPolicy &p, Arguments... args);


harrism · 2017-11-12T01:17:53Z

hemi/launch.inl

+// Automatic parallel launch
+//
+template <typename Function, typename... Arguments>
+void launch(Arguments... args)


How does one pass a concrete function reference to this launch()? Can you show a working example?

harrism · 2017-11-12T01:18:42Z

hemi/launch.inl

+	if (p.getLocation() == hemi::device && queryForDevice())
+	{
+		checkCuda(configureGrid(p, Kernel<Function, Arguments...>));
+		Kernel << <p.getExecutionGrid(),


Please fix spacing ("Kernel << <" --> "Kernel<<<"

Brandon-Wilson added 4 commits February 5, 2017 22:02

switched compiler guards - the original ones were causing an error on…

99aa51d

… rhel machines

Added optional launch bounds to function-object style launches

8486606

Added heap size setter in device runtime

b266959

harrism requested changes Nov 12, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardware runtime decision and utility #29

Hardware runtime decision and utility #29

Brandon-Wilson commented Feb 6, 2017

Brandon-Wilson commented Aug 7, 2017

harrism left a comment

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

harrism Nov 12, 2017

		@@ -0,0 +1,26 @@
		///////////////////////////////////////////////////////////////////////////////

Hardware runtime decision and utility #29

Are you sure you want to change the base?

Hardware runtime decision and utility #29

Conversation

Brandon-Wilson commented Feb 6, 2017

Brandon-Wilson commented Aug 7, 2017

harrism left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment