Name	Name	Last commit message	Last commit date
Latest commit tugrul512bit 64 bit binaries for lazy developers. Reference the Cekirdekler.dll in… May 27, 2017 a6d6746 · May 27, 2017 History 252 Commits
Cekirdekler	Cekirdekler	added ClArray.zeroCopy option to enable less overhead buffer access	May 27, 2017
.gitignore	.gitignore	Create .gitignore	Mar 31, 2017
Kamera.cs	Kamera.cs	Unity example script for zero-copy mesh deformation computation	Apr 16, 2017
LICENSE	LICENSE	Update LICENSE	Mar 29, 2017
README.md	README.md	Update README.md	May 12, 2017
_config.yml	_config.yml	Set theme jekyll-theme-time-machine	Mar 31, 2017
cekirdekler_dll_v127.rar	cekirdekler_dll_v127.rar	Binaries compiled (x64 arch) on FX8150 for lazy developers. Reference…	May 23, 2017
cekirdekler_dll_v128.rar	cekirdekler_dll_v128.rar	Compiled binaries(64 bit) on FX8150 CPU for lazy developers(reference…	May 26, 2017
cekirdekler_dll_v129.rar	cekirdekler_dll_v129.rar	64-bit binaries compiled on FX8150 for lazy developers. Reference Cek…	May 27, 2017
cekirdekler_dll_v1_2_9_hotpatch.rar	cekirdekler_dll_v1_2_9_hotpatch.rar	64 bit binaries for lazy developers. Reference the Cekirdekler.dll in…	May 27, 2017
deviceToDevicePipeline.gif	deviceToDevicePipeline.gif	Add files via upload	May 9, 2017
mandelbrot_bench_v4.rar	mandelbrot_bench_v4.rar	auto selects cpu and gpu	May 13, 2017
opencl64.png	opencl64.png	how configuration manager should look like	May 12, 2017

Repository files navigation

Cekirdekler

Very simple C# Multi-device GPGPU(OpenCL) compute API with an iterative interdevice-loadbalancing feature using multiple pipelining on read/write/compute operations for developers' custom opencl kernels.

64-bit only. "project settings -> build -> platform target -> x64" Also configuration manager needs to look like this:

Needs extra C++ dll built in 64-bit(x86_64) from https://github.com/tugrul512bit/CekirdeklerCPP which must be named KutuphaneCL.dll

The other needed dll is Microsoft's System.Threading.dll and its xml helper for .Net 2.0 - or - you can adjust "using" and use .Net 3.5+ for your own project and don't need System.Threading.dll.

In total, Cekirdekler.dll and KutuphaneCL.dll and using .Net 3.5 should be enough.

Usage: add only Cekirdekler.dll and system.threading.dll as references to your C# projects. Other files needs to exist in same folder with Cekirdekler.dll or the executable of main project.

This project is being enhanced using ZenHub:

Features

Implicit multi device control: from CPUs to any number of GPUs and ACCeelerators. Explicit in library-side for compatibility and performance, implicit for client-coder for the ease of GPGPU to concentrate on opencl kernel code. Selection of devices can be done implicitly or explicitly to achieve ease-of-setup or detailed device query.
Iterative load balancing between devices: uniquely done for each different compute(explicit control with user-given compute-id). Multiple devices get more and more fair work loads until the ratio of work distribution converges to some point. Partitionig workload completes a kernel with less latency which is applicable for hot-spot loops and some simple embarrassingly-parallel algorithms. Even better for streaming data with pipelining option enabled.
Pipelining for reads, computes and writes(host - device link): either by the mercy of device drivers or explicit event-based queue management. Hides the latency of least time consuming part(such as writes) behind the most time consuming part(such as compute). GPUs can run buffer copies and opencl kernels concurrently.
Pipelining between devices(device - host - device): Concurrently run multiple stages to overlap them in timeline and gain advantage of multiple GPUs(and FPGAa, CPUs) for even non-separable(because of atomics and low-level optimizations) kernels of a time-consuming pipeline. Each device runs a different kernel but at the same time with other devices and uses double buffers to overlap even data movements between pipeline stages.
Working with different numeric arrays: Either C#-arrays like float[], int[], byte[],... or C++-array wrappers like ClFloatArray, ClArray<float>, ClByteArray, ClArray<byte>
Automatic buffer copy optimizations for devices: If a device shares RAM with CPU, it uses map/unmap commands to reduce number of array copies(instead of read/write). If also that device is given a C++ wrapper array(such as ClArray<float>), it also uses cl_use_host_ptr flag on buffer for a zero-copy access aka" streaming". By default, all devices have their own buffers.
Two different usage types: First one lets the developer choose all kernel parameters as arrays more explicitly for a more explicitly readable execution, second one creates same thing using a much shorter definition to complete in less code lines and change only the necessary flags instead of all.
Automatic resource dispose: When C++ array wrappers are finalized(out-of-scope, garbage collected), they release resources. Also dispose method can be called explicitly by developer.
Uses OpenCL 1.2: C++ bindings from Khronos.org for its base.

Documentation

You can see details and tutorial here in Cekirdekler-wiki

Known Issues

For C++ array wrappers like Array<float> there is no out-of-bounds-check, don't cross boundaries when accessing array indexing.
Don't use C++ array wrappers after they are disposed. These features are not added to speed-up array indexing.
Don't use ClNumberCruncher or Core instances after they are disposed.
Pay attention to "number of array elements used" per workitem in kernel and how they are given as parameters from API compute() method.
Pay attenton to "partial read"/"read"/"write" array copy modifiers when your kernel is altering(or reading) whole array or just a part of it.
No performance output at first iteration. Load balancer needs at least several iterations to distribute fairly and performance report needs at least 2 iterations for console output.

Example

        Cekirdekler.ClNumberCruncher cr = new Cekirdekler.ClNumberCruncher(
            Cekirdekler.AcceleratorType.GPU, @"
                __kernel void hello(__global char * arr)
                {
                    printf(""hello world"");
                }
            ");

        Cekirdekler.ClArrays.ClArray<byte> array = new Cekirdekler.ClArrays.ClArray<byte>(1000);
        array.compute(cr, 1, "hello", 1000, 100);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cekirdekler

Features

Documentation

Known Issues

Example

About

Releases 44

Packages

Languages

License

tugrul512bit/Cekirdekler

Folders and files

Latest commit

History

Repository files navigation

Cekirdekler

Features

Documentation

Known Issues

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 44

Packages 0

Languages

Packages