Name	Name	Last commit message	Last commit date
Latest commit History 106 Commits
Cekirdekler	Cekirdekler	v1.1.1 with single device selection capability (+ operator later to b…	Apr 6, 2017
.gitignore	.gitignore	Create .gitignore	Mar 31, 2017
LICENSE	LICENSE	Update LICENSE	Mar 29, 2017
README.md	README.md	Update README.md	Apr 6, 2017
_config.yml	_config.yml	Set theme jekyll-theme-time-machine	Mar 31, 2017

Repository files navigation

Cekirdekler

Very simple C# Multi-device GPGPU(OpenCL) compute API with an iterative interdevice-loadbalancing feature using multiple pipelining on read/write/compute operations for developers' custom opencl kernels.

Needs extra C++ dll built in 64-bit(x86_64) from https://github.com/tugrul512bit/CekirdeklerCPP which must be named KutuphaneCL.dll

The other needed dll is Microsoft's System.Threading.dll and its xml helper for .Net 2.0 - or - you can adjust "using" and use .Net 3.5+ for your own project and don't need System.Threading.dll.

In total, Cekirdekler.dll and KutuphaneCL.dll and using .Net 3.5 should be enough.

This project is being enhanced using ZenHub:

Features

Implicit multi device control: from CPUs to any number of GPUs and ACCeelerators. Explicit in library-side for compatibility and performance, implicit for client-coder for the ease of GPGPU to concentrate on opencl kernel code.
Iterative load balancing between devices: uniquely done for each different compute(explicit control with user-given compute-id). Multiple devices get more and more fair work loads until the ratio of work distribution converges to some point. Partitionig workload completes a kernel with less latency which is applicable for hot-spot loops and some simple embarrassingly-parallel algorithms. Even better for streaming data with pipelining option enabled.
Pipelining for reads, computes and writes: either by the mercy of device drivers or explicit event-based queue management. Hides the latency of least time consuming part(such as writes) behind the most time consuming part(such as compute). GPUs can run buffer copies and opencl kernels concurrently.
Working with different numeric arrays: Either C#-arrays like float[], int[], byte[],... or C++-array wrappers like ClFloatArray, ClArray<float>, ClByteArray, ClArray<byte>
Automatic buffer copy optimizations for devices: If a device shares RAM with CPU, it uses map/unmap commands to reduce number of array copies(instead of read/write). If also that device is given a C++ wrapper array(such as ClArray<float>), it also uses cl_use_host_ptr flag on buffer for a zero-copy access aka" streaming". By default, all devices have their own buffers.
Two different usage types: First one lets the developer choose all kernel parameters as arrays more explicitly for a more explicitly readable execution, second one creates same thing using a much shorter definition to complete in less code lines and change only the necessary flags instead of all.
Automatic resource dispose: When C++ array wrappers are finalized(out-of-scope, garbage collected), they release resources. Also dispose method can be called explicitly by developer.
Uses OpenCL 1.2: C++ bindings from Khronos.org for its base.

Documentation

You can see details and tutorial here in Cekirdekler-wiki

Known Issues

For C++ array wrappers like Array<float> there is no out-of-bounds-check, don't cross boundaries when accessing array indexing.
Don't use C++ array wrappers after they are disposed. These features are not added to speed-up array indexing.
Don't use ClNumberCruncher or Core instances after they are disposed.
Pay attention to "number of array elements used" per workitem in kernel and how they are given as parameters from API compute() method.
Pay attenton to "partial read"/"read"/"write" array copy modifiers when your kernel is altering(or reading) whole array or just a part of it.
No performance output at first iteration. Load balancer needs at least several iterations to distribute fairly and performance report needs at least 2 iterations for console output.

Example

        Cekirdekler.ClNumberCruncher cr = new Cekirdekler.ClNumberCruncher(
            Cekirdekler.AcceleratorType.GPU, @"
                __kernel void hello(__global char * arr)
                {
                    printf(""hello world"");
                }
            ");

        Cekirdekler.ClArray<byte> array = new Cekirdekler.ClArray<byte>(1000);
        array.compute(cr, 1, "hello", 1000, 100);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cekirdekler

Features

Documentation

Known Issues

Example

About

Releases 44

Packages

Languages

License

tugrul512bit/Cekirdekler

Folders and files

Latest commit

History

Repository files navigation

Cekirdekler

Features

Documentation

Known Issues

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 44

Packages 0

Languages

Packages