Skip to content

Conversation

@charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Jul 19, 2024

This is a big step towards closing #32.

What else is needed:

  • Remove CUDA/Adapt extension/dependency
  • Add other backend tests?
  • Improve the performance
  • Fix Add CUDA tests #34

Right now, at least on the CPU, the KA kernels are more than 5x slower. cc @avik-pal @vchuravy

@avik-pal
Copy link

Thanks for looking into this so quickly. I generally tend not to use KA for CPU (in favor of just the @simd loop), but I think it uses threading for CPU, which might not be ideal for smaller broadcasting problems.

@charleskawczynski
Copy link
Member Author

Thanks for looking into this so quickly. I generally tend not to use KA for CPU (in favor of just the @simd loop), but I think it uses threading for CPU, which might not be ideal for smaller broadcasting problems.

Yeah, I thought I'd give it a try, it's kind of a nice solution, to be honest. The only reason CUDA is still needed in the extension is for Adapt. I think that this branch should mostly support other backends, but if Adapt is needed for other backends, then I think you'd need to add it. I don't have experience adapting AMD gpus, but I imagine the pattern is very similar.

Feel free to try things on this branch. I probably won't merge this in until we add tests (to close #34), and perhaps fix the cpu perf, or just revert things to how we previously had it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CUDA tests

3 participants