Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

CUBLAS dot far slower than BLAS dot #17

@dpo

Description

@dpo

I wrote simple functions that perform dot products on Arrays and CudaArrays. I'm finding that the CUDA version is about 4x slower. Is this expected?

using CUDArt
using CUBLAS

function blasdots(x :: Vector{Float64}, y :: Vector{Float64}; kmax :: Int=100)
  for k = 1:kmax
    BLAS.dot(x, y)
  end
end

function cublasdots(d_x :: CudaArray{Float64}, d_y :: CudaArray{Float64}; kmax :: Int=100)
  for k = 1:kmax
    CUBLAS.dot(d_x, d_y)
  end
end

n = 10000
x = rand(n); y = rand(n)
d_x = CudaArray(x); d_y = CudaArray(y)

blasdots(x, y, kmax=1)  # compile
@time blasdots(x, y)

cublasdots(d_x, d_y, kmax=1)  # compile
@time cublasdots(d_x, d_y)

Running this script gives:

$ julia time_cublas.jl 
  0.001865 seconds (431 allocations: 27.450 KB)
  0.007459 seconds (583 allocations: 28.250 KB)
jl_uv_writecb() ERROR: bad file descriptor EBADF
jl_uv_writecb() ERROR: bad file descriptor EBADF
jl_uv_writecb() ERROR: bad file descriptor EBADF
jl_uv_writecb() ERROR: bad file descriptor EBADF
jl_uv_writecb() ERROR: bad file descriptor EBADF

(Bonus question: what's up with the EBADF???)

This is on OSX 10.9, Julia 0.4.1 installed from Homebrew, built against OpenBLAS, CUDA 7.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions