Skip to content

Conversation

Alexandr-Konovalov
Copy link
Contributor

Create possibility to delay creation of copy of HostKernel till it became used out of submit stack, i.e. by scheduler. Do type erasure for kernel lambda via vptr in HostKernelRefBase.

Create possibility to delay creation of copy of HostKernel till it became used
out of submit stack, i.e. by scheduler. Do type erasure for kernel lambda via
vptr in HostKernelRefBase.
Comment on lines 261 to 266
#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
// This function is needed for host-side compilation to keep kernels
// instantitated. This is important for debuggers to be able to associate
// kernel code instructions with source code lines.
// NOTE: InstatiateKernelOnHost() should not be called.
void InstantiateKernelOnHost() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be empty. HostKernelRef<...> instantiates HostKernel<...> on line 256, and its InstantiateKernelOnHost already does the right thing (outside preview). And for preview we need a mechanism that doesn't require copy-paste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it can be empty. I try to describe the reason in comment.

And for preview we need a mechanism that doesn't require copy-paste.

Are we talking about GetInstantiateKernelOnHostPtr() call, right? It required template parameter, so unclear what can we done other then adding the call to templated constructor.

virtual char *getPtr() override {
return const_cast<char *>(reinterpret_cast<const char *>(&MKernel));
}
virtual std::shared_ptr<HostKernelBase> takeOrCopyOwnership() const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this should std::unique_ptr because it has no overhead and one can always easily create shared via unique_ptr::release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally agree. Overhead is the creation/destructor of unique_ptr, meanwhile caller needs shared_ptr. (And for shared_ptr we return 2 pointers vs 1 for unique_ptr, so it's hard to judge). Is that a chance that someday caller would need unique_ptr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overhead is the creation/destructor of unique_ptr

is exactly zero with optimizations enabled: https://godbolt.org/z/fcaos1Wr7

That is not true for std::shared_ptr (which not only has extra memory alloc but also involves atomics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good point. Done.


std::shared_ptr<detail::HostKernelBase> HostKernel = std::make_shared<
detail::HostKernel<KernelType, TransformedArgType, Dims>>(KernelFunc);
HostKernelRef<KernelType, TransformedArgType, Dims> HostKernel(KernelFunc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could do

const HostKernelRefBase &TypeErasedKernel = HostKernerlRef<...>{KernelFunc};

(https://godbolt.org/z/h9v9s3TrG), but getPtr() isn't marked as const 😞


class TestMoveFunctor {
public:
static int MoveCtorCalls;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid c++ doesn't allow to init it here, right?

void operator()(sycl::nd_item<3> Item) const {}
};

int TestMoveFunctor::MoveCtorCalls;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int TestMoveFunctor::MoveCtorCalls;
int TestMoveFunctor::MoveCtorCalls = 0;

IIUC, still need explicit zero here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 59 to 64
static constexpr const char *getFileName() { return "TestMoveFunctor.hpp"; }
static constexpr const char *getFunctionName() {
return "TestMoveFunctorFunctionName";
}
static constexpr unsigned getLineNumber() { return 13; }
static constexpr unsigned getColumnNumber() { return 8; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think

Suggested change
static constexpr const char *getFileName() { return "TestMoveFunctor.hpp"; }
static constexpr const char *getFunctionName() {
return "TestMoveFunctorFunctionName";
}
static constexpr unsigned getLineNumber() { return 13; }
static constexpr unsigned getColumnNumber() { return 8; }

would work just fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#ifndef __INTEL_PREVIEW_BREAKING_CHANGES
// This function can't be called from old user code, because there is no
// HostKernelRef in old user code. So, make it empty.
void InstantiateKernelOnHost() override {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put assert(false && "Should never be called") inside the function body?

const KernelType &MKernel;

public:
HostKernelRef(const KernelType &Kernel) : MKernel(Kernel) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete copy ctor here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to create HostKernelRef from constant reference, as in sycl/include/sycl/queue.hpp, so we can't.

  HostKernelRef<KernelType, KernelTypeUniversalRef, TransformedArgType, Dims>
      HostKernel(std::forward<KernelTypeUniversalRef>(KernelFunc));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean add HostKernelRef(const HostKernelRef&) = delete;

event __SYCL_EXPORT submit_kernel_direct_with_event_impl(
const queue &Queue, const nd_range<Dims> &Range,
std::shared_ptr<detail::HostKernelBase> &HostKernel,
detail::HostKernelRefBase &HostKernel,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the sycl/test/abi/sycl_symbols_linux.dump and the sycl/test/abi/sycl_symbols_windows.dump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants