Skip to content

[PLAN] Improve performance with dimension compaction and indexer #105

@sonots

Description

@sonots
  1. Stop using ndloop and compute an operation with one CUDA kernel using indexer
  2. Compact dimension to make computation of indexer fast

Element-wise (binary ops) is already done at #64.
But, reduction and others such as store_from are not yet done.

Without this, cumo (and red-chainer) can not compete with cupy (and chainer)

Current performance comparison on k80 machine:

  • chainer mnist: 5 sec / epoch
  • red-chainer mnist: 13 sec / epoch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions