[PLAN] Improve performance with dimension compaction and indexer

1. Stop using ndloop and compute an operation with one CUDA kernel using indexer
2. Compact dimension to make computation of indexer fast

Element-wise (binary ops) is already done at https://github.com/sonots/cumo/pull/64.
But, reduction and others such as `store_from` are not yet done.

Without this, cumo (and red-chainer) can not compete with cupy (and chainer)

Current performance comparison on k80 machine:

* chainer mnist: 5 sec / epoch
* red-chainer mnist: 13 sec / epoch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PLAN] Improve performance with dimension compaction and indexer #105

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[PLAN] Improve performance with dimension compaction and indexer #105

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions