Skip to content

[NeurIPS'25] Official implementation of "Doodle to Detect: A Goofy but Powerful Approach to Skeleton-based Hand Gesture Recognition".

License

Notifications You must be signed in to change notification settings

capableofanything/SKETCH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SKETCH

This is the official implementation of "Doodle to Detect: A Goofy but Powerful Approach to Skeleton-based Hand Gesture Recognition".

NeurIPS 2025 License

Table of Contents


Overview

SKETCH is a novel skeleton-based hand gesture recognition framework that directly transforms raw 4D skeleton sequences into interpretable visual graph representations. It introduces a learnable Dynamic Range Embedding (DRE) to preserve motion magnitudes and applies per-window normalization for robustness to viewpoint and body variations. Leveraging pre-trained vision backbones, SKETCH achieves state-of-the-art accuracy and generalization on SHREC’19 and SHREC’22 benchmarks.

SKETCH overview / pipeline

Figure 1. Overall architecture of the SKETCH.


Development Progress

Core Implementation

  • Demo script
  • Add training pipeline (SHREC'22)
  • Release pretrained model (SHREC'22)
  • Add post processing pipeline (SHREC'22)

Results

SKETCH overview / pipeline

Figure 2. Visualization of joint-level attention for gestures in the SHREC’22 dataset.

Comparison on SHREC’19

Method DR↑ FP↓ Time (s) FPS JCD FD Adj Ref Plot
PSUMNet 0.64 0.22 0.0250 40
MS-G3D 0.69 0.25 0.0303 33
SeS-GCN 0.75 0.12 0.0020 500
SW 3-cent 0.76 0.19 0.0030 333
DSTA 0.81 0.08 0.0088 114
DG-STA 0.81 0.07 0.0042 238
DDNet 0.82 0.10 0.0022 455
uDeepGRU 0.85 0.10 0.0030 333
OO-dMVMT 0.88 0.05 0.0058 172
DS-GCN 0.80 0.05 - -
BlockGCN 0.83 0.04 - -
ProtoGCN 0.86 0.05 0.0334 30
SKETCH (V-L-16-384) 0.90 0.03 0.0176 57
SKETCH (S-S-4-7-224) 0.88 0.04 0.0039 256
SKETCH (S-B-4-12-384) 0.91 0.03 0.0091 110
SKETCH (S-L-16-384) 0.92 0.02 0.0142 70
SKETCH (S-L-16-384) + Aug 0.93 0.01 0.0142 70

Comparison on SHREC’22

Method DR↑ FP↓ JI↑ Delay (fr.) Time (s) JCD FD Adj Plot
DeepGRU 0.26 0.25 0.21 8.0 0.0031
DG-STA 0.51 0.32 0.40 8.0 0.0042
SeS-GCN 0.60 0.16 0.53 8.0 0.0018
PSUMNet 0.62 0.24 0.52 8.0 0.0244
MS-G3D 0.68 0.21 0.57 8.0 0.0293
Stronger 0.72 0.34 0.59 14.8 0.1000
DSTA 0.73 0.24 0.61 8.0 0.0092
2ST-GCN+5F 0.74 0.23 0.61 13.3 0.0021
TN-FSM+JD 0.77 0.23 0.63 10.0 0.0046
Causal TCN 0.80 0.29 0.68 19.0 0.0280
DDNet 0.88 0.16 0.78 8.0 0.0022
OO-dMVMT 0.92 0.09 0.85 8.0 0.0041
SKETCH (S-B-4-12-384) 0.91 0.06 0.86 8.0 0.0097
SKETCH (S-L-4-12-384) 0.92 0.07 0.87 8.0 0.0124
SKETCH (S-L-4-12-384) + Aug 0.95 0.06 0.91 8.0 0.0124

Installation

conda create -n sketch python=3.7 -y
conda activate sketch
git clone https://github.com/capableofanything/SKETCH.git
cd SKETCH
pip install -r requirements.txt

Dataset Structure

SHREC'22

The dataset should be organized as follows:

SKETCH/
└── shrec22/
    └── SHREC2022/        # SHREC'22 dataset root
        ├── shrec2022_training_set/
        │   ├── annotations.txt
        │   ├── 1.txt
        │   ├── 2.txt
        │   └── ...
        └── shrec2022_test_set/
            ├── annotations.txt
            ├── 1.txt
            ├── 2.txt
            └── ...

Demo Script

The demo.py script generates a small sample of SHREC'22 windows and corresponding stacked coordinate images.

python demo.py --num_files 2 --window 16 --stride 1

Full Pipeline Execution

To reproduce the full SHREC'22 workflow, run the following scripts in order:

python prepare_window/shrec22_prepare_window.py
python draw/shrec22_draw_3stack.py
python train.py
python post_processing\shrec22\post_processing.py

Pretrained Models

We provide pretrained weights for SHREC'22.
You can download them from the following link:

🔗 Download Pretrained Models (Google Drive)

Community & Contribution

We encourage everyone to try applying SKETCH to other human action recognition tasks
and share your results or insights with the community!

If you adapt or extend our framework, feel free to open an issue,
start a discussion, or submit a pull request — we’d love to hear from you.


Citation

@inproceedings{handoodle,
  title={Doodle to Detect: A Goofy but Powerful Approach to Skeleton-based Hand Gesture Recognition},
  author={Han, Sang Hoon and Lee, Seonho and Nam, Hyeok and Park, Jae Hyeon and Cha, Min Hee and Kim, Min Geol and Lee, Hyunse and Ahn, Sangyeon and Cho, Sung In and others},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
}

License

This project is licensed under the MIT License.


Contact

For questions, collaborations, or feedback, please contact:
SangHoon Han (Co-first Authors) — [email protected]
Seonho Lee (Co-first Authors) — [email protected]
Hyeok Nam (Co-first Authors) — [email protected]
Sung In Cho (Corresponding Author) — [email protected]

About

[NeurIPS'25] Official implementation of "Doodle to Detect: A Goofy but Powerful Approach to Skeleton-based Hand Gesture Recognition".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages