Add NetEaseCrowd dataset #101

shenxiangzhuang · 2024-03-12T08:00:34Z

Checklist

I have read the CONTRIBUTING document
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

Dataset info

Adding our open-source dataset, NetEaseCrowd(https://github.com/fuxiAIlab/NetEaseCrowd-Dataset).

NetEaseCrowd is a large-scale dataset for long-term and online crowdsourcing truth inference, which contains about 2,400 workers, 1,000,000 tasks, and 6,000,000 annotations collected over 6 months. We believe that this dataset could be an invaluable asset to the Toloka/crowd-kit community by providing a new benchmark for crowdsourcing-related research and development.

codecov-commenter · 2024-03-12T08:08:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.96%. Comparing base (07c4240) to head (08440a2).
Report is 34 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #101      +/-   ##
==========================================
+ Coverage   92.80%   92.96%   +0.15%     
==========================================
  Files          47       47              
  Lines        2070     2216     +146     
==========================================
+ Hits         1921     2060     +139     
- Misses        149      156       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pilot7747

Hi @shenxiangzhuang! Thank you for contributing this dataset. Lgtm

shenxiangzhuang · 2024-03-12T08:11:46Z

Besides the CI test, I also tested to use this dataset do categorical aggregation and it works well:

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

df, gt = load_dataset('netease_crowd')

ds = DawidSkene(10)
result = ds.fit_predict(df)

print(len(result))
# 999799

dustalov

Thank you for a very well-done PR! I noticed a small imperfection in the dataset metadata. Could you please check my suggestion?

crowdkit/datasets/_loaders.py

Co-authored-by: Dmitry Ustalov <[email protected]>

shenxiangzhuang · 2024-03-12T11:37:04Z

Thank you for a very well-done PR! I noticed a small imperfection in the dataset metadata. Could you please check my suggestion?

Thanks a lot for your carefully review!

dustalov · 2024-03-12T11:48:44Z

Great job, thank you again!

shenxiangzhuang added 2 commits March 12, 2024 15:55

add: netease_crowd dataset

af3501c

change: revert the debug setting

08440a2

shenxiangzhuang requested review from dustalov, pilot7747, aliskin, denaxen, varfolomeii, alexdrydew, Pocoder, vlad-mois and DrhF as code owners March 12, 2024 08:00

shenxiangzhuang mentioned this pull request Mar 12, 2024

Add the dataset to crowd-kit fuxiAIlab/NetEaseCrowd-Dataset#5

Closed

pilot7747 approved these changes Mar 12, 2024

View reviewed changes

dustalov approved these changes Mar 12, 2024

View reviewed changes

crowdkit/datasets/_loaders.py Outdated Show resolved Hide resolved

Fix description spaces

d4674eb

Co-authored-by: Dmitry Ustalov <[email protected]>

dustalov merged commit ec05dcc into Toloka:main Mar 12, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NetEaseCrowd dataset #101

Add NetEaseCrowd dataset #101

shenxiangzhuang commented Mar 12, 2024

codecov-commenter commented Mar 12, 2024

pilot7747 left a comment

shenxiangzhuang commented Mar 12, 2024

dustalov left a comment

shenxiangzhuang commented Mar 12, 2024

dustalov commented Mar 12, 2024

Add NetEaseCrowd dataset #101

Add NetEaseCrowd dataset #101

Conversation

shenxiangzhuang commented Mar 12, 2024

Checklist

Dataset info

codecov-commenter commented Mar 12, 2024

Codecov Report

pilot7747 left a comment

Choose a reason for hiding this comment

shenxiangzhuang commented Mar 12, 2024

dustalov left a comment

Choose a reason for hiding this comment

shenxiangzhuang commented Mar 12, 2024

dustalov commented Mar 12, 2024