Skip to content

Commit

Permalink
Enhance comments in examples and READMEs
Browse files Browse the repository at this point in the history
  • Loading branch information
dustalov committed Feb 6, 2023
1 parent c94b591 commit ddd252b
Show file tree
Hide file tree
Showing 6 changed files with 36 additions and 38 deletions.
9 changes: 5 additions & 4 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
The following authors have created the source code of "crowd-kit" published and distributed by YANDEX LLC as the owner:
The following authors have created the source code of "crowd-kit" published and distributed by Crowd-Kit team as the owner:

Dmitry Ustalov dustalov@yandex-team.ru
Dmitry Ustalov dustalov@toloka.ai
Evgeny Tulin [email protected]
Nikita Pavlichenko [email protected]
Vladimir Losev [email protected]
Nikita Pavlichenko [email protected]
Vladimir Losev [email protected]
Boris Tseitlin [email protected]
12 changes: 3 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
# Notice to external contributors


## General info

Hello! In order for us (YANDEX LLC) to accept patches and other contributions from you, you will have to adopt our Yandex Contributor License Agreement (the “**CLA**”). The current version of the CLA can be found here:
1) https://yandex.ru/legal/cla/?lang=en (in English) and
2) https://yandex.ru/legal/cla/?lang=ru (in Russian).
Hello! In order for us to accept patches and other contributions from you, you will have to adopt Yandex Contributor License Agreement (the “**CLA**”). The current version of the CLA can be found at https://yandex.ru/legal/cla/?lang=en.

By adopting the CLA, you state the following:

Expand All @@ -22,14 +19,11 @@ If you agree with these principles, please read and adopt our CLA. By providing
If you have already adopted terms and conditions of the CLA, you are able to provide your contributions. When you submit your first pull request, please add the following information into it:

```
I hereby agree to the terms of the CLA available at: [link].
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en.
```

Replace the bracketed text as follows:
* [link] is the link to the current version of the CLA: https://yandex.ru/legal/cla/?lang=en (in English) or https://yandex.ru/legal/cla/?lang=ru (in Russian).

It is enough to provide this notification only once.

## Other questions

If you have any questions, please mail us at [email protected].
If you have any questions, please mail us at [email protected].
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2020 YANDEX LLC
Copyright 2020 Crowd-Kit team authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,4 @@ Below is the list of currently implemented methods, including the already availa

## License

© YANDEX LLC, 2020-2022. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
© Crowd-Kit team authors, 2020–2023. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
45 changes: 24 additions & 21 deletions examples/Readability-Pairwise.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1018,9 +1018,7 @@
"source": [
"We’re all set now. Let’s import the NDCG computation function from scikit-learn and use it to compute NDCG@10 values. Remember that NDCG tends to converge to 1 as k goes to infinity (Wang et al., 2013), and since our dataset has only 490 elements, we need to stick with a relatively small value of k=10. Feel free to experiment.\n",
"\n",
"Having computed the NDCG@10 values for the three models we have (baseline, Bradley-Terry, and noisy Bradley-Terry) we found that the random baseline expectedly showed the worst performance. In contrast, the Bradley-Terry models demonstrated higher and similar scores. However, the simpler model outperformed the more complex one on this dataset. This means you can perform model selection even with crowdsourced data.\n",
"\n",
"As a final indicator of quality, let’s look at the rank correlations between predictions without limiting ourselves to the top-k items. We see that the Bradley-Terry models moderately correlate to each other and to the ground truth labels even though the granularity of the grades is different."
"Having computed the NDCG@10 values for the three models we have (baseline, Bradley-Terry, and noisy Bradley-Terry) we found that the random baseline expectedly showed the worst performance. In contrast, the Bradley-Terry models demonstrated higher and similar scores. However, the simpler model outperformed the more complex one on this dataset. This means you can perform model selection even with crowdsourced data."
]
},
{
Expand Down Expand Up @@ -1092,6 +1090,13 @@
"ndcg_score(df_agg['noisybt_rank'].values.reshape(1, -1), df_agg['gt'].values.reshape(1, -1), k=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a final indicator of quality, let’s look at the rank correlations between predictions without limiting ourselves to the top-k items using the Spearman's ρ rank correlation coefficient. We see that the Bradley-Terry models moderately correlate to each other and to the ground truth labels even though the granularity of the grades is different."
]
},
{
"cell_type": "code",
"execution_count": 17,
Expand Down Expand Up @@ -1128,29 +1133,29 @@
" <tr>\n",
" <th>bt_rank</th>\n",
" <td>1.000000</td>\n",
" <td>0.872997</td>\n",
" <td>0.030279</td>\n",
" <td>0.423757</td>\n",
" <td>0.872988</td>\n",
" <td>0.030259</td>\n",
" <td>0.430876</td>\n",
" </tr>\n",
" <tr>\n",
" <th>noisybt_rank</th>\n",
" <td>0.872997</td>\n",
" <td>0.872988</td>\n",
" <td>1.000000</td>\n",
" <td>0.018814</td>\n",
" <td>0.481291</td>\n",
" <td>0.482654</td>\n",
" </tr>\n",
" <tr>\n",
" <th>random_rank</th>\n",
" <td>0.030279</td>\n",
" <td>0.030259</td>\n",
" <td>0.018814</td>\n",
" <td>1.000000</td>\n",
" <td>-0.043134</td>\n",
" <td>-0.036178</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gt</th>\n",
" <td>0.423757</td>\n",
" <td>0.481291</td>\n",
" <td>-0.043134</td>\n",
" <td>0.430876</td>\n",
" <td>0.482654</td>\n",
" <td>-0.036178</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
Expand All @@ -1159,10 +1164,10 @@
],
"text/plain": [
" bt_rank noisybt_rank random_rank gt\n",
"bt_rank 1.000000 0.872997 0.030279 0.423757\n",
"noisybt_rank 0.872997 1.000000 0.018814 0.481291\n",
"random_rank 0.030279 0.018814 1.000000 -0.043134\n",
"gt 0.423757 0.481291 -0.043134 1.000000"
"bt_rank 1.000000 0.872988 0.030259 0.430876\n",
"noisybt_rank 0.872988 1.000000 0.018814 0.482654\n",
"random_rank 0.030259 0.018814 1.000000 -0.036178\n",
"gt 0.430876 0.482654 -0.036178 1.000000"
]
},
"execution_count": 17,
Expand All @@ -1171,14 +1176,14 @@
}
],
"source": [
"df_agg[['bt_rank', 'noisybt_rank', 'random_rank', 'gt']].corr()"
"df_agg[['bt_rank', 'noisybt_rank', 'random_rank', 'gt']].corr(method='spearman')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Suppose we want to export the aggregated data and use it in downstream applications. We put our aggregation results into a data frame and save them to a TSV file."
"Suppose we want to export the aggregated data and use it in downstream applications. We put our aggregation results into a data frame for later use."
]
},
{
Expand Down Expand Up @@ -1376,8 +1381,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The file should appear in the Files pane. If you open it, you’ll see that the data is there, as are the weights, and the ranks.\n",
"\n",
"And there you have it: we obtained aggregated pairwise comparisons in a few lines of code and performed model selection using [Crowd-Kit](https://github.com/Toloka/crowd-kit) and commonly-used Python data science libraries."
]
},
Expand Down
4 changes: 2 additions & 2 deletions examples/TlkAgg-Categorical.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@
"source": [
"In our experiment, the best quality was offered by the Dawid-Skene model. Having selected the model, we want to export all of the aggregated data, which makes sense in downstream applications.\n",
"\n",
"We’ll now use pandas to save the whole aggregation results to a TSV file, after transforming the series to a data frame just to specify the desired column name.\n",
"We now transform the series to a data frame for later use by specifing the desired column name.\n",
"\n",
"Let’s take a look inside it. The data is here, the responses are here, and the aggregation results are also here."
]
Expand Down Expand Up @@ -570,7 +570,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We’ve obtained aggregated data in just a few lines of code."
"We’ve obtained aggregated data in just a few lines of code using [Crowd-Kit](https://github.com/Toloka/crowd-kit) and commonly-used Python data science libraries."
]
},
{
Expand Down

0 comments on commit ddd252b

Please sign in to comment.