Enhance comments in examples and READMEs

Toloka · Feb 6, 2023 · ddd252b · ddd252b
1 parent c94b591
commit ddd252b
Show file tree

Hide file tree

Showing 6 changed files with 36 additions and 38 deletions.
diff --git a/AUTHORS b/AUTHORS
@@ -1,6 +1,7 @@
-The following authors have created the source code of "crowd-kit" published and distributed by YANDEX LLC as the owner:
+The following authors have created the source code of "crowd-kit" published and distributed by Crowd-Kit team as the owner:
 
-Dmitry Ustalov dustalov@yandex-team.ru
+Dmitry Ustalov dustalov@toloka.ai
 Evgeny Tulin [email protected]
-Nikita Pavlichenko [email protected]
-Vladimir Losev [email protected]
+Nikita Pavlichenko [email protected]
+Vladimir Losev [email protected]
+Boris Tseitlin [email protected]
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,11 +1,8 @@
 # Notice to external contributors
 
-
 ## General info
 
-Hello! In order for us (YANDEX LLC) to accept patches and other contributions from you, you will have to adopt our Yandex Contributor License Agreement (the “**CLA**”). The current version of the CLA can be found here:
-1) https://yandex.ru/legal/cla/?lang=en (in English) and 
-2) https://yandex.ru/legal/cla/?lang=ru (in Russian).
+Hello! In order for us to accept patches and other contributions from you, you will have to adopt Yandex Contributor License Agreement (the “**CLA**”). The current version of the CLA can be found at https://yandex.ru/legal/cla/?lang=en.
 
 By adopting the CLA, you state the following:
 
@@ -22,14 +19,11 @@ If you agree with these principles, please read and adopt our CLA. By providing
 If you have already adopted terms and conditions of the CLA, you are able to provide your contributions. When you submit your first pull request, please add the following information into it:
 
 ```
-I hereby agree to the terms of the CLA available at: [link].
+I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en.
 ```
 
-Replace the bracketed text as follows:
-* [link] is the link to the current version of the CLA: https://yandex.ru/legal/cla/?lang=en (in English) or https://yandex.ru/legal/cla/?lang=ru (in Russian).
-
 It is enough to provide this notification only once.
 
 ## Other questions
 
-If you have any questions, please mail us at [email protected].
+If you have any questions, please mail us at [email protected].
diff --git a/LICENSE b/LICENSE
@@ -1,4 +1,4 @@
-Copyright 2020 YANDEX LLC
+Copyright 2020 Crowd-Kit team authors
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.

diff --git a/README.md b/README.md
@@ -139,4 +139,4 @@ Below is the list of currently implemented methods, including the already availa
 
 ## License
 
-© YANDEX LLC, 2020-2022. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
+&copy; Crowd-Kit team authors, 2020&ndash;2023. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
diff --git a/examples/Readability-Pairwise.ipynb b/examples/Readability-Pairwise.ipynb
@@ -1018,9 +1018,7 @@
    "source": [
     "We’re all set now. Let’s import the NDCG computation function from scikit-learn and use it to compute NDCG@10 values. Remember that NDCG tends to converge to 1 as k goes to infinity (Wang et al., 2013), and since our dataset has only 490 elements, we need to stick with a relatively small value of k=10. Feel free to experiment.\n",
     "\n",
-    "Having computed the NDCG@10 values for the three models we have (baseline, Bradley-Terry, and noisy Bradley-Terry) we found that the random baseline expectedly showed the worst performance. In contrast, the Bradley-Terry models demonstrated higher and similar scores. However, the simpler model outperformed the more complex one on this dataset. This means you can perform model selection even with crowdsourced data.\n",
-    "\n",
-    "As a final indicator of quality, let’s look at the rank correlations between predictions without limiting ourselves to the top-k items. We see that the Bradley-Terry models moderately correlate to each other and to the ground truth labels even though the granularity of the grades is different."
+    "Having computed the NDCG@10 values for the three models we have (baseline, Bradley-Terry, and noisy Bradley-Terry) we found that the random baseline expectedly showed the worst performance. In contrast, the Bradley-Terry models demonstrated higher and similar scores. However, the simpler model outperformed the more complex one on this dataset. This means you can perform model selection even with crowdsourced data."
    ]
   },
   {
@@ -1092,6 +1090,13 @@
     "ndcg_score(df_agg['noisybt_rank'].values.reshape(1, -1), df_agg['gt'].values.reshape(1, -1), k=10)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As a final indicator of quality, let’s look at the rank correlations between predictions without limiting ourselves to the top-k items using the Spearman's &rho; rank correlation coefficient. We see that the Bradley-Terry models moderately correlate to each other and to the ground truth labels even though the granularity of the grades is different."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 17,
@@ -1128,29 +1133,29 @@
        "    <tr>\n",
        "      <th>bt_rank</th>\n",
        "      <td>1.000000</td>\n",
-       "      <td>0.872997</td>\n",
-       "      <td>0.030279</td>\n",
-       "      <td>0.423757</td>\n",
+       "      <td>0.872988</td>\n",
+       "      <td>0.030259</td>\n",
+       "      <td>0.430876</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>noisybt_rank</th>\n",
-       "      <td>0.872997</td>\n",
+       "      <td>0.872988</td>\n",
        "      <td>1.000000</td>\n",
        "      <td>0.018814</td>\n",
-       "      <td>0.481291</td>\n",
+       "      <td>0.482654</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>random_rank</th>\n",
-       "      <td>0.030279</td>\n",
+       "      <td>0.030259</td>\n",
        "      <td>0.018814</td>\n",
        "      <td>1.000000</td>\n",
-       "      <td>-0.043134</td>\n",
+       "      <td>-0.036178</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>gt</th>\n",
-       "      <td>0.423757</td>\n",
-       "      <td>0.481291</td>\n",
-       "      <td>-0.043134</td>\n",
+       "      <td>0.430876</td>\n",
+       "      <td>0.482654</td>\n",
+       "      <td>-0.036178</td>\n",
        "      <td>1.000000</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
@@ -1159,10 +1164,10 @@
       ],
       "text/plain": [
        "               bt_rank  noisybt_rank  random_rank        gt\n",
-       "bt_rank       1.000000      0.872997     0.030279  0.423757\n",
-       "noisybt_rank  0.872997      1.000000     0.018814  0.481291\n",
-       "random_rank   0.030279      0.018814     1.000000 -0.043134\n",
-       "gt            0.423757      0.481291    -0.043134  1.000000"
+       "bt_rank       1.000000      0.872988     0.030259  0.430876\n",
+       "noisybt_rank  0.872988      1.000000     0.018814  0.482654\n",
+       "random_rank   0.030259      0.018814     1.000000 -0.036178\n",
+       "gt            0.430876      0.482654    -0.036178  1.000000"
       ]
      },
      "execution_count": 17,
@@ -1171,14 +1176,14 @@
     }
    ],
    "source": [
-    "df_agg[['bt_rank', 'noisybt_rank', 'random_rank', 'gt']].corr()"
+    "df_agg[['bt_rank', 'noisybt_rank', 'random_rank', 'gt']].corr(method='spearman')"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Suppose we want to export the aggregated data and use it in downstream applications. We put our aggregation results into a data frame and save them to a TSV file."
+    "Suppose we want to export the aggregated data and use it in downstream applications. We put our aggregation results into a data frame for later use."
    ]
   },
   {
@@ -1376,8 +1381,6 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The file should appear in the Files pane. If you open it, you’ll see that the data is there, as are the weights, and the ranks.\n",
-    "\n",
     "And there you have it: we obtained aggregated pairwise comparisons in a few lines of code and performed model selection using [Crowd-Kit](https://github.com/Toloka/crowd-kit) and commonly-used Python data science libraries."
    ]
   },

diff --git a/examples/TlkAgg-Categorical.ipynb b/examples/TlkAgg-Categorical.ipynb
@@ -444,7 +444,7 @@
    "source": [
     "In our experiment, the best quality was offered by the Dawid-Skene model. Having selected the model, we want to export all of the aggregated data, which makes sense in downstream applications.\n",
     "\n",
-    "We’ll now use pandas to save the whole aggregation results to a TSV file, after transforming the series to a data frame just to specify the desired column name.\n",
+    "We now transform the series to a data frame for later use by specifing the desired column name.\n",
     "\n",
     "Let’s take a look inside it. The data is here, the responses are here, and the aggregation results are also here."
    ]
@@ -570,7 +570,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We’ve obtained aggregated data in just a few lines of code."
+    "We’ve obtained aggregated data in just a few lines of code using [Crowd-Kit](https://github.com/Toloka/crowd-kit) and commonly-used Python data science libraries."
    ]
   },
   {
Original file line number	Diff line number	Diff line change
Expand Up		@@ -139,4 +139,4 @@ Below is the list of currently implemented methods, including the already availa

		## License

		© YANDEX LLC, 2020-2022. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
		© Crowd-Kit team authors, 2020–2023. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.