Skip to content

Commit

Permalink
deploy: 1d8ff03
Browse files Browse the repository at this point in the history
  • Loading branch information
JesperDramsch committed Feb 11, 2024
1 parent 9e12c24 commit aba1832
Show file tree
Hide file tree
Showing 9 changed files with 181 additions and 152 deletions.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
40 changes: 33 additions & 7 deletions _sources/notebooks/5-interpretability.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -383,21 +383,33 @@
"\n",
"We have the 3 features and how varying these changes the impact in predicting a specific class.\n",
"\n",
"Interestingly, we can see that the Culmen length for [A] is smaller, because larger values reduce the partial dependence , [B] however seems to have a larger Culmen length and [C] is almost unaffected by this feature!\n",
"Interestingly, we can see that the Culmen length for Adelie is smaller, because larger values reduce the partial dependence, Chinstrap penguins however seem to have a larger Culmen length, and Gentoo is almost unaffected by this feature!\n",
"\n",
"Similarly only [C] seems to have larger Flippers, whereas smaller flippers have a lower partial dependence for large values.\n",
"Similarly only Gentoo seems to have larger Flippers, whereas smaller flippers have a lower partial dependence for large values.\n",
"\n",
"I'm not a penguin expert, I just find them adorable, and I'm able to glean this interpretable information from the plots.\n",
"\n",
"I think is a great tool!"
"I'm not a penguin expert, I just find them adorable, and I'm able to glean this interpretable information from the plots. I think is a great tool! 🐧"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature importances with Tree importance vs Permutation importance\n",
"\n"
"\n",
"Understanding feature importance is crucial in machine learning, as it helps us identify which features have the most significant impact on model predictions. \n",
"\n",
"Two standard methods for assessing feature importance are Tree Importance and Permutation Importance.\n",
"Tree Importance, usually associated with tree-based models like random forests, calculates feature importances based on how frequently a feature is used to split nodes in the trees. It's a counting exercise.\n",
"\n",
"Features frequently selected for splitting are considered more important because they contribute more to the model's predictive performance. One benefit of Tree Importance is its computational efficiency, as feature importance can be readily obtained by training. However, Tree Importance may overestimate the importance of correlated features, features with high cardinality and randomness, and features that struggle with feature interactions.\n",
"\n",
"On the other hand, Permutation Importance assesses feature importance by measuring the decrease in model performance when the values of a feature are randomly shuffled. Features that, when shuffled, lead to a significant decrease in model performance are deemed more important. Permutation Importance is model-agnostic and can be applied to any type of model, making it versatile and applicable in various scenarios. Additionally, Permutation Importance accounts for feature interactions and is less biased by correlated features. However, it is computationally more expensive, especially for models with large numbers of features or complex interactions.\n",
"\n",
"People are interested in feature importances for several reasons. Firstly, feature importances provide insights into the underlying relationships between features and the target variable, aiding in feature selection and dimensionality reduction. \n",
"\n",
"Moreover, understanding feature importances helps researchers and practitioners interpret model predictions and identify potential areas for improvement or further investigation. Feature importances can also inform domain experts and stakeholders about which features are driving model decisions, enhancing transparency and trust in machine learning systems.\n",
"\n",
"We'll start out by training a different type of model in this section, a standard Random Forest. Then we can directly compare the tree-based feature importnace with permutation importances. The data split from [the Data notebook](/notebooks/0-basic-data-prep-and-model.html) we established earlier remains the same and the pre-processing is also the same, despite Random Forests dealing with non-normalised data well."
]
},
{
Expand Down Expand Up @@ -437,13 +449,22 @@
"\n",
"rf = Pipeline(steps=[\n",
" ('preprocessor', preprocessor),\n",
" ('classifier', RandomForestClassifier()),\n",
" ('classifier', RandomForestClassifier(random_state=42)),\n",
"])\n",
"\n",
"rf.fit(X_train, y_train)\n",
"rf.score(X_test, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can simply plot the feature importances obtained from training the model.\n",
"\n",
"These will always be slightly different, due to the training process of Random Forests on randomly selected subsets of the data."
]
},
{
"cell_type": "code",
"execution_count": 10,
Expand Down Expand Up @@ -473,6 +494,11 @@
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 11,
Expand Down
132 changes: 66 additions & 66 deletions notebooks/0-basic-data-prep-and-model.html
Original file line number Diff line number Diff line change
Expand Up @@ -984,39 +984,39 @@ <h2>Machine Learning<a class="headerlink" href="#machine-learning" title="Permal
</thead>
<tbody>
<tr>
<th>326</th>
<td>48.1</td>
<td>16.4</td>
<td>199.0</td>
<td>FEMALE</td>
</tr>
<tr>
<th>225</th>
<td>46.5</td>
<td>14.8</td>
<td>217.0</td>
<th>38</th>
<td>37.6</td>
<td>19.3</td>
<td>181.0</td>
<td>FEMALE</td>
</tr>
<tr>
<th>289</th>
<td>52.0</td>
<th>93</th>
<td>39.6</td>
<td>18.1</td>
<td>201.0</td>
<td>186.0</td>
<td>MALE</td>
</tr>
<tr>
<th>180</th>
<td>48.2</td>
<td>14.3</td>
<td>210.0</td>
<th>152</th>
<td>46.1</td>
<td>13.2</td>
<td>211.0</td>
<td>FEMALE</td>
</tr>
<tr>
<th>90</th>
<td>35.7</td>
<td>18.0</td>
<td>202.0</td>
<td>FEMALE</td>
<th>209</th>
<td>45.5</td>
<td>15.0</td>
<td>220.0</td>
<td>MALE</td>
</tr>
<tr>
<th>161</th>
<td>46.8</td>
<td>15.4</td>
<td>215.0</td>
<td>MALE</td>
</tr>
<tr>
<th>...</th>
Expand All @@ -1026,39 +1026,39 @@ <h2>Machine Learning<a class="headerlink" href="#machine-learning" title="Permal
<td>...</td>
</tr>
<tr>
<th>324</th>
<td>51.5</td>
<td>18.7</td>
<td>187.0</td>
<th>91</th>
<td>41.1</td>
<td>18.1</td>
<td>205.0</td>
<td>MALE</td>
</tr>
<tr>
<th>147</th>
<td>36.6</td>
<td>18.4</td>
<td>184.0</td>
<th>183</th>
<td>42.8</td>
<td>14.2</td>
<td>209.0</td>
<td>FEMALE</td>
</tr>
<tr>
<th>73</th>
<td>45.8</td>
<td>18.9</td>
<td>197.0</td>
<td>MALE</td>
<th>286</th>
<td>46.6</td>
<td>17.8</td>
<td>193.0</td>
<td>FEMALE</td>
</tr>
<tr>
<th>114</th>
<td>39.6</td>
<td>20.7</td>
<td>191.0</td>
<th>337</th>
<td>46.8</td>
<td>16.5</td>
<td>189.0</td>
<td>FEMALE</td>
</tr>
<tr>
<th>143</th>
<td>40.7</td>
<td>17.0</td>
<td>190.0</td>
<td>MALE</td>
<th>330</th>
<td>42.5</td>
<td>17.3</td>
<td>187.0</td>
<td>FEMALE</td>
</tr>
</tbody>
</table>
Expand Down Expand Up @@ -1095,48 +1095,48 @@ <h2>Machine Learning<a class="headerlink" href="#machine-learning" title="Permal
</thead>
<tbody>
<tr>
<th>326</th>
<td>Chinstrap penguin (Pygoscelis antarctica)</td>
<th>38</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
</tr>
<tr>
<th>225</th>
<td>Gentoo penguin (Pygoscelis papua)</td>
<th>93</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
</tr>
<tr>
<th>289</th>
<td>Chinstrap penguin (Pygoscelis antarctica)</td>
<th>152</th>
<td>Gentoo penguin (Pygoscelis papua)</td>
</tr>
<tr>
<th>180</th>
<th>209</th>
<td>Gentoo penguin (Pygoscelis papua)</td>
</tr>
<tr>
<th>90</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
<th>161</th>
<td>Gentoo penguin (Pygoscelis papua)</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
</tr>
<tr>
<th>324</th>
<td>Chinstrap penguin (Pygoscelis antarctica)</td>
<th>91</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
</tr>
<tr>
<th>147</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
<th>183</th>
<td>Gentoo penguin (Pygoscelis papua)</td>
</tr>
<tr>
<th>73</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
<th>286</th>
<td>Chinstrap penguin (Pygoscelis antarctica)</td>
</tr>
<tr>
<th>114</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
<th>337</th>
<td>Chinstrap penguin (Pygoscelis antarctica)</td>
</tr>
<tr>
<th>143</th>
<td>Adelie Penguin (Pygoscelis adeliae)</td>
<th>330</th>
<td>Chinstrap penguin (Pygoscelis antarctica)</td>
</tr>
</tbody>
</table>
Expand Down Expand Up @@ -1264,7 +1264,7 @@ <h3>Model Training<a class="headerlink" href="#model-training" title="Permalink
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>0.9914163090128756
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>0.9871244635193133
</pre></div>
</div>
</div>
Expand All @@ -1278,7 +1278,7 @@ <h3>Model Training<a class="headerlink" href="#model-training" title="Permalink
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>1.0
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>0.9900990099009901
</pre></div>
</div>
</div>
Expand Down
4 changes: 2 additions & 2 deletions notebooks/1-model-evaluation.html
Original file line number Diff line number Diff line change
Expand Up @@ -1146,8 +1146,8 @@ <h2><span class="section-number">1.3.5. </span>Choosing the appropriate Evaluati
</div>
</div>
<div class="cell_output docutils container">
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>{&#39;fit_time&#39;: array([0.00643802, 0.0052402 , 0.00526881, 0.00526285, 0.0052402 ]),
&#39;score_time&#39;: array([0.00420523, 0.0040257 , 0.00399923, 0.00399208, 0.0042634 ]),
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>{&#39;fit_time&#39;: array([0.00581956, 0.00523829, 0.00519466, 0.00517082, 0.00550675]),
&#39;score_time&#39;: array([0.00410533, 0.00397706, 0.00397515, 0.00400424, 0.00418091]),
&#39;test_MCC&#39;: array([0.37796447, 0.27863911, 0.40824829, 0.02424643, 0.08625819]),
&#39;test_ACC&#39;: array([0.73333333, 0.7 , 0.76666667, 0.66666667, 0.62068966])}
</pre></div>
Expand Down
Loading

0 comments on commit aba1832

Please sign in to comment.