Skip to content

Commit

Permalink
📓 doc
Browse files Browse the repository at this point in the history
  • Loading branch information
karbartolome committed May 10, 2024
1 parent 8b5dc48 commit 544d2fa
Show file tree
Hide file tree
Showing 37 changed files with 1,371 additions and 11,465 deletions.
1,951 changes: 993 additions & 958 deletions 01_ml/01_clasificacion/01_calibracion/01_calibration_slides.html

Large diffs are not rendered by default.

35 changes: 18 additions & 17 deletions 01_ml/01_clasificacion/01_calibracion/01_calibration_slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,12 @@ institute: |
**CIMBAGE (IADCOM)** - Facultad Ciencias Económicas (UBA)
date: 2024-05-13
bibliography: bib.bib
nocite: |
@*
format:
revealjs:
theme: [default, custom_testing.scss]
theme: [default, custom.scss]
logo: logo-uba.png
footer: |
<body>
Expand Down Expand Up @@ -101,7 +104,7 @@ from sklearn.metrics import (
from IPython.display import display, Markdown
# Funciones adicionales:
from custom_functions import (
from functions import (
plot_distribution,
plot_calibration_models,
plot_calibration,
Expand Down Expand Up @@ -148,7 +151,7 @@ color_verde_claro = "#BDCBCC"

## Machine Learning
::: {style="font-size: 50%;"}
En general al hablar de machine learning se hace referencia a diversos tipos de modelos. En la @fig-ml-types se muestran los distintos tipos de modelos de aprendizaje automático.
En general al hablar de **aprendizaje automático** se hace referencia a diversos tipos de modelos. En la @fig-ml-types se muestra una clasificaicón comunmente utilizada^[Para una introducción a este tipo de modelos, ver @james2023introduction].

En esta presentación se analizará el caso particular de los [modelos de clasificación]{style="color: blue"}, haciendo énfasis en la [Estimación de probabilidad]{style="color: blue"}.
```{mermaid}
Expand Down Expand Up @@ -210,6 +213,8 @@ Se busca [predecir la ocurrencia de un evento]{style="color: blue"} a partir de

▪️**Iris**: Clasificación de especies de plantas

▪️**Breast Cancer Wisconsin (Diagnostic)**: Clasificación de tumores benignos o malignos

▪️**Titanic**: Clasificación de individuos en sobrevivientes y no sobrevivientes

::: {.fragment .highlight-red}
Expand All @@ -229,7 +234,7 @@ Se busca [predecir la ocurrencia de un evento]{style="color: blue"} a partir de
#| echo: true
#| code-fold: true
#| code-summary: "Código"
PATH_DATA = 'https://raw.githubusercontent.com/ayseceyda/german-credit-gini-analysis/master/gc.csv'
PATH_DATA = 'https://raw.githubusercontent.com/karbartolome/workshops/main/01_ml/01_clasificacion/01_calibracion/df_german_credit.csv'
TARGET = "Risk"
# pd.read_csv(PATH_DATA).to_csv('df_german_credit.csv', index=False)
df = (pd.read_csv('df_german_credit.csv')
Expand All @@ -238,7 +243,7 @@ df = (pd.read_csv('df_german_credit.csv')
)
```

Se cuenta con un dataset de `{python} df.shape[0]` observaciones y `{python} df.shape[1]` variables.
Se cuenta con un dataset de `{python} df.shape[0]` observaciones y `{python} df.shape[1]` variables^[Fuente de los datos: [Statlog (German Credit Data)](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data), en Kaggle en un formato más simple: [German Credit Risk - With Target](https://www.kaggle.com/datasets/kabure/german-credit-data-with-risk)].

```{python}
#| echo: true
Expand Down Expand Up @@ -863,7 +868,7 @@ Un clasificador binario bien calibrado debería clasificar de forma tal que para
::: {style="font-size: 50%;"}
## Calibración sigmoide (Platt scaling)

Este método asume una relación logística entre los scores (z) y la probabilidad real (p):
Este método [@platt2000] asume una relación logística entre los scores (z) y la probabilidad real (p):

> $log(\frac{p}{1-p})=\alpha+\beta(z_{i})$
>
Expand All @@ -880,11 +885,6 @@ Se estiman 2 parámetros ($\alpha$ y $\beta$), como en una regresión logística

- Es útil cuando el modelo no calibrado tiene errores similares en predicción de valores bajos y altos

<br>
**Referencias:**

- Platt, John. (2000). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Adv. Large Margin Classif, Volume 10 (pp. 61-74).

:::

::: {style="font-size: 50%;"}
Expand Down Expand Up @@ -967,7 +967,9 @@ for i in models_dict.keys():
::: {style="font-size: 50%;"}
## Calibración isotónica

Se ajusta un regresor isotónico no paramétrico, que produce una función escalonada no decreciente:
@Zadrozny2002 proponen un método alternativo al propuesto por @platt2000: calibrar probabilidades mediante el uso de un método de regresión no paramétrico (**regresión isotónica**).

Si se asume que el modelo ordena bien, el mappeo de `scores` a probabilidades es `no decreciente`. De esta forma, este método genera una función escalonada no decreciente:

> $\sum_{i=1}^{n}(y_i - \hat{f_i})^2$
>
Expand Down Expand Up @@ -1118,19 +1120,18 @@ Esto es útil para la toma de decisiones, ya que permite establecer punto de cor

::: {style="font-size: 50%;"}
## Referencias / Recursos

[Probability Calibration Workshop - PyData 2020](https://www.youtube.com/watch?v=A1NGGV3Z4m4&list=PLeVfk5xTWHYBw22D52etymvcpxey4QFIk&ab_channel=numeristical)

::: {#refs}
:::
:::


## Contacto

{{< fa brands linkedin size=1x >}} [karinabartolome](https://www.linkedin.com/in/karinabartolome/)

{{< fa brands twitter size=1x >}} [@karbartolome](https://twitter.com/karbartolome)
{{< fa brands twitter size=1x >}} [karbartolome](https://twitter.com/karbartolome)

{{< fa brands github size=1x >}} [@karbartolome](http://github.com/karbartolome)
{{< fa brands github size=1x >}} [karbartolome](http://github.com/karbartolome)

{{< fa link >}} [Blog](https://karbartolome-blog.netlify.com)

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions 01_ml/01_clasificacion/01_calibracion/bib.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Calibración sigmoide
@article{platt2000,
author = {Platt, John},
year = {2000},
month = {06},
pages = {},
title = {Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods},
volume = {10},
journal = {Adv. Large Margin Classif.}
}

# Calibración isotónica
@article{Zadrozny2002,
author = {Zadrozny, Bianca and Elkan, Charles},
year = {2002},
month = {08},
pages = {},
title = {Transforming Classifier Scores into Accurate Multiclass Probability Estimates},
journal = {Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
doi = {10.1145/775047.775151}
}

# Beta calibration
@article{kull2017,
author = {Meelis Kull and Telmo M. Silva Filho and Peter Flach},
title = {{Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration}},
volume = {11},
journal = {Electronic Journal of Statistics},
number = {2},
publisher = {Institute of Mathematical Statistics and Bernoulli Society},
pages = {5052 -- 5080},
keywords = {Beta distribution, Binary classification, classifier calibration, logistic function, posterior probabilities, sigmoid},
year = {2017},
doi = {10.1214/17-EJS1338SI},
URL = {https://doi.org/10.1214/17-EJS1338SI}
}

# Recomendaciones
@online{pydata2020,
title = {Probability Calibration Workshop},
date = {2020},
organization = {PyData},
author = {Lucena, Brian},
url = {https://www.youtube.com/watch?v=A1NGGV3Z4m4&list=PLeVfk5xTWHYBw22D52etymvcpxey4QFIk&ab_channel=numeristical},
}

@book{james2023introduction,
added-at = {2023-11-14T15:31:56.000+0100},
address = {Cham},
author = {James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert and Taylor, Jonathan},
biburl = {https://www.bibsonomy.org/bibtex/22f0ef5cea78a278ad410067c5f36ef69/jaeschke},
doi = {10.1007/978-3-031-38747-0},
interhash = {9a8a7486348d9be69907361b87dbe6f4},
intrahash = {2f0ef5cea78a278ad410067c5f36ef69},
isbn = {978-3-031-38746-3},
keywords = {book learning python statistics},
publisher = {Springer},
series = {Springer Texts in Statistics},
timestamp = {2023-11-14T15:31:56.000+0100},
title = {An Introduction to Statistical Learning with Applications in Python},
url = {https://link.springer.com/book/10.1007/978-3-031-38747-0},
year = 2023
}
Loading

0 comments on commit 544d2fa

Please sign in to comment.