📓 doc

karbartolome · May 10, 2024 · 544d2fa · 544d2fa
1 parent 8b5dc48
commit 544d2fa
Show file tree

Hide file tree

Showing 37 changed files with 1,371 additions and 11,465 deletions.
diff --git a/01_ml/01_clasificacion/01_calibracion/01_calibration_slides.html b/01_ml/01_clasificacion/01_calibracion/01_calibration_slides.html
diff --git a/01_ml/01_clasificacion/01_calibracion/01_calibration_slides.qmd b/01_ml/01_clasificacion/01_calibracion/01_calibration_slides.qmd
@@ -14,9 +14,12 @@ institute: |
   **CIMBAGE (IADCOM)** - Facultad Ciencias Económicas (UBA)
 
 date: 2024-05-13
+bibliography: bib.bib
+nocite: |
+  @*
 format: 
     revealjs:
-        theme: [default, custom_testing.scss]
+        theme: [default, custom.scss]
         logo: logo-uba.png
         footer: |
             <body>
@@ -101,7 +104,7 @@ from sklearn.metrics import (
 from IPython.display import display, Markdown
 
 # Funciones adicionales: 
-from custom_functions import (
+from functions import (
     plot_distribution,
     plot_calibration_models,
     plot_calibration,
@@ -148,7 +151,7 @@ color_verde_claro = "#BDCBCC"
 
 ## Machine Learning
 ::: {style="font-size: 50%;"}
-En general al hablar de machine learning se hace referencia a diversos tipos de modelos. En la @fig-ml-types se muestran los distintos tipos de modelos de aprendizaje automático.
+En general al hablar de **aprendizaje automático** se hace referencia a diversos tipos de modelos. En la @fig-ml-types se muestra una clasificaicón comunmente utilizada^[Para una introducción a este tipo de modelos, ver @james2023introduction].
 
 En esta presentación se analizará el caso particular de los [modelos de clasificación]{style="color: blue"}, haciendo énfasis en la [Estimación de probabilidad]{style="color: blue"}. 
 ```{mermaid}
@@ -210,6 +213,8 @@ Se busca [predecir la ocurrencia de un evento]{style="color: blue"} a partir de
 
 ▪️**Iris**: Clasificación de especies de plantas
 
+▪️**Breast Cancer Wisconsin (Diagnostic)**: Clasificación de tumores benignos o malignos
+
 ▪️**Titanic**: Clasificación de individuos en sobrevivientes y no sobrevivientes
 
 ::: {.fragment .highlight-red}
@@ -229,7 +234,7 @@ Se busca [predecir la ocurrencia de un evento]{style="color: blue"} a partir de
 #| echo: true
 #| code-fold: true
 #| code-summary: "Código"
-PATH_DATA = 'https://raw.githubusercontent.com/ayseceyda/german-credit-gini-analysis/master/gc.csv'
+PATH_DATA = 'https://raw.githubusercontent.com/karbartolome/workshops/main/01_ml/01_clasificacion/01_calibracion/df_german_credit.csv'
 TARGET = "Risk"
 # pd.read_csv(PATH_DATA).to_csv('df_german_credit.csv', index=False)
 df = (pd.read_csv('df_german_credit.csv')
@@ -238,7 +243,7 @@ df = (pd.read_csv('df_german_credit.csv')
 )
 ```
 
-Se cuenta con un dataset de `{python} df.shape[0]` observaciones y `{python} df.shape[1]` variables.
+Se cuenta con un dataset de `{python} df.shape[0]` observaciones y `{python} df.shape[1]` variables^[Fuente de los datos: [Statlog (German Credit Data)](https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data), en Kaggle en un formato más simple: [German Credit Risk - With Target](https://www.kaggle.com/datasets/kabure/german-credit-data-with-risk)].
 
 ```{python}
 #| echo: true
@@ -863,7 +868,7 @@ Un clasificador binario bien calibrado debería clasificar de forma tal que para
 ::: {style="font-size: 50%;"}
 ## Calibración sigmoide (Platt scaling)
 
-Este método asume una relación logística entre los scores (z) y la probabilidad real (p):
+Este método [@platt2000] asume una relación logística entre los scores (z) y la probabilidad real (p):
 
 > $log(\frac{p}{1-p})=\alpha+\beta(z_{i})$ 
 >
@@ -880,11 +885,6 @@ Se estiman 2 parámetros ($\alpha$ y $\beta$), como en una regresión logística
 
 - Es útil cuando el modelo no calibrado tiene errores similares en predicción de valores bajos y altos 
 
-<br>
-**Referencias:**
-
-- Platt, John. (2000). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Adv. Large Margin Classif, Volume 10 (pp. 61-74).
-
 :::
 
 ::: {style="font-size: 50%;"}
@@ -967,7 +967,9 @@ for i in models_dict.keys():
 ::: {style="font-size: 50%;"}
 ## Calibración isotónica
 
-Se ajusta un regresor isotónico no paramétrico, que produce una función escalonada no decreciente:
+@Zadrozny2002 proponen un método alternativo al propuesto por @platt2000: calibrar probabilidades mediante el uso de un método de regresión no paramétrico (**regresión isotónica**). 
+
+Si se asume que el modelo ordena bien, el mappeo de `scores` a probabilidades es `no decreciente`. De esta forma, este método genera una función escalonada no decreciente:
 
 > $\sum_{i=1}^{n}(y_i - \hat{f_i})^2$
 > 
@@ -1118,19 +1120,18 @@ Esto es útil para la toma de decisiones, ya que permite establecer punto de cor
 
 ::: {style="font-size: 50%;"}
 ## Referencias / Recursos
-
-[Probability Calibration Workshop - PyData 2020](https://www.youtube.com/watch?v=A1NGGV3Z4m4&list=PLeVfk5xTWHYBw22D52etymvcpxey4QFIk&ab_channel=numeristical)
-
+::: {#refs}
+:::
 :::
 
 
 ## Contacto
 
 {{< fa brands linkedin size=1x >}} [karinabartolome](https://www.linkedin.com/in/karinabartolome/)
 
-{{< fa brands twitter size=1x >}} [@karbartolome](https://twitter.com/karbartolome)
+{{< fa brands twitter size=1x >}} [karbartolome](https://twitter.com/karbartolome)
 
-{{< fa brands github size=1x >}} [@karbartolome](http://github.com/karbartolome)
+{{< fa brands github size=1x >}} [karbartolome](http://github.com/karbartolome)
 
 {{< fa link >}} [Blog](https://karbartolome-blog.netlify.com)
 

diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-10.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-10.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-11.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-11.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-14.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-14.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-15.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-15.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-18.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-18.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-19.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-19.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-2.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-2.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-22.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-22.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-23.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-23.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-3.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-3.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-6.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-6.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-7.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-27-output-7.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-10.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-10.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-11.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-11.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-14.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-14.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-15.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-15.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-18.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-18.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-19.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-19.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-2.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-2.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-22.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-22.png
diff --git a/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-23.png b/...on/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-23.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-3.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-3.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-6.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-6.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-7.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/cell-30-output-7.png
diff --git a/...alibracion/01_calibration_slides_files/figure-html/fig-credit-tree-output-1.png b/...alibracion/01_calibration_slides_files/figure-html/fig-credit-tree-output-1.png
diff --git a/...libracion/01_calibration_slides_files/figure-html/fig-distrib-tree-output-1.png b/...libracion/01_calibration_slides_files/figure-html/fig-distrib-tree-output-1.png
diff --git a/...ion/01_calibracion/01_calibration_slides_files/figure-html/fig-log-output-1.png b/...ion/01_calibracion/01_calibration_slides_files/figure-html/fig-log-output-1.png
diff --git a/...alibracion/01_calibration_slides_files/figure-html/fig-reliability-output-1.png b/...alibracion/01_calibration_slides_files/figure-html/fig-reliability-output-1.png
diff --git a/01_ml/01_clasificacion/01_calibracion/bib.bib b/01_ml/01_clasificacion/01_calibracion/bib.bib
@@ -0,0 +1,63 @@
+# Calibración sigmoide
+@article{platt2000,
+    author = {Platt, John},
+    year = {2000},
+    month = {06},
+    pages = {},
+    title = {Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods},
+    volume = {10},
+    journal = {Adv. Large Margin Classif.}
+}
+
+# Calibración isotónica
+@article{Zadrozny2002,
+author = {Zadrozny, Bianca and Elkan, Charles},
+year = {2002},
+month = {08},
+pages = {},
+title = {Transforming Classifier Scores into Accurate Multiclass Probability Estimates},
+journal = {Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
+doi = {10.1145/775047.775151}
+}
+
+# Beta calibration
+@article{kull2017,
+author = {Meelis Kull and Telmo M. Silva Filho and Peter Flach},
+title = {{Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration}},
+volume = {11},
+journal = {Electronic Journal of Statistics},
+number = {2},
+publisher = {Institute of Mathematical Statistics and Bernoulli Society},
+pages = {5052 -- 5080},
+keywords = {Beta distribution, Binary classification, classifier calibration, logistic function, posterior probabilities, sigmoid},
+year = {2017},
+doi = {10.1214/17-EJS1338SI},
+URL = {https://doi.org/10.1214/17-EJS1338SI}
+}
+
+# Recomendaciones
+@online{pydata2020,
+    title = {Probability Calibration Workshop},
+    date = {2020},
+    organization = {PyData},
+    author = {Lucena, Brian},
+    url = {https://www.youtube.com/watch?v=A1NGGV3Z4m4&list=PLeVfk5xTWHYBw22D52etymvcpxey4QFIk&ab_channel=numeristical},
+}
+
+@book{james2023introduction,
+  added-at = {2023-11-14T15:31:56.000+0100},
+  address = {Cham},
+  author = {James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert and Taylor, Jonathan},
+  biburl = {https://www.bibsonomy.org/bibtex/22f0ef5cea78a278ad410067c5f36ef69/jaeschke},
+  doi = {10.1007/978-3-031-38747-0},
+  interhash = {9a8a7486348d9be69907361b87dbe6f4},
+  intrahash = {2f0ef5cea78a278ad410067c5f36ef69},
+  isbn = {978-3-031-38746-3},
+  keywords = {book learning python statistics},
+  publisher = {Springer},
+  series = {Springer Texts in Statistics},
+  timestamp = {2023-11-14T15:31:56.000+0100},
+  title = {An Introduction to Statistical Learning with Applications in Python},
+  url = {https://link.springer.com/book/10.1007/978-3-031-38747-0},
+  year = 2023
+}