Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
shappiron committed Nov 20, 2023
1 parent 20b2a0e commit ce76003
Show file tree
Hide file tree
Showing 10 changed files with 483 additions and 502 deletions.
56 changes: 28 additions & 28 deletions _sources/stat/survival_analysis_part2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -202,16 +202,16 @@
"id": "9adc7d5c",
"metadata": {},
"source": [
"# Classical machine learning methods\n",
"## Classical machine learning methods\n",
"The main advantage - ability to model the non-linear relationships and work with high dimensional data\n",
"## Decision tree\n",
"### Decision tree\n",
"The basic intuition behind the tree models is to recursively partition the data based on a particular splitting criterion, so that the objects that are similar to each other based on the value of interest will be placed in the same node.\n",
"\n",
"We will start with the simplest case - decision tree for classification:\n",
"### Classification decision tree\n",
"#### Classification decision tree\n",
"```{figure} figs/9.PNG\n",
"```\n",
"#### Probabilities\n",
"##### Probabilities\n",
"\n",
"Before the first split:\n",
"\n",
Expand All @@ -228,7 +228,7 @@
"$$P(y=\\text{YELLOW}|X > 12) = \\frac{6}{7} \\approx 0.86$$\n",
"\n",
"\n",
"#### Entropy:\n",
"##### Entropy:\n",
"$$\n",
"H(p) = - \\sum_i^K p_i\\log(p_i)\n",
"$$\n",
Expand All @@ -246,7 +246,7 @@
"\n",
"$$H_{\\text{total}} = - \\frac{13}{20} 0.96 - \\frac{7}{20} 0.58 \\approx 0.83$$\n",
"\n",
"#### Information Gain:\n",
"##### Information Gain:\n",
"$$\n",
"IG = H(\\text{parent}) - H(\\text{child})\n",
"$$\n",
Expand All @@ -263,7 +263,7 @@
"id": "929d56ed",
"metadata": {},
"source": [
"### Regression decision tree\n",
"#### Regression decision tree\n",
"Is a step-wise constant predictor\n",
"\n",
"Lets look at this example:\n",
Expand Down Expand Up @@ -294,7 +294,7 @@
"id": "637ecf43",
"metadata": {},
"source": [
"## Ensembling\n",
"### Ensembling\n",
"Ensembling - combining the predictions of multiple base learners to obtain a powerful overall model. The base learners are often very simple models also referred as weak learners. \n",
"Multiple diverse models are created to predict an outcome, either by using many different modeling algorithms or using different training data sets.\n",
"\n",
Expand All @@ -320,14 +320,14 @@
"id": "5ada40bb",
"metadata": {},
"source": [
"### Random forest\n",
"#### Random forest\n",
"Random Forest fits a set of Trees independently and then averages their predictions\n",
"\n",
"The general principles as RF: (a) Trees are grown using bootstrapped data; (b) Random feature selection is used when splitting tree nodes; (c) Trees are generally grown deeply (d) Forest ensemble is calculated by averaging terminal node statistics\n",
"\n",
"Importantly, the high number of base learners do not lead to overfitting. \n",
"\n",
"### Gradient boosting\n",
"#### Gradient boosting\n",
"In contrast to random forest gradient boosted model is constructed sequentially in a greedy stagewise fashion\n",
"\n",
"After training decision tree the errors of prediction are obtained and the next decision tree is trained on this prediction errors\n",
Expand All @@ -347,7 +347,7 @@
"\n",
"If we want to include high number of base learners we should use very low learning rate to restrict the influence of individual base learners - similar to regularization\n",
"\n",
"# Survival machine learning\n",
"## Survival machine learning\n",
"\n",
"Survival analysis is a type of regression problem as we want to predict a continuous value, but with a twist. It differs from traditional regression by the fact that parts of the data can only be partially observed – they are censored\n",
"\n",
Expand All @@ -357,11 +357,11 @@
"Survival machine learning - machine learning methods adapted to work with survival data and censoring!\n",
"\n",
"\n",
"## 1. Survival random forest\n",
"### Survival random forest\n",
"Survival trees are one form of decision trees which are tailored\n",
"to handle censored data. The goal is to split the tree node into left and right daughters with dissimilar event history (survival) behavior.\n",
"\n",
"### Splitting criterion\n",
"#### Splitting criterion\n",
"The primary difference between a survival tree and the standard decision tree is\n",
"in the choice of splitting criterion - the log-rank test. The log-rank test has traditionally been used for two-sample testing of survival data, but it can be used for survival splitting as a means for maximizing between-node survival differences. \n",
"\n",
Expand Down Expand Up @@ -403,7 +403,7 @@
"id": "559e3cad",
"metadata": {},
"source": [
"### Prediction\n",
"#### Prediction\n",
"For prediction, a sample is dropped down each tree in the forest until it reaches a terminal node.\n",
"\n",
"Data in each terminal is used to non-parametrically estimate the cumulative hazard function and survival using the Nelson-Aalen estimator and Kaplan-Meier, respectively. \n",
Expand Down Expand Up @@ -431,11 +431,11 @@
"id": "0cd849af",
"metadata": {},
"source": [
"## 2. Survival Gradient boosting\n",
"### Survival Gradient boosting\n",
"\n",
"Gradient Boosting does not refer to one particular model, but a framework to optimize loss functions. \n",
"\n",
"### Cox’s Partial Likelihood Loss\n",
"#### Cox’s Partial Likelihood Loss\n",
"The default loss function is the partial likelihood loss of Cox’s proportional hazards model. \n",
"The objective is to maximize the log partial likelihood function, but replacing the traditional linear model with the additive model \n"
]
Expand All @@ -456,7 +456,7 @@
"id": "64c915e1",
"metadata": {},
"source": [
"# Neural networks - Multi-Layer Perceptron Network \n",
"## Neural networks - Multi-Layer Perceptron Network \n",
"\n",
"Here is the model of artificial neuron - base element of artificial neural network. The output is computed using activation function on the summation of inputs multiplied by weights and the bias value\n",
"\n",
Expand All @@ -481,7 +481,7 @@
"id": "4a762b0f",
"metadata": {},
"source": [
"## MLP training\n",
"### MLP training\n",
"The most popular method for training MLPs is back-propagation. During backpropagation, the output values are compared with the correct answer to compute the value of some predefined error-function. The error is then fed back through the network. Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of the error function by some small amount. After repeating this process for a sufficiently large number of training cycles, the network will usually converge to some state where the error of the calculations is small. In this case, one would say that the network has learned a certain target function. \n"
]
},
Expand Down Expand Up @@ -576,11 +576,11 @@
"id": "740e6a49",
"metadata": {},
"source": [
"# Survival neural networks\n",
"## Survival neural networks\n",
"Neural networks methods adapted to work with survival data and censoring!\n",
"Pycox - python package for survival analysis and time-to-event prediction with PyTorch, built on the torchtuples package for training PyTorch models.\n",
"\n",
"## DeepSurv (CoxPH NN)\n",
"### DeepSurv (CoxPH NN)\n",
"Continious-time model. \n",
"\n",
"Nonlinear Cox proportional hazards network. Deep feed-forward neural network with Cox proportional hazards loss function. Can be considered as nonlinear extension of the Cox proportional hazards: can deal with both linear and nonlinear effects from covariates. \n",
Expand All @@ -606,7 +606,7 @@
"id": "9a8b7c3b",
"metadata": {},
"source": [
"## Nnet-survival (Logistic hazard NN)\n",
"### Nnet-survival (Logistic hazard NN)\n",
"Discrete-time model, fully parametric survival model\n",
"\n",
"The Logistic-Hazard method parametrize the discrete hazards and optimize the survival likelihood.\n",
Expand All @@ -623,21 +623,21 @@
"id": "86a4434a",
"metadata": {},
"source": [
"## Performance metrics\n",
"### Performance metrics\n",
"Our test data is usually subject to censoring too, therefore common metrics like root mean squared error or correlation are unsuitable. Instead, we use specific metrics for survival analysis\n",
"### 1. Harrell’s concordance index\n",
"#### Harrell’s concordance index\n",
"Predictions are often evaluated by a measure of rank correlation between predicted risk scores and observed time points in the test data. Harrell’s concordance index or c-index computes the ratio of correctly ordered (concordant) pairs to comparable pairs\n",
"\n",
"The higher the C-index is - the better model performance is\n",
"\n",
"\n",
"### 2. Time-dependent ROC AUC\n",
"#### Time-dependent ROC AUC\n",
"Extention of the well known receiver operating characteristic curve (ROC curve) to possibly censored survival times. Given a time point, we can estimate how well a predictive model can distinguishing subjects who will experience an event by time \n",
" (sensitivity) from those who will not (specificity).\n",
" \n",
"The higher the ROC AUC is - the better model performance is\n",
"\n",
" ### 3. TIme-dependent Brier score\n",
" #### TIme-dependent Brier score\n",
" The time-dependent Brier score is an extension of the mean squared error to right censored data.\n",
" \n",
" The lower the Brier score is - the better model performance is"
Expand All @@ -648,7 +648,7 @@
"id": "b05e6c79",
"metadata": {},
"source": [
"## Features selection\n",
"### Features selection\n",
"Which variable is most predictive?\n",
"\n",
"Different methodologies exist, however we will only talk about one simple but valuable method - permutation importance"
Expand All @@ -659,11 +659,11 @@
"id": "2ff2731e",
"metadata": {},
"source": [
"### 1. Permutation feature importance \n",
"#### Permutation feature importance \n",
"Permutation feature importance is a model inspection technique which can be used for any fitted estimator with tabular data. This is especially useful for non-linear estimators. \n",
"\n",
"The permutation feature importance is a decrease in a model score when a single feature value is randomly shuffled. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. \n",
"# Credits \n",
"## Credits \n",
"This notebook was prepared by Margarita Sidorova"
]
}
Expand Down
Loading

0 comments on commit ce76003

Please sign in to comment.