SKL05-SupervisedLearning.tex


% 2. Supervised learning: predicting an output variable from high-dimensional observations

The problem solved in supervised learning

Supervised learning consists in learning the link between two datasets: the observed data X, and an external variable y that we are trying to predict, usually called target or labels. Most often, y is a 1D array of length \texttt{n\_samples}.

All supervised estimators in the scikit-learn implement a fit(X, y) method to fit the model, and a predict(X) method that, given unlabeled observations X, returns predicts the corresponding labels y.
%================================================================================ %
Vocabulary: classification and regression

If the prediction task is to classify the observations in a set of finite labels, in other words to “name” the objects observed, the task is said to be a classification task. On the opposite, if the goal is to predict a continous target variable, it is said to be a regression task.

In the scikit-learn, for classification tasks, y is a vector of integers.

%================================================================================ %
2.1. Nearest neighbor and the curse of dimensionality

\subsection{Classifying irises:}

The iris dataset is a classification task consisting in identifying 3 different types of irises (Setosa, Versicolour, and Virginica) from their petal and sepal length and width:

<pre>
\begin{verbatim}
>>> import numpy as np
>>> from scikits.learn import datasets
>>> iris = datasets.load_iris()
>>> iris_X = iris.data
>>> iris_y = iris.target
>>> np.unique(iris_y)
array([0, 1, 2])
\end{verbatim}
\end{framed}
%============================================================================= %
\subsubsection{k-Nearest neigbhors classifier}

The simplest possible classifier is the nearest neighbor: given a new observation \texttt{x\_test}, find in the training set (i.e. the data used to train the estimator) the observation with the closest feature vector.
%============================================================================== %
\end{document}