What is the 'Risk' in ML

Here is a review notes of Machine Learning.

There are several ways to measure the performance of ML. One is using loss function to measure the closeness between prediction output with the ground truth. The second way doesn’t just want the prediction of one test data but all the possible input from $(X,Y) \sim P_{XY}$ . So given a cell draw randomly from the distribution the perform of this model can be represent as the Risk which is the average of the performance:

R i s k R (f) = 𝔼 X Y [l o s s (Y, f (X))]

$Risk\ R(f) = \mathbb{E}_{XY} [loss(Y, f(X))]$

As for the regression problem, if the loss function is square loss then the risk is the mean square error (MSE).

Bayes Optimal rule

Ideal goal: to construct the prediction rule : $f^*:\mathcal{X}\mapsto\mathcal{Y}$ , which means:

f * = arg min f 𝔼 X Y [l o s s (Y, f (X))]

$f^* = \mathop{\arg\min}_{f}\ \mathbb{E}_{XY} [loss(Y, f(X))]$

The best possible performance which is Bayes risk is：

R (f *) \leq R (f) f o r a l l f

$R(f^*) \le R(f)\ \ for\ all\ f$

This optimal rule is not computable because it depends on unknown $P_{XY}$ .

Training process

When we talk about the performance of a learning algorithm, we are talking about how well does the algorithm do on average for a test example drawn at random, which is the Risk, and for a set of training examples and labels $D_n = \{(X_i, Y_i)\}_{i=1}^n$ drawn at random, which is called Expected Risk (aka Generalization Error).

𝔼 D n [R (f n ˆ)] \equiv 𝔼 D n [𝔼 X Y [l o s s (Y, f (X))]]

$\mathbb{E}_{D_n}[R(\widehat{f_n})] \equiv \mathbb{E}_{D_n}[\mathbb{E}_{XY} [loss(Y, f(X))]]$

The ideal goal of a learning problem is the Bayes optimal rule. However, in the practical process, the practical goal is: given $\{X_i,Y_i\}_{i=1}^n$ , learn prediction rule $\widehat{f_n}: \mathcal{X} \mapsto \mathcal{Y}$ . Often,

f n ˆ = arg min f \in  1 n \sum i = 1 n [l o s s (Y i, f (X i))]

$\widehat{f_n} = \mathop{\arg\min}_{f \in \mathcal{F}} \frac{1}{n}\sum_{i=1}^{n}[loss(Y_i,f(X_i))]$

This is called Empirical Risk minimizer. Under the Law of Large Number, we can get:

1 n \sum i = 1 n [l o s s (Y i, f (X i))] - \to - - - - - - - N u m b e r s L a w o f L a r g e 𝔼 X Y [l o s s (Y, f (X))]

$\frac{1}{n}\sum_{i=1}^{n}[loss(Y_i,f(X_i))] \xrightarrow[Numbers]{Law\ of\ Large} \mathbb{E}_{XY} [loss(Y, f(X))]$

Author: Junbin Huang

Words: 284

Released under CC BY-NC 4.0

Back · Home