Here is a review notes of Machine Learning.

There are several ways to measure the performance of ML. One is using loss function to measure the closeness between prediction output with the ground truth. The second way doesn’t just want the prediction of one test data but all the possible input from (X,Y)PXY. So given a cell draw randomly from the distribution the perform of this model can be represent as the Risk which is the average of the performance:

Risk R(f)=𝔼XY[loss(Y,f(X))]

As for the regression problem, if the loss function is square loss then the risk is the mean square error (MSE).

Bayes Optimal rule

Ideal goal: to construct the prediction rule :f:, which means:

f=argminf 𝔼XY[loss(Y,f(X))]

The best possible performance which is Bayes risk is:

R(f)R(f)  for all f

This optimal rule is not computable because it depends on unknown PXY.

Training process

When we talk about the performance of a learning algorithm, we are talking about how well does the algorithm do on average for a test example drawn at random, which is the Risk, and for a set of training examples and labels Dn={(Xi,Yi)}ni=1 drawn at random, which is called Expected Risk (aka Generalization Error).

𝔼Dn[R(fnˆ)]𝔼Dn[𝔼XY[loss(Y,f(X))]]

The ideal goal of a learning problem is the Bayes optimal rule. However, in the practical process, the practical goal is: given {Xi,Yi}ni=1, learn prediction rule fnˆ:. Often,

fnˆ=argminf1ni=1n[loss(Yi,f(Xi))]

This is called Empirical Risk minimizer. Under the Law of Large Number, we can get:

1ni=1n[loss(Yi,f(Xi))]NumbersLaw of Large𝔼XY[loss(Y,f(X))]

comments powered by Disqus