In machine learning, weight coefficients and biases are adjusted through big data to minimize the experience loss function. A fundamental question is: What is the difference between empirical optimal and ideal optimal? In this report, I will introduce the foundational work of Vapnik and Talagrand on this issue and extend their work to the case of unbounded loss functions. The key tools are Bernstein's inequality, empirical process theory, and VC dimension or metric entropy.