损失函数.doc
文本预览下载声明
Definition
Formally, we begin by considering some family of distributions for a random variable X, that is indexed by some θ.
More intuitively, we can think of X as our data, perhaps , where i.i.d. The X is the set of things the decision rule will be making decisions on. There exists some number of possible ways to model our data X, which our decision function can use to make decisions. For a finite number of models, we can thus think of θ as the index to this family of probability models. For an infinite family of models, it is a set of parameters to the family of distributions.
On a more practical note, it is important to understand that, while it is tempting to think of loss functions as necessarily parametric (since they seem to take θ as a parameter), the fact that θ is non-finite-dimensional is completely incompatible with this notion; for example, if the family of probability functions is uncountably infinite, θ indexes an uncountably infinite space.
From here, given a set A of possible actions, a decision rule is a function δ?:?→?A.
A loss function is a real lower-bounded function L on Θ?×?A for some θ ∈ Θ. The value L(θ,?δ(X)) is the cost of action δ(X) under parameter θ.[1]
[edit] Decision rules
A decision rule makes a choice using an optimality criterion. Some commonly used criteria are:
Minimax: Choose the decision rule with the lowest worst loss — that is, minimize the worst-case (maximum possible) loss:
Invariance: Choose the optimal decision rule which satisfies an invariance requirement.
Choose the decision rule with the lowest average loss (i.e. minimize the expected value of the loss function):
[edit] Expected loss
The value of the loss function itself is a random quantity because it depends on the outcome of a random variable X. Both frequentist and Bayesian statistical theory involve making a decision based on the expected value of the loss function: however this quantity is defined differently under the two paradigms.
[edit] Frequentist risk
显示全部