The Beast of Bias Discovering Statistics(偏见的野兽发现统计数据).pdf
文本预览下载声明
Exploring Data: The Beast of Bias
Sources of Bias
A bit of revision. We鈥檝e seen that having collected data we usually fit a model that represents the hypothesis that we
want to test. This model is usually a linear model, which takes the form of:
outcome = 饊亸饊亸 饊亱饊亱 +饊亸饊亸 饊亱饊亱 鈰€亸饊亸 饊亱饊亱 +error Eq. 1
! $ $! ! ) )! !
Therefore, we predict an outcome variable, from one or more predictor variables (the Xs) and parameters (the bs in the
equation) that tell us something about the relationship between the predictor and the outcome variable. Finally, the
model will not predict the outcome perfectly so for each observation there will be some error.
When we fit a model, we often estimate the parameters (b) usin the method of least squares (known as ordinary least
squares or OLS). We鈥檙e not interested in our sample so much as a general population, so we use the sample data to
estimate the value of the parameters in the population (that鈥檚 why we call them estimates rather than values). When
we estimate a parameter we also compute an estimate of how well it represents the population such as a standard
error or confidence interval. We can test hypotheses about these parameters by computing test statistics and their
associated probabilities (p-values). Therefore, when we think about bias, we need to think about it within three
contexts:
1. Things that bias the parameter estimates.
2. Things that bias standard errors and confidence intervals.
3. Things that bias test statistics and p-values.
T
显示全部