- Uliana Plotnikova (4.3) http://rpubs.com/uplotnik/475303
- Mary Anna Kivenson (4.23) http://rpubs.com/mkivenson/475036
March 13, 2019
There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a decision about which might be true, but our choice might be incorrect.
fail to reject H_{0} | reject H_{0} | |
---|---|---|
H_{0} true | âœ” | Type I Error |
H_{A} true | Type II Error | âœ” |
If we again think of a hypothesis test as a criminal trial then it makes sense to frame the verdict in terms of the null and alternative hypotheses:
H_{0} : Defendant is innocent
H_{A} : Defendant is guilty
Which type of error is being committed in the following circumstances?
Which error do you think is the worse error to make?
(cv <- qnorm(0.05, mean=0, sd=1, lower.tail=FALSE))
## [1] 1.644854
PlotDist(alpha=0.05, distribution='normal', alternative='greater') abline(v=cv, col='blue')
cord.x1 <- c(-5, seq(from = -5, to = cv, length.out = 100), cv) cord.y1 <- c(0, dnorm(mean=cv, x=seq(from=-5, to=cv, length.out = 100)), 0) curve(dnorm(x, mean=cv), from = -5, to = 5, n = 1000, col = "black", lty = 1, lwd = 2, ylab = "Density", xlab = "Values") polygon(x = cord.x1, y = cord.y1, col = 'lightgreen') abline(v=cv, col='blue')
pnorm(cv, mean=cv, lower.tail = FALSE)
## [1] 0.5
mu <- 2.5 (cv <- qnorm(0.05, mean=0, sd=1, lower.tail=FALSE))
## [1] 1.644854
Type I Error
pnorm(mu, mean=0, sd=1, lower.tail=FALSE)
## [1] 0.006209665
Type II Error
pnorm(cv, mean=mu, lower.tail = TRUE)
## [1] 0.1962351
Visualizing Type I and Type II errors: http://shiny.albany.edu/stat/betaprob/
Check out this page: https://www.openintro.org/stat/why05.php
See also:
Kelly M. Emily Dickinson and monkeys on the stair Or: What is the significance of the 5% significance level? Significance 10:5. 2013.
boot
R package provides a framework for doing bootstrapping: https://www.statmethods.net/advstats/bootstrapping.htmlDefine our population with a uniform distribution.
n <- 1e5 pop <- runif(n, 0, 1) mean(pop)
## [1] 0.5008915
We observe one random sample from the population.
samp1 <- sample(pop, size = 50)
boot.samples <- numeric(1000) # 1,000 bootstrap samples for(i in seq_along(boot.samples)) { tmp <- sample(samp1, size = length(samp1), replace = TRUE) boot.samples[i] <- mean(tmp) } head(boot.samples)
## [1] 0.5771871 0.5807470 0.5244116 0.5417600 0.5402336 0.5336786
d <- density(boot.samples) h <- hist(boot.samples, plot=FALSE) hist(boot.samples, main='Bootstrap Distribution', xlab="", freq=FALSE, ylim=c(0, max(d$y, h$density)+.5), col=COL[1,2], border = "white", cex.main = 1.5, cex.axis = 1.5, cex.lab = 1.5) lines(d, lwd=3)
c(mean(boot.samples) - 1.96 * sd(boot.samples), mean(boot.samples) + 1.96 * sd(boot.samples))
## [1] 0.4703869 0.6215272
boot.samples.median <- numeric(1000) # 1,000 bootstrap samples for(i in seq_along(boot.samples.median)) { tmp <- sample(samp1, size = length(samp1), replace = TRUE) boot.samples.median[i] <- median(tmp) # NOTICE WE ARE NOW USING THE median FUNCTION! } head(boot.samples.median)
## [1] 0.5183295 0.5857431 0.4789167 0.6613506 0.4997448 0.5135616
95% confidence interval for the median
c(mean(boot.samples.median) - 1.96 * sd(boot.samples.median), mean(boot.samples.median) + 1.96 * sd(boot.samples.median))
## [1] 0.4398765 0.6971332