February 20, 2019

## Announcements

• For labs going forward, if you find a particular block produces a lot of output, feel free to comment it out and add a note like: “Omitted to save space in output”
• Really happy to see so many of you starting to use ggplot2!
• Github has a great feature to allow you to render HTML pages from a Github repo. Go to https://htmlpreview.github.io and paste in the URL to the HTML file in your repo and it (usually) renders as a webpage.
• Random numbers and seeds: https://data606.net/post/2019-02-17-random_numbers_and_seeds/
• Few notes about colors in plots (my general advice, not rules per se).

## Coin Tosses Revisited

coins <- sample(c(-1,1), 100, replace=TRUE)
plot(1:length(coins), cumsum(coins), type='l')
abline(h=0)

cumsum(coins)[length(coins)]
## [1] -12

## Many Random Samples

samples <- rep(NA, 1000)
for(i in seq_along(samples)) {
coins <- sample(c(-1,1), 100, replace=TRUE)
samples[i] <- cumsum(coins)[length(coins)]
}
head(samples)
## [1]  -8   8  -2 -10  -8   6

## Histogram of Many Random Samples

hist(samples)

## Properties of Distribution

(m.sam <- mean(samples))
## [1] 0.162
(s.sam <- sd(samples))
## [1] 9.883088

## Properties of Distribution (cont.)

within1sd <- samples[samples >= m.sam - s.sam & samples <= m.sam + s.sam]
length(within1sd) / length(samples)
## [1] 0.677
within2sd <- samples[samples >= m.sam - 2 * s.sam & samples <= m.sam + 2* s.sam]
length(within2sd) / length(samples)
## [1] 0.951
within3sd <- samples[samples >= m.sam - 3 * s.sam & samples <= m.sam + 3 * s.sam]
length(within3sd) / length(samples)
## [1] 0.999

## Standard Normal Distribution

$f\left( x|\mu ,\sigma \right) =\frac { 1 }{ \sigma \sqrt { 2\pi } } { e }^{ -\frac { { \left( x-\mu \right) }^{ 2 } }{ { 2\sigma }^{ 2 } } }$

x <- seq(-4,4,length=200); y <- dnorm(x,mean=0, sd=1)
plot(x, y, type = "l", lwd = 2, xlim = c(-3.5,3.5), ylab='', xlab='z-score', yaxt='n')

## What’s the likelihood of ending with less than 15?

pnorm(15, mean=mean(samples), sd=sd(samples))
## [1] 0.9333678

## What’s the likelihood of ending with more than 15?

1 - pnorm(15, mean=mean(samples), sd=sd(samples))
## [1] 0.06663219

## Comparing Scores on Different Scales

￼SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?

## Z-Scores

• Z-scores are often called standard scores:

$Z = \frac{observation - mean}{SD}$

• Z-Scores have a mean = 0 and standard deviation = 1.

Converting Pam and Jim’s scores to z-scores:

$Z_{Pam} = \frac{1800 - 1500}{300} = 1$

$Z_{Jim} = \frac{24-21}{5} = 0.6$