February 20, 2019

Announcements

  • For labs going forward, if you find a particular block produces a lot of output, feel free to comment it out and add a note like: “Omitted to save space in output”
    • Really happy to see so many of you starting to use ggplot2!
  • Github has a great feature to allow you to render HTML pages from a Github repo. Go to https://htmlpreview.github.io and paste in the URL to the HTML file in your repo and it (usually) renders as a webpage.
  • Random numbers and seeds: https://data606.net/post/2019-02-17-random_numbers_and_seeds/
  • Few notes about colors in plots (my general advice, not rules per se).

Presentations

Coin Tosses Revisited

coins <- sample(c(-1,1), 100, replace=TRUE)
plot(1:length(coins), cumsum(coins), type='l')
abline(h=0)

cumsum(coins)[length(coins)]
## [1] -12

Many Random Samples

samples <- rep(NA, 1000)
for(i in seq_along(samples)) {
    coins <- sample(c(-1,1), 100, replace=TRUE)
    samples[i] <- cumsum(coins)[length(coins)]
}
head(samples)
## [1]  -8   8  -2 -10  -8   6

Histogram of Many Random Samples

hist(samples)

Properties of Distribution

(m.sam <- mean(samples))
## [1] 0.162
(s.sam <- sd(samples))
## [1] 9.883088

Properties of Distribution (cont.)

within1sd <- samples[samples >= m.sam - s.sam & samples <= m.sam + s.sam]
length(within1sd) / length(samples)
## [1] 0.677
within2sd <- samples[samples >= m.sam - 2 * s.sam & samples <= m.sam + 2* s.sam]
length(within2sd) / length(samples)
## [1] 0.951
within3sd <- samples[samples >= m.sam - 3 * s.sam & samples <= m.sam + 3 * s.sam]
length(within3sd) / length(samples)
## [1] 0.999

Standard Normal Distribution

\[ f\left( x|\mu ,\sigma \right) =\frac { 1 }{ \sigma \sqrt { 2\pi } } { e }^{ -\frac { { \left( x-\mu \right) }^{ 2 } }{ { 2\sigma }^{ 2 } } } \]

x <- seq(-4,4,length=200); y <- dnorm(x,mean=0, sd=1)
plot(x, y, type = "l", lwd = 2, xlim = c(-3.5,3.5), ylab='', xlab='z-score', yaxt='n')

Standard Normal Distribution

Standard Normal Distribution

Standard Normal Distribution

What’s the likelihood of ending with less than 15?

pnorm(15, mean=mean(samples), sd=sd(samples))
## [1] 0.9333678

What’s the likelihood of ending with more than 15?

1 - pnorm(15, mean=mean(samples), sd=sd(samples))
## [1] 0.06663219

Comparing Scores on Different Scales

SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT?

Z-Scores

  • Z-scores are often called standard scores:

\[ Z = \frac{observation - mean}{SD} \]

  • Z-Scores have a mean = 0 and standard deviation = 1.

Converting Pam and Jim’s scores to z-scores:

\[ Z_{Pam} = \frac{1800 - 1500}{300} = 1 \]

\[ Z_{Jim} = \frac{24-21}{5} = 0.6 \]

Standard Normal Parameters