STAT 3360 Notes
Table of Contents
1 One-Population Inference: Population Mean
1.1 Population Standard Deviation Known
1.1.1 Assumptions and Notations
- X: the random variable following a Normal distibution;
- μ: the unknown population mean of X;
- σ: the known population standard deviation of X;
- X1,X2,…,Xn: the sample of X;
- n: sample size;
- ˉX: sample mean;
- s: sample standard deviation;
- Z: the standard Normal random variable;
- Zα: the number such that P(Z>Zα)=α;
1.1.2 Confidence Interval
- This topic is already covered in a previous section titled "Estimating Population Mean". For completeness, let's repeat the relevant procedure and formulae here.
- When the population standard deviation of X, denoted by σ, is known, at Confidence Level 1−α, the Confidence Interval for the population mean μ can be constructed as follows,
- Midpoint: ˉX;
- Critical Value: Zα/2;
- Margin of Error: Zα/2⋅σ√n;
- Lower Confidence Limit (LCL): ˉX−Zα/2⋅σ√n;
- Upper Confidence Limit (UCL): ˉX+Zα/2⋅σ√n;
- Confidence Interval: [ˉX−Zα/2⋅σ√n,ˉX+Zα/2⋅σ√n];
- Width of Confidence Interval: 2⋅Zα/2⋅σ√n.
- At confidence level 1−α, to achieve a confidence interval for the population mean μ whose margin of error is no wider than M, the smallest sample size needed is
n=(zα/2σM)2If the value of n calculated by the formula is not an integer, then we should round it up in order to ensure the actual margin of error is no wider than M.
1.1.3 Hypothesis Testing
- When the population standard deviation of X, denoted by σ, is known, at Significance Level α,
- the Z-test for H0:μ≤μ0 vs HA:μ>μ0 (which is equivalent to H0:μ=μ0 vs HA:μ>μ0) can be constructed as follows,
- Test Statistic: T=√n(ˉX−μ0)σ;
- Critical Value: Zα;
- Rejection Rule: Reject H0 if T>Zα.
- the Z-test for H0:μ≥μ0 vs HA:μ<μ0 (which is equivalent to H0:μ=μ0 vs HA:μ<μ0) can be constructed as follows,
- Test Statistic: T=√n(ˉX−μ0)σ;
- Critical Value: Z1−α, which equals −Zα;
- Rejection Rule: Reject H0 if T<Z1−α, which is equivalent to T<−Zα.
- the Z-test for H0:μ=μ0 vs HA:μ≠μ0 can be constructed as follows,
- Test Statistic: T=√n(ˉX−μ0)σ;
- Critical Values: Z1−α/2 (which equals −Zα/2) and Zα/2;
- Rejection Rule: Reject H0 if T>Zα/2 or T<−Zα/2, which is equivalent to |T|>Zα/2.
- the Z-test for H0:μ≤μ0 vs HA:μ>μ0 (which is equivalent to H0:μ=μ0 vs HA:μ>μ0) can be constructed as follows,
1.1.4 Example
- Suppose the students' scores follow a Normal distribution with unknown population mean μ and known population standard deviation σ=3.
- Q1: (Confidence Interval) If we randomly select 16 students and find their average score is 79 and sample standard deviation is 3.5, what is the confidence interval of the population mean at confidence level 90%?
- Let X be the score of a student, then we know
- X follows a Normal distribution;
- the population mean of X, denoted by μ, is unknown;
- the population standard deviation of X is σ=3;
- the sample size n=16;
- the sample mean ˉX=79;
- the sample standard deviation is s=3.5.
- To construct the confidence interval of μ when population standard deviation σ=3 is known,
- the confidence level is 1−α=90%, so α=0.1;
- the midpoint of the confidence interval is ˉX=79;
- the critical value is Zα/2=Z0.1/2=Z0.05=1.64 because P(Z>1.64)=0.05;
- the margin of error is Zα/2⋅σ√n=1.64⋅3√16=1.23;
- the lower confidence limit is 79−1.23=77.77;
- the upper confidence limit is 79+1.23=80.23;
- the confidence interval is [77.77,80.23];
- the width of the confidence interval is 2×1.23=2.46.
- Let X be the score of a student, then we know
- Q2: (Sample Size) At confidence level 90%, to achieve a confidence interval no wider than 2, what is the smallest sample size needed?
- the confidence level is 1−α=90%, so α=0.1;
- the critical value is Zα/2=Z0.1/2=Z0.05=1.64 because P(Z>1.64)=0.05;
- since the width of the confidence interval is 2M where M is the margin of error, to let 2M<2, we need M<1;
- by formula, to achieve M<1, the smallest sample size should be n=(zα/2σM)2=(1.64⋅31)2=24.21≈25.
- Q3: (Hypothesis Testing) If we randomly select 16 students and find their average score is 79 and sample standard deviation is 3.5, at 5% significance level, do we have sufficient evidence that the population mean is lower than 80?
- Here we want to confirm the claim that μ<80, so we let it be the alternative hypothesis. That is, we want to test H0:μ≥80 vs HA:μ<80 (or equivalently H0:μ=80 vs HA:μ<80);
- The test statistic is T=√n(ˉX−μ0)σ=√16(79−80)3=−1.33;
- The significance level is α=5%;
- The critical value is −Zα=−Z0.05=−1.64;
- The rejection rule is: reject H0 if T<−Zα;
- Here T=−1.33>−1.64=−Zα, so we don't have enough evidence to reject H0 and thus don't have enough evidence to say that the population mean is lower than 80.
- Q4: (Hypothesis Testing) For Q3, at 10% significance level, do we have sufficient evidence that the population mean is lower than 80?
- The hypotheses are the same, ie, H0:μ≥80 vs HA:μ<80 (or equivalently H0:μ=80 vs HA:μ<80);
- The test statistic is the same, ie, T=√n(ˉX−μ0)σ=√16(79−80)3=−1.33;
- The significance level now is α=10%;
- The critical value now is −Zα=−Z0.1=−1.28;
- The rejection rule is the same, ie, reject H0 if T<−Zα;
- Here T=−1.33<−1.28=−Zα, so we have enough evidence to reject H0 and thus have enough evidence to say that the population mean is lower than 80.
- Remark:
- Comparing Q4 with Q3, we see that at a higher significance level, we are more likely to reject H0. Recall that the significance level is just the probability of Type I error, and Type I error occurs if we reject H0 while H0 is true. With a higher significance level, we are allowed more chance for Type I error and thus are more aggressive in rejecting H0.
- In Q3, we were conservative (prudent) in rejecting H0 by using a lower significance level α=5%. We didn't reject H0 and thus avoided the risk of Type I error. However, if H0 is not true, then we miss the chance to disprove it (ie, Type II error occurs).
- In Q4, we were aggressive in rejecting H0 by using a high significance level α=10%. We rejected H0 and supported HA. However, if H0 is true and HA is not true, then Type I error occurs.
- Therefore, the choice of significance is critical for the decision making, and different significance level can lead to even opposite conclusions for the same data.
- In general, a lower significance level means a conservative (prudent) attitude toward Type I error, which leads to a low chance of rejecting H0. Therefore, at a low significance level,
- if H0 is rejected, then we believe we have enough evidence to support HA because we are already prudent;
- if H0 is not rejected, then we don't say "H0 is true". Instead, we say "we don't have enough evidence to reject H0". When we are conservative, we need very strong evidence for rejecting H0. Therefore, it may happen that H0 is not true but we choose not to reject it because we believe the evidence is not strong enough.
- In general, a high significance level means an aggressive attitude toward Type I error, which leads to a high chance of rejecting H0. Therefore, at a high significance level,
- even if H0 is rejected, we are still not quite confident in HA because we are inclined to reject H0 and thus have a high chance of rejecting a true H0;
- if H0 is not rejected, then we believe there is enough evidence to support H0 (ie, H0 is not rejected even when we are aggressive).
1.2 Population Standard Deviation Unknown
1.2.1 Assumptions and Notations
- X: the random variable following a Normal distibution;
- μ: the unknown population mean of X;
- σ: the unknown population standard deviation of X;
- X1,X2,…,Xn: the sample of X;
- n: the sample size;
- ˉX: sample mean;
- s: sample standard deviation;
1.2.2 Student's t Distribution
- When the population standard deviation is unknown, for inference about the population mean, the (Student's) t distribution is used.
- Unlike the standard Normal distribution, there are many versions of t distribution, indexed by the so-called degrees of freedom.
- We use td to represent a random variable following the t distribution with d degrees of freedom.
- Use table 4 to find the number tα,d such that P(td>tα,d)=α.
For example, the number in the red box means P(t15>1.753)=0.05, so t0.05,15=1.753.
1.2.3 Confidence Interval
- When the population standard deviation of X, denoted by σ, is unknown, at Confidence Level 1−α, the confidence interval for the population mean μ can be constructed as follows,
- Midpoint: ˉX;
- Critical Value: tα/2,n−1;
- Margin of Error: tα/2,n−1⋅s√n;
- Lower Confidence Limit (LCL): ˉX−tα/2,n−1⋅s√n;
- Upper Confidence Limit (UCL): ˉX+tα/2,n−1⋅s√n;
- Confidence Interval: [ˉX−tα/2,n−1⋅s√n,ˉX+tα/2,n−1⋅s√n];
- Width of Confidence Interval: 2⋅tα/2,n−1⋅s√n.
1.2.4 Hypothesis Testing
- When the population standard deviation of X, denoted by σ, is unknown, at Significance Level α,
- the t-test for H0:μ≤μ0 vs HA:μ>μ0 (which is equivalent to H0:μ=μ0 vs HA:μ>μ0) can be constructed as follows,
- Test Statistic: T=√n(ˉX−μ0)s;
- Critical Value: tα,n−1;
- Rejection Rule: Reject H0 if T>tα,n−1.
- the t-test for H0:μ≥μ0 vs HA:μ<μ0 (which is equivalent to H0:μ=μ0 vs HA:μ<μ0) can be constructed as follows,
- Test Statistic: T=√n(ˉX−μ0)s;
- Critical Value: t1−α,n−1, which equals −tα,n−1
- Rejection Rule: Reject H0 if T<t1−α,n−1, which is equivalent to T<−tα,n−1
- the t-test for H0:μ=μ0 vs HA:μ≠μ0 can be constructed as follows,
- Test Statistic: T=√n(ˉX−μ0)s;
- Critical Values: t1−α/2,n−1 (which equals −tα/2,n−1) and tα/2,n−1;
- Rejection Rule: Reject H0 if T>tα/2,n−1 or T<−tα/2,n−1, which is equivalent to |T|>tα/2,n−1.
- the t-test for H0:μ≤μ0 vs HA:μ>μ0 (which is equivalent to H0:μ=μ0 vs HA:μ>μ0) can be constructed as follows,
1.2.5 Example
- Suppose the students' scores follow a Normal distribution with unknown population mean μ and unknown population standard deviation σ.
- Q1: (Confidence Interval) If we randomly select 16 students and find their average score is 79 and sample standard deviation is 3.5, what is the confidence interval of the population mean at confidence level of 90%?
- Let X be the score of a student, then we know
- X follows a Normal distribution;
- the population mean of X, denoted by μ, is unknown;
- the population standard deviation of X, denoted by σ, is unknown;
- the sample size n=16;
- the sample mean ˉX=79;
- the sample standard deviation s=3.5.
- To construct the confidence interval of μ when σ is unknown,
- the confidence level is 1−α=90%, so α=0.1;
- the midpoint of the confidence interval is ˉX=79;
- the critical value is tα/2,n−1=t0.1/2,16−1=t0.05,15=1.753;
- the margin of error is tα/2,n−1⋅s√n=1.753⋅3.5√16=1.534;
- the lower confidence limit is 79−1.534=77.466;
- the upper confidence limit is 79+1.534=80.534;
- the confidence interval is [77.466,80.534];
- the width of the confidence interval is 2×1.534=3.068.
- Let X be the score of a student, then we know
- Q2: (Hypothesis Testing) If we randomly select 16 students and find their average score is 79 and sample standard deviation is 3.5, at 5% significance level, do we have sufficient evidence that the population mean is lower than 80?
- Here we want to confirm the claim that μ<80, so we let it be the alternative hypothesis. That is, we want to test H0:μ≥80 vs HA:μ<80 (or equivalently H0:μ=80 vs HA:μ<80);
- The test statistic is T=√n(ˉX−μ0)s=√16(79−80)3.5=−1.143;
- The significance level is α=5%;
- The critical value is −tα,n−1=−t0.05,16−1=−1.753;
- The rejection rule is: reject H0 if T<−tα,n−1;
- Here T=−1.143>−1.753=−tα,n−1, so we don't have enough evidence to reject H0 and thus don't have enough evidence to say that the population mean is lower than 80.
- Q3: (Hypothesis Testing) For Q3, at 10% significance level, do we have sufficient evidence that the population mean is lower than 80?
- The hypotheses are the same, H0:μ≥80 vs HA:μ<80 (or equivalently H0:μ=80 vs HA:μ<80);
- The test statistic is the same, T=√n(ˉX−μ0)s=√16(79−80)3.5=−1.143;
- The significance level now is α=10%;
- The critical value now is −tα,n−1=−t0.1,16−1=−1.341;
- The rejection rule is the same: reject H0 if T<−tα,n−1;
- Here T=−1.143>−1.341=−tα,n−1, so we still don't have enough evidence to reject H0 and thus still don't have enough evidence to say that the population mean is lower than 80.
2 One-Population Inference: Population Variance
2.1 Assumptions and Notations
- X: the random variable following a Normal distibution;
- μ: the (known or unknown) population mean of X;
- σ2: the unknown population variance of X;
- X1,X2,…,Xn: the sample (observations) of X;
- ˉX: sample mean;
- s: sample standard deviation.
2.2 Student's χ2 Distribution
- For inference about population variance, the χ2 distribution is used.
- Unlike the standard Normal distribution, there are many versions of χ2 distribution, indexed by the so-called degrees of freedom.
- We use χ2d to represent a random variable following the χ2 distribution with d degrees of freedom.
- Use table 5 to find the number χ2α,d such that P(χ2d>χ2α,d)=α.
For example, the red box means P(χ210>16)=0.1, so χ20.1,10=16.
2.3 Confidence Interval
- The 1−α confidence interval of the unknown population variance σ2 is
[(n−1)s2χ2α/2,n−1,(n−1)s2χ21−α/2,n−1]
- (n−1)s2χ2α/2,n−1 is called the Lower Confidence Level;
- χ2α/2,n−1 is called the Critical Value for Lower Confidence Level.
- (n−1)s2χ21−α/2,n−1 is called the Upper Confidence Level;
- χ21−α/2,n−1 is called the Critical Value for Upper Confidence Level.
2.4 Hypothesis Testing
- At Significance Level α,
- the test for H0:σ2≤σ20 vs HA:σ2>σ20 (which is equivalent to H0:σ2=σ20 vs HA:σ2>σ20) can be constructed as follows,
- Test Statistic: T=(n−1)s2σ20;
- Critical Value: χ2α,n−1;
- Rejection Rule: Reject H0 if T>χ2α,n−1.
- the test for H0:σ2≥σ20 vs HA:σ2<σ20 (which is equivalent to H0:σ2=σ20 vs HA:σ2<σ20) can be constructed as follows,
- Test Statistic: T=(n−1)s2σ20;
- Critical Value: χ21−α,n−1;
- Rejection Rule: Reject H0 if T<χ21−α,n−1.
- the test for H0:σ2=σ20 vs HA:σ2≠σ20 can be constructed as follows,
- Test Statistic: T=(n−1)s2σ20;
- Critical Values: χ21−α/2,n−1 and χ2α/2,n−1;
- Rejection Rule: Reject H0 if T>χ2α/2,n−1 or T<χ21−α/2,n−1.
- the test for H0:σ2≤σ20 vs HA:σ2>σ20 (which is equivalent to H0:σ2=σ20 vs HA:σ2>σ20) can be constructed as follows,
2.5 Example
- Suppose the students' scores follow a Normal distribution with unknown population mean μ and unknown population variance σ2.
- Q1: (Confidence Interval) If we randomly select 16 students and find their average score is 79 and sample standard deviation is 3.5, what is the confidence interval of the population variance at confidence level of 90%?
- Let X be the score of a student, then we know
- X follows a Normal distribution;
- the population variance of X, denoted by σ2, is unknown;
- the sample size n=16;
- the sample mean ˉX=79;
- the sample standard deviation s=3.5 and thus the sample variance is s2=3.52=12.25.
- To construct the confidence interval of the unknown σ2,
- the confidence level is 1−α=90%, so α=0.1;
- the critical values are
- χ21−α/2,n−1=χ21−0.1/2,16−1=χ20.95,15=7.26;
- χ2α/2,n−1=χ20.1/2,16−1=χ20.05,15=25.0;
- the confidence interval is [(n−1)s2χ2α/2,n−1,(n−1)s2χ21−α/2,n−1] = [(16−1)⋅3.5225,(16−1)⋅3.527.26] = [7.35,25.31]
- the width of the confidence interval is 25.31−7.35=17.96.
- Let X be the score of a student, then we know
- Q2: (Hypothesis Testing) If we randomly select 16 students and find their average score is 79 and sample standard deviation is 3.5, at 5% significance level, do we have sufficient evidence that the population variance is lower than 13?
- Here we want to confirm the claim that σ2<13, so we let it be the alternative hypothesis. That is, we want to test H0:σ2≥13 vs HA:σ2<13 (or equivalently H0:σ2=13 vs HA:σ2<13) where σ20=13;
- The test statistic is T=(n−1)s2σ20=(16−1)⋅3.5213=14.135;
- The significance level is α=5%;
- The critical value is χ21−α,n−1=χ21−0.05,16−1=7.26;
- The rejection rule is: reject H0 if T<χ21−α,n−1;
- Here T=14.135>7.26=χ2α,n−1, so we don't have enough evidence to reject H0 and thus don't have enough evidence to say that the population variance is lower than 13.
3 One-Population Inference: Population Proportion
3.1 Notations
- p: the unknown population proportion (probability) of the event of interest.
- ˆp: the sample proportion (relative frequency) of the event among the n independent trials.
3.2 Sampling Distribution
- Suppose X∼Binomial(n,p). When np and n(1−p) are greater than or equal to 5, ˆp is approximately Normally distributed with
- population mean μˆp=E(ˆp)=p;
- population variance σ2ˆp=Var(ˆp)=p(1−p)n.
- That is,
ˆp∼N(p,p(1−p)n)
3.3 Confidence Interval
3.3.1 Construction
- At Confidence Level 1−α, the confidence interval for the unknown population proportion p is constructed as follows,
- Midpoint: ˆp;
- Critical Value: Zα/2;
- Margin of Error: Zα/2⋅√ˆp(1−ˆp)n;
- Lower Confidence Limit (LCL): ˆp−Zα/2⋅√ˆp(1−ˆp)n;
- Upper Confidence Limit (UCL): ˆp+Zα/2⋅√ˆp(1−ˆp)n;
- Confidence Interval: [ˆp−Zα/2⋅√ˆp(1−ˆp)n,ˆp+Zα/2⋅√ˆp(1−ˆp)n];
- Width of Confidence Interval: 2⋅Zα/2⋅√ˆp(1−ˆp)n;
3.3.2 Sample Size
At confidence level 1−α, to achieve a confidence interval for the population proportion p whose margin of error is no wider than M, the smallest sample size needed is n=(Zα/22M)2
If the value of n calculated by the formula is not an integer, then we should round it up in order to ensure the actual margin of error is no wider than M.
3.4 Hypothesis Testing
- At Significance Level α,
- the test for H0:p≤p0 vs HA:p>p0 (which is equivalent to H0:p=p0 vs HA:p>p0) can be constructed as follows,
- Test Statistic: T=ˆp−p0√p0(1−p0)n;
- Critical Value: Zα;
- Rejection Rule: Reject H0 if T>Zα.
- the test for H0:p≥p0 vs HA:p<p0 (which is equivalent to H0:p=p0 vs HA:p<p0) can be constructed as follows,
- Test Statistic: T=ˆp−p0√p0(1−p0)n;
- Critical Value: Z1−α, which equals −Zα;
- Rejection Rule: Reject H0 if T<−Zα.
- the test for H0:p=p0 vs HA:p≠p0 can be constructed as follows,
- Test Statistic: T=ˆp−p0√p0(1−p0)n;
- Critical Value: Z1−α/2 (which equals −Zα/2) and Zα/2;
- Rejection Rule: Reject H0 if |T|>Zα/2 (ie, T>Zα/2 or T<Z1−α/2).
- the test for H0:p≤p0 vs HA:p>p0 (which is equivalent to H0:p=p0 vs HA:p>p0) can be constructed as follows,
3.5 Example
- Suppose the proportion of female students in the university is p.
- Q1: (Confidence Interval) If we randomly select 100 students and find 45 of them are female, what is the confidence interval for the population proportion of female students, p, at confidence level of 90%?
- We know the sample proportion of female students ˆp=45100=0.45, and want to construct the 90% confidence interval for the population proportion p.
- To construct the confidence interval of the unknown population proportion p,
- the confidence level is 1−α=90%, so α=0.1;
- the midpoint is ˆp=0.45;
- the critical value is Zα/2=Z0.05=1.64;
- the margin of error is Zα/2√ˆp(1−ˆp)n=1.64√0.45(1−0.45)100=0.0816;
- the lower confidence limit is 0.45−0.0816=0.3684;
- the upper confidence limit is 0.45+0.0816=0.5316;
- the confidence interval is [0.3684,0.5316]
- the width of the confidence interval is 2⋅0.0816=0.1632.
- Q2: (Sample Size) At confidence level 90%, to achieve a confidence interval no wider than 10%, what is the smallest sample size needed?
- the confidence level is 1−α=90%, so α=0.1;
- the critical value is Zα/2=Z0.1/2=Z0.05=1.64;
- since the width of the confidence interval is 2M where M is the margin of error, to let 2M<10%=0.1, we need M<0.05;
- by formula, to achieve M<0.05, the smallest sample size should be n=(zα/22M)2=(1.642⋅0.05)2=268.96≈269.
- Q3: (Hypothesis Testing) If we randomly select 100 students and find 45 of them are female, at 5% significance level, do we have sufficient evidence that the population proportion of female is lower than 50%?
- Here we want to confirm the claim that p<0.5, so we let it be the alternative hypothesis. That is, we want to test H0:p≥0.5 vs HA:p<0.5 (or equivalently H0:p=0.5 vs HA:p<0.5) where p0=0.5 and n=100;
- The test statistic is T=ˆp−p0√p0(1−p0)n=0.45−0.5√0.5(1−0.5)100=−1;
- The significance level is α=5%;
- The critical value is −Zα=−Z0.05=−1.64;
- The rejection rule is: reject H0 if T<−Zα;
- Here T=−1>−1.64=−Zα, so we don't have enough evidence to reject H0 and thus don't have enough evidence to say that the population proportion of female is lower than 50%.
4 References
- Keller, Gerald. (2015). Statistics for Management and Economics, 10th Edition. Stamford: Cengage Learning.