STAT 3360 Notes

Table of Contents

1 Discrete Probability Distribution

1.1 Probability Mass Function

1.1.1 Definition

  • The function \(f(x)\) which describes the probability distribution of a discrete random variable is called the Probability Mass Function.
  • The input (the \(x\) inside \(f(\cdot)\)) of \(f(x)\) is a value \(x\) of the discrete random variable \(X\).
  • The output (function value) of \(f(x)\) is the probability of the event \(X=x\). That is, \[ \boxed{ f(x) = P(x) = P(X=x)} \]
  • If the random variable has a finite number of values, then the probability mass function can be represented by a distribution table.

1.1.2 Example

  • Example 1

    If \(X\) is the random variable representing the nominal variable Gender and let \[ X = \left\{ \begin{array}{ll} 1, & \text{if Gender = “male”}\\ 2, & \text{if Gender = “female”}\end{array} \right.\] then

    • \(X\) is a discrete variable
    • The following \(f(x)\) is an eligible probability mass function for \(X\) \[ f(x) = \left\{ \begin{array}{ll} 60\%, & \text{if } x = 1 \\ 40\%, & \text{if } x = 2 \end{array} \right.\]
    • The input of \(f(x)\) is \(x\), which is a value of the random variable \(X\). Here \(x\) can be \(1\) and \(2\).
    • The output of \(f(x)\) is \(P(x) = P(X=x)\), the probability of the event "\(X=x\)".
      • The probability of the event "\(X=1\)" is \(P(1) = P(X=1) = f(1) = 60\%\),
      • The probability of the event "\(X=2\)" is \(P(1) = P(X=2) = f(2) = 40\%\),
    • Since the random variable \(X\) has a finite number of values (ie, \(1\) and \(2\)), we can represent its probability mass function of \(X\) by the following distribution table

      \(x\) \(1\) \(2\)
      \(P(x) = f(x)\) \(60\%\) \(40\%\)
  • Example 2

    If \(X\) is the random variable representing the ordinal variable Height and let \[ X = \left\{ \begin{array}{ll} 1, & \text{if Height = “tall”}\\ 2, & \text{if Height = “medium”}\\ 3, & \text{if Height = “short”} \end{array} \right.\] then

    • \(X\) is a discrete variable
    • The following \(f(x)\) is an eligible probability mass function for \(X\) \[ f(x) = \left\{ \begin{array}{ll} 12\%, & \text{if } x = 1 \\ 80\%, & \text{if } x = 2 \\ 8\%, & \text{if } x = 3 \end{array} \right.\]
    • The input of \(f(x)\) is \(x\), which is a value of the random variable \(X\). Here \(x\) can be \(1,2\) and \(3\).
    • The output of \(f(x)\) is \(P(x) = P(X=x)\), the probability of the event "\(X=x\)".
      • The probability of the event "\(X=1\)" is \(P(1) = P(X=1) = f(1) = 12\%\),
      • The probability of the event "\(X=2\)" is \(P(2) = P(X=2) = f(2) = 80\%\),
      • The probability of the event "\(X=3\)" is \(P(3) = P(X=3) = f(3) = 8\%\).
    • Since the random variable \(X\) has a finite number of values (eg, \(1, 2\) and \(3\)), we can represent its probability mass function of \(X\) by the following distribution table

      \(x\) \(1\) \(2\) \(3\)
      \(P(x) = f(x)\) \(12\%\) \(80\%\) \(8\%\)
  • Example 3

    If \(X\) is the random variable representing the integer-value numerical variable Score (suppose \(5\) is the full score), and let \[ X = \left\{ \begin{array}{ll} 0, & \text{if Score = “0 point”}\\ 1, & \text{if Score = “1 point”}\\ 2, & \text{if Score = “2 points”}\\ 3, & \text{if Score = “3 points”}\\ 4, & \text{if Score = “4 points”}\\ 5, & \text{if Score = “5 points”}\end{array} \right.\] then

    • \(X\) is a discrete variable
    • The following \(f(x)\) is an eligible probability mass function for \(X\) \[ f(x) = \left\{ \begin{array}{ll} 5\%, & \text{if } x = 0 \\ 10\%, & \text{if } x = 1 \\ 35\%, & \text{if } x = 2 \\ 25\%, & \text{if } x = 3 \\ 15\%, & \text{if } x = 4 \\ 10\%, & \text{if } x = 5 \end{array} \right.\]
    • The input of \(f(x)\) is \(x\), which is a value of the random variable \(X\). Here \(x\) can be \(0,1,2,3,4\) and \(5\).
    • The output of \(f(x)\) is \(P(x) = P(X=x)\), the probability of the event "\(X=x\)".
      • The probability of the event "\(X=1\)" is \(P(1) = P(X=1) = f(1) = 5\%\),
      • The probability of the event "\(X=2\)" is \(P(2) = P(X=2) = f(2) = 10\%\),
      • The probability of the event "\(X=3\)" is \(P(3) = P(X=3) = f(3) = 35\%\),
      • The probability of the event "\(X=4\)" is \(P(4) = P(X=4) = f(4) = 25\%\),
      • The probability of the event "\(X=5\)" is \(P(5) = P(X=5) = f(5) = 15\%\),
      • The probability of the event "\(X=6\)" is \(P(6) = P(X=6) = f(6) = 10\%\).
    • Since the random variable \(X\) has a finite number of values (eg, \(1, 2, 3, 4, 5\) and \(6\)), we can represent its probability mass function of \(X\) by the following distribution table

      \(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)
      \(P(x) = f(x)\) \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\)

1.2 Population Mean

1.2.1 Definition

  • Suppose
    • \(X\) is a discrete random variable,
    • \(X\) has \(k\) possible values \(x_1, x_2, \dots, x_k\).
  • Then the Population Mean (or Expect Value) of \(X\), denoted by \(E(X)\) or \(\mu_x\), is \[ \boxed{ E(X) = \mu_x = \sum\limits_{i=1}^{k} [x_i \cdot P(x_i)] } \]

1.2.2 Properties

  • Suppose \(X\) and \(Y\) are two random variables and \(a, b, c\) are constants. Then
    • \(\boxed{ E(c) = c }\)
    • \(\boxed{ E(aX + b) = aE(X) + b }\)
    • \(\boxed{ E(aX + bY) = aE(X) + bE(Y) }\)

1.2.3 Note The Variable Type

  • For interval (numerical) variables, we can calculate the population mean \(E(X)\) of the corresponding random variable \(X\), and \(E(X)\) usually has a pratical meaning.
  • For categorical (ie, nominal or ordinal) variables, the population mean \(E(X)\) of the corresponding random variable \(X\) is usually hard to interpret or meaningless, so we don't care about it.

1.2.4 Examples

  • We don't calculate the population mean of the random variable for the nominal variable Gender.
    • Suppose we insist in doing it, then \(E(X) = 1 \cdot P(1) + 2 \cdot P(2) = 1 \cdot 60\% + 2 \cdot 40\% = 1.4\). What does the number \(1.4\) mean? The student is expected to be neither male (represented by \(1\)) nor female (represented by \(2\))? Obviously, the number \(1.4\) does not have a reasonable interpretation.

      Gender male female
      \(x\) \(1\) \(2\)
      \(P(x) = f(x)\) \(60\%\) \(40\%\)
  • We don't calculate the population mean of the random variable for the ordinal variable Height, either.
  • We can calculate the population mean of the random variable for the numerical variable Score as follows.

    \(E(X) = \mu_x\)

    \(= 0 \cdot P(0) + 1 \cdot P(1) + 2 \cdot P(2) + 3 \cdot P(3) + 4 \cdot P(4) + 5 \cdot P(5)\)

    \(= 0 \cdot 5\% + 1 \cdot 10\% + 2 \cdot 35\% + 3 \cdot 25\% + 4 \cdot 15\% + 5 \cdot 10\%\)

    \(= 2.65\)

    \(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)
    \(P(x) = f(x)\) \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\)

1.3 Population Variance and Standard Deviation

1.3.1 Definitions

  • Suppose
    • \(X\) is a discrete random variable,
    • \(X\) has \(k\) possible values \(x_1, x_2, \dots, x_k\).
  • Then
    • the Population Variance of \(X\), denoted by \(Var(X)\) or \(\sigma_x^2\), is \[ \boxed{ Var(X) = \sigma_x^2 = \sum\limits_{i=1}^{k} [(x_i - \mu_x)^2 \cdot P(x_i)] } \]
    • the Population Standard Deviation of \(X\), denoted by \(\sigma_x\), is \[ \boxed{ \sigma_x = \sqrt{\sigma_x^2} } \]
  • Note: To find the standard deviation, always find the variance first.

1.3.2 Properties

  • Suppose \(X\) and \(Y\) are two random variables and \(a, b, c\) are constants. Then
    • \(\boxed{ Var(c) = 0 }\)
    • \(\boxed{ Var(aX + b) = a^2 Var(X) }\)
    • \(\boxed{ Var(aX + bY) = a^2 Var(X) + b^2 Var(Y) + 2ab\cdot Cov(X, Y) }\) where \(Cov(X, Y)\) is the population covariance between \(X\) and \(Y\), which will be discussed later.

1.3.3 Note The Variable Type

  • Similar to population mean, we only calculate population variance and population standard deviation of random variables for numerical variables.

1.3.4 Example

  • We can calculate the population variance and population standard deviation of the random variable \(X\) for the numerical variable Score as follows.

    \(Var(X) = \sigma_x^2\)

    \(= (0 - 2.65)^2 \cdot P(0) + (1-2.65)^2 \cdot P(1) + (2-2.65)^2 \cdot P(2)\)

    \(+ (3-2.65)^2 \cdot P(3) + (4-2.65)^2 \cdot P(4) + (5-2.65)^2 \cdot P(5)\)

    \(= 1.6275\)

    \(\sigma_x = \sqrt{\sigma_x^2} = \sqrt{1.6275} = 1.2754\)

    \(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)
    \(P(x) = f(x)\) \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\)

1.4 Joint/Marginal Probability Mass Function

1.4.1 Introduction

  • In the section Probability of Events, when we consider two variables simultaneously, the joint distribution table was used.
  • Similarly, when we consider two random variables simultaneously, the probability of the intersection event \([X = x \text{ and } Y = y]\), denoted by \(P(X = x, Y = y)\) or \(P(x, y)\), is described by the Joint Probability Mass Function \(f(x, y)\). That is \[ \boxed{ f(x, y) = P(x, y) = P(X = x, Y = y) } \]
  • Also, based on the joint probability mass function, we can calculate Marginal Probability Mass Function of each random variable.

1.4.2 Example

  • Suppose
    • \(X\) is the random variable for the numerical variable Studying Hours
    • \(Y\) is the random variable for the numerical variable Score
  • Then the following table represents an eligible Joint Probability Mass Function of \(X\) and \(Y\)

    \(f(x,y)\) \(Y=0\) \(Y=1\) \(Y=2\) \(Y=3\) \(Y=4\) \(Y=5\)
    \(X=10\) \(\boxed{4\%}\) \(7\%\) \(18\%\) \(10\%\) \(5\%\) \(2\%\)
    \(X=20\) \(1\%\) \(3\%\) \(17\%\) \(15\%\) \(10\%\) \(8\%\)
    • For example, the upper-left cell means \(f(10, 0) = P(10, 0) = P(X = 10, Y = 0) = 4\%\).
  • Also, we can represent the Marginal Probability Mass Function in the table

    \(f(x,y)\) \(Y=0\) \(Y=1\) \(Y=2\) \(Y=3\) \(Y=4\) \(Y=5\) Marginal
    \(X=10\) \(4\%\) \(7\%\) \(18\%\) \(10\%\) \(5\%\) \(2\%\) \(46\%\)
    \(X=20\) \(1\%\) \(3\%\) \(17\%\) \(15\%\) \(10\%\) \(8\%\) \(54\%\)
    Marginal \(\boxed{5\%}\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\) \(100\%\)
    • For example, the lower-left cell means \(f_y(0) = P_y(0) = P(Y = 0) = 5\%\).

1.5 Independence between Random Variables

1.5.1 Introduction

  • In the section Probability of Events, we studied the independence between two events.
  • A random variable has multiple values, which corresponds to multiple simple events.
  • When we consider two random variables simultaneously, there are multiple simple events for each.
  • Therefore, the independence between random variables \(X\) and \(Y\) is defined not just between one event of \(X\) and one event \(Y\). Instead, it is defined through all simple events of the two random variables.

1.5.2 Definition

  • Suppose
    • \(X\) and \(Y\) are two random varibales,
    • \(f_x(x)\) is the probability mass function of \(X\),
    • \(f_y(y)\) is the probability mass function of \(Y\),
    • \(f(x, y)\) is the joint probability mass function of \(X\) and \(Y\).
  • \(X\) and \(Y\) are said to be Independent if \[ \boxed{ f(x, y) = f_x(x) \cdot f_y(y) \text{ for all } x \text{ and } y } \]

    Note: the word "all" means the above formula should hold for each combination of \(x\) and \(y\). Therefore, once a specific combination of \(x\) and \(y\) does not satisty the formula, then the two variables are definitely not independent.

1.5.3 Example

  • For the example of numerical variables Studying Hours and Score, we have

    \(f(10, 0) = P(X = 10, Y = 0) = 4\%\),

    \(f_x(10) = P(X = 10) = 46\%\),

    \(f_y(0) = P(Y = 0) = 5\%\),

    \(f_x(10) \cdot f_y(0) = 46\% \cdot 5\% = 2.3\% \neq 4\% = f(10, 0)\),

    so the two random variables are NOT independent.

    \(f(x,y)\) \(Y=0\) \(Y=1\) \(Y=2\) \(Y=3\) \(Y=4\) \(Y=5\) Marginal
    \(X=10\) \(\boxed{4\%}\) \(7\%\) \(18\%\) \(10\%\) \(5\%\) \(2\%\) \(\boxed{46\%}\)
    \(X=20\) \(1\%\) \(3\%\) \(17\%\) \(15\%\) \(10\%\) \(8\%\) \(54\%\)
    Marginal \(\boxed{5\%}\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\) \(100\%\)

1.6 Population Covariance

1.6.1 Introduction

  • When describing a sample of two variables, we calculate the sample covariance.
  • If instead we want to describe the whole population, then the population covariance is calculated.

1.6.2 Definition

  • Suppose
    • \(X\) is a discrete random variable which has \(k\) possible values \(x_1, x_2, \dots, x_k\),
    • \(Y\) is a discrete random variable which has \(l\) possible values \(y_1, y_2, \dots, y_l\),
  • Then
    • the Population Covariance between \(X\) and \(Y\), donoted by \(Cov(X, Y)\) or \(\sigma_{xy}\), is \[ \boxed{ Cov(X, Y) = \sigma_{xy} = \sum\limits_{i=1}^{k} \sum\limits_{j=1}^{l} [(x_i - \mu_x) ( y_j - \mu_y) \cdot P(x_i, y_j)] } \]
    • the Population Coefficient of Correlation between \(X\) and \(Y\), denoted by \(Corr(X, Y)\) or \(\rho_{xy}\), is \[ \boxed{ Corr(X, Y) = \rho_{xy} = \frac{\sigma_{xy}}{\sigma_{x}\sigma_{y}} } \] which implies \[ \boxed{ \sigma_{xy} = \rho_{xy} \cdot \sigma_{x} \cdot \sigma_{y} } \]

1.6.3 Properties

  • \(\boxed{ Cov(aX, bY) = a \cdot b \cdot Cov(X, Y)}\).
  • \(\boxed{ \text{If } X \text{ and } Y \text{ are independent, then } Cov(X, Y) = 0 }\).

1.6.4 Example

  • For the example of numerical variables Studying Hours and Score
    • The joint and marginal probability mass functions are represented by the following table

      \(f(x,y)\) \(Y=0\) \(Y=1\) \(Y=2\) \(Y=3\) \(Y=4\) \(Y=5\) Marginal
      \(X=10\) \(4\%\) \(7\%\) \(18\%\) \(10\%\) \(5\%\) \(2\%\) \(46\%\)
      \(X=20\) \(1\%\) \(3\%\) \(17\%\) \(15\%\) \(10\%\) \(8\%\) \(54\%\)
      Marginal \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\) \(100\%\)
    • The population mean of \(X\) is

      \(E(X) = \mu_x\)

      \(= 10 \cdot P(X = 10) + 20 \cdot P(X=20)\)

      \(= 10 \cdot 46\% + 20 \cdot 54\%\)

      \(= 4.6 + 10.8 = 15.4\)

    • The population mean of \(Y\) is

      \(E(Y) = \mu_x\)

      \(= 0 \cdot P(0) + 1 \cdot P(1) + 2 \cdot P(2) + 3 \cdot P(3) + 4 \cdot P(4) + 5 \cdot P(5)\)

      \(= 0 \cdot 5\% + 1 \cdot 10\% + 2 \cdot 35\% + 3 \cdot 25\% + 4 \cdot 15\% + 5 \cdot 10\%\)

      \(= 2.65\)

    • The covariance between random variables \(X\) and \(Y\) is

      \(Cov(X,Y) = \sigma_{xy}\)

      \(=\) \((10-15.4)(0-2.65) \cdot P(10, 0)\)

      \(+(10-15.4)(1-2.65) \cdot P(10, 1)\)

      \(+(10-15.4)(2-2.65) \cdot P(10, 2)\)

      \(+(10-15.4)(3-2.65) \cdot P(10, 3)\)

      \(+(10-15.4)(4-2.65) \cdot P(10, 4)\)

      \(+(10-15.4)(5-2.65) \cdot P(10, 5)\)

      \(+(20-15.4)(0-2.65) \cdot P(20, 0)\)

      \(+(20-15.4)(1-2.65) \cdot P(20, 1)\)

      \(+(20-15.4)(2-2.65) \cdot P(20, 2)\)

      \(+(20-15.4)(3-2.65) \cdot P(20, 3)\)

      \(+(20-15.4)(4-2.65) \cdot P(20, 4)\)

      \(+(20-15.4)(5-2.65) \cdot P(20, 5)\)

      \(=1.89\)

1.7 Portfolio Investment

1.7.1 Formula

  • Suppose
    • A and B are two stocks
    • \(X\) is the random variable for the Return On Investment (ROI) of stock A
      • The Expected Value of \(X\) is \(\mu_x\), which is called the Expected ROI of stock A
      • The Standard Deviation of \(X\) is \(\sigma_x\), which is called the Volatility of stock A
    • \(Y\) is the random variable for the Return On Investment (ROI) of stock B
      • The Expected Value of \(Y\) is \(\mu_y\), which is called the Expected ROI of stock B
      • The Standard Deviation of \(Y\) is \(\sigma_y\), which is called the Volatility of stock B
    • The Coefficient of Correlation between \(X\) and \(Y\) is \(\rho_{xy}\)
    • We invest \(\alpha\) (a proportion, \(0 \le \alpha \le 1\)) of our money in stock A
    • We invest \(\beta\) (a proportion, \(0 \le \beta \le 1\)) of our money in stock B
  • Then
    • Our Portfolio Investment C is represented by the random variable \[ \boxed{ Z = \alpha X + \beta Y } \]
    • The Expected ROI of portfolio investment C is the expected value of random variable \(Z\) \[ \mu_z = E(\alpha X + \beta Y) = \alpha E(X) + \beta E(Y) = \alpha \mu_x + \beta \mu_y \] For short, \[ \boxed{ \mu_z = \alpha \mu_x + \beta \mu_y } \]
    • The Variance of \(X\) is \[ \boxed{ \sigma_x^2 = (\sigma_x)^2 } \]
    • The Variance of \(Y\) is \[ \boxed{ \sigma_y^2 = (\sigma_y)^2 } \]
    • The Covariance between \(X\) and \(Y\) is \[ \boxed{ \sigma_{xy} = \rho_{xy} \sigma_x \sigma_y } \]
    • The Variance of the random variable \(Z\) is

      \(\sigma_z^2 = Var(Z) = Var(\alpha X + \beta Y)\)

      \(= \alpha^2 Var(X) + \beta^2 Var(Y) + 2 \alpha \beta Cov(X, Y)\)

      \(= \alpha^2 \sigma_x^2 + \beta^2 \sigma_y^2 + 2 \alpha \beta \rho_{xy} \sigma_x \sigma_y\)

      For short, \[ \boxed{ \sigma_z^2 = \alpha^2 \sigma_x^2 + \beta^2 \sigma_y^2 + 2 \alpha \beta \rho_{xy} \sigma_x \sigma_y } \]

    • The Standard Deviation of random variable \(Z\), which is also the volatility of stock C, is \[ \boxed{ \sigma_z = \sqrt{\sigma_z^2} }\]

1.7.2 Example

  • Suppose
    • \(\mu_x = 100\)
    • \(\mu_y = 120\)
    • \(\sigma_x = 20\)
    • \(\sigma_y = 30\)
    • \(\rho_{xy} = 0.5\)
    • \(\alpha = 0.3\)
    • \(\beta=0.7\)
  • Then
    • Our Portfolio Investment C is represented by the random variable \[ Z = \alpha X + \beta Y = 0.3 X + 0.7 Y \]
    • The Expected ROI of Portfolio Investment C is the Expected value of random variable \(Z\) \[ \mu_z = \alpha \mu_x + \beta \mu_y = 0.3 \cdot 100 + 0.7 \cdot 120 = 114 \]
    • The Variance of \(X\) is \[ \sigma_x^2 = (\sigma_x)^2 = 20^2 = 400 \]
    • The Variance of \(Y\) is \[ \sigma_y^2 = (\sigma_y)^2 = 30^2 = 900 \]
    • The Covariance between \(X\) and \(Y\) is \[ \sigma_{xy} = \rho_{xy} \sigma_x \sigma_y = 0.5 \cdot 20 \cdot 30 = 300 \]
    • The Variance of the random variable \(Z\) is \[ \sigma_z^2 = \alpha^2 \sigma_x^2 + \beta^2 \sigma_y^2 + 2 \alpha \beta \rho_{xy} \sigma_x \sigma_y = 0.3^2 \cdot 400 + 0.7^2 \cdot 900 + 2 \cdot 0.3 \cdot 0.7 \cdot 0.5 \cdot 20 \cdot 30 = 603\]
    • The Standard Deviation of random variable \(Z\), which is also the volatility of stock C, is \[ \sigma_z = \sqrt{\sigma_z^2} = \sqrt{603} = 24.56 \]

1.7.3 Comment

  • The expected ROI measures on average how much money we will make.
  • The volatility measures the instability (risk) of ROI.
  • We always want high expected ROI with low volability.
  • Unfortunately, high expected ROI is usually accompanied with high volatility.
  • For two investments having the same expected ROI, we should choose the one with lower volatility.

2 References

  • Keller, Gerald. (2015). Statistics for Management and Economics, 10th Edition. Stamford: Cengage Learning.

Yunfei Wang

2016-02-04 Thu 02:02