STAT 3360 Notes

1. Discrete Probability Distribution
2. References

1 Discrete Probability Distribution

1.1 Probability Mass Function

1.1.1 Definition

The function \(f(x)\) which describes the probability distribution of a discrete random variable is called the Probability Mass Function.
The input (the \(x\) inside \(f(\cdot)\)) of \(f(x)\) is a value \(x\) of the discrete random variable \(X\).
The output (function value) of \(f(x)\) is the probability of the event \(X=x\). That is, \[ \boxed{ f(x) = P(x) = P(X=x)} \]
If the random variable has a finite number of values, then the probability mass function can be represented by a distribution table.

1.1.2 Example

Example 1

If \(X\) is the random variable representing the nominal variable Gender and let \[ X = \left\{ \begin{array}{ll} 1, & \text{if Gender = “male”}\\ 2, & \text{if Gender = “female”}\end{array} \right.\] then
- \(X\) is a discrete variable
- The following \(f(x)\) is an eligible probability mass function for \(X\) \[ f(x) = \left\{ \begin{array}{ll} 60\%, & \text{if } x = 1 \\ 40\%, & \text{if } x = 2 \end{array} \right.\]
- The input of \(f(x)\) is \(x\), which is a value of the random variable \(X\). Here \(x\) can be \(1\) and \(2\).
- The output of \(f(x)\) is \(P(x) = P(X=x)\), the probability of the event "\(X=x\)".
  - The probability of the event "\(X=1\)" is \(P(1) = P(X=1) = f(1) = 60\%\),
  - The probability of the event "\(X=2\)" is \(P(1) = P(X=2) = f(2) = 40\%\),
- Since the random variable \(X\) has a finite number of values (ie, \(1\) and \(2\)), we can represent its probability mass function of \(X\) by the following distribution table
  
  \(x\) \(1\) \(2\)
  
  \(P(x) = f(x)\) \(60\%\) \(40\%\)
Example 2

If \(X\) is the random variable representing the ordinal variable Height and let \[ X = \left\{ \begin{array}{ll} 1, & \text{if Height = “tall”}\\ 2, & \text{if Height = “medium”}\\ 3, & \text{if Height = “short”} \end{array} \right.\] then
- \(X\) is a discrete variable
- The following \(f(x)\) is an eligible probability mass function for \(X\) \[ f(x) = \left\{ \begin{array}{ll} 12\%, & \text{if } x = 1 \\ 80\%, & \text{if } x = 2 \\ 8\%, & \text{if } x = 3 \end{array} \right.\]
- The input of \(f(x)\) is \(x\), which is a value of the random variable \(X\). Here \(x\) can be \(1,2\) and \(3\).
- The output of \(f(x)\) is \(P(x) = P(X=x)\), the probability of the event "\(X=x\)".
  - The probability of the event "\(X=1\)" is \(P(1) = P(X=1) = f(1) = 12\%\),
  - The probability of the event "\(X=2\)" is \(P(2) = P(X=2) = f(2) = 80\%\),
  - The probability of the event "\(X=3\)" is \(P(3) = P(X=3) = f(3) = 8\%\).
- Since the random variable \(X\) has a finite number of values (eg, \(1, 2\) and \(3\)), we can represent its probability mass function of \(X\) by the following distribution table
  
  \(x\) \(1\) \(2\) \(3\)
  
  \(P(x) = f(x)\) \(12\%\) \(80\%\) \(8\%\)
Example 3

If \(X\) is the random variable representing the integer-value numerical variable Score (suppose \(5\) is the full score), and let \[ X = \left\{ \begin{array}{ll} 0, & \text{if Score = “0 point”}\\ 1, & \text{if Score = “1 point”}\\ 2, & \text{if Score = “2 points”}\\ 3, & \text{if Score = “3 points”}\\ 4, & \text{if Score = “4 points”}\\ 5, & \text{if Score = “5 points”}\end{array} \right.\] then
- \(X\) is a discrete variable
- The following \(f(x)\) is an eligible probability mass function for \(X\) \[ f(x) = \left\{ \begin{array}{ll} 5\%, & \text{if } x = 0 \\ 10\%, & \text{if } x = 1 \\ 35\%, & \text{if } x = 2 \\ 25\%, & \text{if } x = 3 \\ 15\%, & \text{if } x = 4 \\ 10\%, & \text{if } x = 5 \end{array} \right.\]
- The input of \(f(x)\) is \(x\), which is a value of the random variable \(X\). Here \(x\) can be \(0,1,2,3,4\) and \(5\).
- The output of \(f(x)\) is \(P(x) = P(X=x)\), the probability of the event "\(X=x\)".
  - The probability of the event "\(X=1\)" is \(P(1) = P(X=1) = f(1) = 5\%\),
  - The probability of the event "\(X=2\)" is \(P(2) = P(X=2) = f(2) = 10\%\),
  - The probability of the event "\(X=3\)" is \(P(3) = P(X=3) = f(3) = 35\%\),
  - The probability of the event "\(X=4\)" is \(P(4) = P(X=4) = f(4) = 25\%\),
  - The probability of the event "\(X=5\)" is \(P(5) = P(X=5) = f(5) = 15\%\),
  - The probability of the event "\(X=6\)" is \(P(6) = P(X=6) = f(6) = 10\%\).
- Since the random variable \(X\) has a finite number of values (eg, \(1, 2, 3, 4, 5\) and \(6\)), we can represent its probability mass function of \(X\) by the following distribution table
  
  \(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)
  
  \(P(x) = f(x)\) \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\)

\(x\)	\(1\)	\(2\)
\(P(x) = f(x)\)	\(60\%\)	\(40\%\)

\(x\)	\(1\)	\(2\)	\(3\)
\(P(x) = f(x)\)	\(12\%\)	\(80\%\)	\(8\%\)

\(x\)	\(0\)	\(1\)	\(2\)	\(3\)	\(4\)	\(5\)
\(P(x) = f(x)\)	\(5\%\)	\(10\%\)	\(35\%\)	\(25\%\)	\(15\%\)	\(10\%\)

1.2 Population Mean

1.2.1 Definition

Suppose
- \(X\) is a discrete random variable,
- \(X\) has \(k\) possible values \(x_1, x_2, \dots, x_k\).
Then the Population Mean (or Expect Value) of \(X\), denoted by \(E(X)\) or \(\mu_x\), is \[ \boxed{ E(X) = \mu_x = \sum\limits_{i=1}^{k} [x_i \cdot P(x_i)] } \]

1.2.2 Properties

Suppose \(X\) and \(Y\) are two random variables and \(a, b, c\) are constants. Then
- \(\boxed{ E(c) = c }\)
- \(\boxed{ E(aX + b) = aE(X) + b }\)
- \(\boxed{ E(aX + bY) = aE(X) + bE(Y) }\)

1.2.3 Note The Variable Type

For interval (numerical) variables, we can calculate the population mean \(E(X)\) of the corresponding random variable \(X\), and \(E(X)\) usually has a pratical meaning.
For categorical (ie, nominal or ordinal) variables, the population mean \(E(X)\) of the corresponding random variable \(X\) is usually hard to interpret or meaningless, so we don't care about it.

1.2.4 Examples

We don't calculate the population mean of the random variable for the nominal variable Gender.
- Suppose we insist in doing it, then \(E(X) = 1 \cdot P(1) + 2 \cdot P(2) = 1 \cdot 60\% + 2 \cdot 40\% = 1.4\). What does the number \(1.4\) mean? The student is expected to be neither male (represented by \(1\)) nor female (represented by \(2\))? Obviously, the number \(1.4\) does not have a reasonable interpretation.
  
  Gender male female
  
  \(x\) \(1\) \(2\)
  
  \(P(x) = f(x)\) \(60\%\) \(40\%\)
We don't calculate the population mean of the random variable for the ordinal variable Height, either.
We can calculate the population mean of the random variable for the numerical variable Score as follows.

\(E(X) = \mu_x\)

\(= 0 \cdot P(0) + 1 \cdot P(1) + 2 \cdot P(2) + 3 \cdot P(3) + 4 \cdot P(4) + 5 \cdot P(5)\)

\(= 0 \cdot 5\% + 1 \cdot 10\% + 2 \cdot 35\% + 3 \cdot 25\% + 4 \cdot 15\% + 5 \cdot 10\%\)

\(= 2.65\)

\(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)

\(P(x) = f(x)\) \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\)

\(x\)	\(0\)	\(1\)	\(2\)	\(3\)	\(4\)	\(5\)
\(P(x) = f(x)\)	\(5\%\)	\(10\%\)	\(35\%\)	\(25\%\)	\(15\%\)	\(10\%\)

1.3 Population Variance and Standard Deviation

1.3.1 Definitions

Suppose
- \(X\) is a discrete random variable,
- \(X\) has \(k\) possible values \(x_1, x_2, \dots, x_k\).
Then
- the Population Variance of \(X\), denoted by \(Var(X)\) or \(\sigma_x^2\), is \[ \boxed{ Var(X) = \sigma_x^2 = \sum\limits_{i=1}^{k} [(x_i - \mu_x)^2 \cdot P(x_i)] } \]
- the Population Standard Deviation of \(X\), denoted by \(\sigma_x\), is \[ \boxed{ \sigma_x = \sqrt{\sigma_x^2} } \]
Note: To find the standard deviation, always find the variance first.

1.3.2 Properties

Suppose \(X\) and \(Y\) are two random variables and \(a, b, c\) are constants. Then
- \(\boxed{ Var(c) = 0 }\)
- \(\boxed{ Var(aX + b) = a^2 Var(X) }\)
- \(\boxed{ Var(aX + bY) = a^2 Var(X) + b^2 Var(Y) + 2ab\cdot Cov(X, Y) }\) where \(Cov(X, Y)\) is the population covariance between \(X\) and \(Y\), which will be discussed later.

1.3.3 Note The Variable Type

Similar to population mean, we only calculate population variance and population standard deviation of random variables for numerical variables.

1.3.4 Example

We can calculate the population variance and population standard deviation of the random variable \(X\) for the numerical variable Score as follows.

\(Var(X) = \sigma_x^2\)

\(= (0 - 2.65)^2 \cdot P(0) + (1-2.65)^2 \cdot P(1) + (2-2.65)^2 \cdot P(2)\)

\(+ (3-2.65)^2 \cdot P(3) + (4-2.65)^2 \cdot P(4) + (5-2.65)^2 \cdot P(5)\)

\(= 1.6275\)

\(\sigma_x = \sqrt{\sigma_x^2} = \sqrt{1.6275} = 1.2754\)

\(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\)

\(P(x) = f(x)\) \(5\%\) \(10\%\) \(35\%\) \(25\%\) \(15\%\) \(10\%\)

\(x\)	\(0\)	\(1\)	\(2\)	\(3\)	\(4\)	\(5\)
\(P(x) = f(x)\)	\(5\%\)	\(10\%\)	\(35\%\)	\(25\%\)	\(15\%\)	\(10\%\)

1.4 Joint/Marginal Probability Mass Function

1.4.1 Introduction

In the section Probability of Events, when we consider two variables simultaneously, the joint distribution table was used.
Similarly, when we consider two random variables simultaneously, the probability of the intersection event \([X = x \text{ and } Y = y]\), denoted by \(P(X = x, Y = y)\) or \(P(x, y)\), is described by the Joint Probability Mass Function \(f(x, y)\). That is \[ \boxed{ f(x, y) = P(x, y) = P(X = x, Y = y) } \]
Also, based on the joint probability mass function, we can calculate Marginal Probability Mass Function of each random variable.

1.4.2 Example

Suppose
- \(X\) is the random variable for the numerical variable Studying Hours
- \(Y\) is the random variable for the numerical variable Score

Then the following table represents an eligible Joint Probability Mass Function of \(X\) and \(Y\)

\(f(x,y)\)	\(Y=0\)	\(Y=1\)	\(Y=2\)	\(Y=3\)	\(Y=4\)	\(Y=5\)
\(X=10\)	\(\boxed{4\%}\)	\(7\%\)	\(18\%\)	\(10\%\)	\(5\%\)	\(2\%\)
\(X=20\)	\(1\%\)	\(3\%\)	\(17\%\)	\(15\%\)	\(10\%\)	\(8\%\)

For example, the upper-left cell means \(f(10, 0) = P(10, 0) = P(X = 10, Y = 0) = 4\%\).

Also, we can represent the Marginal Probability Mass Function in the table

\(f(x,y)\)	\(Y=0\)	\(Y=1\)	\(Y=2\)	\(Y=3\)	\(Y=4\)	\(Y=5\)	Marginal
\(X=10\)	\(4\%\)	\(7\%\)	\(18\%\)	\(10\%\)	\(5\%\)	\(2\%\)	\(46\%\)
\(X=20\)	\(1\%\)	\(3\%\)	\(17\%\)	\(15\%\)	\(10\%\)	\(8\%\)	\(54\%\)
Marginal	\(\boxed{5\%}\)	\(10\%\)	\(35\%\)	\(25\%\)	\(15\%\)	\(10\%\)	\(100\%\)

For example, the lower-left cell means \(f_y(0) = P_y(0) = P(Y = 0) = 5\%\).

1.5 Independence between Random Variables

1.5.1 Introduction

In the section Probability of Events, we studied the independence between two events.
A random variable has multiple values, which corresponds to multiple simple events.
When we consider two random variables simultaneously, there are multiple simple events for each.
Therefore, the independence between random variables \(X\) and \(Y\) is defined not just between one event of \(X\) and one event \(Y\). Instead, it is defined through all simple events of the two random variables.

1.5.2 Definition

Suppose
- \(X\) and \(Y\) are two random varibales,
- \(f_x(x)\) is the probability mass function of \(X\),
- \(f_y(y)\) is the probability mass function of \(Y\),
- \(f(x, y)\) is the joint probability mass function of \(X\) and \(Y\).
\(X\) and \(Y\) are said to be Independent if \[ \boxed{ f(x, y) = f_x(x) \cdot f_y(y) \text{ for all } x \text{ and } y } \]

Note: the word "all" means the above formula should hold for each combination of \(x\) and \(y\). Therefore, once a specific combination of \(x\) and \(y\) does not satisty the formula, then the two variables are definitely not independent.

1.5.3 Example

For the example of numerical variables Studying Hours and Score, we have

\(f(10, 0) = P(X = 10, Y = 0) = 4\%\),

\(f_x(10) = P(X = 10) = 46\%\),

\(f_y(0) = P(Y = 0) = 5\%\),

\(f_x(10) \cdot f_y(0) = 46\% \cdot 5\% = 2.3\% \neq 4\% = f(10, 0)\),

so the two random variables are NOT independent.

\(f(x,y)\)	\(Y=0\)	\(Y=1\)	\(Y=2\)	\(Y=3\)	\(Y=4\)	\(Y=5\)	Marginal
\(X=10\)	\(\boxed{4\%}\)	\(7\%\)	\(18\%\)	\(10\%\)	\(5\%\)	\(2\%\)	\(\boxed{46\%}\)
\(X=20\)	\(1\%\)	\(3\%\)	\(17\%\)	\(15\%\)	\(10\%\)	\(8\%\)	\(54\%\)
Marginal	\(\boxed{5\%}\)	\(10\%\)	\(35\%\)	\(25\%\)	\(15\%\)	\(10\%\)	\(100\%\)

1.6 Population Covariance

1.6.1 Introduction

When describing a sample of two variables, we calculate the sample covariance.
If instead we want to describe the whole population, then the population covariance is calculated.

1.6.2 Definition

Suppose
- \(X\) is a discrete random variable which has \(k\) possible values \(x_1, x_2, \dots, x_k\),
- \(Y\) is a discrete random variable which has \(l\) possible values \(y_1, y_2, \dots, y_l\),
Then
- the Population Covariance between \(X\) and \(Y\), donoted by \(Cov(X, Y)\) or \(\sigma_{xy}\), is \[ \boxed{ Cov(X, Y) = \sigma_{xy} = \sum\limits_{i=1}^{k} \sum\limits_{j=1}^{l} [(x_i - \mu_x) ( y_j - \mu_y) \cdot P(x_i, y_j)] } \]
- the Population Coefficient of Correlation between \(X\) and \(Y\), denoted by \(Corr(X, Y)\) or \(\rho_{xy}\), is \[ \boxed{ Corr(X, Y) = \rho_{xy} = \frac{\sigma_{xy}}{\sigma_{x}\sigma_{y}} } \] which implies \[ \boxed{ \sigma_{xy} = \rho_{xy} \cdot \sigma_{x} \cdot \sigma_{y} } \]

1.6.3 Properties

\(\boxed{ Cov(aX, bY) = a \cdot b \cdot Cov(X, Y)}\).
\(\boxed{ \text{If } X \text{ and } Y \text{ are independent, then } Cov(X, Y) = 0 }\).

1.6.4 Example

For the example of numerical variables Studying Hours and Score

The joint and marginal probability mass functions are represented by the following table

\(f(x,y)\)	\(Y=0\)	\(Y=1\)	\(Y=2\)	\(Y=3\)	\(Y=4\)	\(Y=5\)	Marginal
\(X=10\)	\(4\%\)	\(7\%\)	\(18\%\)	\(10\%\)	\(5\%\)	\(2\%\)	\(46\%\)
\(X=20\)	\(1\%\)	\(3\%\)	\(17\%\)	\(15\%\)	\(10\%\)	\(8\%\)	\(54\%\)
Marginal	\(5\%\)	\(10\%\)	\(35\%\)	\(25\%\)	\(15\%\)	\(10\%\)	\(100\%\)

The population mean of \(X\) is

\(E(X) = \mu_x\)

\(= 10 \cdot P(X = 10) + 20 \cdot P(X=20)\)

\(= 10 \cdot 46\% + 20 \cdot 54\%\)

\(= 4.6 + 10.8 = 15.4\)
The population mean of \(Y\) is

\(E(Y) = \mu_x\)

\(= 0 \cdot P(0) + 1 \cdot P(1) + 2 \cdot P(2) + 3 \cdot P(3) + 4 \cdot P(4) + 5 \cdot P(5)\)

\(= 0 \cdot 5\% + 1 \cdot 10\% + 2 \cdot 35\% + 3 \cdot 25\% + 4 \cdot 15\% + 5 \cdot 10\%\)

\(= 2.65\)
The covariance between random variables \(X\) and \(Y\) is

\(Cov(X,Y) = \sigma_{xy}\)

\(=\) \((10-15.4)(0-2.65) \cdot P(10, 0)\)

\(+(10-15.4)(1-2.65) \cdot P(10, 1)\)

\(+(10-15.4)(2-2.65) \cdot P(10, 2)\)

\(+(10-15.4)(3-2.65) \cdot P(10, 3)\)

\(+(10-15.4)(4-2.65) \cdot P(10, 4)\)

\(+(10-15.4)(5-2.65) \cdot P(10, 5)\)

\(+(20-15.4)(0-2.65) \cdot P(20, 0)\)

\(+(20-15.4)(1-2.65) \cdot P(20, 1)\)

\(+(20-15.4)(2-2.65) \cdot P(20, 2)\)

\(+(20-15.4)(3-2.65) \cdot P(20, 3)\)

\(+(20-15.4)(4-2.65) \cdot P(20, 4)\)

\(+(20-15.4)(5-2.65) \cdot P(20, 5)\)

\(=1.89\)

1.7 Portfolio Investment

1.7.1 Formula

Suppose
- A and B are two stocks
- \(X\) is the random variable for the Return On Investment (ROI) of stock A
  - The Expected Value of \(X\) is \(\mu_x\), which is called the Expected ROI of stock A
  - The Standard Deviation of \(X\) is \(\sigma_x\), which is called the Volatility of stock A
- \(Y\) is the random variable for the Return On Investment (ROI) of stock B
  - The Expected Value of \(Y\) is \(\mu_y\), which is called the Expected ROI of stock B
  - The Standard Deviation of \(Y\) is \(\sigma_y\), which is called the Volatility of stock B
- The Coefficient of Correlation between \(X\) and \(Y\) is \(\rho_{xy}\)
- We invest \(\alpha\) (a proportion, \(0 \le \alpha \le 1\)) of our money in stock A
- We invest \(\beta\) (a proportion, \(0 \le \beta \le 1\)) of our money in stock B
Then
- Our Portfolio Investment C is represented by the random variable \[ \boxed{ Z = \alpha X + \beta Y } \]
- The Expected ROI of portfolio investment C is the expected value of random variable \(Z\) \[ \mu_z = E(\alpha X + \beta Y) = \alpha E(X) + \beta E(Y) = \alpha \mu_x + \beta \mu_y \] For short, \[ \boxed{ \mu_z = \alpha \mu_x + \beta \mu_y } \]
- The Variance of \(X\) is \[ \boxed{ \sigma_x^2 = (\sigma_x)^2 } \]
- The Variance of \(Y\) is \[ \boxed{ \sigma_y^2 = (\sigma_y)^2 } \]
- The Covariance between \(X\) and \(Y\) is \[ \boxed{ \sigma_{xy} = \rho_{xy} \sigma_x \sigma_y } \]
- The Variance of the random variable \(Z\) is
  
  \(\sigma_z^2 = Var(Z) = Var(\alpha X + \beta Y)\)
  
  \(= \alpha^2 Var(X) + \beta^2 Var(Y) + 2 \alpha \beta Cov(X, Y)\)
  
  \(= \alpha^2 \sigma_x^2 + \beta^2 \sigma_y^2 + 2 \alpha \beta \rho_{xy} \sigma_x \sigma_y\)
  
  For short, \[ \boxed{ \sigma_z^2 = \alpha^2 \sigma_x^2 + \beta^2 \sigma_y^2 + 2 \alpha \beta \rho_{xy} \sigma_x \sigma_y } \]
- The Standard Deviation of random variable \(Z\), which is also the volatility of stock C, is \[ \boxed{ \sigma_z = \sqrt{\sigma_z^2} }\]

1.7.2 Example

Suppose
- \(\mu_x = 100\)
- \(\mu_y = 120\)
- \(\sigma_x = 20\)
- \(\sigma_y = 30\)
- \(\rho_{xy} = 0.5\)
- \(\alpha = 0.3\)
- \(\beta=0.7\)
Then
- Our Portfolio Investment C is represented by the random variable \[ Z = \alpha X + \beta Y = 0.3 X + 0.7 Y \]
- The Expected ROI of Portfolio Investment C is the Expected value of random variable \(Z\) \[ \mu_z = \alpha \mu_x + \beta \mu_y = 0.3 \cdot 100 + 0.7 \cdot 120 = 114 \]
- The Variance of \(X\) is \[ \sigma_x^2 = (\sigma_x)^2 = 20^2 = 400 \]
- The Variance of \(Y\) is \[ \sigma_y^2 = (\sigma_y)^2 = 30^2 = 900 \]
- The Covariance between \(X\) and \(Y\) is \[ \sigma_{xy} = \rho_{xy} \sigma_x \sigma_y = 0.5 \cdot 20 \cdot 30 = 300 \]
- The Variance of the random variable \(Z\) is \[ \sigma_z^2 = \alpha^2 \sigma_x^2 + \beta^2 \sigma_y^2 + 2 \alpha \beta \rho_{xy} \sigma_x \sigma_y = 0.3^2 \cdot 400 + 0.7^2 \cdot 900 + 2 \cdot 0.3 \cdot 0.7 \cdot 0.5 \cdot 20 \cdot 30 = 603\]
- The Standard Deviation of random variable \(Z\), which is also the volatility of stock C, is \[ \sigma_z = \sqrt{\sigma_z^2} = \sqrt{603} = 24.56 \]

1.7.3 Comment

The expected ROI measures on average how much money we will make.
The volatility measures the instability (risk) of ROI.
We always want high expected ROI with low volability.
Unfortunately, high expected ROI is usually accompanied with high volatility.
For two investments having the same expected ROI, we should choose the one with lower volatility.

2 References

Keller, Gerald. (2015). Statistics for Management and Economics, 10th Edition. Stamford: Cengage Learning.

Gender	male	female
\(x\)	\(1\)	\(2\)
\(P(x) = f(x)\)	\(60\%\)	\(40\%\)