The Binary Logistic Regression


Binary logistic regression model is an example of Generalized Linear Model.

[1] Assumptions
Observations are independent, and sample size is large enough for valid inference-tests and confidence interval as the Generalized Linear Model uses MLE (Maximum Likelihood Estimate) to predict the parameter coefficients. Also, underlying probability model for response is Bernoulli.

The correct form of model is linear relationship between logits and explanatory variables.

As this is an example of Generalized Linear Model, there are less assumptions to check for than in linear regression. (Note that General Linear Model uses least square mathematical method to predict parameter coefficients.)
- We don't need to check for outliers since the response variable is either 0 or 1. 
- There is no residual plots as there is no meaning from residuals. 
- Variance is not constant. (see below) 


[2] Model   
$Y_{i}|X_{i}$ = 1 if response is in category of interest, 0 otherwise. (Binary!)
where $Y_{i}|X_{i}$  is distributed by Bernoulli ($\pi_{i}$), where $\pi_{i}$ is a probability of success.

Then $E[Y_{i}|X_{i}]=\pi_{i}$, and $Var[Y_{i}|X_{i}]=\pi_{i} \cdot (1-\pi_{i})$
Therefore, variance is not constant as the probability is vary. 

Logistic Regression Model : $log(\frac{\pi}{1-\pi})= \beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p}$

Logistic Function : $\pi=\frac{\exp(\beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p})}{1+\exp(\beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p})}$,
                                where $\exp(\beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p})\in (-\infty, \infty )$


[3] Estimation Method of Parameter Coefficients - by MLE
Data is $Y_{i}$ where 1 if response is in category of interest, 0 otherwise.
Model : $P(Y_{i}=y_{i})=\pi_{i}^{y_{i}}\cdot (1-\pi_{i})^{1-y_{i}}$.

Joint Density: $P(Y_{i}=y_{i},...,Y_{n}=y_{n})=\prod_{i=1}^{n}\pi_{i}^{y_{i}}\cdot (1-\pi_{i})^{1-y_{i}}$
           where  $\pi=\frac{\exp(\beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p})}{1+\exp(\beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p})}$ and $1-\pi_{i}= \frac{1}{1+\exp(\beta_{0}+\beta_{1}X_{1}+...+\beta_{p}X_{p})}$
Note that we assume n observations are independent!

So if we think of the joint density as a function of $\beta$, then 
 - Likelihood Function will be $L(\beta_{0},...,\beta_{p})= \prod_{i=1}^{n}\pi_{i}(\beta)^{y_{i}}(1-\pi_{i}(\beta))^{1-y_{i}}$
Finally if we maximize the log-likelihood, $(\hat{\beta_{0}},...,\hat{\beta_{p}})=\arg \max \left \{ \log L(\beta_{0},...,\beta_{p}) \right \}$
However, there is no exact values so that we can get solution by using Newton-Raphson algorithm or Fisher Scoring Algorithm. As the parameter coefficients are estimated by this method, our one of the assumptions should be large enough sample!


[4] Wald Procedure
How can we know whether a explanatory has effect on log-odds or not?
We can use Wald procedure to test whether $\beta's$ are zero or not!!

Hypothesis : $H_{0}:\beta_{j}=0$ which means $X_{j}$ has no effect on log-odds!!, $H_{1}:\beta_{j}\neq 0$
Test Statistics : $Z_{obs}=\frac{\hat{\beta_{j}}}{se(\hat{\beta_{j}})}$
                        where $\hat{\beta_{j}}$ is a maximum likelihood estimate.
Note that if there is large enough sample, MLE's are normally distributed so that under t$H_{0}$, our test statistics, $Z_{obs}$, is an observation from an approximate Normal(0,1) distribution!!

95% Confidence interval : $\hat{\beta_{j}}\pm 1.96 \cdot se(\hat{\beta_{j}})$


[5] Likelihood Ratio Test
In order to compare which model is appropriate, we can use likelihood ratio test.
Likelihood Ratio : $\frac{L_{R}}{L_{F}}$, where $L_{R}$ is reduced model, and $L_{F}$ is full model of same data.

Hypothesis :
$H_{0}: \beta_{1}=...=\beta_{k}=0$ Reduced model is appropriate so that it fits data as well as full model.
$H_{1}:$ at least one $\beta_{1}=...=\beta_{k}\neq 0$

Test Statistics : $G^2=-2 \log L_{R}-(-2\log L_{F})=-2 \log \frac{L_R}{L_F}$
Note that, under the null hypothesis, $G^2$ is an observation from a chi-square distribution with k degrees of freedom for large n! k is the number of parameter fewer in reduced model.


Remark!! Both Wald Test and Likelihood Ratio Test are testing whether $\beta$ parameter is zero or not! But those two procedures are different so that each following distribution is also different!! Just in case they do not agree, we use the Likelihood Ratio Test since it is more reliable. 



No comments:

Post a Comment