The Binomial Logistic Regression


Binomial logistic regression model is an example of Generalized Linear Model.

[1] Assumptions
Observations are independent, and sample size is large enough for valid inference-tests and confidence interval as the Generalized Linear Model use MLE(Maximum Likelihood Estimate) to predict the parameter coefficients.

There is a linear relationship between observed logit and quantitative explanatory variables.

[2] Model
The response variable is binomial count of the number of "success" Y~ Binomial(m, $\pi$)
$P(Y=y)=\binom{m}{y}\pi^y(1-\pi)^{m-y}$, y=0,1,...,m

If we consider $\frac{y}{m}$ as a proportion of success out of m independent Bernoulli trials then,
 - The mean: $E(\frac{y}{m})=\pi$
 - The variance: $Var(\frac{y}{m})= \frac{\pi(1-\pi)}{m}$  

Model: $\log (\frac{\pi}{1-\pi})= f(\mathbb{X}; \beta)$ where $f(\mathbb{X}; \beta)$ is a linear function of the $\beta$'s.


[3] Models and Deviance & Global Likelihood Ratio Test 
In Binomial Logistic regression, we can do more tests for model adequacy than in Binary Logistic Regression.

There are three types of models ; Saturated, Fitted, and Null model) describing the data, but we need to choose the most adequate model among them through tests.

3.1 Saturated Model (or Full Model)
It is an exact model describing the data so that predictor variables are indicator variables for each level of X!
$logit (\hat{\pi})= \hat{\alpha_{0}}+ \hat{\alpha_{1}}I_1+...+ \hat{\alpha}_{n-1}I_{n-1}$

3.2 Fitted Model (or Reduced Model)
The model with only a few predictor variables.
$logit (\hat{\pi})= \hat{\beta_{0}}+ \hat{\beta_{1}}X$

3.3 Null model
The model with intercept only!
$logit (\hat{\pi})= \hat{\gamma_{0}}$

In order to compare between Saturated and Fitted model, then we use Deviance Test!
In order to compare between Fitted and Null, then we use Global Likelihood Ratio Tests!

3.4 Deviance Test - Saturated vs. Fitted
$H_0$ : The Fitted model is enough vs $H_1$ : Saturated model is required. 
The smaller deviance implies fitted model is good enough for the data.

Wait! What is the Deviance?
Deviance = $-2 \log \frac{L_F}{L_S}= -2 (\log L_F-\log L_S)=2(\log L_S-\log L_F)$, where
the log-likelihood is $-2 \log \frac{L_F}{L_S}= -2 (\log L_F-\log L_S)=2(\log L_S-\log L_F)$
Recall) Likelihood is $L=\prod_{i=1}^{n}\binom{m_i}{y_i}\pi^{y_i}_i (1-\pi_i)^{m_i-y_i}$ (m= fixed number of trials, n=total sample size)

3.5 Global Likelihood Ratio Test - Fitted vs. Null
$H_0$ : The Null model is enough vs $H_1$ : The Fitted model is required.  


[4] Other Model Fit Statistics - AIC & BIC
There are two popular fit statistics; AIC and SC for comparing models with same response and same data

4.1 Akaike's Information Criterion (AIC)
     AIC = -2 log L + 2(p+1),
              where 2(p+1) is a penalty and p+1 : # parameters including $\beta_0$

4.2 Schwarz's (Bayesian Information) Criterion (SC)
     SC = -2 logL + (p+1) log N,
            where (p+1)log N is a penalty and p is the number of explanatory variables
     SC applies stronger penalty for model complexity than AIC.

If difference in AIC's > 10, then one model fits better than another. If AIC's < 2, then one model can be considered to be equivalent to another. Overall, smaller is better!

No comments:

Post a Comment