Case Study : The Binary Logistic Regression in SAS

 
Case Study : The Donner Party in SAS  
 
The Donner and Reed family (87 people) travelling by covered wagon got stuck in a snow storm in October in the Sierra Nevada in 1846. By the time they were rescued in next April, 40 members had died from starvation and harsh condition. The researchers attempted to investigate whether females are better able to withstand this condition than males, and whether the odds of survival were different between male and females for any given age.
 
Reference: Gayson, D.K., 1990, "Donner Party deaths: A demographic assessment," Journal of Anthropological Research, 46, 223-42, and Ramsey, F.L. and Schafer, D.W., 2002, The Statistical Sleuth, 2nd Ed, Duxbury Press, p. 580.
 
[1] Data and Model
Response $Y_{i}$ : Binary variable; survived/died. (Note that Not continuous & normal
Predictors $X_{i}$ : Age, Sex of ith pioneer. 
Odds in favor of success is $\frac{\pi}{1-\pi}$ & Log Odds: $\log \frac{\pi}{1-\pi}$
Model :  $\log \frac{\pi}{1-\pi}= \beta_{0} + \beta_{1}AGE_{i1} + \beta_{2}SEX_{i2}$, i=1,...,45 Binary Logistic Regression
 
Note that we cannot predict survival ($\pi$=1) or death ($\pi$=0) of a pioneer, but we can estimate $\pi_{i}$, the probability of survival, odds of survival and log-odds of survival based on the predictors!
 
[2] SAS Code & Result - proc logistic
The default is an alphanumeric order. So if we run SAs with default, between Die and Survived, Die comes first. Therefore, $\pi$=P(DIE)!! In order to reserve order we can use DESCENDING option in proc statement which makes $\pi$=P(Survived) 

In generalized linear model, we estimate parameter coefficient by using the maximum likelihood estimate which is calculated by Fisher Scoring algorithm in SAS.
 
2.1 Default Code
 

Note that the predictor variable SEX is a categorical variable. The class statement creates female = 1, and male = -1.
 
2.2 Default  Result


2.3 SAS Result Intepretation 
* Model Equation
$\log(\frac{\hat{\pi}}{1-\hat{\pi}})=-2.43+0.078 \cdot AGE_{i}-0.80\cdot SEX_{i}$


* Wald Procedures
Hypothesis : $H_{0}: \beta_j=0$ vs $H_{1}: \beta_j\neq 0$
Test Statistics: $Z_{obs}=\frac{\hat{\beta_j}}{se(\hat{\beta_j})}$ ~ approx. Normal(0,1) distribution
95% Confidence Interval : $\hat{\beta_j}\pm1.96 \cdot se(\hat{\beta_j})$

Does the Age predictor variable have effect on log-odds?
Hypothesis: $H_{0}: \beta_{AGE}=0$ vs $H_{1}: \beta_{AGE}\neq 0$
Test Statistics :  $(\frac{-0.078}{0.0373})^2=4.3988$ ~ $\chi_{1}$ (Why? b/c If $Z$ ~N(0,1) then $Z^2$ ~ $\chi_{1}$)  
P-value : 0.036 which means we have moderate evidence that AGE has an effect on survival.
95% Confidence Interval : $-0.078 \pm 1.96 \cdot 0.0373 = (-0.15, -0.0055)$
CI for Odds Ratio : $(e^{-0.15}, e^{-0.0055})=(0.86, 0.995)$
Interpretation for Odds Ratio: For the same sex, the odds ratio for a 1-yera increase in age is between 0.86 and 0.995.

Note that if a predictor variable($X_1$) increase by 1 unit, holding all other predictor variables are constant, the odds that Y=1 change by a multiplicative factor of $e^{\beta_1}$.


* Likelihood Ratio Test
Hypothesis : $H_{0}: \beta_1= \beta_2=0$ vs \H_1$= The Fitted model is better.
Test Statistics: $G^2=-2 \log \frac{L_R}{L_F}=2 \log L_F-2 \log L_R=61.827 - 51.256=10.57$
Interpretation : As we have small P-value (0.0051), we have strong evidence that fitted model is better.
 
Reference: https://onlinecourses.science.psu.edu/stat504/node/159 


Remark 1) SAS Code - DESCENDING Options
If we put descending option, we will get $\pi$ =P(Survived)

 
Remark 2) SAS Code - CLASS Options
In categorical predictor defined by the class statement, effect coding is 1/-1. In order to use indicator variable such as $I_{Female}=1$ if female, and 0 if Male, we can use / param=ref; statement.


[3] Model with age, sex and their interaction!
Q) Is the coefficient for the age-sex interaction statistically significantly different from 0?

A) We can use a Likelihood ratio test!

First, SAS code is following below. 
proc logistic descending;
  title 'Model with age sex and their interaction';
  class sex / param=ref;
  model status = age sex age*sex;
run;

Then, the SAS result is following below.

The Likelihood Ratio Test with $H_0$: the coefficient of the interaction term($\beta_3$ )= 0
The deviance for the model with the interaction is 47.346, whereas the model without the interaction is 51.256. So Test statistic is 51.256-47.346 = 3.91 which is distributed with Chi-squared with 1 degree of freedom (as we are testing only $\beta_3$)
From Chi-Square table, the p-value is between 0.025 and 0.05.

Therefore, there is moderate evidence that the coefficient of the interaction is not 0.

Q) What is the estimated odds ratio for a female that is 10 years older than another female?
A) Refer to the coefficient estimates, the estimated odds ratio will be,
$\exp\left \{ -0.0325\cdot10-0.1616\cdot 10 \right \}=0.14$  

No comments:

Post a Comment