Case Study : 2X2 Contingency Table in SAS

Case Study : Framingham Heart study 
In 1948, in Massachusetts, 5209 healthy men and women, aged 30-60 were recruited and followed to examine risk factors for cardiovascular disease(CVD). After 10 years, checked whether they developed CVD or not. So main question is high cholesterol is associated with increase risk of CVD
 
Reference: https://www.framinghamheartstudy.org/, https://en.wikipedia.org/wiki/Framingham_Heart_Study  
 
 
[1] Data Assumption and SAS Code for reading the data. 
The number of total sample (n) is 1329 (men). The cholesterol was measured in 1948, and after ten years, checked whether cardiovascular disease (CVD) was found or not. Therefore, our observations can be classified into 2 ways: Cholesterol status (H or L), and CVD status (present or absent).
 
Recall, our assumption should be the number of total sample (n=1329) is fixed!! And cholesterol status and CVD status are categorical random variable with 2 levels.
 
data fram;
  input chol $ cvd $ count;
  datalines;
low present 51
low absent 992
high present 41
high absent 245;
 

 
This SAS code presents the data table below.  
 

 
[2] Hypothesis
Recall that two variables A and B are independent if and only if P(AB)=P(A)xP(B), Therefore, the main idea is to compare between the joint distribution and the marginal distribution, where  
 
The Joint distribution:
The probability that an observation falls into row i, column j, for i & j=1, 2 $= P(C=i, D=j)= \pi_{ij}$
 
The Marginal distribution:
The probability an observation falls into row I $= P(C=i)= \pi_{i \cdot}$
The probability an observation falls into column j $= P(D=i)= \pi_{ \cdot j}$
 
$H_0$ : $\pi_{ij}=\pi_{i\cdot}\pi_{\cdot j}$ There is no relationship between Cholesterol and CVD.
$H_1$ : $\pi_{ij} \neq \pi_{i\cdot}\pi_{\cdot j}$  
 
 
 [3] SAS Code - 2x2 Contingency Table  
proc freq;
 weight count;
 table chol*cvd / chisq;
run;
  
 
 
[4] Conclusion
Under the null hypothesis, with large samples, our test statistics is followed by Chi-squared distribution with (I-1)(J-1) degrees of freedom.
 
From the SAS result, Chi-square statistic is 31.0818 with 1 degree of freedom. And the P-value is less than 0.0001 (We reject the null hypothesis) Therefore ,we have strong evidence that cholesterol and CVD are not independent, which means CVD status depends on the cholesterol level.

No comments:

Post a Comment