The following is the short summary of the project from Winter Semester 2009. For the full article, click here.
The term "heart disease" is often used interchangeably with "cardiovascular disease" (CVD) — a term that generally refers to conditions that involve narrowed or blocked blood vessels that can lead to a heart attack, chest pain (angina) or stroke. CVD is one of the top worldwide killer of men and women. It is of interest to investigate the effect of risk factors on the odds of diagnosing CVD.
Framingham Data
1,615 men, aging from 31 to 65, from Framingham Massachusetts were recruited to perform extensive examination on the CVD. The Framingham Heart Study was originally directed by the National Heart Institute The study has played an important role in identifying many major risk factors for cardiovascular disease, and has led to the publication of about 1,200 research articles in leading journals.
The followinga were investigated as the risk factors for the cardiovascular disease: (1) overall average blood pressure, (2) overall average cholesterol level, (3) the difference in the average blood pressure, (4) the difference in the average the cholesterol level, (5) subject's age at the follow up period and (6) whether the subjects had the habit of smoking at the first examination. Model was created to predict the probability of diagnosing CVD given subject's profile. In this data, only 128 out of 1,615 subjects were diagnosed with CVD (7.925% of data).
Cardiovascular Disease Model
The data was randomly split so that 80% of data (1,292 subjects) is used for the model construction. Using this training data, a model was constructed to predict the probability of diagnosing CVD. The table below shows the effect of each risk factor in the model and the P-value for the Wald Test, which tests whether an risk factor effect exists or not.
The plot below shows how each risk factor eects the likelihood of diagnosing the CVD.Model Validation
The performance of the model was evaluated using the remaining 20% of the data (272 subjects). The specificity (the actual proportion of no CVD prediction among subjects with no CVD) is 0.7368 and the sensitivity (the actual proportion of CVD prediction among subjects with CVD) is 0.7273. Therefore, given that probability of diagnosing CVD is 7.925% the probability of predicting that the subject has a CVD is 19.21%. This probability may seem small but considering that only 7.925% out of all subject (128 out of 1,615 subjects) actually had the CVD, the model seem to perform well. The model needs more subjects with CVD to increase its efficiency.
Conclusion
The age, systolic blood pressure and the cholesterol level have significant effects of the odds of diagnosing the CVD, while smoking, changes in blood pressure and cholesterol level do not have significant effect. Furthermore, there is no interaction of these risk factors. The model performance is moderately accurate. Model predicts the diagnosis of the CVD correctly 19.21% of the time, given that probability of diagnosing CVD is 7.925%.
0 comments:
Post a Comment