Clustering and predicting antidepressant response in patients with major depressive disorder – a machine learning based secondary analysis of “The EMC trial”-data

ItemDissertationOpen Access

Abstract

Current antidepressant treatment strategies in MDD evaluate treatment response only after 4 weeks of treatment duration. The early improvement criterium (20% sum score improvement after 2 weeks), which is currently state of the art for earlier prediction of treatment response, wasn’t proven to be an effective trigger for clinical decision making so far. This thesis investigates potential sources of predictive information other than symptom severity sum scores, in order to find new hypotheses for future treatment strategies. In Experiment 1, early treatment response patterns over time are identified by clustering with the k-means-algorithm and several possible cluster-structures are evaluated for mathematical fit as well as clinical interpretability. A structure with 5 clusters of early response is identified as a candidate for further investigation and hypothesis building. In Experiment 2, traditional clinical response and remission criteria are predicted using random forest classifiers with different sets of clinical variables at different timepoints as predictors. The classifiers are evaluated in comparison to the early improvement criterium which is being outperformed for some of the predictor sets at any timepoint. This shows that predictive information is contained in clinical variables other than the sum score. These variables are assessed and selected for further model building based on their relative feature importance scores. In Experiment 3, a random forest classifier based on the variables selected in Experiment 2 is trained to predict assignment to the clusters from Experiment 1, thereby combining the two sources of predictive information. The results show this prediction to be possible above the zeroinformation rate for later timepoints. In the combined discussion, the results from these three Experiments are combined to formulate two new hypotheses for treatment strategy. The first hypothesis assumes that the “Delayed Improvement” cluster from Experiment 1 benefits (on average) from treatment continuation longer than 4 weeks and the second hypothesis assumes that patients that will likely – based on predictions like in Experiment 3 – be part of the “Non Improvement” cluster from Experiment 1 benefit from early medication change. The role of the algorithms from this thesis for research into the hypotheses as well as their additional scientific use is discussed.

Description

Keywords

Citation

Relationships