Clustering and predicting antidepressant response in patients with major depressive disorder – a machine learning based secondary analysis of “The EMC trial”-data
Date issued
Authors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
License
Abstract
Current antidepressant treatment strategies in MDD evaluate treatment response only
after 4 weeks of treatment duration. The early improvement criterium (20% sum score
improvement after 2 weeks), which is currently state of the art for earlier prediction of
treatment response, wasn’t proven to be an effective trigger for clinical decision making
so far. This thesis investigates potential sources of predictive information other than
symptom severity sum scores, in order to find new hypotheses for future treatment
strategies. In Experiment 1, early treatment response patterns over time are identified
by clustering with the k-means-algorithm and several possible cluster-structures are
evaluated for mathematical fit as well as clinical interpretability. A structure with 5
clusters of early response is identified as a candidate for further investigation and
hypothesis building. In Experiment 2, traditional clinical response and remission criteria
are predicted using random forest classifiers with different sets of clinical variables at
different timepoints as predictors. The classifiers are evaluated in comparison to the
early improvement criterium which is being outperformed for some of the predictor sets
at any timepoint. This shows that predictive information is contained in clinical variables
other than the sum score. These variables are assessed and selected for further model
building based on their relative feature importance scores. In Experiment 3, a random
forest classifier based on the variables selected in Experiment 2 is trained to predict
assignment to the clusters from Experiment 1, thereby combining the two sources of
predictive information. The results show this prediction to be possible above the zeroinformation rate for later timepoints. In the combined discussion, the results from these
three Experiments are combined to formulate two new hypotheses for treatment
strategy. The first hypothesis assumes that the “Delayed Improvement” cluster from
Experiment 1 benefits (on average) from treatment continuation longer than 4 weeks
and the second hypothesis assumes that patients that will likely – based on predictions
like in Experiment 3 – be part of the “Non Improvement” cluster from Experiment 1
benefit from early medication change. The role of the algorithms from this thesis for
research into the hypotheses as well as their additional scientific use is discussed.