Multilevel Modelling of Electronic Health Records
thesisposted on 19.11.2019, 14:16 by Alessandro Gasparini
The use of electronic health records (EHRs) is increasingly common in applied research, providing the opportunity to answer more relevant and detailed clinical questions. Among others, assessing the quality of routine care, enabling pragmatic clinical trials, investigating temporal trends and the natural evolution of diseases. The effective use of EHRs in medical research provides several opportunities, but challenges persist. The principal aim of this Thesis consists of investigating methodological challenges, with focus on the multilevel structure of EHRs. First, I studied shared frailty survival models for clustered survival data and the impact of model misspecification on estimates of risk and heterogeneity. Then, I investigated joint models for longitudinal and survival data and their use to account for the drop-out and observation processes in the analysis of longitudinal data. Drop-out and the timing between observations are likely not independent of the outcome of interest in the settings of EHRs, therefore violating common assumptions of traditional methods. Focussing on the observation process, I compared the joint modelling approach to other methods previously proposed in the literature via Monte Carlo simulation. Lastly, given the use of simulation methods throughout this Thesis, I introduced newly-developed software in R to aid, support, and supplement their analysis. The results of this Thesis highlight the importance of properly modelling the baseline hazard, frailty distribution, and assessing model fit in shared frailty survival models, as clinically-relevant biases may arise otherwise. Moreover, the joint modelling approach showed superior performance and flexibility when modelling the observation process, with a consistent pattern across all simulated scenarios. I illustrated the above-mentioned results in practice using real-world data on chronic kidney disease and intensive care medicine, emphasising once again the requirement for appropriate statistical methods that can accommodate the complexities commonly encountered in the settings of EHRs.