How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

Stavseth, Marianne Riksheim; Clausen, Thomas; Røislien, Jo

dc.contributor.author	Stavseth, Marianne Riksheim
dc.contributor.author	Clausen, Thomas
dc.contributor.author	Røislien, Jo
dc.date.accessioned	2020-02-17T14:13:13Z
dc.date.available	2020-02-17T14:13:13Z
dc.date.created	2019-09-26T14:16:02Z
dc.date.issued	2019-01
dc.identifier.citation	Stavseth, M.R., Clausen, T., Røislien, J. (2019) How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data. SAGE Open Medicine, 7, 1-12.	nb_NO
dc.identifier.issn	2050-3121
dc.identifier.uri	http://hdl.handle.net/11250/2642055
dc.description.abstract	Objectives: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. Methods: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation–maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. Results: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was ⩾20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. Conclusion: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	SAGE Publishing	nb_NO
dc.relation.uri	https://journals.sagepub.com/doi/pdf/10.1177/2050312118822912
dc.rights	Navngivelse-Ikkekommersiell 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/deed.no	*
dc.subject	medisinsk forskning	nb_NO
dc.title	How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	publishedVersion	nb_NO
dc.rights.holder	© The Author(s) 2019	nb_NO
dc.subject.nsi	VDP::Medical disciplines: 700	nb_NO
dc.source.pagenumber	1-12	nb_NO
dc.source.volume	7	nb_NO
dc.source.journal	SAGE Open Medicine	nb_NO
dc.identifier.doi	10.1177/2050312118822912
dc.identifier.cristin	1729648
cristin.unitcode	217,13,2,0
cristin.unitname	Avdeling for kvalitet og helseteknologi
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: How+handling+missing+data+may+ ...
Størrelse:: 321.4Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin [4330]
Vitenskapelige publikasjoner (HV) [900]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse-Ikkekommersiell 4.0 Internasjonal