How prevalent is data fabrication in household surveys? Would such fabrication substantially affect the validity of empirical analyses? We document how we identified such fabrication in South Africa's longitudinal National Income Dynamics Study, which affected about 7% of the sample. The fabrication was detected while fieldwork was still on-going, and the relevant interviews were reconducted. We thus have an observed counterfactual that allows us to measure how problematic such fabrication would have been, had it remained undetected. We compare estimates from the dataset that includes the fabricated interviews with corresponding estimates that includes the corrected data instead. We find that the fabrication would not have affected our univariate and cross-sectional estimates meaningfully, but would have led us to reach substantially different conclusions when implementing panel estimators. We estimate that the data quality investigation in this survey had a benefit-cost ratio of at least 24, and was thus easily justifiable.
Comments
(Leave your comments here about this item.)