A data-driven approach for determining source proportion in mixed-mode studies
Anuja Hariharan, Jenny Geiger, Alexandra Wachenfeld-Schell
GIM - Gesellschaft für Innovative Marktforschung mbH
In mixed-mode studies that use dual-frame data sources – such as Computer Assisted Telephone Interview (CATI) and Computer Assisted Web Interview (CAWI) – a design decision has to be made ex ante regarding the proportion of CATI/CAWI data to be collected, that maximizes the representativity of the variable of interest (such as online shopping preferences, or any given consumer behavior) as far as possible. Our work aims to investigate how a data-driven approach minimizes the risks of the design choice of proportion of mixed-mode data. Particularly, we attempt to compare the data structure in 5 studies where similar attributes were measured (online shopping behavior & internet usage) and examine whether the variance in these attributes can be minimized by choosing the appropriate proportion of CATI/CAWI split. The studies were conducted in 2021/22 with varying sampling proportions of CATI/CAWI and different online panels. Firstly; we explore how the studies can be compared with each other, and whether representation differences exist when combining different proportions of CAWI and CATI data sources (e.g. 10/90, 30/70 splits etc.). Sampling differences were compared between (1) using a repeated random sampling procedure and (2) by assigning different weights to the data sources. Using total square errors & regressions, data representativity across demographic variables (such as gender, age, occupation, state) & goodness of fit of the variable of interest was compared across 5 studies. We discuss our findings for choosing an appropriate CAWI/CATI proportion, and conclude by highlighting practical implications for designing and analyzing mixed-mode studies.