First, the latent classes were derived selleck chem for the sample for which all three measures of smoking behavior were available (complete case dataset), that is, obtained through listwise deletion. The validity of any results based on this small subset will depend on the degree to which the nonresponse is missing completely at random (MCAR), that is, that nonresponse is neither related to measured nor unmeasured variables. We refer the interested reader to excellent introductory texts on the various types of missing data (MCAR/MAR/MNAR; Graham, 2009; Schafer & Graham, 2002; Sterne et al., 2009). Second, the latent class estimation was repeated for those with at least one of the three smoking measures present (partially missing dataset). Mplus achieves this through estimation using FIML.
Here, the assumption is that missing data are missing at random (MAR) conditional on the repeated measures data that are observed (nonresponse is random, conditional on these data). Rather than imputing data to fill in any of the missing values, FIML directly estimates all parameters using all available data (Enders, 2001; Enders & Bandalos, 2001). As stated in Enders and Bandalos (2001) �� �� under MAR, the partially observed cases provide important information about the underlying marginal distributions of the incomplete variables and hence may reduce the bias that would result from the listwise deletion of cases.�� Results from regression models involving the output from this latent class model will be referred to and labeled as the ��FIML�� results to distinguish them from imputation results described below.
The FIML approach deals with missing data among the repeated measures but not within the independent variables. Consequently, as an alternative to FIML estimation, MI was carried out using chained equations (van Buuren, Boshuizen, & Knook, 1999) using the ice routine (Royston, 2009) in Stata. This is an iterative procedure, which uses univariable regression equations applied to each variable in turn to predict any missing data, based on the other variables included in the imputation model. Unlike the FIML approach, MI creates multiple datasets over which any imputed data can vary, reflecting the uncertainly in the true values. This approach avoided any drop in sample size, which would otherwise occur when the covariates were regressed on the latent class outcome (see Supplementary Material for more details on the ice procedure).
Previous substance use work combining imputation and mixture Batimastat modeling has simplified the task either by using a single imputed dataset (Hix-Small et al., 2004; Li et al., 2001) or by restricting the imputation step to the covariates (Guo et al., 2002). The derivation of the latent classes was carried out on each imputed dataset in turn; however, the choice of the optimal number of classes was based on the earlier CC/FIML analyses.