FAQ Latent GOLD®

LC Cluster Analysis
Discrete Factor (DFactor) Analysis
LC Regression Analysis
Advanced/Syntax and Technical questions
LG Choice


What resources are available to learn about Latent GOLD® and latent class modeling?

Before purchasing the program, you can try out the free demo version of the program, which allows access to all program features with sample data files.

Tutorials take you step-by-step through several analyses of these sample files. These tutorials along with various publications are available on our website. Upon purchase of the program users can download a 200 page User's Guide or other Manuals that cover a wide range of topics on Latent Class Analysis and Latent GOLD® .

We also offer Onsite training as well as Online Courses.

What data file formats can Latent GOLD® handle?

Latent GOLD® can handle ASCII Text data formats as well as SPSS files.

How can I use Latent GOLD® with SAS data sets?

SAS Export can create an SPSS .sav file which can be opened by Latent GOLD®. The SAS Documentation illustrates the Export function. View the Relevant Export page for instructions.

How many records and variables can I use? How much time will it take to run?

There is NO limit concerning the number of records. The time will depend on several factors including the # of variables and records, speed of your machine, and the requested output. For many models, Latent GOLD® runs 20 or more times faster than other Latent Class programs and version 5.0 is much faster than earlier versions. We suggest trying the demo program to see how fast Latent GOLD® works on your machine.

How does Latent GOLD differ from the LEM program?

Latent GOLD® implements the 3 most important types of latent class (LC) models. It was designed to be extremely easy to use and to make it possible for people without a strong statistical background to apply LC analysis to their own data in a safe and easy way. LEM is a command language research tool that Prof. Jeroen Vermunt developed for applied researchers with a strong statistical background who want to apply nonstandard log-linear and latent class models to their categorical data. With LEM you can specify more probability structures with many more kinds of restrictions (if you know how to do it), but is not designed to be Windows friendly, requires strict data and input formats and does not provide error checks.

With Latent GOLD, continuous and count variables can be included in the model, and special LC output not available in LEM is provided, such as various graphs, classification statistics, and bivariate residuals. Latent GOLD® also has faster (full Newton-Raphson) and safer (sets of starting values, Bayes constants) estimation methods for LC models than LEM. Both programs give information on nonidentifiability and boundary solutions, but Latent GOLD® , unlike LEM, can prevent boundary solutions through the use of Bayes constants.

Do you have any tutorials for event history analysis?

The set of example data files on our website contains various event history analysis examples. The setup for several  Event History models can be opened in Latent GOLD using the HELP GUI Example Regression menu. Full tutorials are not yet available for these. However, to get you started, you might look at the data file land.sav, the full reference for which is " Land, K.C., Nagin, D.S., and McCall (2001). Discrete-time hazard regression models with hidden heterogeneity: the semi-parametric mixed Poisson approach. Sociological Methods and Research, 29, 342-373." Another good example is jobchange.dat.

Land.sav contains information on 411 males from working-class area of London who were followed from ages 10 through 31. The dependent variable is "first serious delinquency". As can be seen, there is one record for each time point, which is called a person-period data format. The dependent "first" is zero for all records of a person, expect for the last if a person experienced the event of interest at that age. The variables age and age_sq are the duration variables. These can also be seen as time-varying predictors. The variable "tot" is a time-constant covariate/predictor (a composite risk factor). Of course the ID should be used as Case ID to indicate which records belong to the same case.

The dependent "first" can be treated as a Poisson count or as a binomial count. The former option yields a piece-wise constant log-linear hazard model, the latter a discrete-time logit. If treated as Poisson count, it is best to set the exposure to one half (exp_half: event occurs in the middle of the interval) for the time point at which the event occurs. With a binomial count the exposure should be one all the time (=default). Age and age_sq should be used as class-dependent predictors. You identify two groups with clearly different age pattern in the rate of first delinquency. The variable "tot" can be used as class-independent predictor, but more interesting is to use it as covariate: does the risk factor determine the type of delinquency rajectory?

This example can be modified.extended in many ways. - you can include other time-varying predictors than the time variables. These can be assumed to have the same or different effects across classes. - you can include information on another event. In that case your classes describe the pattern in multiple events - you can include as many covariates as you want (this will usually be demographics, but can also be a treatment) - you can model the time dependence as nominal, yielding a Cox-like model.

A general reference on event history combined with LC analysis is Vermunt (1997), Log-linear event history analysis. Sage Publications.

Can I use Latent GOLD® on a Mac?

Latent GOLD® is a Windows program but we have had anecdotal reports that the emulation software Wine and other virtual machine software have allowed users to run Latent GOLD on Macintosh machines.

Please note that we only support the installation process of Latent GOLD® on a Windows machine.

LC Cluster Analysis

How does latent class cluster analysis compare with the traditional clustering procedures in SAS and SPSS?

LC clustering is model-based in contrast to traditional approaches that are based on ad-hoc distance measures. The general probability model underlying LC clustering more readily accommodates reality by allowing for unequal variances in each cluster, use of variables with mixed scale types, and formal statistical procedures for determining the number of clusters, among many other improvements. For a detailed comparison showing how LC cluster outperforms SPSS K-means clustering and SAS FASTCLUS procedures, see Magidson and Vermunt (2002) and Kent, Jensen, and Kongsted (2014)

How does Latent GOLD® classify cases into latent classes?

Cases are assigned to the latent class having the highest posterior membership probability. Covariates can be added to the model for improved description and prediction of the latent classes.

When the 'Include Missing' option is selected, does Latent Gold do some kind of imputation?

No, imputation is not necessary. Classification with missing values works exactly the same as classification without missing values. It is simply based on the variables that are observed for the case concerned. There is no imputation of missing values for indicators. One of the nice things about LC analysis is that imputation is not necessary.

In the User's Guide, we give the general form of the density with missing values. The crucial thing is the delta, which is 0 if an indicator is missing. If that occurs the term cancels (it is equal to 1 irrespective of the value of y).

Thus with 4 indicators y1, y2, y3, and y4, two clusters and y2 missing

P(x|y1,y3,y4) = P(x) P(y1|x) P(y3|x) P(y4|x) / P(y1,y3,y4)


P(y1,y3,y4) = P(1) P(y1|1) P(y3|1) P(y4|1) + P(2) P(y1|2) P(y3|2) P(y4|2).

How are Latent Class (LC) clustering techniques related to Fuzzy Clustering Techniques?

In fuzzy clustering, a case has grades of membership which are the "parameters" to be estimated (Kaufman and Rousseeuw, 1990). In contrast, in LC clustering an individual's posterior class-membership probabilities are computed from the estimated model parameters and the observed scores. The advantage of the LC approach is that it is possible to use the LC model to classify other cases (outside the sample used to estimate the model) which belong to the population. This is not possible with standard fuzzy clustering techniques.

Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in data: An introduction to cluster analysis, New York: John Wiley and Sons.

How can I tell if my latent class cluster model contains local dependencies?

Local dependence for a K-class model exists if the model does NOT fit the data. One such measure of model fit is given by the bivariate residuals (BVRs) associated with each pair of model indicators. Large BVRs (values over 2) can be viewed as evidence of local dependence associated with that pair of model indicators (see the residuals output of Latent GOLD for these statistics).

How can I handle local dependencies in my LC cluster model?

Local dependence can be accounted for by simply adding latent classes or by maintaining the current number of classes and modifying the model in other ways such as adding direct effects associated with 2 variables that have large bivariate residuals. See the LG manual for details of how to add direct effects. See also section 3 in Magidson and Vermunt (2004) for further details of the different approaches for dealing with local dependence.

In LC cluster models containing continuous indicators, how can I determine whether a model should contain a within class correlation between 2 or more of these variables?

You can estimate several models and select the one that fits best according to BIC. For example, six types of LC cluster models are reported in Table 1 of the Magidson and Vermunt (2002). These models differ with respect to a) the specification of class dependent vs. class independent error variances and b) the 'direct effects' included in the LC cluster model estimated by LatentGOLD. The 3-class type-5 model is best according to the BIC statistic. Various parameter estimates and standard errors from this 'final' model are obtained from the Profile and Parameters Output.

Discrete Factor (DFactor) Analysis

How does latent class factor (DFactor) analysis compare with traditional factor analysis?

The Discrete Factor (DFactor) model assumes that each factor contains 2 or more ordered categories as opposed to traditional factor analysis which assumes that the factors (as well as the variables) are continuous (interval scaled). The variables in LC factor analysis need not be continuous. They may be mixed scale types (nominal, ordinal, continuous, counts, or combinations of these). LC Factor also has a close relationship to cluster analysis. For an introduction to LC factor analysis, and to see how it relates to LC cluster analysis, see Magidson and Vermunt (2001). For a comparison with traditional Factor Analysis in datamining see Magidson and Vermunt (2003)

I was looking at the lifestyle data set (tutorial #1) and was trying to run a factor model on all lifestyle indicator variables (from Tennis to Military). I have requested an 8-factor model and it has been running for 30 minutes. Am I doing something wrong?

As the number of discrete factors (DFactors) increases the estimation time increases exponentially. From an exploratory perspective, you may well find that a 2 or 3 factor solution will already be quite informative -- 3 dichotomous DFactors will segment the sample into 8 distinct clusters! On the other hand, 8 dichotomous factors corresponds to 2 to the power 8 = 256 clusters. To see the relationship between factors and clusters, see Magidson and Vermunt (2001).

Traditional factor analysis (FA) is faster because it makes a simplifying assumption that all variables are continuous and that they follow a multivariate normal (MVN) distribution. When these assumptions are true, only the second order moments (the correlations between the variables) are needed to estimate the model. Since these data contain dichotomous variables, the FA assumptions are not justified.

Latent GOLD® does not assume MVN and hence is much more general. It utilizes information from all higher order associations (more than means and correlations) in the estimation of parameters. The resulting solution will be directly interpretable and unique, unlike the traditional FA solution which requires a rotation for interpretability. Vermunt and Magidson (2003)  is an article by the developers of LG that appears in a book on datamining. It compares the DFactor model with traditional FA and shows why the former often provides insights into data that are missed by traditional FA. For the relationship between factor loadings reported for DFactor models and those from traditional FA, see  Vermunt and Magidson (2004).

LC Regression Analysis

How does latent class regression analysis compare with traditional regression modeling?

There are 2 primary kinds of differences. First, the particular regression is automatically determined according to the scale type of the dependent variable. For continuous, the traditional linear regression is employed; for dichotomous, logistic regression; for ordinal, the baseline/adjacent category logit extension; for nominal, multinomial logit; for count, Poisson regression. models are used. For example, for dichotomous dependent variables, the logistic regression model is used. Second, LC Regression is a mixture model and hence is more general than traditional regression. The special case of 1-class corresponds to the homogeneous population assumption made in traditional regression. In LC regression, separate regressions  are estimated simultaneously for each latent class.

I need a mixture modeling program that can handle dependent variables that are dichotomous as well as continuous. Does Latent GOLD® handle this?

Yes. Mixture modeling and latent class modeling are synonymous.

Is there any "stepwise" inclusion feature in the LC regression module?

No. Since the latent classes may be highly dependent on the predictors that are included, stepwise features have not been implemented in the latent class regression module.

I have a binary dependent variable and five categorical independent variables. I am using Latent GOLD® to find 3 segments among the respondents. The Parameters output shows separate estimates for each segment. However, there appears to be both intercepts as well as betas for dependent variable. I am confused about how to use both of them in terms of predicting.

The 'gamma' parameters labeled ‘Intercept’ (and other gamma parameters that would appear if you have covariates) refer to the model to predict the latent variable classes as a function of the covariates. If no covariates are included in the model only the Intercept appears under the label ‘(gamma)’. Beneath the gamma parameters, the parameters labeled 'beta' appear. These refer to the model to predict the dependent variable (which including the dependent variable regression intercept). This output has been rearranged in Latent GOLD® to provide better separation of the parameters from these two different models. See Tutorial 3 for an example.

Latent GOLD® also has many additional features useful for prediction, such as the automatic generation of predicted values, the ability to restrict the regression coefficients in many ways, and R-square statistics. See the User's Guide and Technical Guide for further details on these new features.

Can Latent GOLD® perform multinominal LC regression models?

Multinomial LC regression models are estimated simply by specifying the dependent variable to be nominal. In the case of repeated measures, (multiple time points, multiple ratings by the same respondent, etc.) an ID variable can be used to identify the records associated with the same case. (See Tutorial 3 for an example of a repeated measures conjoint study.)

For LC Regression models, there are several R square statistics reported in the Latent Gold output. When there are 2 or more latent segments (latent classes), do these still measure the overall strength of the predictors to predict the dependent variable?

Yes. One important additional aspect is that estimated class-membership
also improves overall prediction and contributes to the magnitude of R square.

I understand that the covariates are used to predict membership in a class based upon the probabilities derived from a multinomial logit model. The classification errors, reduction errors, entropy R square, etc. are associated with this estimation. Correct?

This is not fully correct: These measures indeed indicate how well we can predict class membership. But, the covariates alone do not determine classification -- the regression model itself plays a major role in predicting class membership. This prediction/classification is based on a person’s responses on the dependent variable (given predictor values). If you look at the formulas, you can see that the posterior membership probabilities do not only depend on P(x|z), but also on P(y|x,z). Even without any covariates (z), these models usually predict class membership quite well.

Intuitively, one determines which class-specific regression model fits best to the responses of a certain case. The better that a regression model associated with a particular class fits, the higher the probability of belonging to that class. Price sensitive people are assigned to the class for which the regression shows higher price effects, etc.

In Latent GOLD, we also report a separate R-squared for the prediction of class membership based on covariates only.

I need additional information on Tutorial # 3: LC Regression with Repeated Measures. Specifically, I would like to know how preference ratings and probabilities for different preference levels for a given profile are computed in this example. In other words, how do I use the estimated beta coefficients to compute the probabilities of choosing Ratings 1 thru 5 for a given profile, say [Fashion=Traditional, Quality=High, Price=Lower]? Also, what exactly are the gamma coefficients as distinguished from those labeled betas in the parameters output?

The 'ordinal' dependent variable specification is used in this example which causes the baseline category logit model to be used. The beta coefficients listed in the column of the parameters output file corresponding to a particular latent class are the b-coefficients in the following model: f( j | Z1, Z2, Z3) = b0(j) + b1*Z1*y(j) + b2*Z2*y(j) + b2*Z3*y(j).

The b0 estimates are the betas associated with each rating category j of the dependent variable RATING.  The y(j), j=1,2,3,4,5 are the fixed scores used for the dependent variable, (1, 2, 3, 4, and 5 in this example).  The desired probabilities are thus computed as: Prob(Rating = j | Z1, Z2, Z3) = exp[f(j)] / [exp(f(1))+exp(f(2))+exp(f(3))+exp(f(4))+exp(f(5))], j = 1,…,5.

 (For additional technical information on this model see the associated Magidson references)

 "Maximum Likelihood Assessment of Clinical Trials Based on an Ordered Categorical  Response."  Drug Information Journal, Maple Glen, PA: Drug Information Association, Vol. 30,  No. 1, 1996.

"Multivariate Statistical Models for Categorical Data," chapter 3 in Bagozzi, Richard, Advanced  Methods of Marketing Research, Blackwell, 1994. 

Additional  coefficients, labeled gammas (as opposed to betas) pertaining to the multinomial logit model for predicting the latent variable as a function of the covariates (SEX and AGE for this example) are listed at the bottom of the parameters output file (in Latent GOLD®). In the model containing no covariates, the gamma coefficients (labeled 'intercepts') relate to the size of the classes which are always ordered from largest (the first latent class) to smallest (the last  class).

After I run the model, say on a binary response, and get two latent classes with their set of parameters, I'd like to predict the response of a new observation, with a given set of predictors and with or without covariates, but unknown response. How can I get this from Latent Gold?

With active covariates, posterior membership probabilities are computed for cases with missing responses (whether or not they are 'new' cases), based on their covariate values, as shown in Latent GOLD's 'covariate classification' output. These probabilities are used as weights applied to the predictions for each latent class, using the predictors for such cases and the regression coefficients associated with that class to get the appropriate prediction for each class. Without active covariates, the posterior membership probabilities are taken to be the overall class sizes.

In practice, if the new cases are included in the data file but given a case weight close to 0 (say 1E-49), while all other cases are given a weight of 1, and the 'include missing' option is used, such cases will not be used to estimate the model parameters (so the same solution would be obtained without the new cases), but by requesting that Predictions be output to a file, predictions for ALL cases, including the new cases, will be output to the file.

Advanced/Syntax and Technical questions

What additional functionalities are gained with the advanced version of Latent Gold?

The advanced version of Latent GOLD consists of an advanced module containing the ability to 1) estimate multi-level latent class models, 2) incorporate complex sampling designs, and 3) include random effects with continuous factors (CFactors). An overview of these capabilities is provided in section 8 of the Technical Guide, followed by detailed documentation for each of these 3 advanced features in sections 9, 10 and 11, as well as output produced by these advanced features in section 12. You may download a demo version that contains the Advanced module and use it with any of our demo data sets.

Beginning with version 5.1, the Advanced version of Latent GOLD also includes the Syntax Module, which allows users to customize their models with many additional advanced capabilities. For details on the Syntax Module see the Syntax User Guide.

I have the Advanced version of Latent GOLD. Is it possible to estimate IRT models such as the Rasch model, Rost's Rasch mixture model, partial credit model and rating scale models that can be estimated in the WINMIRA program? If so, how do the parameterizations differ?

Yes, Latent GOLD Advanced (LGA) can be used to estimate a wide variety of IRT and IRT mixture models. This Tutorial describes the connections between various LGA and standard IRT models.

How does Latent GOLD takes into account the complex sampling scheme during multiple imputation? Is there some technical documentation that explains the details?

Yes it does. This is not documented specific to multiple imputation, but the nonparametric bootstrap used in the multiple imputation procedure (to account for parameter uncertainty) deals with the complex sampling design (the resampling is done at the PSU level). This is explained in the LG-Syntax User's Guide when discussing complex sampling features. On page 85 we say: "In a nonparametric bootstrap, replicate samples are obtained by sampling C(o) PSUs with replacement from each stratum. The values of delta(r) is 1/R."

I scored my data file using the 'classification output file' option and found that the percentage of each class is different than the class sizes given in the profile output. What am I (or Latent GOLD® ) doing wrong?

Nothing is wrong. What you are observing is the effects of misclassification errors associated with assignment to a latent class based upon the modal (highest) class probability. For example, in a 3-class model if the posterior membership probabilities for cases having a given response pattern are 0.2 (for class 1), 0.7 (for class 2), and 0.1 (for class 3), the modal probability is 0.7. Assignment based on the modal probability means that all such cases will be assigned to class 2. However, such assignment is expected to be correct for only 70% of these cases, since 20% truly belong to class 1 and the remaining 10% belong to class 3. The expected  misclassification rate for these cases will be 20% + 10% = 30%. For cases with other response patterns, the misclassification rate may be 7%, or 2%, etc. The modal assignment rule minimizes the overall expected misclassification rate (the overall expected misclassification rate is given in the output). To the extent the misclassification rate is greater than 0, the observed frequency distribution of class memberships will reflect the effects of such misclassification. The marginal distributions in the classification table show how the marginal distribution changes when using modal class assignment.

I have included several ordinal variables with many values in my model and the program takes a very long time to run? Can I do anything to speed it up?

Substantial improvement in speed can be accomplished by specifying the variables to be continuous. Alternatively, the grouping option can be used to reduce the number of levels in the ordinal variables (to say 10 or 20).

I’ve been reading through pp. 56-57 of the Latent Gold manual to try to understand the difference between ‘Replication Weights’ and ‘Case Weights,’ and I’m having some difficulty understanding which I should be using. I have survey data where different respondents may have differing number of rows of data. Different respondents will generally have different weights, but the weight variable would have the same value for each row of data within a respondent. The respondent weights reflect how much we want that respondent to ‘count’ in any analysis. It’s not clear to me whether I should import the weight variable into ‘Replication Weight’ or ‘Case Weight’ when I’m setting up my analysis. I want to use all the data – i.e., it’s not a question of creating a holdout sample by weighting down certain rows to 1.0e-100.

In this situation you should use a case weight since you want to modify the weights of cases.

A replication weight is used to increase or decrease the weight of a choice within a case. A weight of ‘2’ means that this person made this choice twice. So setting all replication weights to 2 for a particular case is not the same as having a case weight of 2:

A case weight of 2 means: there are 2 cases with this set of choices (I have to count
this person twice).
A series of replication weights of 2 means: this person made each of these choices twice (I have twice as much information for one person).

In the special case of a 1 class model, the two are equivalent because all observations are assumed to be independent. In all other situations, they are very different. You can consult the manual to see how the weights enter into the log-likelihood function, which is very different for the two types of weights.

LG Choice

What is conjoint and discrete choice analysis?

In conjoint analysis, the goal is to obtain share estimates for various product or service configurations of interest. These configurations are defined based on combinations of attribute levels (e.g, PRICE = $499, BRAND = Sony, …). In traditional conjoint analysis, also known as ratings-based conjoint, respondents rate the various products/services/alternatives/options. In discrete choice studies, also known as choice based conjoint (CBC) experiments, respondents are posed with various competitive scenarios where they are asked to choose between 2 or more products/services/alternatives/options. Rather than rating each alternative, they are asked to select the most preferred or most important, or rank the alternatives. In both kinds of conjoint analyses, alternatives are expressed in terms of 1 or more attributes. A model then estimates the importances/utilities of these attributes.

Why can’t discriminant analysis and traditional multinomial logit models be used to estimate discrete choice models?

Such models utilize predictors that are characteristics of the respondents, and cannot accomodate characteristics (attributes) of the choices as predictors. To do the latter requires the conditional logit model, now also known as the multinomial logit model.

How do latent classes enter into conjoint analysis?

Since not all respondents have the same utilities, K>1 latent classes (segments), each having its own importance to the various attributes are assumed to exist, and utilities are estimated for each segment. Various statistics are available to help determine the number of segments K.

Can latent class conjoint models be estimated with traditional statistical modeling software?

No. Traditional conjoint programs do not include latent classes and therefore typically produce distorted share estimates (see article). Ratings based conjoint models can be estimated using Latent GOLD (see LG tutorial #2) or Latent GOLD Choice. Discrete choice (choice-based conjoint) models require Latent GOLD Choice.

We usually build our own simulator in Excel. If I use the individual HB-like utilities generated by the Latent GOLD Choice program, can I utilize my MNL share calculator with Latent Gold to estimate individual share of preference?

Yes. Latent GOLD Choice allows you to specify both active and inactive sets (and active as well as inactive alternatives) for an analysis. The inactive sets (and inactive alternatives) are not included in the experiment so that no response information is available for these. They are used because they are of interest and the resulting model can be used to generate/simulate shares.

The output tab of the program allows you to output to a file a) predicted values and/or b) individual coefficients for each choice set for each case. For the individual predicted values, this output consists of 1) the probability (share) of each choice and 2) the most likely choice. If you select the "HB prediction" option, these shares are calculated (automatically) using the HB-like individual coefficients. Alternatively, the default option is to calculate shares using maximum likelihood estimation which generally differs slightly from the HB estimates.

Regarding simulation, you could use these individual coefficients in your Excel simulator as you do now.

For the K-class model, the standard output of the program includes for each of the K classes (for each of the K segments) separately, as well as for the overall sample a) the part-worth utilities ("parameters output"), plus b) predicted probabilities (shares) for each choice set ("Sets Profile").

Regarding simulation, the program allows you to specify any number of additional choice sets to be simulated (we call these 'inactive' sets), and includes share estimates for each of these in addition to each of the 'active' sets in the Sets Profile output. See Tutorial 2 for an example of this. If the "HB Prediction" option is set in the Output tab, the overall share estimates in the Sets Profile output are based on the individual HB-like coefficients. Otherwise, they are based on the maximum likelihood.

Does inclusion of latent classes in my model allow better share predictions?

Yes. The standard aggregate model generally suffers from violations of the independence of irrelevant alternatives (IIA) problem, which distorts share predictions. Inclusion of latent classes to account for the fact that different segments have different utility preferences, (and IIA holds true within each segment), is a way of resolving this problem and improving share predictions. See "New Developments in Latent Class Choice Models" article for more information.