**General **

What

resources are available to learn about Latent GOLD® and latent class

modeling?

What

data file formats can Latent GOLD® handle?

How can I use Latent GOLD® with SAS data sets?

How

many records and variables can I use? How much time will it take to

run?

How

does Latent GOLD® differ from the LEM Program?

Do you have any tutorials for event history analysis?

**LC Cluster Analysis**

How

does Latent GOLD® classify cases into latent classes?

When the ‘Include Missing’ option is selected, does Latent Gold do some kind of imputation?

How

are Latent Class (LC) clustering techniques related to Fuzzy Clustering

Techniques?

How can I tell if my latent class cluster model contains local dependencies?

How can I handle local dependencies in my LC cluster model?

**LC Factor Analysis **

How

does latent class factor analysis compare with traditional factor

analysis?

Latent class

factor analysis running time.

**LC Regression Analysis**

How

does LC regression analysis compare with traditional regression

modeling?

Is

there any “stepwise” inclusion feature in the LC regression

module?

Questions

on Tutorial #3: LC Regression with Repeated Measures

**Technical Questions from Latent GOLD®
Users**

**Latent GOLD Advanced Questions**

What additional functionalities are gained with the advanced version of Latent Gold?

**LG-Syntax Questions**

**Replication and Case Weights**

**General**

**Q.
What resources are available to learn about Latent GOLD® and latent class
modeling?**

A. Before purchasing the program, you can try out the free

demo

version of the program, which allows access to all program features with

sample data files. Tutorials

take you step-by-step through several analyses of these sample files. These

tutorials along with several articles

are available on our website. Upon purchase of the program users can download a 200

page User’s Guide that covers a wide range of topics on Latent Class Analysis

and Latent GOLD® . We also offer a once a year training

program (Statistical Modeling Week) which includes a 2 day course on Latent

Class Analysis, as well as Online Courses

**Q. What data file
formats can Latent GOLD® handle?**

A. Latent GOLD® can handle ASCII

Text data formats as well as SPSS files.

**Q. How can I
use Latent GOLD® with SAS data sets?**

SAS Export can create an SPSS .sav file which can be opened by Latent GOLD®. The SAS Documentation

illustrates the Export function. View the Relevant Export page for instructions.

**Q.
How many records and variables can I use? How much time will it take to
run?**

A .There is NO limit concerning the number of records. The time will depend on several factors including

the # of variables and records, speed of your machine, and the requested output.

For many models, Latent GOLD® runs 20 or more times faster than other Latent

Class programs. We suggest trying the demo program to see how fast Latent GOLD®

works on your machine.

**Q. How does Latent
GOLD differ from the LEM program?**

Latent GOLD® implements the 3 most important types of

latent class (LC) models. It was designed to be extremely easy to use and to

make it possible for people without a strong statistical background to apply LC

analysis to their own data in a safe and easy way. LEM is a command language

research tool that Prof. Jeroen Vermunt developed for applied researchers

*with* a strong statistical background who want to apply nonstandard

log-linear and latent class models to their categorical data. With LEM you can

specify more probability structures with many more kinds of restrictions (if you

know how to do it), but is not designed to be Windows friendly, requires strict

data and input formats and does not provide error checks.

With Latent

GOLD, continuous and count variables can be included in the model, and special

LC output not available in LEM is provided, such as various graphs,

classification statistics, and bivariate residuals. Latent GOLD® also has faster

(full Newton-Raphson) and safer (sets of starting values, Bayes constants)

estimation methods for LC models than LEM. Both programs give information on

nonidentifiability and boundary solutions, but Latent GOLD® , unlike LEM, can

prevent boundary solutions through the use of Bayes constants.

**Q. Do you have any tutorials for event history analysis?**

The set of example data files on our website contains various

event history analysis examples. Tutorials are not yet available for

these. However, to get you started, you might look at the data file land.sav, the full reference for which is ” Land, K.C., Nagin, D.S., and

McCall (2001). Discrete-time hazard regression

models with hidden heterogeneity: the

semi-parametric mixed Poisson approach.

Sociological Methods and Research, 29,

342-373.” Another good example is

jobchange.dat.

Land.sav contains information on 411 males

from working-class area of London who were

followed from ages 10 through 31. The

dependent variable is “first serious

delinquency”.

As can be seen, there is one record for

each time point, which is called a person-

period data format. The dependent “first” is

zero for all records of a person, expect for

the last if a person experienced the event of

interest at that age.

The variables age and age_sq are the

duration variables. These can also be seen

as time-varying predictors. The variable “tot”

is a time-constant covariate/predictor (a

composite risk factor).

Of course the ID should be used as Case ID to

indicate which records belong to the same

case.

The dependent “first” can be treated as a

Poisson count or as a binomial count. The

former option yields a piece-wise constant log-

linear hazard model, the latter a discrete-time

logit. If treated as Poisson count, it is best to

set the exposure to one half (exp_half: event

occurs in the middle of the interval) for the

time point at which the event occurs. With a

binomial count the exposure should be one all

the time (=default).

Age and age_sq should be used as class-

dependent predictors. You identify two

groups with clearly different age pattern in the

rate of first delinquency. The variable “tot” can

be used as class-independent predictor, but

more interesting is to use it as covariate: does

the risk factor determine the type of

delinquency trajectory?

This example can be modified.extended in

many ways.

– you can include other time-varying predictors

than the time variables. These can be

assumed to have the same or different effects

across classes.

– you can include information on another

event. In that case your classes describe the

pattern in multiple events

– you can include as many covariates as you

want (this will usually be demographics, but

can also be a treatment)

– you can model the time dependence as

nominal, yielding a Cox-like model.

A general reference on event history

combined with LC analysis is Vermunt (1997),

Log-linear event history analysis. Sage

Publications..

**LC Cluster Analysis**

A. LC clustering is model-based in contrast to traditional

approaches that are based on ad-hoc distance measures. The general probability

model underlying LC clustering more readily accommodates reality by allowing for

unequal variances in each cluster, use of variables with mixed scale types, and

formal statistical procedures for determining the number of clusters, among many

other improvements. For a detailed comparison showing how LC cluster outperforms

SPSS K-means clustering and SAS FASTCLUS procedures, see Latent Class

Modeling as a Probabilistic Extension of K-means Clustering.

Published article on a comparison of SPSS TwoStep Cluster with Latent GOLD

**Q. How does
Latent GOLD® classify cases into latent classes. **

A. Cases are assigned to the latent class having the

highest posterior membership probability. Covariates can be added to the model

for improved description and prediction of the latent classes.

**Q.
When the ‘Include Missing’ option is selected, does Latent Gold do some kind of imputation?**

A. No, imputation is not necessary. Classification with missing values works exactly the same as classification without

missing values. It is simply based on the variables that are observed for the case

concerned. There is no imputation of missing values for indicators. One of the nice

things about LC analysis is that imputation is not necessary.

In the User’s Guide, we give the general form of the density with missing

values. The crucial thing is the delta, which is 0 if an indicator is missing. If that occurs the

term cancels (it is equal to 1 irrespective of the value of y).

Thus with 4 indicators y1, y2, y3, and y4, two clusters and y2 missing

P(x|y1,y3,y4) = P(x) P(y1|x) P(y3|x) P(y4|x) / P(y1,y3,y4)

where

P(y1,y3,y4) = P(1) P(y1|1) P(y3|1) P(y4|1) + P(2) P(y1|2) P(y3|2) P(y4|2)

**Q.
How are Latent Class (LC) clustering techniques related to Fuzzy Clustering
Techniques**

A. In fuzzy clustering, a case has *grades of
membership* which are the “parameters” to be estimated (Kaufman and

Rousseeuw, 1990). In contrast, in LC clustering an individual’s posterior

class-membership probabilities are computed from the estimated model parameters

and the observed scores. The advantage of the LC approach is that it is possible

to use the LC model to classify

*other*cases (outside the sample used to

estimate the model) which belong to the population. This is not possible with

standard fuzzy clustering techniques.

Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in

data: An introduction to cluster analysis, New York: John Wiley and

Sons.

**Q.
How can I tell if my latent class cluster model contains local dependencies?**

A. Local dependence for a K-class model exists if the model does NOT fit the

data. One such measure of model fit is given by the bivariate residuals (BVRs)

associated with each pair of model indicators. Large BVRs (values over 2)

can be viewed as evidence of local dependence

associated with that pair of model indicators (see the

residuals output of Latent GOLD for these statistics).

**Q.
How can I handle local dependencies in my LC cluster model?**

A. Local dependence can

be accounted for by simply adding latent classes or by maintaining the

current number of classes and modifying the model in other ways such as

adding direct effects associated with 2 variables that have large bivariate

residuals. See the LG manual for details of how to add

direct effects. See also section 3 in

https://www.statisticalinnovations.com/articles/sage11.pdf for further

details of the different approaches for dealing with local dependence.

A. You can estimate several models and select the one that fits best according to BIC.

For example, six types of LC cluster models are reported in Table 1 of the Latent Class

Cluster Analysis article. These models differ with respect to a) the specification of class dependent vs. class independent error variances and

b) the ‘direct effects’ included in the LC cluster model estimated by

LatentGOLD. The 3-class type-5 model is best according to the BIC statistic.

Various parameter estimates and standard errors from this ‘final’ model

are obtained from the Profile and Parameters Output.

Click on dataset #29

and download the data and the model setup file diabetes.lgf containing the specifications for each of the 6 types of 3-class cluster models described in Table 1.

**LC Factor Analysis**

**Q.
How does latent class factor analysis compare with traditional factor analysis?
**

A. The LC factor model assumes that each factor contains 2

or more ordered categories as opposed to traditional factor analysis which

assumes that the factors (as well as the variables) are continuous (interval

scaled). The variables in LC factor analysis need not be continuous. They may be

mixed scale types (nominal, ordinal, continuous, counts, or combinations of

these). LC Factor also has a close relationship to cluster analysis. For an

introduction to LC factor analysis, and to see how it relates to LC cluster

analysis, see Magidson and

Vermunt Sociological Methodology 2001. For a comparison with traditional

Factor Analysis in datamining see Traditional vs. Latent

Class Factor Analysis for Datamining

A. A 2-factor model on a 650 MH computer took less than 2

minutes to estimate and a 3-factor model 4 minutes. As the number of factors

increases the estimation time increases exponentially. From an exploratory

perspective, you may well find that a 2 or 3 factor solution will already be

quite informative — 3 dichotomous factors will segment the sample into 8

distinct clusters! On the other hand, 8 dichotomous factors corresponds to 2 to

the power 8 = 256 clusters. To see the relationship between factors and

clusters, see Magidson and

Vermunt Sociological Methodology 2001.

Traditional factor analysis (FA) is faster because it makes a simplifying

assumption that all variables are continuous and that they follow a multivariate

normal (MVN) distribution. When these assumptions are true, only the second

order moments (the correlations between the variables) are needed to estimate

the model. For these data, the FA assumptions are not justified.

Latent GOLD® does not assume MVN and hence is much more general. It utilizes

information from all higher order associations (more than means and

correlations) in the estimation of parameters. The resulting solution will be

directly interpretable and unique, unlike the traditional FA solution which

requires a rotation for interpretability. Traditional vs. Latent

Class Factor Analysis for Datamining is an article by the developers of LG

that will appear in a book on datamining. It shows why LG factor analysis often

provides insights into data that are missed by traditional FA.

**LC Regression Analysis**

**Q.
How does latent class regression analysis compare with traditional regression
modeling?**

A. There are 2 primary kinds of differences. First, the

*particular* regression is automatically determined according to the scale

type of the dependent variable. For continuous, the traditional linear

regression is employed; for dichotomous, logistic regression; for ordinal, the

baseline/adjacent category logit extension; for nominal, multinomial logit; for

count, Poisson regression. models are used. For example, for dichotomous

dependent variables, the logistic regression model is used. Second, LC

Regression is a mixture model and hence is more general than traditional

regression. The special case of 1-class corresponds to the homogeneous

population assumption made in traditional regression. In LC regression, separate

regressions are estimated simultaneously for each latent class.

**I
need a mixture modeling program that can handle dependent variables that are
dichotomous as well as continuous. Does Latent GOLD® handle this?**

A. Yes. Mixture modeling and latent class

modeling are synonymous

**Is
there any “stepwise” inclusion feature in the LC regression
module?**

A. No. Since the latent classes may be highly dependent on

the predictors that are included, stepwise features have not been implemented in

the latent class regression module.

A. The ‘gamma’ parameters labeled ‘Intercept’ (and other gamma parameters that would appear if you have covariates) refer to the model to predict the latent variable classes as a function of the covariates. If no covariates are included in the model only the Intercept appears under the label ‘(gamma)’. Beneath the gamma parameters, the parameters labeled ‘beta’ appear. These refer to the model to predict the dependent variable (which including the dependent variable regression intercept). This output has been rearranged in Latent GOLD® to provide better separation of the parameters from these two different models. See Tutorial 3 (PDF) for an example.

Latent GOLD® also has many additional features useful for prediction, such as

the automatic generation of predicted values, the ability to restrict the regression coefficients in many ways, and R-square statistics. See the User’s Guide (PDF) and Technical Guide(PDF) for further details on these new features.

A. Multinomial LC regression models are estimated simply

by specifying the dependent variable to be nominal. In the case of repeated

measures, (multiple time points, multiple ratings by the same respondent, etc.)

an ID variable can be used to identify the records associated with the same

case. (See tutorial

#2 for an example of a repeated measures conjoint study.) Latent GOLD®

cannot currently estimate *conditional* logit models of the kind used in

discrete choice studies, although such capability will be incorporated in Latent

GOLD Choice, and add-on to Latent GOLD® , that is now under

development.

A. Yes. One important additional aspect is that estimated class-membership

also improves overall prediction and contributes to the magnitude of R square.

A. This is not fully correct: These measures indeed indicate how well we can predict class membership. But, the covariates alone do not determine classification — the regression model itself plays a major role in predicting class membership. This prediction/classification is based on a person’s responses on the dependent variable (given predictor values). If you look at the formulas, you can see that the posterior membership probabilities do not only depend on P(x|z), but also on P(y|x,z). Even without any covariates (z), these models usually predict class membership quite well.

Intuitively, one determines which class-specific regression model fits best to the responses of a certain case. The better that a regression model associated with a particular class fits, the higher the probability of belonging to that class. Price sensitive people are assigned to the class for which the regression shows higher price effects, etc.

In Latent GOLD, we also report a separate R-squared for the prediction of class membership based on covariates only.

A. The ‘ordinal’ dependent variable specification is used

in this example which causes the baseline category logit model to be used. The

*beta* coefficients listed in the column of the *parameters output
file* corresponding to a particular latent class are the b-coefficients in

the following model:

f( j | Z1, Z2, Z3) = b0(j) + b1*Z1*y(j) + b2*Z2*y(j) +

b2*Z3*y(j).

The b0 estimates are the betas associated with each rating

category j of the dependent variable RATING.

The y(j), j=1,2,3,4,5 are the fixed scores used for the

dependent variable, (1, 2, 3, 4, and 5 in this example)

The desired probabilities are thus computed as:

Prob(Rating = j | Z1, Z2, Z3) = exp[f(j)] /

[exp(f(1))+exp(f(2))+exp(f(3))+exp(f(4))+exp(f(5))] , j = 1,…,5

(For additional technical information on this model see

the associated Magidson references)

“Maximum Likelihood Assessment of Clinical Trials Based on

an Ordered Categorical Response.” *Drug Information Journal*, Maple

Glen, PA: Drug Information Association, Vol. 30, No. 1, 1996.

“Multivariate Statistical Models for Categorical Data,”

chapter 3 in Bagozzi, Richard, *Advanced Methods of Marketing Research*,

Blackwell, 1994.

*Additional* coefficients, labeled *gammas* (as

opposed to *betas*) pertaining to the multinomial logit model for

predicting the *latent variable* as a function of the covariates (SEX and

AGE for this example) are listed at the bottom of the parameters output file (in Latent GOLD®). In

the model containing *no* covariates, the gamma coefficients (labeled

‘intercepts’) relate to the size of the classes which are always ordered from

largest (the first latent class) to smallest (the last class).

A. With active covariates, posterior membership probabilities are computed

for

cases with missing responses (whether or not they are ‘new’ cases), based on

their covariate values, as shown in Latent GOLD’s ‘covariate classification’

output. These probabilities are used as weights applied to the predictions

for each latent class, using the predictors for such cases and the

regression coefficients associated with that class to get the appropriate

prediction for each class. Without active covariates, the posterior

membership probabilities are taken to be the overall class sizes.

In practice, if the new cases are included in the data file but given a case

weight close to 0 (say 1E-49), while all other cases are given a weight of

1, and the

‘include missing’ option is used, such cases will not be used to estimate

the model parameters (so the same solution would be obtained without the new

cases), but by requesting that Predictions be output to a file, predictions

for ALL cases, including the new cases, will be output to the file..

**Technical Questions from Latent GOLD®
Users**

A. Nothing is wrong. What you are observing is the effects

of misclassification errors associated with assignment to a latent class based

upon the modal (highest) class probability. For example, in a 3-class model if

the posterior membership probabilities for cases having a given response pattern

are 0.2 (for class 1), 0.7 (for class 2), and 0.1 (for class 3), the modal

probability is 0.7. Assignment based on the modal probability means that

*all* such cases will be assigned to class 2. However, such assignment is

expected to be correct for only 70% of these cases, since 20% truly belong to

class 1 and the remaining 10% belong to class 3. The expected misclassification

rate for these cases will be 20% + 10% = 30%. For cases with other response

patterns, the misclassification rate may be 7%, or 2%, etc. The modal assignment

rule minimizes the *overall* expected misclassification rate (the

*overall* expected misclassification rate is given in the output). To the

extent the misclassification rate is greater than 0, the observed frequency

distribution of class memberships will reflect the effects of such

misclassification. The marginal distributions in the classification table show how the marginal distribution changes when using modal class assignment.

A. Substantial improvement in speed can be accomplished by

specifying the variables to be continuous. Alternatively, the grouping option

can be used to reduce the number of levels in the ordinal variables (to say 10

or 20).

A. The output listings in the manual for the IRIS data

contain some errors in the statistics. You can download the correct

specification for each model (iris.lgf)

and the data

(iris.dat).

**Latent GOLD Advanced Questions**

**Q. What additional functionalities are gained with the advanced version of Latent Gold?**

A. The advanced version of Latent GOLD consists of an advanced module containing the ability to 1) estimate multi-level latent class models, 2) incorporate complex sampling designs, and 3) include random effects with continuous factors (CFactors). An overview of these capabilities is provided in section 8 of the Technical Guide, followed by detailed documentation for each of these 3 advanced features in sections 9, 10 and 11, as well as output produced by these advanced features in section 12. You may download a demo version that contains the Advanced module and use it with any of our demo data sets.

A. Yes, Latent GOLD Advanced (LGA) can be used to estimate a wide variety of

IRT and IRT mixture models. This .pdf describes the connections

between various LGA and standard IRT models. The associated .lgf and data

files illustrate examples that can be run with our demo data sets. (Note

that we set the Bayes constant to 0 in these runs.)

**LG-Syntax Questions**

Yes it does. This is not documented specific to multiple imputation, but the nonparametric

bootstrap used in the multiple imputation procedure (to account for parameter uncertainty) deals with

the complex sampling design (the resampling is done at the PSU level). This is explained

in the LG-Syntax User’s Guide when discussing complex sampling features. On page 85 we say: “In a

nonparametric bootstrap, replicate samples are obtained by sampling C(o) PSUs with

replacement from each stratum. The values of delta(r) is 1/R.”

**Replication and Case Weights**

A. In this situation you should use a case weight since you want to modify the weights of cases.

A replication weight is used to increase or decrease the weight of a

choice within a case. A weight of ‘2’ means that this person made this choice twice.

So setting all replication weights to 2 for a particular case is not the

same as having a case weight of 2:

- A case weight of 2 means: there are 2 cases with this set of choices (I have to count

this person twice). - A series of replication weights of 2 means: this person made each of these choices twice (I have

twice as much information for one person).

In the special case of a 1 class model, the two are equivalent because all observations are assumed to be independent. In all other situations, they are very different. You can consult the manual to see how the weights enter into the log-likelihood function, which is very

different for the two types of weights.