General
What
resources are available to learn about Latent GOLD® and latent class
modeling?
What
data file formats can Latent GOLD® handle?
How can I use Latent GOLD® with SAS data sets?
How
many records and variables can I use? How much time will it take to
run?
How
does Latent GOLD® differ from the LEM Program?
Do you have any tutorials for event history analysis?
LC Cluster Analysis
How
does Latent GOLD® classify cases into latent classes?
When the ‘Include Missing’ option is selected, does Latent Gold do some kind of imputation?
How
are Latent Class (LC) clustering techniques related to Fuzzy Clustering
Techniques?
How can I tell if my latent class cluster model contains local dependencies?
How can I handle local dependencies in my LC cluster model?
LC Factor Analysis
How
does latent class factor analysis compare with traditional factor
analysis?
Latent class
factor analysis running time.
LC Regression Analysis
How
does LC regression analysis compare with traditional regression
modeling?
Is
there any “stepwise” inclusion feature in the LC regression
module?
Questions
on Tutorial #3: LC Regression with Repeated Measures
Technical Questions from Latent GOLD®
Users
Latent GOLD Advanced Questions
What additional functionalities are gained with the advanced version of Latent Gold?
LG-Syntax Questions
Replication and Case Weights
General
Q.
What resources are available to learn about Latent GOLD® and latent class
modeling?
A. Before purchasing the program, you can try out the free
demo
version of the program, which allows access to all program features with
sample data files. Tutorials
take you step-by-step through several analyses of these sample files. These
tutorials along with several articles
are available on our website. Upon purchase of the program users can download a 200
page User’s Guide that covers a wide range of topics on Latent Class Analysis
and Latent GOLD® . We also offer a once a year training
program (Statistical Modeling Week) which includes a 2 day course on Latent
Class Analysis, as well as Online Courses
Q. What data file
formats can Latent GOLD® handle?
A. Latent GOLD® can handle ASCII
Text data formats as well as SPSS files.
Q. How can I
use Latent GOLD® with SAS data sets?
SAS Export can create an SPSS .sav file which can be opened by Latent GOLD®. The SAS Documentation
illustrates the Export function. View the Relevant Export page for instructions.
Q.
How many records and variables can I use? How much time will it take to
run?
A .There is NO limit concerning the number of records. The time will depend on several factors including
the # of variables and records, speed of your machine, and the requested output.
For many models, Latent GOLD® runs 20 or more times faster than other Latent
Class programs. We suggest trying the demo program to see how fast Latent GOLD®
works on your machine.
Q. How does Latent
GOLD differ from the LEM program?
Latent GOLD® implements the 3 most important types of
latent class (LC) models. It was designed to be extremely easy to use and to
make it possible for people without a strong statistical background to apply LC
analysis to their own data in a safe and easy way. LEM is a command language
research tool that Prof. Jeroen Vermunt developed for applied researchers
with a strong statistical background who want to apply nonstandard
log-linear and latent class models to their categorical data. With LEM you can
specify more probability structures with many more kinds of restrictions (if you
know how to do it), but is not designed to be Windows friendly, requires strict
data and input formats and does not provide error checks.
With Latent
GOLD, continuous and count variables can be included in the model, and special
LC output not available in LEM is provided, such as various graphs,
classification statistics, and bivariate residuals. Latent GOLD® also has faster
(full Newton-Raphson) and safer (sets of starting values, Bayes constants)
estimation methods for LC models than LEM. Both programs give information on
nonidentifiability and boundary solutions, but Latent GOLD® , unlike LEM, can
prevent boundary solutions through the use of Bayes constants.
Q. Do you have any tutorials for event history analysis?
The set of example data files on our website contains various
event history analysis examples. Tutorials are not yet available for
these. However, to get you started, you might look at the data file land.sav, the full reference for which is ” Land, K.C., Nagin, D.S., and
McCall (2001). Discrete-time hazard regression
models with hidden heterogeneity: the
semi-parametric mixed Poisson approach.
Sociological Methods and Research, 29,
342-373.” Another good example is
jobchange.dat.
Land.sav contains information on 411 males
from working-class area of London who were
followed from ages 10 through 31. The
dependent variable is “first serious
delinquency”.
As can be seen, there is one record for
each time point, which is called a person-
period data format. The dependent “first” is
zero for all records of a person, expect for
the last if a person experienced the event of
interest at that age.
The variables age and age_sq are the
duration variables. These can also be seen
as time-varying predictors. The variable “tot”
is a time-constant covariate/predictor (a
composite risk factor).
Of course the ID should be used as Case ID to
indicate which records belong to the same
case.
The dependent “first” can be treated as a
Poisson count or as a binomial count. The
former option yields a piece-wise constant log-
linear hazard model, the latter a discrete-time
logit. If treated as Poisson count, it is best to
set the exposure to one half (exp_half: event
occurs in the middle of the interval) for the
time point at which the event occurs. With a
binomial count the exposure should be one all
the time (=default).
Age and age_sq should be used as class-
dependent predictors. You identify two
groups with clearly different age pattern in the
rate of first delinquency. The variable “tot” can
be used as class-independent predictor, but
more interesting is to use it as covariate: does
the risk factor determine the type of
delinquency trajectory?
This example can be modified.extended in
many ways.
– you can include other time-varying predictors
than the time variables. These can be
assumed to have the same or different effects
across classes.
– you can include information on another
event. In that case your classes describe the
pattern in multiple events
– you can include as many covariates as you
want (this will usually be demographics, but
can also be a treatment)
– you can model the time dependence as
nominal, yielding a Cox-like model.
A general reference on event history
combined with LC analysis is Vermunt (1997),
Log-linear event history analysis. Sage
Publications..
LC Cluster Analysis
A. LC clustering is model-based in contrast to traditional
approaches that are based on ad-hoc distance measures. The general probability
model underlying LC clustering more readily accommodates reality by allowing for
unequal variances in each cluster, use of variables with mixed scale types, and
formal statistical procedures for determining the number of clusters, among many
other improvements. For a detailed comparison showing how LC cluster outperforms
SPSS K-means clustering and SAS FASTCLUS procedures, see Latent Class
Modeling as a Probabilistic Extension of K-means Clustering.
Published article on a comparison of SPSS TwoStep Cluster with Latent GOLD
Q. How does
Latent GOLD® classify cases into latent classes.
A. Cases are assigned to the latent class having the
highest posterior membership probability. Covariates can be added to the model
for improved description and prediction of the latent classes.
Q.
When the ‘Include Missing’ option is selected, does Latent Gold do some kind of imputation?
A. No, imputation is not necessary. Classification with missing values works exactly the same as classification without
missing values. It is simply based on the variables that are observed for the case
concerned. There is no imputation of missing values for indicators. One of the nice
things about LC analysis is that imputation is not necessary.
In the User’s Guide, we give the general form of the density with missing
values. The crucial thing is the delta, which is 0 if an indicator is missing. If that occurs the
term cancels (it is equal to 1 irrespective of the value of y).
Thus with 4 indicators y1, y2, y3, and y4, two clusters and y2 missing
P(x|y1,y3,y4) = P(x) P(y1|x) P(y3|x) P(y4|x) / P(y1,y3,y4)
where
P(y1,y3,y4) = P(1) P(y1|1) P(y3|1) P(y4|1) + P(2) P(y1|2) P(y3|2) P(y4|2)
Q.
How are Latent Class (LC) clustering techniques related to Fuzzy Clustering
Techniques
A. In fuzzy clustering, a case has grades of
membership which are the “parameters” to be estimated (Kaufman and
Rousseeuw, 1990). In contrast, in LC clustering an individual’s posterior
class-membership probabilities are computed from the estimated model parameters
and the observed scores. The advantage of the LC approach is that it is possible
to use the LC model to classify other cases (outside the sample used to
estimate the model) which belong to the population. This is not possible with
standard fuzzy clustering techniques.
Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in
data: An introduction to cluster analysis, New York: John Wiley and
Sons.
Q.
How can I tell if my latent class cluster model contains local dependencies?
A. Local dependence for a K-class model exists if the model does NOT fit the
data. One such measure of model fit is given by the bivariate residuals (BVRs)
associated with each pair of model indicators. Large BVRs (values over 2)
can be viewed as evidence of local dependence
associated with that pair of model indicators (see the
residuals output of Latent GOLD for these statistics).
Q.
How can I handle local dependencies in my LC cluster model?
A. Local dependence can
be accounted for by simply adding latent classes or by maintaining the
current number of classes and modifying the model in other ways such as
adding direct effects associated with 2 variables that have large bivariate
residuals. See the LG manual for details of how to add
direct effects. See also section 3 in
http://www.statisticalinnovations.com/articles/sage11.pdf for further
details of the different approaches for dealing with local dependence.
A. You can estimate several models and select the one that fits best according to BIC.
For example, six types of LC cluster models are reported in Table 1 of the Latent Class
Cluster Analysis article. These models differ with respect to a) the specification of class dependent vs. class independent error variances and
b) the ‘direct effects’ included in the LC cluster model estimated by
LatentGOLD. The 3-class type-5 model is best according to the BIC statistic.
Various parameter estimates and standard errors from this ‘final’ model
are obtained from the Profile and Parameters Output.
Click on dataset #29
and download the data and the model setup file diabetes.lgf containing the specifications for each of the 6 types of 3-class cluster models described in Table 1.
LC Factor Analysis
Q.
How does latent class factor analysis compare with traditional factor analysis?
A. The LC factor model assumes that each factor contains 2
or more ordered categories as opposed to traditional factor analysis which
assumes that the factors (as well as the variables) are continuous (interval
scaled). The variables in LC factor analysis need not be continuous. They may be
mixed scale types (nominal, ordinal, continuous, counts, or combinations of
these). LC Factor also has a close relationship to cluster analysis. For an
introduction to LC factor analysis, and to see how it relates to LC cluster
analysis, see Magidson and
Vermunt Sociological Methodology 2001. For a comparison with traditional
Factor Analysis in datamining see Traditional vs. Latent
Class Factor Analysis for Datamining
A. A 2-factor model on a 650 MH computer took less than 2
minutes to estimate and a 3-factor model 4 minutes. As the number of factors
increases the estimation time increases exponentially. From an exploratory
perspective, you may well find that a 2 or 3 factor solution will already be
quite informative — 3 dichotomous factors will segment the sample into 8
distinct clusters! On the other hand, 8 dichotomous factors corresponds to 2 to
the power 8 = 256 clusters. To see the relationship between factors and
clusters, see Magidson and
Vermunt Sociological Methodology 2001.
Traditional factor analysis (FA) is faster because it makes a simplifying
assumption that all variables are continuous and that they follow a multivariate
normal (MVN) distribution. When these assumptions are true, only the second
order moments (the correlations between the variables) are needed to estimate
the model. For these data, the FA assumptions are not justified.
Latent GOLD® does not assume MVN and hence is much more general. It utilizes
information from all higher order associations (more than means and
correlations) in the estimation of parameters. The resulting solution will be
directly interpretable and unique, unlike the traditional FA solution which
requires a rotation for interpretability. Traditional vs. Latent
Class Factor Analysis for Datamining is an article by the developers of LG
that will appear in a book on datamining. It shows why LG factor analysis often
provides insights into data that are missed by traditional FA.
LC Regression Analysis
Q.
How does latent class regression analysis compare with traditional regression
modeling?
A. There are 2 primary kinds of differences. First, the
particular regression is automatically determined according to the scale
type of the dependent variable. For continuous, the traditional linear
regression is employed; for dichotomous, logistic regression; for ordinal, the
baseline/adjacent category logit extension; for nominal, multinomial logit; for
count, Poisson regression. models are used. For example, for dichotomous
dependent variables, the logistic regression model is used. Second, LC
Regression is a mixture model and hence is more general than traditional
regression. The special case of 1-class corresponds to the homogeneous
population assumption made in traditional regression. In LC regression, separate
regressions are estimated simultaneously for each latent class.
I
need a mixture modeling program that can handle dependent variables that are
dichotomous as well as continuous. Does Latent GOLD® handle this?
A. Yes. Mixture modeling and latent class
modeling are synonymous
Is
there any “stepwise” inclusion feature in the LC regression
module?
A. No. Since the latent classes may be highly dependent on
the predictors that are included, stepwise features have not been implemented in
the latent class regression module.
A. The ‘gamma’ parameters labeled ‘Intercept’ (and other gamma parameters that would appear if you have covariates) refer to the model to predict the latent variable classes as a function of the covariates. If no covariates are included in the model only the Intercept appears under the label ‘(gamma)’. Beneath the gamma parameters, the parameters labeled ‘beta’ appear. These refer to the model to predict the dependent variable (which including the dependent variable regression intercept). This output has been rearranged in Latent GOLD® to provide better separation of the parameters from these two different models. See Tutorial 3 (PDF) for an example.
Latent GOLD® also has many additional features useful for prediction, such as
the automatic generation of predicted values, the ability to restrict the regression coefficients in many ways, and R-square statistics. See the User’s Guide (PDF) and Technical Guide(PDF) for further details on these new features.
A. Multinomial LC regression models are estimated simply
by specifying the dependent variable to be nominal. In the case of repeated
measures, (multiple time points, multiple ratings by the same respondent, etc.)
an ID variable can be used to identify the records associated with the same
case. (See tutorial
#2 for an example of a repeated measures conjoint study.) Latent GOLD®
cannot currently estimate conditional logit models of the kind used in
discrete choice studies, although such capability will be incorporated in Latent
GOLD Choice, and add-on to Latent GOLD® , that is now under
development.
A. Yes. One important additional aspect is that estimated class-membership
also improves overall prediction and contributes to the magnitude of R square.
A. This is not fully correct: These measures indeed indicate how well we can predict class membership. But, the covariates alone do not determine classification — the regression model itself plays a major role in predicting class membership. This prediction/classification is based on a person’s responses on the dependent variable (given predictor values). If you look at the formulas, you can see that the posterior membership probabilities do not only depend on P(x|z), but also on P(y|x,z). Even without any covariates (z), these models usually predict class membership quite well.
Intuitively, one determines which class-specific regression model fits best to the responses of a certain case. The better that a regression model associated with a particular class fits, the higher the probability of belonging to that class. Price sensitive people are assigned to the class for which the regression shows higher price effects, etc.
In Latent GOLD, we also report a separate R-squared for the prediction of class membership based on covariates only.
A. The ‘ordinal’ dependent variable specification is used
in this example which causes the baseline category logit model to be used. The
beta coefficients listed in the column of the parameters output
file corresponding to a particular latent class are the b-coefficients in
the following model:
f( j | Z1, Z2, Z3) = b0(j) + b1*Z1*y(j) + b2*Z2*y(j) +
b2*Z3*y(j).
The b0 estimates are the betas associated with each rating
category j of the dependent variable RATING.
The y(j), j=1,2,3,4,5 are the fixed scores used for the
dependent variable, (1, 2, 3, 4, and 5 in this example)
The desired probabilities are thus computed as:
Prob(Rating = j | Z1, Z2, Z3) = exp[f(j)] /
[exp(f(1))+exp(f(2))+exp(f(3))+exp(f(4))+exp(f(5))] , j = 1,…,5
(For additional technical information on this model see
the associated Magidson references)
“Maximum Likelihood Assessment of Clinical Trials Based on
an Ordered Categorical Response.” Drug Information Journal, Maple
Glen, PA: Drug Information Association, Vol. 30, No. 1, 1996.
“Multivariate Statistical Models for Categorical Data,”
chapter 3 in Bagozzi, Richard, Advanced Methods of Marketing Research,
Blackwell, 1994.
Additional coefficients, labeled gammas (as
opposed to betas) pertaining to the multinomial logit model for
predicting the latent variable as a function of the covariates (SEX and
AGE for this example) are listed at the bottom of the parameters output file (in Latent GOLD®). In
the model containing no covariates, the gamma coefficients (labeled
‘intercepts’) relate to the size of the classes which are always ordered from
largest (the first latent class) to smallest (the last class).
A. With active covariates, posterior membership probabilities are computed
for
cases with missing responses (whether or not they are ‘new’ cases), based on
their covariate values, as shown in Latent GOLD’s ‘covariate classification’
output. These probabilities are used as weights applied to the predictions
for each latent class, using the predictors for such cases and the
regression coefficients associated with that class to get the appropriate
prediction for each class. Without active covariates, the posterior
membership probabilities are taken to be the overall class sizes.
In practice, if the new cases are included in the data file but given a case
weight close to 0 (say 1E-49), while all other cases are given a weight of
1, and the
‘include missing’ option is used, such cases will not be used to estimate
the model parameters (so the same solution would be obtained without the new
cases), but by requesting that Predictions be output to a file, predictions
for ALL cases, including the new cases, will be output to the file..
Technical Questions from Latent GOLD®
Users
A. Nothing is wrong. What you are observing is the effects
of misclassification errors associated with assignment to a latent class based
upon the modal (highest) class probability. For example, in a 3-class model if
the posterior membership probabilities for cases having a given response pattern
are 0.2 (for class 1), 0.7 (for class 2), and 0.1 (for class 3), the modal
probability is 0.7. Assignment based on the modal probability means that
all such cases will be assigned to class 2. However, such assignment is
expected to be correct for only 70% of these cases, since 20% truly belong to
class 1 and the remaining 10% belong to class 3. The expected misclassification
rate for these cases will be 20% + 10% = 30%. For cases with other response
patterns, the misclassification rate may be 7%, or 2%, etc. The modal assignment
rule minimizes the overall expected misclassification rate (the
overall expected misclassification rate is given in the output). To the
extent the misclassification rate is greater than 0, the observed frequency
distribution of class memberships will reflect the effects of such
misclassification. The marginal distributions in the classification table show how the marginal distribution changes when using modal class assignment.
A. Substantial improvement in speed can be accomplished by
specifying the variables to be continuous. Alternatively, the grouping option
can be used to reduce the number of levels in the ordinal variables (to say 10
or 20).
A. The output listings in the manual for the IRIS data
contain some errors in the statistics. You can download the correct
specification for each model (iris.lgf)
and the data
(iris.dat).
Latent GOLD Advanced Questions
Q. What additional functionalities are gained with the advanced version of Latent Gold?
A. The advanced version of Latent GOLD consists of an advanced module containing the ability to 1) estimate multi-level latent class models, 2) incorporate complex sampling designs, and 3) include random effects with continuous factors (CFactors). An overview of these capabilities is provided in section 8 of the Technical Guide, followed by detailed documentation for each of these 3 advanced features in sections 9, 10 and 11, as well as output produced by these advanced features in section 12. You may download a demo version that contains the Advanced module and use it with any of our demo data sets.
A. Yes, Latent GOLD Advanced (LGA) can be used to estimate a wide variety of
IRT and IRT mixture models. This .pdf describes the connections
between various LGA and standard IRT models. The associated .lgf and data
files illustrate examples that can be run with our demo data sets. (Note
that we set the Bayes constant to 0 in these runs.)
LG-Syntax Questions
Yes it does. This is not documented specific to multiple imputation, but the nonparametric
bootstrap used in the multiple imputation procedure (to account for parameter uncertainty) deals with
the complex sampling design (the resampling is done at the PSU level). This is explained
in the LG-Syntax User’s Guide when discussing complex sampling features. On page 85 we say: “In a
nonparametric bootstrap, replicate samples are obtained by sampling C(o) PSUs with
replacement from each stratum. The values of delta(r) is 1/R.”
Replication and Case Weights
A. In this situation you should use a case weight since you want to modify the weights of cases.
A replication weight is used to increase or decrease the weight of a
choice within a case. A weight of ‘2’ means that this person made this choice twice.
So setting all replication weights to 2 for a particular case is not the
same as having a case weight of 2:
- A case weight of 2 means: there are 2 cases with this set of choices (I have to count
this person twice). - A series of replication weights of 2 means: this person made each of these choices twice (I have
twice as much information for one person).
In the special case of a 1 class model, the two are equivalent because all observations are assumed to be independent. In all other situations, they are very different. You can consult the manual to see how the weights enter into the log-likelihood function, which is very
different for the two types of weights.