 |


User's Guides
Tutorials 
Articles 
FAQ 
Software Updates
Extended Support
LG-Syntax Board
Contact Us
|
 |

Program Tutorials
 |
Technical Support > Tutorials
|
|
|
|
|
- Latent GOLD
- LG Choice
- LG-Syntax
- CORExpress
- SI-CHAID
- GOLDMineR
All sample datasets and sample .lgf files used the tutorials below are downloaded to your computer when you install the demo version of Latent GOLD. To download individual datasets and .lgf files, please refer to our Sample Datasets Page.
|
Tutorial 1: Using Latent GOLD® 4.5 to Estimate LC Cluster Models  
download PDF
Watch Video Tutorial
+ overview
- overview
In this tutorial, we use 4 categorical indicators to show how to estimate LC Cluster models and interpret the
resulting output. In this tutorial, you will:
- Open a data file
- Setup and estimate traditional latent class (cluster) models
- Explore which models best fit the data
- Generate and interpret output and interactive graphs
- Save results
|
Tutorial 2: Using Latent GOLD® 4.5 to Estimate DFactor Models  
download PDF
Watch Video Tutorial
+ overview
- overview
In this tutorial, we re-examine the results obtained from tutorial #1 using discrete factor (DFactor) models
instead of LC Cluster models. We show how a 2-DFactor model consisting of 2 dichotomous factors can be
viewed as a restricted form of the 4-cluster model and use the L2 difference statistic to test whether the
unrestricted 4-class model provides an improvement. In addition, this tutorial illustrates:
- The use of the Ordinal scale type
- Estimating DFactor models
- Factor Loadings Output
- Restricting Factor Loadings to Zero
- Joint Profile output
- Classification Output
- The Bi-plot
For these data the DFactor models provide additional insights into the different survey respondent types.
|
Tutorial 3: LC Regression with Repeated Measures
download PDF
+ overview
- overview
This tutorial shows how to develop Latent Class (LC) Regression models using the sample data file
“conjoint.sav”. You will learn how to:
- Select the dependent variable and specify its scale type
- Distinguish predictors from covariates
- Impose restrictions on the predictor effects
- Specify covariates as active or inactive
- Determine the number of latent classes (i.e., segments)
- Examine R2 and various other information related to model prediction
|
Tutorial 4: Profiling LC Segments using the CHAID Option
download PDF
+ overview
- overview
In this tutorial, we obtain further insights into the latent class segments obtained from tutorials #1 and #2
using additional variables (covariates) to profile these segments in terms of respondent demographics –
gender (SEX), education (EDUCR), marital status (MARITAL), and age (AGE).
This tutorial illustrates:
- Use of ‘inactive’ covariates feature to describe LC segments
- Use of the SI-CHAID add-on program to obtain additional descriptive profiles and tests of
significance
In addition, it illustrates
- Use of the Grouping option to reduce the number of categories of a variable
|
Tutorial 6A: Comparing Segments obtained from LC Cluster and DFactor Models in a Consumer Preference Study
download PDF
+ overview
- overview
The goal of this research was to determine if consumers can be segmented in a meaningful way on the
basis of their liking ratings of the crackers. In this tutorial 6A we will estimate and compare a number of
LC Cluster and DFactor models and interpret the resulting segments. In Tutorial 6B we will use Regression Models to obtain segments.
In this tutorial, you will:
- Estimate LC Cluster and DFactor Models
- Examine various output including the Loadings output from DFactor Models
- Use the DFactor module to obtain ‘clusters’ after ‘factoring out’ a nuisance factor
- Use the ‘Equal Effects’ option to obtain a general factor
|
Tutorial 7A: Latent Class Growth Model
download PDF
+ overview
- overview
What you will learn:
- To use Latent GOLD to identify distinct latent class growth trajectories in the data.
- To name the identified latent class subgroups based on their growth patterns
- classify 36 cases as unchanged (Class 1 above)
- classify 17 cases as improved (Class 2 above)
- classify the remaining 6 cases as Unstable (Class 3 above)
- How to estimate a latent class growth model (Poisson mixture model) to these data that shows
those receiving the drug treatment were significantly more likely than the placebo group to
improve and significantly less likely to show no change over their baseline seizure rate (p =
.02).
|
Tutorial 7B: Latent Class Growth Model Using an Active Covariate
download PDF
+ overview
- overview
Continuation of Tutorial 7B
|
Tutorial 8: LC Regression with High-dimensional Data
download PDF
+ overview
- overview
The overall goal is to predict the liking ratings as a function of the 16 attributes. There are 2 methodological challenges that need to be addressed in accomplishing this goal:
- Observations are not independent -- Since these data consist of multiple records per case, traditional (1-class) regression methods generally suffer from violation of the independent observations assumption which yields suboptimal prediction, since residuals from records associated with the same judge may be correlated. In this tutorial we show how a latent class (LC) regression can be used to identify 2 LC segments having different OJ preferences, to account for correlated observations.
- High-dimensional data – With only 6 juices being rated, use of the 16 correlated attributes as predictors yields a high-dimensional data situation such that traditional regression is not possible due to multicollinearity. We use the Correlated Component Regression (CCR) methods implemented in CORExpress to address this problem.
Two specific goals are:
- Goal 1 – to determine if the judges can be segmented on the basis of their juice liking ratings.
- Goal 2 – to determine if the juice attributes can predict the liking ratings, and if so which attributes are the most important predictors for each segment.
|
Advanced Tutorial: Latent GOLD 4.5 and IRT Modeling
download PDF
+ overview
- overview
This tutorial shows that various IRT/latent trait models can be estimated with the Cluster and/or Regression modules in Latent GOLD Advanced (LGA), by including a continuous factor (CFactor) in the model. While LGA uses a somewhat different parameterization of these models, they can be easily transformed to obtain the traditional parameters (item locations, difficulties, threshholds, etc). The .pdf shows how to do this, by illustrating the equivalences between the LGA and standard IRT parameterizations. We also show how latent-class based IRT models can be defined using the DFactor module, as well as how these relate to standard IRT models.
Data files:
Download all data files for this example
The .lgf files show how to use LGA to estimate various IRT and IRT mixture models. Simply open the .lgf files from within the (demo or standard) LGA program and select 'Estimate All' from the Model Menu. Several IRT models, appropriately labeled will be estimated, so you can view the Parameters and other Output files.
Note: These .lgf files are designed to show how to set up these types of models. Since the data sets are small, they do not make good examples for 'mixture' IRT models (i.e., IRT models containing 2 or more latent classes). As a result, when estimating mixture variants of the IRT models, you may well encounter local solutions, even with many random startsets and many start iterations.
|
We would like to thank our colleagues at the Statistical Consulting Group at UCLA Academic Technology Services for creating video seminars for our tutorials. Tutorials for which video is available are marked with the icon.
|
All sample datasets and sample .lgf files used the tutorials below are downloaded to your computer when you install the demo version of Latent GOLD Choice. To download individual datasets and .lgf files, please refer to our Sample Datasets Page.
|
|
Tutorial 1: Using LG Choice 4.5 to Estimate Discrete Choice Models
download PDF
+ overview
- overview
In this tutorial, we analyze data from a simple choice-based conjoint (CBC) experiment designed to
estimate market shares (choice shares) for shoes. In this tutorial you will:
- Set up an analysis
- Estimate choice models that specify different numbers of classes (segments)
- Explore which of these models provides the best fit to the data
- Utilize restrictions to refine the best fitting model
- Interpret results using our ‘final’ model
- Save results
|
Tutorial 1A: Using CHAID to Profile Latent Class Segments
download PDF
+ overview
- overview
From Tutorial 1, the final model will be used to:
- Predict future choices
- Simulate choices among additional products of interest
|
Tutorial 2: Using LG Choice to Predict Future Choices
download PDF
+ overview
- overview
One of the major benefits of discrete choice modeling is the ability to use the model to
predict choices for any choice set of interest including ones that were not utilized in the
original choice experiment (inactive sets). In this tutorial, we utilize our final 3-segment
model from tutorial 1 to simulate choice results for additional product alternatives of
interest. You will:
- Retrieve our previous model setup
- Utilize different Alternatives and Sets Files
- Examine predicted choice shares for current and inactive sets
- Create your own sets and obtain choice share predictions for these
- Include your sets in the tri-plot display
|
Tutorial 3: Estimating Brand and Price Effects
download PDF
+ overview
- overview
A popular application of discrete choice modeling is to simulate how market share changes when the price
of a brand changes and when the price of a competitive brand changes. With latent class choice modeling,
it is possible to estimate these changes separately among those who are price sensitive, and those who are
not so price sensitive. That is, separate effects can be obtained for each segment as well as the overall
market.
One of the flexible features of the 3-file format in Latent GOLD Choice is that it is easy to define the
effects and interactions to be included in a model. In this tutorial, we first show how to estimate the effects
of brand and price using a model where PRICE and BRAND are treated as a distinct attributes – that is,
where the effect of price sensitivity is assumed to be the same for both brands. We then show how easy it
is to relax this assumption and examine the consequences if in fact the PRICE effect differs by brand.
In this tutorial you will:
- Retrieve a previously saved model setup
- Re-estimate all models
- Determine the number of classes and name the segments
- Impose restrictions to simplify model
- Examine Output including share simulations for additional choice sets
- Include Brand x Price interactions in the model
|
Tutorial 4: Using the 1-file Format
download PDF
+ overview
- overview
In tutorial #3, we illustrated some analyses on data from the Brand Pricing Experiment,
where the data was input from a 3-file format. In this tutorial, we illustrate use of the
program when the same data is provided in the 1-file format.
In this tutorial you will:
- Retrieve a previously saved model setup
- Re-estimate all models
- Determine the number of classes and name the segments
- Impose restrictions to simplify model
- Examine Output including share simulations for additional choice sets
- Include Brand x Price interactions in the model
|
Tutorial 5: Analyzing Ranking Data
download PDF
+ overview
- overview
Choice tutorials 1-4 all dealt with the analysis of first choices among sets of alternatives.
In applications where information is also available on additional choices -- 2nd choice, 3rd
choice, last choice, etc. -- improved efficiency of the part-worth utility estimates is
possible by taking this additional information into account. In such cases, Latent GOLD
Choice allows the utilization of the sequential logit model, a generalization of the
conditional logit model, to account for the additional choice information.
The way the sequential logit model works is that the first choice is analyzed as usual
based on the conditional logit model. If the model specified in Latent GOLD Choice is
set to a ‘Ranking’ Model, after the first record within a set, any additional records are
assumed to be associated with a 2nd choice, 3rd choice, etc. A 2nd choice is considered to
be a first choice from the set of alternatives that excludes the 1st choice, and so on for the
3rd, 4th and additional choices. For ranking models, Latent GOLD Choice automatically
excludes these prior choices from the consideration set of alternatives used for a current
choice.
This tutorial deals with full ranking data obtained from a real bank segmentation study as
described in Kamakura, Wedel, and Agrawal (1994), “Concommitant variable latent class
models for conjoint analysis”, International Journal of Research in Marketing,11, 451-
464. The data was provided for our use by Wagner Kamakura.
This tutorial illustrates the use of the Latent GOLD Choice program to analyze ranking
data.
You will:
- Identify 4 segments that differ in the importance placed upon various checking
account attributes.
- Interpret output in the context of rank-order preference data.
- Use concomitant variables (“covariates”) to predict and describe these segments.
|
Tutorial 6: Using LG Choice to Estimate max-diff (best-worst) and Other Partial Ranking Models
download PDF
+ overview
- overview
In this tutorial, we will perform a re-analysis of the data used in tutorial #5. In this
application, 9 checking account alternatives were ranked in order of preference by N=256
bank customers. These alternatives are defined in terms of 4 attributes in the file
bank9ALT.sav:
|
Tutorial 7: LC Segmentation with Ratings-based Conjoint Data
download PDF
+ overview
- overview
This tutorial shows how to use the Latent GOLD Choice program when the scale type of
the dependent variable corresponds to a Rating as opposed to a Choice or Ranking.
Ratings data can also be analyzed using the Regression module in Latent GOLD 3.0 and
the resulting parameter estimates will be identical. However, the Latent GOLD Choice
program produces additional output, and can be used to ‘fuse’ rating and choice data,
which is useful for situations where both types of data are available.
In this tutorial, we will reanalyze the conjoint data used previously in Latent GOLD
Tutorial #2: (http://www.statisticalinnovations.com/products/lg_tutorial2.pdf) “LC
Regression with Repeated Measures”.
You will learn how to:
- Setup the data for the Latent GOLD Choice program using the 1-file format
- Examine the additional output not available in Latent GOLD Regression module.
By examining the additional ‘Set Profile’ Output generated by the LG Choice program
we will see that the predicted ratings obtained using the standard aggregate (1-class)
conjoint model fail to provide an adequate fit to the observed ratings. In contrast, we will
see that the predictions generated by the 3-class model are quite good.
|
Tutorial 7A: LC Segmentation with Ratings-based Conjoint Data
download PDF
+ overview
- overview
This tutorial shows how to use the Latent GOLD Choice program when the scale type of
the dependent variable corresponds to a Rating as opposed to a Choice or Ranking. For
this tutorial, the file setup is shown for the 3-file format structure. A more extensive
analysis of these data is provided Tutorial #7 which utilizes the 1-file format and also the
Latent GOLD Tutorial #2: “LC Regression with Repeated Measures”
(http://www.statisticalinnovations.com/products/lg_tutorial2.pdf).
In this tutorial, you will learn how to:
- Setup the data for a Ratings-based conjoint analysis using the 3-file format in the
Latent GOLD Choice program.
|
All sample datasets and sample .lgf (and .lgs) files used the tutorial below are downloaded to your computer when you install the demo version of Latent GOLD. To download individual datasets and .lgf files, please refer to our Sample Datasets Page.
|
|
Tutorial 1: Getting Started with LG-Syntax
download PDF
+ overview
- overview
This tutorial introduces the use of the LG-Syntax module, an add-on to the Advanced version of Latent
GOLD. In this tutorial we utilize the data which was also used in ‘Tutorial #3: LC Regression with
Repeated Measures.’
Since it is quite easy to setup a GUI model in Latent GOLD using the LG4.5 Windows Menu system, it is
often useful to begin with a GUI model containing the basic elements of the desired syntax model. This
GUI model can then be converted to an initial syntax model automatically using the ‘Generate Syntax’
option from the ‘Model’ menu. The goal of this tutorial is to illustrate this process, as well as show how to
modify LG-Equations to obtain additional models.
In this tutorial we will:
- Introduce the use of LG-Syntax
- Show how the LG-Syntax can be generated from a GUI model
- Examine the Equations section of the LG-Syntax
- Modify the LG-Equations to specify a different LC regression model
- See how parameter restrictions may be specified in different ways using the syntax
We will reuse the data from ‘Tutorial 3: LC regression with Repeated Measures’ in this tutorial. While the
data were generated under the assumption of the ordinal logit model, for simplicity in introducing the
equation section of the LG-Syntax we will treat the dependent variable as continuous rather than ordinal, so
that the models obtained are LC (linear) regression models.
|
All sample datasets and sample .spp files used the tutorial below are downloaded to your computer when you install the demo version of CORExpress. To download individual datasets and .spp files, please refer to our Sample Datasets Page.
|
|
Tutorial 1: Getting Started with Correlated Component Regression (CCR) in CORExpress
download PDF
+ overview
- overview
This tutorial introduces Key Driver Regression using CCR-LM in CORExpress.
|
Tutorial 2: CCR for a Continuous Dependent Variable with Many Predictors
download PDF
+ overview
- overview
This tutorial introduces the use of CCR-Linear in CORExpress with many predictors.
|
Tutorial 3: Correlated Component Regression for a Dichotomous Dependent Variable
download PDF
+ overview
- overview
This tutorial introduces the use of CCR-LDA in CORExpress.
|
Tutorial 4: Estimation of Naive Bayes and Extended Naive Bayes Models (forthcoming)
Forthcoming
+ overview
- overview
For an introduction to Naive Bayes and its relationship to Logistic Regression, see: http://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf
|
All sample datasets and sample .chd files used the tutorials below are downloaded to your computer when you install the demo version of SI-CHAID.
|
|
Tutorial 1: Beginning a CHAID Analysis
download PDF
+ overview
- overview
In this Tutorial we illustrate the basic functions and uses of SI-CHAID. We will show how to set up an
analysis (.chd) file and grow a CHAID tree by using the standard CHAID algorithm, which is designed for
a dichotomous or nominal dependent variable. In our example, we show how to determine CHAID
segments that differ on response rates, and how gains charts can be used to predict the expected response
from mailing/ targeting the most responsive segments.
|
Tutorial 2: Using SI-CHAID to identify profitable segments
download PDF
+ overview
- overview
This tutorial shows how to use the CHAID ordinal algorithm to segment based on profitability scores. We
will again use the magazine subscription data set, subscribe.sav, used previously in Tutorial 1. However,
our dependent variable will now be RESP3, coded 1 (paid responder), 2 (unpaid responder) and 3
(nonresponder). We'll compare a default nominal CHAID segmentation of RESP3 to the ordinal CHAID
analysis that takes into account the gain (or loss) associated with each response group. For simplicity, we
utilize the SI-CHAID option settings used in Magidson (1993).
|
Tutorial 3: Using SI-CHAID with a Hold-out Sample
download PDF
+ overview
- overview
Sometimes cases on the analysis file are randomly assigned to a ‘hold-out’ sample and not used in the
development of the segmentation tree. Instead, such cases are reserved for the purpose of ‘validating’ the
tree. In this tutorial we utilize the data file holdout.sav to illustrate the use of SI-CHAID in this way.
In particular, from each dependent category (‘paid respondents’, ‘unpaid respondents’ and ‘nonresponders’)
we randomly assigned each case in the ‘subscrib.sav’ file to one of two equally likely groups
by generating the variable SAMPLE (1=test, 2 = holdout).
|
Tutorial 4: Using CHAID with Multiple Correlated Dependent Variables
download PDF
+ overview
- overview
Often a segmentation is desired that is predictive of not one but multiple criteria. For example, in database
marketing, dependent variables might include 1) response to the most recent mailing (responder vs.
nonresponder), 2) response to past mailings, 3) the amount spent, 4) profitability, and possibly others.
Magidson and Vermunt (2005) described an extended CHAID algorithm for such situations, which has
been implemented in SI-CHAID 4.0. A copy of that article, entitled An Extension of the CHAID Tree-based
Segmentation Algorithm to Multiple Dependent Variables, is included with the SI-CHAID 4.0 manual, and
may also be obtained in the Articles section.
|
All sample datasets and sample .arr files used the tutorials below are downloaded to your computer when you install the demo version of GOLDMineR.
|
|
Tutorial 1: Beginning a GOLDMineR® Analysis
download PDF
+ overview
- overview
In this tutorial we use sample dataset #4 and show how to specify a
model and obtain an effects plot. You will:
- open a previously saved data file
- select the dependent and predictor variable
- experiment with different scaling types
- generate plots and tables
- use various display options
|
Tutorial 2: Assessing the Effects of a Reminder Call
download PDF
+ overview
- overview
In this tutorial, we estimate the effects of 1) the dollar amount mailed out with the survey (X1:
PAYMENT - $1, $2, or $10) and 2) a reminder call (X2: CALL - - Yes or No) on the return rate of a mail
survey.
This tutorial illustrates
- the use of the goodness of fit statistic to examine the validity of the scaling and other assumptions
made by a model
- obtaining various tables and regression plots, and
- the poor prediction and biased parameter estimates that can result from model misspecification
(i.e., use of improper scores).
|
Tutorial 3: Analysis of a Qualitative Predictor Variable
download PDF
+ overview
- overview
Sometimes cases on the analysis file are randomly assigned to a ‘hold-out’ sample and not used in the
development of the segmentation tree. Instead, such cases are reserved for the purpose of ‘validating’ the
tree. In this tutorial we utilize the data file holdout.sav to illustrate the use of SI-CHAID in this way.
In particular, from each dependent category (‘paid respondents’, ‘unpaid respondents’ and ‘nonresponders’)
we randomly assigned each case in the ‘subscrib.sav’ file to one of two equally likely groups
by generating the variable SAMPLE (1=test, 2 = holdout).
|
Advanced Tutorial: Detailed example with qualitative predictor
download PDF
+ overview
- overview
In Chapter 8 hypothetical mail survey data were used to demonstrate the importance of
assessing the model fit to test the validity of the assumptions made by a model. In the
current example we will analyze real data from the Mail Survey Experiment
(Magidson, 1994b) where the actual dollar payments tested were $1 (the control), $2,
$3 and $4, the experiment being designed so that there is no correlation between
X1:PAYMENT and X2:CALL. We will again use the model fit statistic to choose
between two alternative models, one that treats PAYMENT as a quantitative predictor
with equidistant category scores (Model A) and one that treats PAYMENT as a
qualitative predictor (Model B).
|
Advanced Tutorial: Stouffer's American Soldier Data
link to tutorial
+ overview
- overview
This note presents an analysis of a famous data example in GOLDMineR® and shows unique insights into the effects present in the data. This note also shows how you can use an odds framework to understand effects in the data.
In his 1972 article "A Modified Multiple Regression Approach to the Analysis of Dichotomous Variables," Leo Goodman presents a version of data presented by Stouffer et al. in their study "The American Soldier." Stouffer's study was published in 1949, and was based upon surveying American soldiers in World War II. The data should be understood in the context of the segregated American military and society of the time. The data have been analyzed many times, and Goodman's approach is both very fruitful for understanding what is going on in the data, as well as superior to other approaches used in the past. Goodman's analysis is replicable in SPSS Genlog or in GOLDMineR® .
|
Advanced Tutorial: GOLDMineR® in biomedical research
link to tutorial
+ overview
- overview
In his book Categorical Data Analysis, Alan Agresti discusses a 40-case dataset that relates respondents’ mental impairment to two explanatory variables. Agresti employs the cumulative logit model, while we use the adjacent category logit model to get insight into the data.
|
|
 |
 |
 |
 |
E-mail Contact: will@statisticalinnovations.com
Address:
Statistical Innovations,
375 Concord Avenue,
Belmont, MA 02478-3084
Phone: +1.617.489.4490
Fax: +1.617.489.4499
|
 |
 | Copyright © 2008 by Statistical Innovations Inc., Belmont, MA All rights reserved. |
 |
|