Statistical Innovations logo
 









  User's Guides
  Tutorials  
Latent GOLD Datasets  
LG Choice Datasets
  Articles  go to section and expand
  FAQ  go to section and expand
  Software Updates
  Extended Support
  LG-Syntax Board
  Contact Us




  Latent GOLD®: Sample Data Sets
The following data sets accompany the Latent GOLD® demo version.
Your options: If a filename is highlighted, a tutorial is available in addition to the data set.
A filename with a icon has a .lgf file associated with it. Opening the .lgf file will retrieve the saved model setup.

A. LC Cluster and Factor Models With Ordinal and Nominal Indicators
Dichotomous Indicators

1. hannover.sav

2. political.sav 

3. landis77.sav 

4. heinen2.sav 

5. heinen_mf.sav 

6. vdheijden.sav

7. depression.sav

8. knowclass.sav 

9. lcamis.dat

10. lifestyle.sav

11. store.sav

12. coleman.sav

13. gss94.sav

14. financial.sav

15. hadgu.sav 

Polytomous indicators

16. judges.dat

17. gss82white.sav 

18. gss82.sav 

19. elliot.sav

20. heinen3.sav 

20A. heinen3reg.sav 

21. environment.dat

22. internet99.sav

23. uselection2000.sav

84. crackers0.sav 

B. Mixtures of Univariate Distributions (Cluster or Regression)

24. galaxy.dat 

25. enzyme.dat

26. acidity.dat

27. candy.dat 

28. candy_trunc.txt 

29. nov2002.sav 

30. sids.dat 

C. Cluster Models with Continuous Indicators and Indicators of mixed scale types

31. iris.dat

32. kmeans.sav

33. diabetes.dat 

34. cancer.dat 

35. srcddata.txt 

36. abortion_cluster.sav

D. LC Regression Models
Mixture Regression Models for Single Response

37. follman.dat

38. fabric.dat

39. beta.dat

40. dmft.sav 

41. cace.sav 

42. long1.sav 

43. long2.sav 

44. runshoes.dat 

Two-level and Multiple Response Data Sets

45. bang.txt 

46. snijdersbosker.sav 

47. conjoint.sav 

48. crackers.sav

49. USselection2000reg.sav

Restricted LC Cluster Models for Multiple Responses Specified Using Multiple Records per Case

50. landisreg.sav 

51. heinen2reg.sav 

52. heinenreg_mf.sav 

53. colemanreg.sav

54. gss94reg.sav

55. financialreg.sav

LC Growth Models for Longitudinal Data

56. abortion.sav 

57. elliotreg.sav  

58. rats.dat

66A. growth1.sav 

Models for Event History and Transition Data

59. jobchange.dat 

60. empltran.dat 

61. dropout.dat

62. land.sav  

63. poulsen.sav 

64. vinken.sav 

Mixture Regression Models for Repeated Measures/Clinical Trials

65. koch.sav

66. epilep.sav

67. aspartame.dat

68. genomics.sav

69. schizophrenia.sav  

82. allocation.sav  

83. crackers4.sav  

E. Multilevel Latent Class Models and Complex Surveys
Two-level Cluster and DFactor Models

70. mierlo_socmeth.txt 

71. mierlo_mbr.sav 

72. cito.dat 

73. meulders.sav

Three-Level Regression Models

74. immunization.sav 

75. tvsfp.sav 

76. socatt.txt 

77. zugugl.sav

78. tob3vote.sav 

Complex Survey Options

79. patterson.sav 

80. pattersonreg.sav

81. pattersonreg2.sav

F. High Dimensional Data

82. OJtutorial.sav 



A. LC Cluster and Factor Models With Ordinal and Nominal Indicators

Dichotomous indicators

1. hannover.sav    [ Download data set ]
  • 5 dichotomous indicators
  • survey data on pain related to rheumatic arthritis
  • cluster or factor model
Sources:
Kohlmann, T. and A.K. Formann. (1997). "Using Latent Class Models to Analyze Response Patterns in Epidemiologic Mail Surveys", Chapter 33 in Applications of Latent Trait and Latent Class Models in the Social Sciences, edited by J. Rost and R. Langeheine. New York:Waxmann.

Magidson J., and Vermunt, J.K. (2001), Latent Class Factor and Cluster Models, Bi-plots and Related Graphical Displays. Sociological Methodology, 31, 223-264.

Back to top


2. political.sav   [ Download data set ]  [ Download .lgf file ]
  • 5 dichotomous indicators on political involvement and tolerance
  • 3 (nominal) covariates
  • 3-cluster model, 2-factor model, or 2-cluster model with local dependencies
  • data from Political Action Survey
  • used in user's manual (additional example IV)
Sources:
Hagenaars, J.A. 1993. Loglinear models with latent variables. Newbury Park: Sage.

Vermunt, J.K., and J. Magidson (2000b) Graphical displays for latent class cluster and latent class factor models. W. Jansen and J.G. Bethlehem (eds.), Proceedings in Computational Statistics 2000, 121-122. Statistics Netherlands. ISSN 0253-018X.

Back to top


3. landis77.sav   [ Download data set ]  [ Download .lgf file ]
  • dichotomous rating (presence/absence of carcinoma in the uterine cervix) of 118 slides by 7 pathologists
  • see also landisreg.sav for other data structure
  • 3-cluster, 2-DFactor model, CFactor model (2PLM), CFactor model with equal effects (Rasch), or combination of 2-cluster and Rasch model
  • sparse table: use bootstrap p value
  • used as illustration in Agresti (2002), Magidson and Vermunt (2004), and Vermunt and Magidson (2005a). Original data in Landis and Koch (1977)
Sources:
Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley.

Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Magidson, J. and Vermunt, J.K. (2003a). Comparing Latent Class Factor Analysis with the Traditional Approach in Datamining. H. Bozdogan (ed.), Statistical data mining & knowledge discovery, Chapman & Hall/CRC.

Landis, J.R., and Koch, G.G. (1977). The measurement of observer agreement for categorical data, Biometrics, 33, 159-174.

Back to top


4. heinen2.sav    [ Download data set ]  [ Download .lgf file ]
  • 5 dichotomous indicators of gender roles (male sample)
  • same data set in other format in heinen2reg.sav
  • 3-cluster, 3-level 1-DFactor model, and various types of IRT models
Source:
Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oaks: Sage Publications.

Back to top


5. heinen_mf.sav    [ Download data set ]  [ Download heinen_mf.lgf ]  [ Download IRT lgf file ]
  • same data as heinen2.sav but now for males and females
  • gender can be used as a covariate, possibly affecting indicators (item bias)
  • data can also be used for unrestricted multiple group analysis (with female2 as known-class indicator)
  • see also SMABS 2004 workshop transparencies
Source:
Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oakes: Sage Publications.

Tutorial info:
This data set is used in Latent GOLD Advanced Tutorial 1: Latent GOLD and IRT Modeling (PDF, 77KB).

Back to top


6. vdheijden.sav   [ Download data set ]
  • 3 dichotomous indicators of youth delinquency
  • ethnic group and age group are covariates
  • used by Van der Heiden et al. (1992) to illustrate logit-restricted latent budget analysis, which is a LC cluster model with covariates
Sources:
Van der Heijden, P.G.M, Mooijaart, A., and De Leeuw, J. (1992). Constrained latent budget analysis. Sociological Methodology, 22, 279-320.

Back to top


7. depression.sav   [ Download data set ]
  • 5 depression indicators and covariate sex
  • 3-cluster, 3-level 1-DFactor, or 2-cluster model with a CFactor model
  • used in Magidson and Vermunt (2001), and Schaeffer (1988)
Sources:
Magidson J., and Vermunt, J.K. (2001), Latent Class Factor and Cluster Models, Bi-plots and Related Graphical Displays. Sociological Methodology, 31, 223-264.

Schaeffer, N.C. 1988. "An application of item response theory to the measurement of depression", Pp. 271-308 in Sociological Methodology 1988, edited by C. Clogg. Washington DC: American Sociological Association.

Back to top


8. knownclass.sav   [ Download data set ]  [ Download .lgf file ]
  • simulated data set based on the 3-cluster solution obtained with the depression.sav data set
  • information on known class membership generated using 3 mechanism: MCAR, MAR (depending on the sum of all item responses), NMAR (depending class membership itself)
  • in the NMAR case known-class yes/no should be used as covariate
Sources:


Back to top


9. lcamis.dat   [ Download data set ]
  • 5 dichotomous indicators
  • example of LC model with missing data on indicators
  • simulated data set
Back to top


10. lifestyle.sav   [ Download data set ]
  • data on a large set of lifestyle activities (dichotomous indicators) and a few covariates (Source:
    The Polk Co.)
  • demo data set in Latent GOLD® 2.0
Source:
Magidson, J., and Vermunt J.K. (2003b) A nontechnical introduction to latent class models. DMA Research Council Journal.

Back to top


11. store.sav   [ Download data set ]
  • 5 dichotomous items related to consumer behavior
  • standard LC cluster model
Source:
Dillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview, chapter 9 in R.P. Bagozzi (ed.), Advanced methods of Marketing Research, 352-388,Cambridge: Blackwell Publishers.

Back to top


12. coleman.sav   [ Download data set ]
  • classical data set of Coleman
  • 2 indicators, membership of and attitudes toward leading crowd, measured at two occasions
  • 2-factor model (unrestricted or restricted)
Sources:
Goodman, L, A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215-231.

Agresti, A. (2002). Categorical Data Analysis. (table 12.8). Second Edition. New York: Wiley.

Back to top


13. gss94.sav   [ Download data set ]
  • data from the 1994 General Social Survey
  • 3 attitudes toward abortion indicators, and covariate gender
  • 2-cluster model, LC Rasch (two-cluster with equal effects), parametric Rasch (CFactor with equal effects)
Source:
Agresti, A. (2002). Categorical Data Analysis. (table 10.13). Second Edition. New York: Wiley.

Back to top


14. financial.sav   [ Download data set ]
  • data on ownership of 4 financial products
Source:
Paas, L. (2002). Acquisition pattern analysis with Mokken scales: Applications in the financial services market. Phd. Thesis. Tilburg University.

Back to top


15. hadgu.sav   [ Download data set ]  [ Download hadgu.lgf   Download hadgu_bayes0.lgf ]
  • 6 measures (tests) for diagnozing chlamydia trachomatis (most common sexually transmitted disease), where one test (culture) is a gold standard and can therefore be used as a Known-Class indicator
  • 2-class model with local dependencies modeled with a CFactor (with equal effects across tests)
  • used by Hadgu and Qu (1998) with the purpose to determine the sensitivity and specificity of the various tests
Source:
Hadgu and Qu (1998)

Back to top


Polytomous indicators

16. judges.dat   [ Download data set ]
  • trichotomous ratings of three judges, that can be treated as ordinal
  • 3-cluster, 3-level 1-DFactor model, and CFactor/IRT (partial credit) model, possibly with equal effects across indicators
  • - used in Dillon and Kumar (1994), and in the Latent GOLD version 2.0 user’s manual (Vermunt and Magidson, 2000a)
Source:
Dillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview, chapter 9 in R.P. Bagozzi (ed.), Advanced methods of Marketing Research, 352-388,Cambridge: Blackwell Publishers.

Back to top


17. gss82white.sav   [ Download data set ]  [ Download .lgf file ]
  • 2 dichotomous and 2 trichotomous indicators that can be treated as nominal or ordinal
  • data from General Social Survey '82, white sample
  • the purpose of the analysis is to construct a typology of survey respondents
  • 3-cluster or 2-factor model
Sources:
McCutcheon, A.L. (1987). Latent class analysis, Sage University Paper. Newbury Park: Sage Publications.

Magidson J., and Vermunt, J.K. (2001), Latent Class Factor and Cluster Models, Bi-plots and Related Graphical Displays. Sociological Methodology, 31, 223-264.

Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Tutorial info:
This data set is used in Tutorial 1: Using Latent GOLD® to estimate LC Cluster Models (PDF, 1.69MB).

Back to top


18. gss82.sav   [ Download data set ]  [ Download .lgf file ]
  • same indicators as in gss82white.sav, but for full sample (whites and non-whites)
  • several covariates that can be treated as active or inactive
Source:
Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Tutorial info:
This data set is used in Tutorial 2: Using Latent GOLD® to estimate DFactor Models (PDF, 1.63MB).

Back to top


19. elliot.sav   [ Download data set ]
  • marijuana use of children (13 years of age in 1976) in 5 consecutive years (trichotomous ordinal response variable)
  • see also elliotreg.sav for other data structure
  • standard cluster model with time-specific indicators and sex as covariate; use bootstrap p value because of sparseness
Sources:
Elliot, D.S., Huizinga, D., and Menard, S. (1989). Multiple problem youth: delinquence, substance use and mental health problems. New York: Springer-Verlag.

Vermunt, J.K., Rodrigo, M.F., Ato-Garcia, M. (2001) Modeling joint and marginal distributions in the analysis of categorical panel data. Sociological Methods and Research, 30, 170-196.

Back to top


20. heinen3.sav   [ Download data set ]  [ Download heinen3_nominal.lgf   Download heinen3_ordinal.lgf ]  [ Download IRT lgf file ]
  • 5 trichotomous indicators of gender roles that can be treated as nominal or ordinal
  • - cluster, order-restricted cluster, DFactor, and various types of IRT models.
  • bootstrap p-value
Source:
Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oakes: Sage Publications.

Tutorial info:
This data set is used in Latent GOLD Advanced Tutorial 1: Latent GOLD and IRT Modeling (PDF, 77KB).

Back to top


20A. heinen3reg.sav   [ Download data set ]  [ Download IRT lgf file ]
  • Same as heinen3.sav but in Regression format
Source:
Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oakes: Sage Publications.

Tutorial info:
This data set is used in Latent GOLD Advanced Tutorial 1: Latent GOLD and IRT Modeling (PDF, 77KB).

Back to top


21. environment.dat   [ Download data set ]
  • 6 trichotomous items measuring attitudes towards environmental issues
  • there are two underlying dimensions: willingness (item 1-3) and awareness (items 4-6)
Source:
Croon, M.A. (2002). Ordering the classes. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis, 137-162. Cambridge University Press.

Back to top


22. internet99.sav   [ Download data set ]
  • data on internet use (Source: Mediamark Research Inc. 1999)
  • relationship between internet usage and several demographic covariates
Source:
Magidson, J., and Vermunt J.K. (2003b) A nontechnical introduction to latent class models. DMA Research Council Journal.

Back to top


23. uselection2000.sav   [ Download data set ]
  • National Election Studies election survey data set 2000.T
  • relationship between vote and ratings of Bush and Gore
  • ICPSR Study Number: 3131
Sources:
Burns, N., D.R. Kinder, S.J. Rosenstone, V. Sapiro, and the National Election Studies. NATIONAL ELECTION STUDIES, 2000: PRE-/POST- ELECTION STUDY [dataset id:2000.T]. Ann Arbor, MI: University of Michigan, Center for Political Studies [producer and distributor], 2001. http://www.umich.edu/~nes/

Back to top


84. crackers0.sav   [ Download data set ]  [ Download .lgf file ]
  • based on consumer preference study done by Kellogg
  • consumers rated their liking of 15 crackers on a nine-point liking scale
Source:
Applications of latent class models to food product development: a case study
Popper, Richard, Kroll, Jeff and Magidson, Jay. Sawtooth Software Proceedings, 2004.

Tutorial info:
This data set is used in Tutorial 6A: Comparing Segments obtained from LC Cluster and DFactor Models in a Consumer Preference Study (PDF, 281KB).

Back to top


B. Mixtures of Univariate Distributions (Cluster or Regression)

24. galaxy.dat   [ Download data set ]  [ Download .lgf file ]
  • velocities of 82 galaxies diverging away from our own galaxy
  • mixture of univariate normals
  • set bayes constants off and increase number of start sets to reproduce results
Source:
McLachlan, G. and Peel, D. (2000). Finite mixture models. New York: Wiley & Sons, Inc.

Back to top


25. enzyme.dat   [ Download data set ]
  • enzymatic activity in the blood among a group of 245 individuals
  • mixture of univariate normals
Source:
McLachlan, G. and Peel, D. (2000). Finite mixture models. New York: Wiley & Sons, Inc.

Back to top


26. acidity.dat   [ Download data set ]
  • acidity index measured in a sample of 155 lakes in north-central Wisconsin
  • mixture of univariate normals
Source:
McLachlan, G. and Peel, D. (2000). Finite mixture models. New York: Wiley & Sons, Inc.

Back to top


27. candy.dat   [ Download data set ]  [ Download .lgf file ]
  • single count variable: number of packages of hard candy purchased in a week
  • example of simple mixture model
  • can be specified with regression or cluster with number of packages as count
Sources:
Dillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview, chapter 9 in R.P. Bagozzi (ed.), Advanced methods of Marketing Research, 352-388,Cambridge: Blackwell Publishers.

Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Back to top


28. candy_trunc.txt   [ Download data set ]  [ Download .lgf file ]
  • single truncated count variable: number of packages of hard candy purchased in a week among consumers
  • example of simple mixture model for truncated counts
  • can be specified with regression or cluster
  • used in Dillon and Kumar (1994)
Sources:
Dillon, W.R., and Kumar, A. (1994). Latent structure and other mixture models in marketing: An integrative survey and overview, chapter 9 in R.P. Bagozzi (ed.), Advanced methods of Marketing Research, 352-388,Cambridge: Blackwell Publishers.

Back to top


29. nov2002.sav   [ Download data set ]  [ Download .lgf file ]
  • results of statistics exam November 2002
  • a 2 class binomial (exposure equal to 20) or normal mixture separates perfectly the students who pass and the ones that do not pass the exam.
Back to top


30. sids.dat   [ Download data set ]  [ Download .lgf file ]
  • data of 100 counties in north Carolina concerning children suffering from sudden infant death syndrome: number of deaths and population at risk
  • mixture of Poisson rates
  • example used by Böhning (1990) to illustrate disease mapping.
Source:
Böhning, D. (1999) Computer-assisted analysis of mixtures and applications: meta-analysis, disease mapping, and others. New York: Chapman & Hall/CRC.

Back to top


C. Cluster Models with Continuous Indicators and Indicators of mixed scale types

31. iris.dat   [ Download data set ]
  • 4 continuous indicators: measures taken on 150 irises
  • LC cluster model or mixture model clustering
  • true specie is known and can be compared with cluster solution (use true as inactive covariate)
  • illustrates different specifications of within cluster variance-covariance matrix
  • classical data set from Fisher
  • used in user's manual (additional example I) and by many others
Back to top


32. kmeans.sav   [ Download data set ]
  • simulated data set to illustrate LC clustering with continuous variables and compare it with K-means clustering
  • different specification of the error variances
Sources:
Magidson, J. and Vermunt, J.K. (2002a). Latent class modeling as a probabilistic extension of K-means clustering. Quirk's Marketing Research Review, March 2002, 20 & 77-80.

Magidson, J. and Vermunt, J.K. (2002b). Latent class models for clustering: A comparison with K-means. Canadian Journal of Marketing Research, 20, 36-43.

Back to top


33. diabetes.dat   [ Download data set ]  [ Download .lgf file ]
  • 3 continuous indicators
  • example of LC clustering
  • clinical classification can be compared with LC cluster classification
Sources:
Fraley, C., and Raftery, A.E. (1998). MCLUST: Software for model-based cluster and discriminant analysis. Department of Statistics, University of Washington: Technical Report No. 342.

Vermunt, J.K., and Magidson, J. (2002). Latent class cluster analysis (PDF). J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis, 89-106. Cambridge University Press.

Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Comments and tutorial info:
Six types of LC cluster models are reported in Table 1 of the Latent Class Cluster Analysis article. These models differ with respect to a) the specification of class dependent vs. class independent error variances and b) the 'direct effects' included in the LC cluster model estimated by Latent GOLD®. The 3-class type-5 model is best according to the BIC statistic. Various parameter estimates and standard errors from this 'final' model are obtained from the Profile and Parameters Output.

Download diabetes.lgf containing the specifications for each of the 6 types of 3-class cluster models described in Table 1.

NOTE: To estimate the various LC cluster models use the Latent GOLD® menu command 'File Open' to retrieve the 6 model specifications. To estimate any of these models, simply double click on the name of any of the retrieved models and click 'Estimate'.

Back to top


34. cancer.dat   [ Download data set ]  [ Download .lgf file ]
  • clustering based on pre-trial "covariates" collected before for a prostate cancer clinical
  • eight are treated as continuous and four as categorical indicators
  • example of LC clustering with mixed mode data
Sources:
Hunt, L, and M. Jorgensen. (1999). "Mixture model clustering using the MULTIMIX program." Australian and New Zeeland Journal of Statistics 41:153-172.

McLachlan, G. and Peel, D. (2000). Finite mixture models. New York: Wiley & Sons, Inc.

Vermunt, J.K., and Magidson, J. (2002). Latent class cluster analysis. J.A. Hagenaars and A.L. McCutcheon (eds.), Applied Latent Class Analysis, 89-106. Cambridge University Press.

Back to top


35. srcddata.txt   [ Download data set ]  [ Download .lgf file ]
  • continuous outcome variable read# (child's reading recognition) measured at 4 occasions (many missing data) on 405 children
  • covariates: child's gender (male=1), mother's age in years at Time 1, child's age in years at Time 1, child's cognitive stimulation at home, and child's emotional support at home
  • longitudinal data for specifying growth model: cluster model with one or two CFactors
  • data used in Vermunt and Magidson (2005c)
  • file also contains an ordinal outcome variable anti# (child's antisocial behavior) measured at four time points, which was however not used in Vermunt and Magidson (2005c)
Sources:
Vermunt, J.K., and Magidson, J. (2005c). Structural equation models: Mixture models. B. Everitt and D. Howell, (Eds.), Encyclopedia of Statistics in Behavioral Science, 1922–1927. Wiley: Chichester, UK.

Back to top


36. abortion_cluster.sav   [ Download data set ]
  • same data as abortion.sav example (see below), but here in standard rectangular data format instead of repeated measures format
  • indicators are binomial counts: number of agrees out of 7 abortion situations measured at 4 occasions
Sources:
McGrath & Waterton (1986)

Back to top


D. LC Regression Models

Standard Mixture Regression Models

37. follman.dat   [ Download data set ]
  • effect of poison on survival
  • dichotomous dependent variable survival can be treated as nominal, ordinal, or binomial count, since all are equivalent for dichotomous variables
  • using logdose as a numeric class-independent predictor yields a non-parametric random effects logistic regression model
Sources:
Follman, D.A. and Lambert, D. (1989). Generalizing logistic regression by nonparametric mixing. Journal of the American Statistical Association, 84, 295-300.

Formann, A.K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476-486.

Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley.

Back to top


38. fabric.dat   [ Download data set ]
  • number of faults in a bolt of fabric of a certain length
  • random-effects Poisson regression with log length as predictor
Sources:
Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6, 251-262

McLachlan, G. and Peel, D. (2000). Finite mixture models. New York: Wiley & Sons, Inc.

Back to top


39. beta.dat    [ Download data set ]
  • meta analysis of 22 clinical trials of beta-blockers for reducing mortality after myocardial infarction
  • dependent is a binomial count
  • observations within a clinic are dependent
  • LC regression model with random intercept (3 classes) and fixed treatment effect
Sources:
Aitkin, M. (1996). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing, 6, 251-262

McLachlan, G. and Peel, D. (2000). Finite mixture models. New York: Wiley & Sons, Inc.

Back to top


40. dmft.sav    [ Download data set ]  [ Download .lgf file ]
  • dental health trial on prevention of tooth decay among 797 Brazilian children
  • dependent variable: # of decayed, missing or filled teeth (DMFT)
  • explanatory variables are: Treatment (1 = no treatment; 2 = oral health education; 3 = school diet enriched with rice bran; 4 = mouth rinse with 0.2% NaF solution; 5= oral hygiene; 6 = all four treatments), Ethnic group (1= brown; 2 = white; 3 = black) and Gender (1 = male; 2 = female)
  • Poisson or binomial count regression with overdispersion, using a LC regression, a zero-inflated regression, or random-intercept regression model
  • data analyzed by Skrondal and Rabe-Hesketh (2004, section 11.2)
  • see also SMABS 2004 workshop transparencies
Sources:
Skrondal and Rabe-Hesketh (2004)

Back to top


41. cace.sav    [ Download data set ]  [ Download .lgf file ]
  • to illustrate “complier average causal effect” model using Known-Class option
  • compliance is known for the treatment group but unknown (latent) for the control group
  • LC regression in which treatment has an effect in the compliance class, and in which compliance (yes/no) is predicted using covariates
  • data analyzed by Skrondal and Rabe-Hesketh (2004) and can be obtained from the ICPSR website (under JOB #2739)
Sources:
Skrondal and Rabe-Hesketh (2004)

Back to top


42. long1.sav    [ Download data set ]  [ Download .lgf file ]
  • continuous dependent: firstjobcens0 or firstjobtrunc0 is the prestige of the first academic job (minus 1 to get the censoring/truncation at 0 instead of 1)
  • various predictors
  • censored normal regression, censored-inflated normal regression, or truncated normal regression
  • data used by Long (1997, chapter 8)
Sources:
Long (1997)

Back to top


43. long2.sav    [ Download data set ]  [ Download .lgf file ]
  • count dependent: number of articles in last 3 years of PhD
  • various predictors (two copies of each in file)
  • Poisson regression, zero-inflated Poisson regression, random-intercept Poisson regression, and zero-inflated random-intercept Poisson regression
  • data used by Long (1997, chapter 9)
Sources:
Long (1997)

Back to top


44. runshoes.sav    [ Download data set ]  [ Download .lgf file ]
  • count dependent: number of running shoes for a sample of runners
  • predictors: runs per week, miles run per week, distance runner
  • truncated Poisson count regression model
  • used in textbook “Analyzing Categorical Data” by Jeffrey S. Simonoff
Sources:
Simonoff

Back to top


Two-Level and Multiple Response Data Sets

45. bang.txt    [ Download data set ]  [ Download .lgf file ]
  • contraceptive use (dichotomous outcome)
  • data from 1989 Bangladesh Fertility Survey (Huq and Cleland 1990)
  • women nested within districts
  • predictors: number of children, age in years (centered), and urban (0=rural)
  • data obtained from multilevel modeling website
Sources:
Huq and Cleland (1990)

Back to top


46. snijdersbosker.sav    [ Download data set ]  [ Download .lgf file ]
  • performance of pupils on a language test (continuous outcome)
  • data taken from the Snijders and Bosker (1999) book on multilevel analysis
  • children nested within school
  • pupil-level predictors: IQ, SES (both overall centered)
  • school-level predictors: school_IQ, school_SES, groupsize (centered), combination classes (yes/no)
  • see also SMABS 2004 workshop transparencies
Sources:
Snijders and Bosker (1999)

Back to top


47. conjoint.sav   [ Download data set ]  [ Download .lgf file ]
  • rating-based conjoint example
  • simulated data
  • full factorial design (2*2*2) with 8 replications
  • LC regression with ordinal dependent, 3 predictors (product attributes) and 2 covariates (individual characteristics)
Source:
Magidson, J., and Vermunt J.K. (2003b) A nontechnical introduction to latent class models. DMA Research Council Journal.

Tutorial info:
This data set is used in Tutorial 3: LC Regression with Repeated Measures (PDF, 1.67MB).

Back to top


48. crackers.sav    [ Download data set ]
  • data from a consumer taste study sponsored by the Kellogg company, where consumers rated their liking of 15 crackers on a nine-point liking scale.
  • an independent trained sensory panel evaluated the same crackers in terms of their sensory attributes (e.g. saltiness, crispness, thickness, etc.), yielding ratings on 12 flavor, texture, and appearance dimensions
  • LC regression analysis with a random intercept
  • mixed regression model which predicts the ratings of 15 different crackers as a function of 12 appearance, texture and flavor variables
Source:
The Kellogg Company (2004)

Back to top


49. USelection2000reg.sav    [ Download data set ]
  • National Election Studies election survey data set 2000.T
  • same data as USselection2000.sav, but now in regression format
Sources:
Burns et. al. (2001), ICPSR Study Number: 3131

Back to top


Restricted LC Cluster Models for Multiple Responses Specified Using Multiple Records per Case

50. landisreg.sav   [ Download data set ]  [ Download .lgf file ]
  • dichotomous rating (presence/absence of carcinoma in the uterine cervix) of 118 slides by 7 pathologists
  • dependent variable “rating” can be treated as nominal, ordinal, or binomial count since all are equivalent for dichotomous variables.
  • LC regression model with rater as nominal predictor. Specifying the rater effect as class independent yields a LC Rasch model. Class dependent yields a standard LC model.
  • variable "sumscore" can be used as inactive covariate to see how the latent classification is related to the sum of the ratings.
  • the file contains dummies for the raters to change the coding scheme.
  • a copy of the predictor rater (rater_) is included to specify a two-dimensional model (LC factor model).
  • sparse table: use bootstrap p-value
Sources:
Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley.

Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Magidson, J. and Vermunt, J.K. (2003a). Comparing Latent Class Factor Analysis with the Traditional Approach in Datamining. H. Bozdogan (ed.), Statistical data mining & knowledge discovery, Chapman & Hall/CRC.

Landis, J.R., and Koch, G.G. (1977). The measurement of observer agreement for categorical data, Biometrics, 33, 159-174.

Back to top


51. heinen2reg.sav    [ Download data set ]  [ Download .lgf file ]
  • 5 dichotomous indicators of gender roles
  • same data as heinen2.sav, but other data structure
  • 3-class regression model: item effect class-independent yields a LC Rasch model; item effect class-dependent yields a standard LC model
Source:
Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oakes: Sage Publications.

Back to top


52. heinenreg_mf.sav    [ Download data set ]  [ Download .lgf file ]  [ Download IRT lgf file ]
  • same as heinenreg2.sav but now for males and females (also the same as heinen2_mf.sav but in other format)
  • gender can be used as covariate, predictors, or in gender-item interaction (item bias)
  • standard LC, restricted LC, LC Rasch, and IRT models
  • see also SMABS 2004 workshop transparencies
Source:
Heinen, T. (1996). Latent Class and Discrete Latent Trait Models: Similarities and Differences. Thousand Oaks: Sage Publications.

Tutorial info:
This data set is used in Latent GOLD Advanced Tutorial 1: Latent GOLD and IRT Modeling (PDF, 77KB).

Back to top


53. colemanreg.sav   [ Download data set ]
  • same data as coleman.sav but in a different format
  • item characteristics are included as predictors to test several assumption
  • predictors: item, member, attitude, time1, time2, member1, member2, attitide1, and attitude2
  • best model is a 2-factor like structure with a member and a attitude factor
Sources:
Goodman, L, A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215-231.

Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley.

Back to top


54. gss94reg.sav   [ Download data set ]
  • same data as gss94.sav but in a different format
Back to top


55. financialreg.sav   [ Download data set ]
  • same data as financial.sav: ownership of 4 financial products
Sources:
Paas, L. (2002). Acquisition pattern analysis with Mokken scales: Applications in the financial services market. Phd. Thesis. Tilburg University.

Back to top


LC Grow Models for Longitudinal Data

56. abortion.sav   [ Download data set ]  [ Download .lgf file ]
  • data from the British Social Survey
  • the dependent "number of times that one agrees with abortion out of 7 situations" should be treated as binomial count
  • year is a class-dependent (random, level-1) predictor and religion a class-independent (fixed, level-2) predictor
  • the data file contains dummies for the time and religion categories to use dummy instead of default effects coding
  • the data file also contains an incremental coding of the time categories and time squared to play with the time effect
  • used by Vermunt and Van Dijk (2001) to illustrate the connection between LC regression and random-coefficients, mixed, hierarchical, or multilevel models.
Sources:
Vermunt, J.K. and Van Dijk. L. (2001). A nonparametric random-coefficients approach: the latent class regression model. Multilevel Modelling Newsletter, 13, 6-13.

Magidson, J., and Vermunt, J.K, (2004) Latent class analysis. D. Kaplan (ed.), Handbook of Quantitative Methodology for the Social Sciences, Sage Publications.

Back to top


57. elliotreg.sav   [ Download data set ]  [ Download .lgf file ]
  • marijuana use of children (13 years of age in 1976) in 5 consecutive years (trichotomous ordinal response variable)
  • LC growth model with time as nominal/ascending/class-dependent predictor and sex as covariate (see Vermunt and Hagenaars, 2004)
  • possible to include random intercept (CFactor)
  • references to data set: Elliot et al. (1989) and Vermunt et al. (2001)
  • see also SMABS 2004 workshop transparencies
Sources:
Elliot, D.S., Huizinga, D., and Menard, S. (1989). Multiple problem youth: delinquence, substance use and mental health problems. New York: Springer-Verlag.

Vermunt, J.K., Rodrigo, M.F., Ato-Garcia, M. (2001) Modeling joint and marginal distributions in the analysis of categorical panel data. Sociological Methods and Research, 30, 170-196.

Back to top


58. rats.dat   [ Download data set ]
  • growth of rats in first weeks
  • LC grow model for continuous outcome variable
Source:
Gelfand, A.E., Hills, S.E. Racine-Poone, A, and Smith, A.F.M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85, 972-985.

Back to top


66A. growth1.sav   [ Download data set ]  [ Download .lgf file ]
  • LC Growth Model based on 59 epileptics who were randomly assigned to either an anti-seizure medication or placebo.
Source:
Thall, P. F, and Vail, S.C. (1990). Some covariance structure model for longitudinal count data with overdispersion. Biometrics, 46, 657-671.

Tutorial info:
This data set is used in Tutorial 7A: Latent Class Growth Model (PDF, 276 KB).

Back to top


Models for Event History and Transition Data

59. jobchange.dat   [ Download data set ]  [ Download .lgf file ]
  • LC regression model for event history data (piece-wise exponential survival model)
  • data from 1975 Social Stratification and Mobility Survey Japan (see Yamaguchi, 1991)
  • the event of interest is first interfirm job change
  • event should be treated as Poisson count with an exposure variable
  • time, categorized in 3 intervals, is a class-independent nominal predictor
  • single covariate firm size (either nominal or linear with extra dummy for government)
Source:
Vermunt, J.K. (2002) A general non-parametric approach to unobserved heterogeneity in the analysis of event history data. J. Hagenaars and A. McCutcheon (eds.): Applied latent class models, 383-407 Cambridge University Press.

Back to top


60. empltran.dat   [ Download data set ]  [ Download .lgf file ]
  • discrete-time event history or survival model with multiple outcomes
  • two predictors/covariates: cohort and sex
  • used in addendum to illustrate replication weight.
Source:
Blossfeld, H.P., and Rohwer, G. (1995). Techniques of event history modeling. Mahwah, New Jersey: Lawrence Erlbaum Associates, Publishers.

Back to top


61. dropout.dat   [ Download data set ]
  • school drop-out rate of brothers at two school levels
  • modelled as discrete-time event history model with unobserved heterogeneity to capture dependence between respondent and brother (family effect)
  • brother and time (school level) are predictors; father's education can serve as predictor or as covariate
Sources:
Mare, R.D. (1994). Discrete-time bivariate hazards with unobserved heterogeneity: a partially observed contingency table approach. P.V. Marsden (ed.), Sociological Methodology 1994, 341-385. Oxford: Basil Blackwell.

Vermunt, J.K. (1997). Log-linear models for event histories. Advanced Quantitative Techniques in the Social Sciences Series, vol 8., 348 pages, Thousand Oakes: Sage Publications.

Back to top


62. land.sav   [ Download data set ]  [ Download .lgf file ]
  • duration time to first serious delinquency
  • 411 males from working-class area of London followed from ages 10 through 31
  • dependent "first" can be treated as Poisson count or as binomial count. If treated as Poisson count, the exposure can be set to one or one half for the time point at which the event occurs.
  • variable "tot" is a risk index that can be used either as predictor or as covariate
  • the duration effect (age effect) can be modelled by a quadratic function
See the FAQ for more information on this data set

Source:
Land, K.C., Nagin, D.S., and McCall (2001). Discrete-time hazard regression models with hidden heterogeneity: the semi-parametric mixed Poisson approach. Sociological Methods and Research, 29, 342-373.

Back to top


63. poulsen.sav   [ Download data set ]  [ Download .lgf file ]
  • transitions in brand preference (brand A or other brand) between 5 occasions
  • example of mixture transition or mixed Markov model
  • predictors are time0 (whether record corresponds to the initial state), ylag_a (previous time point equals brand A), and ylag_oth (previous time point equals other brand).
Source:
Poulsen, C.S. (1982). Latent structure analysis with choice modeling applications. Phd Dissertation. The Arhus School of Business Administration, Institute of Applied Mathematical Statistics and Computer Science.

Comments:
A mover-stayer can be specified in Latent GOLD. That can be done with the regression module. A stayer class is the same as a "zero-inflated" class when the dependent variable is nominal or ordinal. In this example, there are 2 stayer classes -- stayers with other brand, and stayers with brand A. See the discussion on pages 35-36 of the Technical Guide. To make the state at time point t dependent on the state at t-1 for the mover class (class 1), you have to add the state at previous time as a predictor in the model.

The profile output shows the class sizes. 'Estimated values' output gives the initial and transition probabilities for each class. Note that: 1) time0 is used as a predictor to make sure that the initial state probabilities are freely estimated, and 2) the missing values technical option = 'Use ALL' is used because the previous state is missing for the first time point (and these records should be retained).

The moverstayer.lgf file also includes an example of a 2-class mixed Markov model. It is specified in the same way as the mover-stayer model, but with 2 "free" classes instead of 1 free and 2 "zero-inflated" classes. The two models can be combined: one can have a 2-class Mixed Markov model with in addition 2 stayer classes (This is a 4-class mixed Markov model with two restricted classes). Open the attached .lgf file in Latent GOLD, highlight the name of the data file and from the Model Menu, select 'Estimate All'.

Note that the L-sq is "incorrect" in the sense that Latent GOLD does not recognize that the dependent state at t is the same variable as the predictor state at t+1.

Back to top


64. vinken.sav   [ Download data set ]  [ Download vinken.lgf   Download vinken_binomial.lgf ]
  • timing of four events related to first experience with relationships
  • Cox model for correlated events
  • see also SMABS 2004 workshop transparencies
Source:
Vermunt, J.K. (2002a)

Back to top


Mixture Regression Models for Repeated Measures/Clinical Trials

65. koch.sav   [ Download data set ]
  • repeated measures clinical trial with outcome normal (1) or not normal (0)
  • time is a class-dependent predictor, severity a class-independent predictor and treatment is a covariate; this yields a LC grow model in which treatment has an effect on the type of grow curve that one follows.
  • an alternative is to use time, severity, treatment, and the treatment-time interaction as class-independent predictors, yielding a standard non-parametric random-effects model.
Sources:
Agresti, A. (2002). Categorical Data Analysis. Second Edition. New York: Wiley.

Koch, G.G., Landis, J.R., Freeman, J.L., Freeman, D.H., and Lehnen, R.G. (1977). A general methodology for the analysis of experiments with repeated measurement of categorical data. Biometrics, 33, 133-158.

Back to top


66. epilep.sav   [ Download data set ]
  • randomized controlled trial comparing a new drug with placebo
  • outcome variable y is the number of epileptic seizures during the two weeks before each of 4 clinic visits (Poisson count)
  • 4 replications per case (4 visits)
  • class-independent numeric predictors: treatment, log baseline, log age, visit number, dummy for fourth visit, and treatment log base interaction
Sources:
Thall, P. F, and Vail, S.C. (1990). Some covariance structure model for longitudinal count data with overdispersion. Biometrics, 46, 657-671.

Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2002). Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal, 2, 1-21.

Back to top


67. aspartame.dat   [ Download data set ]
  • multiple period (5 weeks) crossover trial to test the side effect of aspartame
  • the dependent variable is a binomial count; that is, the number of days with a headache out of a total of 7 days (a week).
  • the total number of days exposed in a period may be smaller that 7 and the total number of periods may be less than 5 because of drop out.
  • predictors are week and aspartame (1= aspartame; 0=placebo)
  • covariate: belief as to whether drug can cause a headache
Sources:
McKnight, B. and Van Den Eeden (1993). A conditional analysis for two-treatment multiple period crossover designs with binomial or Poisson outcomes and subjects who drop out. Statistics in Medicine, 12, 825-834.

Hedeker, D. (1998). MIXPREG: a computer program for mixed-effect Poisson regression. University of Illinois at Chicago.

Back to top


68. genomics.sav   [ Download data set ]
  • multi-visit follow-up of 7 rheumatoid arthritis patients diagnosed as unstable during first visit and assigned to new drug therapy
  • blood sample taken during each visit to obtain genetic expressions
  • drug effects assessed using IndexZ to see if levels approach those of normals
  • source of IndexZ (Source Precision Medicine, Inc.) - patents pending
Back to top

69. schizophrenia.sav    [ Download data set ]  [ Download .lgf file ]
  • effect of drug on severity of schizophrenia
  • dichotomious or ordinal dependent variable “severity” measured at 7 occasions (with many missing values)
  • can be used for random-effect logistic regression, LC logistic regression, and LC logistics regression with a random intercept
Sources:
Hedeker and Gibbon’s (1996)

Vermunt (2006). Growth models for categorical response variables: standard, latent-class, and hybrid approaches. K. van Montfort, H. Oud, and A. Satorra (eds.) Longitudinal Models in the Behavioral and Related Sciences, Erlbaum.

82. allocation.sav    [ Download data set ]  [ Download .lgf file ]
  • example of allocation model
  • survey question that asks respondents to allocate 100 points over 8 alternatives to indicate the relative importance of each alternative


83. crackers4.sav    [ Download data set ]  [ Download .lgf file ]
  • mixed regression model which predicts the ratings of 15 different crackers as a function of 12 appearance, texture and flavor variables
Source:
The Kellogg Company (2004)

Back to top


E. Multilevel Latent Class Models and Complex Surveys

Two-level Cluster and DFactor Models

70. mierlo_socmeth.txt    [ Download data set ]  [ Download .lgf file ]
  • 5 dichotomous items measuring task variety
  • missing values on items (some of which were caused by a mistake made in the recoding of the items)
  • employees nested within teams
  • simplest variant of multilevel LC model with either GClasses or GCFactors affecting the clusters
  • data set taken from dissertation from Van Mierlo (2003), and used by Vermunt (2003)
Sources:
Van Mierlo (2003), used by Vermunt (2003)

Back to top


71. mierlo_mbr.sav    [ Download data set ]  [ Download .lgf file ]
  • same data set as mierlo_socmeth.dat, but without the mistake in the recoding (results are therefore slightly different)
  • in addition, 4 individual-level covariates: year of birth (4 levels), number of years in the current job (3 levels), number of working hours per week (3 levels), and gender. The 57 cases with missing values on items and/or covariates can be retained in the analysis (using the include missing all option).
  • random-intercept model for the clusters using a GCFactor
Sources:
Vermunt (2005)

Back to top


72. cito.dat    [ Download data set ]  [ Download .lgf file ]
  • data on mathematical skills on pupils: 18 mathematics test items (correct/incorrect) administered to 2157 pupils
  • pupils are nested within 97 schools
  • three individual-level covariates (SES, IQ and Gender) and one school-level covariate (CITO)
Sources:
Fox and Glas (2001) and Vermunt (2003)

Back to top


73. meulders.sav    [ Download data set ]
  • three-mode three-way data from a psychological "experiment" 101 1st year psychology students to indicate whether when angry at someone they would display 8 behaviors ( fly off the handle, quarrel, leave, avoid, pour up ones heart, tell one's story, make up, clear up the matter) in 6 situations (like the other, dislike the other, unfamiliar with the other, other has higher status, other has lower status, and other has equal status other.
  • situations are nested within persons
  • situations are non exchangeable, therefore use situation as covariate affecting class membership
  • GClasses (of persons) affecting intercept of and situation effect on clusters (of persons in situations)
Sources:
Meulders et al. (2002, Journal of Classification) paper on LC models for three-mode data

Back to top


Three-Level Regression Models

74. immunization.sav    [ Download data set ]  [ Download .lgf file ]
  • complete immunization of children in Guatemala (binary response variable)
  • individuals (children) nested within families, and families nested within communities
  • three-level binary logistic regression using parametric or nonparametric random effects
  • 4 individual, 5 family and 2 community level predictors (some are dummies)
Sources:
Rodriguez and Goldman (2001)

Back to top


75. tvsfp.sav    [ Download data set ]  [ Download .lgf file ]
  • ordinal outcome variable: the tobacco and health knowledge scale (THKS) score defined as the number of correct answers to seven items on tobacco and health knowledge (collapsed into our ordinal categories).
  • schools were randomized into one of four conditions combining the factors TV (a television intervention, 1=present, 0=absent) and CC (a social-resistance classroom curriculum, 1=present, 0=absent)
  • classes are nested within schools and pupils are nested within classes
  • data are from the Television School and Family Smoking Prevention and Cessation Project (TVSFP) and used by Hedeker and Gibbons (1996)
Sources:
Hedeker and Gibbons (1996)

Back to top


76. socatt.txt    [ Download data set ]  [ Download .lgf file ]
  • same data as abortion.sav file, but now with district number and some extra covariates
  • repeated measures nested within cases and cases nested within districts
  • three-level binomial count regression using either parametric or nonparametric random effects
Sources:
McGrath and Waterton (1986) and multilevel modeling webpage

Back to top


77. zugugl.sav    [ Download data set ]
  • three well-being items (zufrieden, gut, glücklich) measured at three occasions
  • responses are both in 3-point scale and 5-point scale format
  • 3-level regression or 2-level IRT model
  • data used by Steyer and Partchev (2001) to illustrate their state-trait model for ordinal variables, which is a 2-level IRT model
Sources:
Steyer and Partchev (2001)

Back to top


78. tob3vote.sav    [ Download data set ]  [ Download .lgf file ]
  • response variable: voting pro-tobacco by members of the Congress from 1997-2000
  • predictors/covariates: party, amount of money member received from tobacco industry (money), and the number of harvest acres in the member’s state in 1999 (acres)
  • votings/bills nested within members and members nested within states
  • 3-level random-effects regression model and 2-level LC model
Sources:
Luke (2004) in his Sage textbook “Multilevel Modeling”

Back to top


Complex Survey Options

79. patterson.sav    [ Download data set ]  [ Download .lgf file ]
  • standard LC model for 4 dichotomous response variables (vegetable consumption at 4 occasions)
  • stratum, PSU, and weight variable
  • totvgt1-totvgt4 variables were used by Patterson, Dayton, and Graubard’s (2002) in their article on LC analysis of complex sampling survey data
  • the variables v1-v6 were used by Vermunt (2002b), who made use of the fact that the 4 occasions were actually 6 different time points with missing values on at least two time points
  • data set does also contain some covariates, as well as information on fruit consumption (was not used by the above authors)
Sources:
Patterson, Dayton, and Graubard (2002)

Back to top


80. pattersonreg.sav    [ Download data set ]
  • regression format data set based on patterson.sav
  • contains the variables totvgt1-totvgt4 used in Patterson, Dayton, and Graubard’s (2002)
  • LC growth model
Sources:
Patterson, Dayton, and Graubard (2002)

Back to top


81. pattersonreg2.sav    [ Download data set ]
  • regression format data set based on patterson.sav
  • contains variables v1-v6 (vegetables) used in Vermunt (2002b), as well as f1-f6 (fruit) can be used to specify a LC growth model for vegetables or for fruit, or a multilevel LC model in which vegetables and fruit consumption are used as indicators of a time-specific latent variable
Sources:
Patterson, Dayton, and Graubard (2002)

Back to top


F. High Dimensional Data

82. Orange Juice Data    [ Download data set ]  [ Download .lgf file ]
  • liking ratings on each of 6 different orange juice (OJ) products by 96 judges.
Sources:
Tenenhaus, et al. (2005): Tenenhaus, M., Pagès, J., Ambroisine L. and & Guinot, C. (2005); PLS methodology for studying relationships between hedonic judgments and product characteristics; Food Quality and Preference. 16, 4, pp 315-325.

Tutorial info:
This dataset is used in Latent GOLD Advanced Tutorial 8: Obtaining Predictions from a 2-class Regression (PDF, 708KB).

Back to top



Download all data sets

E-mail Contact: support@statisticalinnovations.com
Address: Statistical Innovations, 375 Concord Avenue, Belmont, MA 02478-3084
Phone: +1.617.489.4490
Fax: +1.617.489.4499