Statistical Innovations logo








  Other Presentations  go to section and expand

  World





  Presentations: US


  Upcoming Presentations

  2012 American Statistical Association Conference on Statistical Practice
  Renaissance Orlando, 6677 Sea Harbor Drive, Orlando, Florida 32821
Short Course, Thursday, February 16, 8:30 a.m.-5:30 p.m:
Regression Modeling with Many Correlated Predictors: High-Dimensional Data Analysis in Practice
Jay Magidson, Statistical Innovations Inc. & Tony Babinec, AB Analytics

Abstract: The availability of a vast amount of data in fields such as genomics, marketing research, and signal processing has led to recent advances in high-dimensional data analysis. It is now possible to develop reliable regression models, even when the number of predictors exceeds the number of cases. In this course, we will begin by reviewing problems and limitations with traditional linear and logistic regression. We will then introduce the two primary regularization approaches for analyzing such data—penalized regression and component methods—related software, and recent advances in feature selection. Our applications-oriented presentation provides insight into how the new approaches work in examples with both low- and high-dimensional data and an overview of the relevant theory, supplemented by supporting equations. We will use real and simulated data sets to illustrate the different methods. The material presented will be included in a forthcoming book on this topic by Jay Magidson.

  PLS'12: 7th International Conference on Partial Least Squares and Related Methods
  Omni Hotel & Resort, Four Riverway, Houston, Texas 77056


Conference Overview: The 7th International Conference on Partial Least Squares and Related Methods (PLS 2012) will be held in Houston, Texas on May 9th-12th, 2012. The will be the first time it is hosted in the United States. The conference theme "Advancing the Theoretical and Applied Frontier" is concerned with both theory and exemplars in practice for areas as diverse as Business, Engineering and the Life Sciences. This conference is a unique opportunity for outstanding experts and practitioners in PLS methods, from all over the world, to meet and discuss mutual interests. During the four day meeting, our global attendees reflecting different applied domains will share their state-of-art understanding and advances in the use of PLS methods.

Correlated Component Regression: Re-thinking Regression in the Presence of Near Multicollinearity
Jay Magidson, Statistical Innovations Inc.

Abstract: Correlated Component Regression (CCR), a new method recently added to XLSTAT, addresses the prediction of a single Y as a function of many correlated predictors. Similar to PLS-regression (PLS-R), the model is based on K components, but differs from PLS-R in that 1) components are correlated, and 2) predictions are invariant to linear transformations of the predictors. In addition, a new cross-validation based step-down algorithm has been shown to improve prediction by eliminating irrelevant and weak predictors from the CCR or PLS-R model. CCR methods improve over linear regression, logistic regression and linear discriminant analysis to obtain reliable prediction even with more predictors than cases.

In this presentation, CCR is introduced and illustrated in the following applications:
  • Linear regression with 6 correlated predictors and small N, resulting in coefficients that are more interpretable than those obtained from traditional OLS regression.
  • Chemometrics: Near-infrared (NIR) spectroscopy – The CCR coefficients weighting higher, less reliable wave-lengths are shown to be more realistic than those obtained from PLS-R with or without standardizing the 700 wave-length predictors.
  • Linear Discriminant Analysis (LDA): CCR is shown to outperform stepwise LDA, with data simulated according to LDA assumptions, where the data contains many irrelevant predictors as well as important suppressor variables.
  • Sensometrics: A hybrid latent class/CCR model identifies the key drivers for judges’ liking ratings of different orange juice products, and yields distinctly different effects of these product attribute drivers for different latent class segments of judges.

  Past Presentations

  Regression Modeling with Many Correlated Predictors, Sponsored by the Chicago Chapter of the American Statistical Association
  Rush University Medical Center, 1653 W Congress Parkway, Chicago, IL
Workshop, Friday, April 8, 2011, 8:30 AM - 4:30 PM:
Regression Modeling with Many Correlated Predictors
Jay Magidson, Statistical Innovations Inc. & Tony Babinec, AB Analytics

Abstract: Recent advances in analysis of high dimensional data now allow reliable regression models to be developed even when the number of predictors exceeds the number of cases! In this course we begin by reviewing problems and limitations with traditional linear and logistic regression. Our applications-oriented presentation provides insight into how the new approaches work through examples and by providing an overview of the relevant theory, supplemented by the supporting equations. We use real and simulated data sets to illustrate the different approaches.

  Modern Modeling Methods Conference
  University of Connecticut
Presentation, May 25, 2011 -- 1:15-2:15PM
Rethinking Regression, Prediction and Variable Selection in the Presence of High Dimensional Data: Correlated Component Regression
Jay Magidson, Statistical Innovations Inc.

  Annual Statistical Modeling Week
  Annual conference featuring applications-oriented seminars focusing on the latest trends in statistical analysis.


  2010 Sawtooth Software Conference
  October 6-8, 2010, Newport Beach Marriott Hotel & Spa, Newport Beach, California.
Workshop, Tuesday, October 5:
SEGMENTATION AND PREDICTION IN PRACTICE: Applications of Latent GOLD®, LG Choice, and CORExpress
Jay Magidson, Statistical Innovations Inc. & Tony Babinec, AB Analytics

In this tutorial we will use popular software to account for heterogeneity in data by identifying actionable segments and developing predictive models that allow us to better understand the segments. Output will be interpreted from a practical perspective. For each analysis we will begin with basic models understandable by non-statisticians and then introduce some advanced features.

Latent GOLD -- We will uncover segments with ratings-based conjoint data and show how to use the output to customize a separate product for each segment. An advanced feature includes the use of a C-factor (random effects) as an alternative to centering.

LG Choice -- We will identify respondents that differ in their brand preference and price sensitivity. We will interpret the output and show how it may be used to develop an Excel-based simulator. Advanced features include the incorporation of scale factors into the model. We will also analyze max-diff data and show how it can be integrated with ratings to improve the reliability of the estimates.

CORExpress – We will show how segments can be described/predicted using exogenous variables. Recent advances in high dimensional data analysis allow inferences to be made for large numbers of attributes, respondent characteristics and interactions. We show how these advances can be used to simplify a model to include only the most important attributes/predictors/interactions.


  2007 Sawtooth Software Conference
  October 15-19, 2007, Hyatt Vineyard Creek Hotel and Spa, Santa Rosa, CA.
SI Presentation, Wednesday, October 17:
Removing the Scale Factor Confound in Multinomial Logit Choice Models to Obtain Better Estimates of Preference
Jay Magidson, Jeroen Vermunt

A theoretical weakness of CBC as currently practiced is that individual utility estimates are confounded by differential measures of uncertainty (error variances). By separating the scale factor from the utilities we obtain clearer estimates of preference. Results from extended latent class models indicate that quick respondents have the highest variances.


  Joint Annual Meeting of the Interface and the Classification Society of North America
  June 8-12, 2005, Washington University School of Medicine, St. Louis, MO.
SI Short Course, Wednesday, June 8, 1-4pm:
Latent Class Models for Clustering and Classification, Jay Magidson and Anthony Babinec

Download the Abstract

  16th Annual Advanced Research Techniques FORUM
  June 12-15, 2005, Coeur d'Alene Resort, Idaho.
SI Short Course, Monday, June 13, 3-3:30pm:
Using Parsimonious Conjoint and Choice Models to Improve the Accuracy of Out-of-Sample Share Predictions
Jay Magidson, Jeroen Vermunt and Thomas Eagle

A primary benefit of conjoint and choice models is the availability of a simulator to generate predictions of ratings and/or market share for new products based on estimated part-worth utilities. To the extent to which the estimated utilities are based on an over-fitted, over-parameterized model, such predictions may not validate outside the sample. In this session we introduce new models that posit continuous factors (C-Factors) underlying the part worth utility parameters to account for respondent heterogeneity and show that they are much simpler to estimate and easier to interpret than HB models. We use data from a rating-based conjoint and from a choice/ranking study to compare various HB, C-Factor, latent class and hybrid models with respect to over-fitting. The results suggest that new hybrid models containing both latent classes and C-Factors can be used to obtain segments and improve the accuracy of out-of-sample predictions.
  2004 Sawtooth Software Conference
  October 6-8, 2004, Shelter Pointe Hotel and Marina, San Diego, CA.
SI Presentation, Tuesday, October 5:
APPLICATIONS IN SEGMENTATION MODELING USING LATENT GOLD, GOLDMineR, SI-CHAID, AND LATENT GOLD CHOICE

Segmentation modeling is a statistical approach for identifying and describing market segments. Depending upon the type of application, different approaches are available. In this tutorial, emphasizing methodological and practical issues of model building, we use commercially available software packages to illustrate four approaches:

Latent class (LC) models. LC/finite mixture models provide powerful ways to obtain segments based on multiple criteria such as 1) ratings/rankings/choices obtained from conjoint studies, or 2) in general clustering applications. Compared to Hierarchical Bayes (HB) models, individual level predictions are of comparable validity, but LC models are much quicker and easier to estimate, and segments are obtained directly as part of the model. (Latent GOLD and LG Choice will be used).

CHAID models. CHAID (Chi-Squared Automatic Interaction Detection) is a tree-based segmentation technique effective in obtaining meaningful segments that differ with respect to a single categorical criterion variable such as response to a mailing. (SI-CHAID, the successor to our original SPSS CHAID program, will be used).

Regression models. In cases where a single criterion such as profitability is available, a regression approach is often a good way to identify the best (most profitable) segments. (The ordinal logit program GOLDMineR will be used here and the resulting segments will be compared to those obtained from CHAID).

Hybrid models. We describe an LC hybrid alternative that maintains the simple structure of a CHAID or regression model, but may be used to obtain segments that are predictive across multiple criteria. We illustrate this in a discrete choice study.

  2004 Meeting of the International Federation of Classification Societies
  July 15-18, 2004, Illinois Institute of Technology, Chicago, IL.
Short Course: Latent Class Models for Clustering and Classification; Jay Magidson & Tony Babinec
Presentation Abstract: Latent Class Models for Clustering and Classification

The various uses of latent class and finite mixture models for clustering and classification are growing rapidly because of:
  1. the lack of restrictive assumptions underlying the general model,
  2. major developments in maximum likelihood estimation of these models, and
  3. availability of model parameters to use for classifying new cases.
This short course introduces the latent class (LC) and finite mixture approach to clustering and focuses on three important LC models -- cluster, factor, and regression -- for combinations of nominal, ordinal, and/or continuous variables. Applications are taken from the fields of marketing research and the biomedical sciences. Topics include (1) relationship to and improvements over K-means clustering, (2) use of simultaneous cluster and regression/discriminant/choice analyses as an improvement over the traditional tandem cluster-regression analyses, and (3) use of covariates and an extended CHAID algorithm to describe the resulting latent class segments. We will provide extensive references and a description of the major latent class modeling software available. The Latent GOLD program and new Latent GOLD Choice module will be used for illustration.

  Our Presentations outside the United States
E-mail Contact: will@statisticalinnovations.com
Address: Statistical Innovations, 375 Concord Avenue, Belmont, MA 02478-3084
Phone: +1.617.489.4490
Fax: +1.617.489.4499