Online course: Regression Modeling with Many Correlated Predictors — Key Driver Regression and Other Applications Using CCR Methods

Next course: not yet scheduled

Recent advances in analysis of high dimensional data now allow reliable regression models to be developed even when the number of predictors exceeds the number of cases! In this course we begin by reviewing problems and limitations with traditional linear and logistic regression. Our applications-oriented presentation provides insight into how the new Correlated Component Regression (CCR) approach works through examples and by providing an overview of the relevant theory, supplemented by the supporting equations. We use real and simulated data sets to illustrate the different approaches. Scroll down for more information about this course.

Additional attendees from the same organization receive a 50% discount. To receive this discount, all attendees should register at the same time. This discount is automatically applied to your order.

This product is currently out of stock and unavailable.

SKU: OnlineCourseCCR Category:

Description

Course Overview: Recent advances in analysis of high dimensional data now allow reliable regression models to be developed even when the number of predictors exceeds the number of cases! In this course we begin by reviewing problems and limitations with traditional linear and logistic regression. Our applications-oriented presentation provides insight into how the new approaches work through examples and by providing an overview of the relevant theory, supplemented by the supporting equations. We use real and simulated data sets to illustrate the different approaches.

Course Structure: The course takes place online. Course participants will be given a username and password for access to a private bulletin board that serves as a forum for discussion and interaction with the instructor. The course is divided into four weekly sessions. Attendees typically spend about 5-15 hours on each session. At the beginning of each week, participants receive the relevant material, in addition to answers to exercises from the previous session. All course materials are posted to a dedicated course homepage, which can be accessed via the same username and password. During each session, participants review the course materials and work through exercises using the CORExpress® program. The instructor will provide answers to the exercises and to posted questions, but participants may also engage in discussions with other course participants.

Course program:

Session 1: Regression Analysis Basics

  • Primary types of regression – linear, logistic, linear discriminant analysis (LDA)
  • Prediction vs. classification
    • Assessing model performance
    • R2 for linear regression
    • Classification Table and ROC Curve for dichotomous dependent variable
    • Accuracy and AUC
  • Examples with simulated data
    • Linear regression
    • Logistic regression / LDA
  • Cross-validation
    • Assessing/Validating model performance with and without cut-points
    • Holdout samples and M-fold cross-validation
    • Generalizability and R2
    • Repeated rounds of M-folds
  • Graphical displays
  • Raw vs. standardized coefficients and measures of predictor importance
  • Problems with stepwise regression methods

Session 2: Regularization: Penalized Regression and Dimension Reduction Approaches

  • Background/Introduction to Problems with High Dimensional Data
    • Need for Regularization
    • Naïve Bayes as Extreme form of regularization
    • Naïve Bayes outperforms LDA and Logistic regression
    • Naïve Bayes as starting point for Correlated Component Regression (CCR)

    Tutorial with dichotomous dependent variable and P = 3,571 predictors

  • Sparseness and Regularization
    • Stepwise Regression
    • Penalized regression approaches – Ridge Regression, Lasso, Elastic Net
    • Principle Components (PCR) and Partial Least Squares Regression (PLS-R)
    • Correlated Component Regression (CCR)
      • CCR step-down algorithm
      • M-fold cross-validation with CCR step-down
      • Right and wrong way to do cross-validation
      • Improvements over Penalized Regression, PCR and PLS-R
  • Relationship between CCR, Naïve Bayes and traditional regression
    • Saturated CCR is equivalent to traditional regression
    • K = 1-component CCR is equivalent to Naïve Bayes
  • Example: Key Driver Regression
  • CCR Variants
    • CCR-Linear
    • CCR-Logistic
    • CCR-LDA
  • Graphical displays: boxplots and coefficient trace plots
  • Examples with real and simulated data
    • Example with Near Infrared (NIR) Data

Session 3: Comparison of Variable Selection/Reduction Approaches

  • Importance of suppressor variables
  • Using M-fold cross-validation for model tuning
  • M-fold cross-validation with CCR step-down
  • Examples with simulated data with many true predictors
    • Logistic Regression and LDA
    • Use of interactive ROC/Scatter plot
  • Failure of common prescreening methods to capture suppressor variables
  • Failure of Naive Bayes to capture suppressor variables
  • Coefficient path plots

Session 4: Issues and Extensions

  • Guidelines to avoid over-fitting
  • Extended CCR models
  • Discrete dependent variable with more than 2 categories
    • CCR-Survival/Event history model
    • Hybrid CCR/Latent Class models
      • Example: Key Driver Regression on orange juice ratings data

Who should sign up for this course: Marketing researchers, biomedical researchers, survey analysts, and anyone who wants to learn the latest tools to develop reliable regression models given the challenges of many correlated predictors that approach or exceed the sample size N (high dimensional data). Applications include predictive models based on gene expression, key driver regression, and more.

Prerequisite: Participants should have taken at least two courses in statistics, and be familiar with the use of linear and logistic regression.

Course Material: No text required — The material for this course will appear in Dr. Magidson’s forthcoming book to be published by Chapman & Hall/CRC. Through your comments and questions, you have the unique opportunity to contribute to this book. Copies of published articles, forthcoming book excerpts, and other material will be made available. All participants will have free access to the demo version of CORExpress® and XLSTAT-CCR, which allows unrestricted analyses of all course datasets.

Instructor: Dr. Jay Magidson, founder and president of Statistical Innovations Inc.. Dr. Magidson’s clients have included A.C. Nielsen Co., Household Finance Corp., Blue Cross Blue Shield Association, and Pfizer. He taught statistics at Tufts and Boston University and is widely published on the theory and applications of multivariate statistical methods. Dr. Magidson designed SPSS CHAID, SI-CHAID®, GOLDMineR® and CORExpress®, is the co-developer (with Jeroen K. Vermunt) of the Latent GOLD® and Latent GOLD® Choice programs, and is co-developer (with Thierry Fahmy) of the XLSTAT-CCR module.