Editorial
Mark Stemmler & Alexander von Eye
PDF of the full article
Identification of confounded subgroups using linear model-based recursive partitioning
Michael P. van Wie, Xintong Li & Wolfgang Wiedermann
PDF of the full article
Regression trees and random forests as alternatives to classical regression modeling: Investigating the risk factors for corporal punishment
Markus Fritsch, Harry Haupt, Friedrich Lösel & Mark Stemmler
PDF of the full article
Analyzing tree structures with Configural Frequency Analysis and the R-package confreq
Mark Stemmler, Jörg-Henrik Heine & Susanne Wallner
PDF of the full article
Log-linear and configural analysis of tree structures
Alexander von Eye, Wolfgang Wiedermann & Stefan von Weber
PDF of the full article
Using tree-based regression to examine factors related to math ability among 15-year old students
Cody Ding & Yuyang Zhao
PDF of the full article
Predicting post-experiment fatigue among healthy young adults: Random forest regression analysis
Eun-Young Mun & Feng Geng
PDF of the full article
Identification of confounded subgroups using linear model-based recursive partitioning
Michael P. van Wie, Xintong Li & Wolfgang Wiedermann
Abstract
The absence of confounding is the fundamental assumption to endow parameters of a statistical model with causal meaning. Causal inference is prone to biases due to confounding when data are purely observational. Often the assumption of unconfoundedness may be too rigid for the entire population under study, but may be plausible for subpopulations. The present article introduces an approach to detect confounded subgroups in linear regression models through combining a recently proposed confounder detection approach based on kernel-based independence testing with model-based recursive partitioning. Results of a simulation study indicate that Bonferroni-corrected independence tests are able protect the (family-wise) Type I error rate of multiple independence testing across recursively partitioned local models. We discuss data scenarios under which the proposed approach can be expected to show adequate statistical power to detect confounded subgroups. Data requirements to ensure best practice for applications and strategies to further improve the statistical power of the approach are discussed.
Keywords: causal inference, recursive partitioning, confounding, Hilbert-Schmidt independence criterion, non-normality
Wolfgang Wiedermann, PhD
Statistics, Measurement,
and Evaluation in Education
Department of Educational,
School, and Counseling Psychology
College of Education
University of Missouri
13B Hill Hall
Columbia, MO, 65211, USA
wiedermannw@missouri.edu
Regression trees and random forests as alternatives to classical regression modeling: Investigating the risk factors for corporal punishment
Markus Fritsch, Harry Haupt, Friedrich Lösel & Mark Stemmler
Abstract
Examining the behavior of individuals is a challenging task due to complex patterns underlying the observable outcome. Even if all predictors contributing to the behavior are known in a particular context, the functional form and potential multi-level interactions between the predictors are difficult to grasp. Any modeling decisions involved in specifying a functional form should withstand comparisons with data-driven techniques. Decision trees and random forests enable data-driven modeling and are valuable tools to overcome limitations of least square regressions and validate existing results. We illustrate the relevant modeling steps required to carry out the two techniques by investigating the complex patterns of aggressiveness, dysfunctional parent-child interactions, and other risk factors for corporal punishment of children by their fathers. We replicate existing results on the corresponding risk factors, interpret the modeling outcomes, and describe the setting of relevant meta parameters in empirical practice.
Keywords: regression trees, random forests, functional form selection, predictor selection, corporal punishment, parenting behavior
Markus Fritsch
Department of Statistics
University of Passau
94030 Passau, Germany
markus.fritsch@uni-passau.de
Analyzing tree structures with Configural Frequency Analysis and the R-package confreq
Mark Stemmler, Jörg-Henrik Heine, Susanne Wallner
Abstract
This article describes the use of Configural Frequency Analysis (CFA; von Eye, 2002) for detecting a tree structure in data that was analyzed with the SPSS module Answer Tree. At the same time, the application of the R package confreq (Heine, 2019) is demonstrated. The data example is taken from a longitudinal study on deviant and delinquent behavior in juveniles. The study is called ’Chances and Risks in the Life-Course (CURL)’ (Reinecke et al., 2013). Here, the offender status of juveniles in 11th grade was predicted by several risk factors measured two years earlier. The CHAID analysis detected a two level tree with illegal substance use as the first node and with the nodes antisocial attitudes and living in a single parent home on the second level. Functional Configural Frequency Analysis (von Eye & Mair, 2008) was applied to model the data structure in several steps: First, a main effects model is performed, then a significant configuration addressing non-offending in juveniles is blanked out and finally, effect vectors modeling the decision tree sequence were introduced into the design matrix. This resulted in a fitting model which suggests that the tree structure was correctly modeled. The theoretical background for this data application can be found in this Special Issue (see von Eye, Wiedermann, & von Weber, 2019). Syntax and outputs based on confreq are presented to create a hands-on application.
Keywords: Configural Frequency Analysis (CFA), R-package confreq, tree-based methods, antisocial behavior, log-linear modeling (LLM)
Mark Stemmler
Friedrich-Alexander University Erlangen-Nürnberg
Department of Psychology
Nägelsbachstrasse 49c
91054 Erlangen, Germany
mark.stemmler@fau.de
Log-linear and configural analysis of tree structures
Alexander von Eye, Wolfgang Wiedermann & Stefan von Weber
Abstract
In this article, two methods are proposed for the analysis of existing tree structures. The first method, log-linear modeling, is variable-oriented. The goals pursued with a log-linear analysis of tree structures concern the modeling of patterns of a sequence of decisions. Starting from a base model that is a standard hierarchical model, it adds special contrasts that represent the decisions that are made when progressing through the tree. Goal of analysis is to fit a model of the sequence of decisions. These contrasts render the model non-standard. The method of Schuster transformation is needed to create a design matrix with contrasts that can be interpreted as intended. The second method, Configural Frequency Analysis (CFA), is person-oriented. Starting from the final, fitting log-linear model, it removes the special contrasts from the model and asks whether there are individual patterns (configurations) that are discrepant from the base model and, thus represent the sequence of decisions. These patterns can be interpreted as types and antitypes in a CFA sense. In a data example, students’ decisions and life satisfaction are examined. The intertwining of the proposed methods and possible extensions are discussed.
Key words: modeling decision trees, nonstandard log-linear models, Configural Frequency Analysis, parameter interpretation, Schuster transformation
Alexander von Eye, PhD
Michigan State University
190 Allée du Nouveau Monde
34000 Montpellier, France
voneye@msu.edu
Using tree-based regression to examine factors related to math ability among 15-year old students
Cody Ding, & Yuyang Zhao
Abstract
Few studies have examined factors that can predict students’ mathematic ability, particularly subgroups of students who share the common characteristics that are associated with different levels of math ability. Based on PISA 2012 data from the United States and China, we used regression tree analysis to select the most salient predictors of math ability and identify the subgroups of 15-year-old students who were likely to be proficient in math ability. Based on the results from regression tree analysis, it was found that students whose math self-efficacy score was over 3.33 and their perceived positive peer math norm score was below 3.33 on a rating scale of 1 to 4 were most likely to be associated with proficient math ability. In contrast, students whose math self-efficacy score was below 2.81 were most likely to be associated with low or below-average math ability. However, for students whose math self-efficacy score was between 2.81 and 3.33, their math ability level was likely to be associated with their perceived positive peer math norm. The significance of the study is that it uniquely identified distinct subgroups of students who were more likely to be associated with different levels of math ability. Methodologically, the study demonstrated the application of regression tree analysis in studying students’ math ability.
Keywords: mathematic ability, regression tree, tress-based method, math self-efficacy, PISA 2012
Cody Ding, PhD
Education Sciences & Professional Programs
University of Missouri-St. Louis
St. Louis, USA
dingc@umsl.edu
Predicting post-experiment fatigue among healthy young adults: Random forest regression analysis
Eun-Young Mun & Feng Geng
Abstract
The current study utilized a random forest regression analysis to predict post-experiment fatigue in a sample of 212 healthy participants (mean age = 20.5, SD = 2.21; 52% women) between the ages of 18 and 30 following a mildly stressful experiment. We used a total of 30 features of demographic variables, lifestyle variables, alcohol and other drug use behaviors and problems, state anxiety and depressive symptoms, and physiological indicators that were lab assessed or self-reported. A random forest regression analysis with 10-fold cross-validation resulted in accurate prediction of post-experiment fatigue (R2 equivalent = 0.93) with the average “out-of-bag” (OOB) R2 = 0.52. Not surprisingly, self-reported pre-experiment fatigue was the most important variable (54%) in the prediction of post-experiment fatigue. Feeling anxious (state anxiety) pre- and post-experiment (3%, 7%), feeling less vigorous post experiment (3%), systolic and diastolic blood pressure (3%, 2%) and LF HRV (2%) assessed at baseline, and self-reported alcohol-related problems (3%) and sleep (2%) additionally contributed to the prediction of post-experiment fatigue. Other remaining input variables had relatively minimal importance. Substantively, this study suggests that complex interactions across multiple systems domains that support regulation may be linked to fatigue. A random forest regression analysis can relatively easily be implemented with a built-in cross-validation function and reveal a web of connections undergirding health behavior and risks.
Keywords: fatigue, stress, regulation, random forests, machine learning
Eun-Young Mun, PhD
Department of Health Behavior and Health Systems
University of North Texas Health Science Center
3500 Camp Bowie Blvd.
EAD 709, Fort Worth, TX 76107-2699, USA
eun-young.mun@unthsc.edu
Psychological Test and Assessment Modeling
Volume 61 · 2019 · Issue 4
Pabst, 2019
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)