**Special Topic: Tree-based methods for regression and classification – Statistical methods at the interface of graphics and statistics Guest editors: Mark Stemmler & Alexander von Eye **

Editorial

*Mark Stemmler & Alexander von Eye*

PDF of the full article

Identification of confounded subgroups using linear model-based recursive partitioning

*Michael P. van Wie, Xintong Li & Wolfgang Wiedermann*

PDF of the full article

Regression trees and random forests as alternatives to classical regression modeling: Investigating the risk factors for corporal punishment

*Markus Fritsch, Harry Haupt, Friedrich Lösel & Mark Stemmler*

PDF of the full article

Analyzing tree structures with Configural Frequency Analysis and the R-package confreq

*Mark Stemmler, Jörg-Henrik Heine & Susanne Wallner*

PDF of the full article

Log-linear and configural analysis of tree structures

*Alexander von Eye, Wolfgang Wiedermann & Stefan von Weber*

PDF of the full article

Using tree-based regression to examine factors related to math ability among 15-year old students

*Cody Ding & Yuyang Zhao*

PDF of the full article

Predicting post-experiment fatigue among healthy young adults: Random forest regression analysis

*Eun-Young Mun & Feng Geng*

PDF of the full article

**Identification of confounded subgroups using linear model-based recursive partitioning***Michael P. van Wie, Xintong Li & Wolfgang Wiedermann Abstract*

The absence of confounding is the fundamental assumption to endow parameters of a statistical model with causal meaning. Causal inference is prone to biases due to confounding when data are purely observational. Often the assumption of unconfoundedness may be too rigid for the entire population under study, but may be plausible for subpopulations. The present article introduces an approach to detect confounded subgroups in linear regression models through combining a recently proposed confounder detection approach based on kernel-based independence testing with model-based recursive partitioning. Results of a simulation study indicate that Bonferroni-corrected independence tests are able protect the (family-wise) Type I error rate of multiple independence testing across recursively partitioned local models. We discuss data scenarios under which the proposed approach can be expected to show adequate statistical power to detect confounded subgroups. Data requirements to ensure best practice for applications and strategies to further improve the statistical power of the approach are discussed.

*Keywords:*causal inference, recursive partitioning, confounding, Hilbert-Schmidt independence criterion, non-normality

*Wolfgang Wiedermann, PhD*

Statistics, Measurement,

and Evaluation in Education

Department of Educational,

School, and Counseling Psychology

College of Education

University of Missouri

13B Hill Hall

Columbia, MO, 65211, USA

wiedermannw@missouri.edu

Statistics, Measurement,

and Evaluation in Education

Department of Educational,

School, and Counseling Psychology

College of Education

University of Missouri

13B Hill Hall

Columbia, MO, 65211, USA

wiedermannw@missouri.edu

**Regression trees and random forests as alternatives to classical regression modeling: Investigating the risk factors for corporal punishment***Markus Fritsch, Harry Haupt, Friedrich Lösel & Mark Stemmler Abstract*

Examining the behavior of individuals is a challenging task due to complex patterns underlying the observable outcome. Even if all predictors contributing to the behavior are known in a particular context, the functional form and potential multi-level interactions between the predictors are difficult to grasp. Any modeling decisions involved in specifying a functional form should withstand comparisons with data-driven techniques. Decision trees and random forests enable data-driven modeling and are valuable tools to overcome limitations of least square regressions and validate existing results. We illustrate the relevant modeling steps required to carry out the two techniques by investigating the complex patterns of aggressiveness, dysfunctional parent-child interactions, and other risk factors for corporal punishment of children by their fathers. We replicate existing results on the corresponding risk factors, interpret the modeling outcomes, and describe the setting of relevant meta parameters in empirical practice.

*Keywords:*regression trees, random forests, functional form selection, predictor selection, corporal punishment, parenting behavior

*Markus Fritsch*

Department of Statistics

University of Passau

94030 Passau, Germany

markus.fritsch@uni-passau.de

Department of Statistics

University of Passau

94030 Passau, Germany

markus.fritsch@uni-passau.de

**Analyzing tree structures with Configural Frequency Analysis and the R-package confreq***Mark Stemmler, Jörg-Henrik Heine, Susanne Wallner Abstract*

This article describes the use of Configural Frequency Analysis (CFA; von Eye, 2002) for detecting a tree structure in data that was analyzed with the SPSS module Answer Tree. At the same time, the application of the R package confreq (Heine, 2019) is demonstrated. The data example is taken from a longitudinal study on deviant and delinquent behavior in juveniles. The study is called ’Chances and Risks in the Life-Course (CURL)’ (Reinecke et al., 2013). Here, the offender status of juveniles in 11th grade was predicted by several risk factors measured two years earlier. The CHAID analysis detected a two level tree with illegal substance use as the first node and with the nodes antisocial attitudes and living in a single parent home on the second level. Functional Configural Frequency Analysis (von Eye & Mair, 2008) was applied to model the data structure in several steps: First, a main effects model is performed, then a significant configuration addressing non-offending in juveniles is blanked out and finally, effect vectors modeling the decision tree sequence were introduced into the design matrix. This resulted in a fitting model which suggests that the tree structure was correctly modeled. The theoretical background for this data application can be found in this Special Issue (see von Eye, Wiedermann, & von Weber, 2019). Syntax and outputs based on confreq are presented to create a hands-on application.

*Keywords:*Configural Frequency Analysis (CFA), R-package confreq, tree-based methods, antisocial behavior, log-linear modeling (LLM)

*Mark Stemmler*

Friedrich-Alexander University Erlangen-Nürnberg

Department of Psychology

Nägelsbachstrasse 49c

91054 Erlangen, Germany

mark.stemmler@fau.de

Friedrich-Alexander University Erlangen-Nürnberg

Department of Psychology

Nägelsbachstrasse 49c

91054 Erlangen, Germany

mark.stemmler@fau.de

**Log-linear and configural analysis of tree structures***Alexander von Eye, Wolfgang Wiedermann & Stefan von Weber Abstract*

In this article, two methods are proposed for the analysis of existing tree structures. The first method, log-linear modeling, is variable-oriented. The goals pursued with a log-linear analysis of tree structures concern the modeling of patterns of a sequence of decisions. Starting from a base model that is a standard hierarchical model, it adds special contrasts that represent the decisions that are made when progressing through the tree. Goal of analysis is to fit a model of the sequence of decisions. These contrasts render the model non-standard. The method of Schuster transformation is needed to create a design matrix with contrasts that can be interpreted as intended. The second method, Configural Frequency Analysis (CFA), is person-oriented. Starting from the final, fitting log-linear model, it removes the special contrasts from the model and asks whether there are individual patterns (configurations) that are discrepant from the base model and, thus represent the sequence of decisions. These patterns can be interpreted as types and antitypes in a CFA sense. In a data example, students’ decisions and life satisfaction are examined. The intertwining of the proposed methods and possible extensions are discussed.

*Key words:*modeling decision trees, nonstandard log-linear models, Configural Frequency Analysis, parameter interpretation, Schuster transformation

*Alexander von Eye, PhD*

Michigan State University

190 Allée du Nouveau Monde

34000 Montpellier, France

voneye@msu.edu

Michigan State University

190 Allée du Nouveau Monde

34000 Montpellier, France

voneye@msu.edu

**Using tree-based regression to examine factors related to math ability among 15-year old students***Cody Ding, & Yuyang Zhao Abstract*

Few studies have examined factors that can predict students’ mathematic ability, particularly subgroups of students who share the common characteristics that are associated with different levels of math ability. Based on PISA 2012 data from the United States and China, we used regression tree analysis to select the most salient predictors of math ability and identify the subgroups of 15-year-old students who were likely to be proficient in math ability. Based on the results from regression tree analysis, it was found that students whose math self-efficacy score was over 3.33 and their perceived positive peer math norm score was below 3.33 on a rating scale of 1 to 4 were most likely to be associated with proficient math ability. In contrast, students whose math self-efficacy score was below 2.81 were most likely to be associated with low or below-average math ability. However, for students whose math self-efficacy score was between 2.81 and 3.33, their math ability level was likely to be associated with their perceived positive peer math norm. The significance of the study is that it uniquely identified distinct subgroups of students who were more likely to be associated with different levels of math ability. Methodologically, the study demonstrated the application of regression tree analysis in studying students’ math ability.

*Keywords:*mathematic ability, regression tree, tress-based method, math self-efficacy, PISA 2012

*Cody Ding, PhD*

Education Sciences & Professional Programs

University of Missouri-St. Louis

St. Louis, USA

dingc@umsl.edu

Education Sciences & Professional Programs

University of Missouri-St. Louis

St. Louis, USA

dingc@umsl.edu

**Predicting post-experiment fatigue among healthy young adults: Random forest regression analysis***Eun-Young Mun & Feng Geng Abstract*

The current study utilized a random forest regression analysis to predict post-experiment fatigue in a sample of 212 healthy participants (mean age = 20.5, SD = 2.21; 52% women) between the ages of 18 and 30 following a mildly stressful experiment. We used a total of 30 features of demographic variables, lifestyle variables, alcohol and other drug use behaviors and problems, state anxiety and depressive symptoms, and physiological indicators that were lab assessed or self-reported. A random forest regression analysis with 10-fold cross-validation resulted in accurate prediction of post-experiment fatigue (R2 equivalent = 0.93) with the average “out-of-bag” (OOB) R2 = 0.52. Not surprisingly, self-reported pre-experiment fatigue was the most important variable (54%) in the prediction of post-experiment fatigue. Feeling anxious (state anxiety) pre- and post-experiment (3%, 7%), feeling less vigorous post experiment (3%), systolic and diastolic blood pressure (3%, 2%) and LF HRV (2%) assessed at baseline, and self-reported alcohol-related problems (3%) and sleep (2%) additionally contributed to the prediction of post-experiment fatigue. Other remaining input variables had relatively minimal importance. Substantively, this study suggests that complex interactions across multiple systems domains that support regulation may be linked to fatigue. A random forest regression analysis can relatively easily be implemented with a built-in cross-validation function and reveal a web of connections undergirding health behavior and risks.

*Keywords:*fatigue, stress, regulation, random forests, machine learning

*Eun-Young Mun, PhD*

Department of Health Behavior and Health Systems

University of North Texas Health Science Center

3500 Camp Bowie Blvd.

EAD 709, Fort Worth, TX 76107-2699, USA

eun-young.mun@unthsc.edu

Department of Health Behavior and Health Systems

University of North Texas Health Science Center

3500 Camp Bowie Blvd.

EAD 709, Fort Worth, TX 76107-2699, USA

eun-young.mun@unthsc.edu

**Psychological Test and Assessment Modeling Volume 61 · 2019 · Issue 4**

Pabst, 2019

ISSN 2190-0493 (Print)

ISSN 2190-0507 (Internet)