Editorial
Lale Khorramdel, Artur Pokropek & Peter van Rijn
PDF of the full article
Using a multilevel random item Rasch model to examine item difficulty variance between random groups
Johannes Hartig, Carmen Köhler & Alexander Naumann
PDF of the full article
Measurement invariance testing in questionnaires: A comparison of three Multigroup-CFA and IRT-based approaches
Janine Buchholz & Johannes Hartig
PDF of the full article
Improving measurement properties of the PISA home possessions scale through partial invariance modeling
Selene S. Lee & Matthias von Davier
PDF of the full article
PISA reading: Mode effects unveiled in short text responses
Fabian Zehner, Ulf Kroehne, Carolin Hahnel & Frank Goldhammer
PDF of the full article
Comparability of response time scales in PISA
Hyo Jeong Shin, Emily Kerzabi, Seang-Hwane Joo, Frederic Robin & Kentaro Yamamoto
PDF of the full article
Using a multilevel random item Rasch model to examine item difficulty variance between random groups
Johannes Hartig, Carmen Köhler & Alexander Naumann
Abstract
In educational assessments, item difficulties are typically assumed to be invariant across groups (e.g., schools or countries). We refer to variances of item difficulties on the group level violating this assumption as random group differential item functioning (RG-DIF). We examine the performance of three methods to estimate RG-DIF: (1) three-level Generalized Linear Mixed Models (GLMMs), (2) three-level GLMMs with anchor items, and (3) item-wise multilevel logistic regression (ML-LR) controlling for the estimated trait score. In a simulation study, the magnitude of RG-DIF and the covariance of the item difficulties on the group level were varied. When group level effects were independent, all three methods performed well. With correlated DIF, estimated variances on the group level were biased with the full three-level GLMM and ML-LR. This bias was more pronounced for ML-LR than for the full three-level GLMM. Using a three-level GLMM with anchor items allowed unbiased estimation of RG-DIF.
Keywords: Multilevel Rasch Model, Random Item Effects, Measurement Invariance
Johannes Hartig, PhD
DIPF | Leibniz Institute for Research
and Information in Education
Rostocker Str. 6
60323 Frankfurt, Germany
hartig@dipf.de
Measurement invariance testing in questionnaires: A comparison of three Multigroup-CFA and IRT-based approaches
Janine Buchholz & Johannes Hartig
Abstract
International Large-Scale Assessments aim at comparisons of countries with respect to latent constructs such as attitudes, values and beliefs. Measurement invariance (MI) needs to hold in order for such comparisons to be valid. Several statistical approaches to test for MI have been proposed: While Multigroup Confirmatory Factor Analysis (MGCFA) is particularly popular, a newer, IRT-based approach was introduced for non-cognitive constructs in PISA 2015, thus raising the question of consistency between these approaches. A total of three approaches (MGCFA for ordinal and continuous data, multi-group IRT) were applied to simulated data containing different types and extents of MI violations, and to the empirical non-cognitive PISA 2015 data. Analyses are based on indices of the magnitude (i.e., parameter-specific modification indices resulting from MGCFA and group-specific item fit statistics resulting from the IRT approach) and direction of local misfit (i.e., standardized parameter change and mean deviation, respectively). Results indicate that all measures were sensitive to (some) MI violations and more consistent in identifying group differences in item difficulty parameters.
Key words: item response theory, item fit, confirmatory factor analysis, modification indices, PISA
Janine Buchholz, PhD
DIPF | Leibniz Institute for Research
and Information in Education
Rostocker Straße 6
60323 Frankfurt, Germany
buchholz@dipf.de
Improving measurement properties of the PISA home possessions scale through partial invariance modeling
Selene S. Lee & Matthias von Davier
Abstract
This paper analyzes the longitudinal and cross-country measurement invariance of the home possessions scale, one of the three components used to measure socioeconomic status (SES) in the Programme for International Student Assessment (PISA). It finds that most of the items in the scale are invariant over time but not invariant across countries. Another finding is that using multiple-group concurrent calibration with partial invariance will make the home possessions scale more comparable over time and across countries, compared to the original home possessions scales that had been generated in each cycle. Moreover, using the new method based on the two-parameter logistic (2PL) model and the generalized partial credit model (GPCM) maintained, and in some cases, even improved the within-country accuracy of the scores, as indicated by an increased regression coefficient when the home possessions scores were used to predict the cognitive proficiency scores for reading, math, and science.
Keywords: Measurement invariance; Home possessions (HOMEPOS) scale; Socioeconomic status (SES); Programme for International Student Assessment (PISA); International large-scale assessment (ILSA)
Selene S. Lee, PhD
Graduate School of Education
University of Pennsylvania
3700 Walnut Street
Philadelphia, PA 19103, USA
selene_lee@alumni.upenn.edu
PISA reading: Mode effects unveiled in short text responses
Fabian Zehner, Ulf Kroehne, Carolin Hahnel & Frank Goldhammer
Abstract
Educational largescale assessments risk their temporal comparability when shifting from paperto computerbased assessment. A recent study showed how text responses have altered alongside PISA’s mode change, indicating mode effects. Uncertainty remained, however, because it compared students from 2012 and 2015. We aimed at reproducing the findings in an experimental setting, in which n = 836 students answered PISA reading questions on computer, paper, or both. Text response features for information quantity and relevance were extracted automatically. Results show a comprehensive recovery of findings. Students incorporated more information into their text responses on computer than on paper, with some items being more affected than others. Regarding information relevance, we found less mode effect variance across items than the original study. Hints for a relationship between mode effect and gender across items could be reproduced. The study demonstrates the stability of linguistic feature extraction from text responses.
Keywords: Computerbased assessment, paperbased assessment, openended text responses, mode effect, automatic processing
DIPF | Leibniz Institute for Research
and Information in Education
Rostocker Str. 6
60323 Frankfurt am Main
fabian.zehner@dipf.de
Comparability of response time scales in PISA
Hyo Jeong Shin, Emily Kerzabi, Seang-Hwane Joo, Frederic Robin & Kentaro Yamamoto
Abstract
The primary goal of this study was to explore the possibility of establishing common response time (RT) scales in the Programme for International Student Assessment (PISA) across participating countries and economies. We use categorized item-level RTs, which affords improved handling of non-lognormal RT distributions with outliers (observed in PISA RT data) and of missing data stemming from PISA’s complex rotated booklet design. Categorized RT data were first analyzed using unidimensional multiple-group item response theory (IRT) models assuming a single latent trait in the RT data. Due to systematic patterns of misfit, the RT data were then analyzed using multidimensional multiple-group IRT models, in which RT scales were assumed to vary by item properties, specifically by item type or cognitive demand. Results indicate that PISA RT scales appear to be multidimensional by item type (multiple-choice and constructed-response). The present study provides implications for the analytical procedures involving RT in international large-scale assessments.
Keywords: Response time, process data, large-scale assessment, Programme for International Student Assessment (PISA), comparability, dimensionality, measurement invariance
Hyo Jeong Shin, PhD
Educational Testing Service
660 Rosedale Road, 13-E
Princeton, NJ 08541, USA
hshin@ets.org
Psychological Test and Assessment Modeling
Volume 62 · 2020 · Issue 1
Pabst, 2020
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)