Inhalt lesen- psychologie-aktuell

An investigation into the usefulness of time-efficient item selection in computerized adaptive testing
Birk Diedenhofen & Jochen Musch
PDF of the full article

Measuring adolescents’ social goals during lower secondary school
Hanna-Riitta Ståhl, Niina Junttila & Päivi M. Niemi
PDF of the full article

Special Topic:
Advances in Educational Measurement (Part II)
Guest editors: Andreas Frey, Christoph König & Christian Spoden

Guest Editorial
Andreas Frey, Christoph König & Christian Spoden
PDF of the full article

A continuous calibration strategy for computerized adaptive testing
Aron Fink, Sebastian Born, Christian Spoden & Andreas Frey
PDF of the full article

Introducing multistage adaptive testing into international large-scale assessments designs using the example of PIAAC
Kentaro Yamamoto, Lale Khorramdel & Hyo Jeong Shin
PDF of the full article

The impact of ignoring the partially compensatory relation between ability dimensions on norm-referenced test scores
Janine Buchholz & Johannes Hartig
PDF of the full article

An investigation into the usefulness of time-efficient item selection in computerized adaptive testing
Birk Diedenhofen & Jochen Musch

Abstract
In computerized adaptive testing (CAT), item-selection algorithms generally attempt to maximize the information provided by each item. However, response times are usually ignored. To improve time efficiency, we established new time-efficient item-selection algorithms that maximize the information collected in a given amount of time. Simulations with 2PL data from the Amsterdam Chess Test (van der Maas & Wagenmakers, 2005) showed that time-efficient item-selection algorithms are indeed able to collect more information in the same amount of time. However, the gains in the amount of information turned out to be rather modest and came at the cost of an increase in measurement bias. For testing practice, our results suggest that item selection based on maximum information can and should be retained as the gold standard in CAT.

Keywords: computerized adaptive testing, item selection, time efficiency, response time, item response theory

Birk Diedenhofen
Department of Experimental Psychology,
University of Duesseldorf
Universitaetsstrasse 1
Building 23.03
40225 Duesseldorf, Germany
birk.diedenhofen@uni-duesseldorf.de

Measuring adolescents’ social goals during lower secondary school
Hanna-Riitta Ståhl, Niina Junttila & Päivi M. Niemi

Abstract
The purpose of this study was to investigate Finnish adolescents’ (n = 390) social goals during three years of lower secondary school at six (6) time points (from age 12 to age 16). We intended to study the measurement validity and longitudinal stability of adolescents’ social goals as measured by the Interpersonal Goals Inventory for Children (IGI-C), developed by Ojanen, Grönroos and Salmivalli (2005). The interpersonal circumplex model is based on two pairs of factors: (1) agency and submission and (2) communion and separation. We aimed to test whether the phenomena of social goals could be captured as individual factors using these four qualities instead of the standard two broader dimensions; Agentic and Communal. These dimensions are usually divided into eight sub-scales according to different combinations. This hypothesized four-factor model was modeled and confirmed using longitudinal confirmatory factor analysis (LCFA). According to the LCFA, the stability within each factor was at least moderate, and the interrelations between the factors varied over time. Acceptable concurrent and discriminant validity was shown by mostly stronger correlations within the social goal sum scores than between the social goals and social anxiety scores. Compared to the original IGI-C measurement tool, the tool utilized in this study, the Scale of Interpersonal Goals for Adolescents (SIG-A), provides a more simplified measurement. This simplified measurement offers a new way to examine adolescents’ social goals in terms of four separate factors. Moreover, with this measurement tool, it is possible to study the social development of adolescents in a more detailed manner - one social goal at a time.

Keywords: social goals, interpersonal goals, measurement validity, longitudinal stability, adolescence

Hanna-Riitta Ståhl
Department of Teacher Education
University of Turku
FI-20014 Turku, Finland
hrlaak@utu.fi

A continuous calibration strategy for computerized adaptive testing
Aron Fink, Sebastian Born, Christian Spoden & Andreas Frey

Abstract
This paper presents a new continuous calibration strategy for using computerized adaptive testing in application areas where it is not feasible to conduct a separate calibration study and/or to construct the complete item pool before the operational phase of the test. This method enables a step-by-step build-up of the item pool across several test cycles. A combination of equating and linking is used to maintaining the scale across these cycles. A simulation study was carried out to investigate the performance of the strategy regarding the precision of the ability estimates. The simulation study is based on a full factorial design with the factors IRT model, sample size and number of new uncalibrated items added to the item pool per test cycle. Precision of the ability estimates increased over the test cycles in all conditions. For the 2PL model, a better performance was reached when using a lower number of new uncalibrated items. The results support the application of the new method especially in small sample sizes.

Keywords: computerized adaptive testing, item response theory, online calibration, item banks, test design

Aron Fink
Institute of Educational Science
Department of Research Methods in Education
Friedrich Schiller University Jena
Am Planetarium 4
07743 Jena, Germany
aron.fink@uni-jena.de

Introducing multistage adaptive testing into international large-scale assessments designs using the example of PIAAC
Kentaro Yamamoto, Lale Khorramdel & Hyo Jeong Shin

Abstract
PIAAC is one of the first international large-scale assessments that implemented a multistage adaptive testing (MST) design. The design consists of multiple layers of adaptation to administer the most relevant and efficient set of questions based on the estimated proficiency of respondents. The benefits of the MST design were evaluated in terms of the comparability of item parameters across countries and the test efficiency. To assess the comparability across countries, item-by-country interactions were examined using item response theory (IRT) models. The efficiency of the MST design was calculated and compared to a nonadaptive design with a fixed item format. Moreover, possible effects of the position of item sets on item difficulty, which would present a problem for implementing MST, were examined. Results show a higher test efficiency in the MST design, only small item position effects and a high comparability of item parameters across different countries and languages.

Keywords: multistage adaptive testing, IRT, item position, test efficiency, measurement invariance

Kentaro Yamamoto
Educational Testing Service
Research and Development
Center for Global Assessment
660 Rosedale Road
Princeton, NJ 08541, USA
kyamamoto@ets.org

The impact of ignoring the partially compensatory relation between ability dimensions on norm-referenced test scores
Janine Buchholz & Johannes Hartig

Abstract
The IRT models most commonly employed to estimate within-item multidimensionality are compensatory and suggest that some dimensions (e.g., traits or abilities) can make up for a lack in others. However, many assessment frameworks in educational large-scale assessments suggest partially compensatory relations among dimensions. In two Monte-Carlo simulation studies we varied the loading pattern, the latent correlation between dimensions and the ability distribution to evaluate the impact on test scores when a compensatory model is incorrectly applied onto partially compensatory data. Findings imply only negligible effects when true abilities are bivariate normal. Assuming a
uniform distribution, however, analyses of differences in test scores demonstrated systematic effects for specific patterns of true ability: High abilities are largely underestimated when the other ability required to solve some of the items was low. These findings highlight the necessity of applying the partially compensatory model under data conditions likely to occur in educational large-scale assessments.

Keywords: educational testing, test interpretation, testing programs, Monte Carlo methods, validity

Janine Buchholz
German Institute for International Educational Research (DIPF)
Frankfurt am Main
Schloßstraße 29
60486 Frankfurt, Germany
buchholz@dipf.de

Psychological Test and Assessment Modeling
Volume 60 · 2018 · Issue 3
Pabst, 2018
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)
Preis: 15,- €

Zurück