NEWSBÜCHERJOURNALEONLINE-SHOP



 

Sie befinden sich hier: JOURNALE » Psychological Test and Assessment Modeling » Currently available » Inhalt lesen

« zurück

Psychological Test and Assessment Modeling

» Psychological Test and Assessment Modeling Online-Shop...


2013-1

Psychological Test and Assessment Modeling, Volume 55, 2013 (1)

A systematic review of the methodology for person fit research in Item Response Theory: Lessons about generalizability of inferences from the design of simulation studies
André A. Rupp
Abstract | Startet den Datei-DownloadPDF of the full article

Robustness and power of the parametric t test and the nonparametric Wilcoxon test under non-independence of observations
Wolfgang Wiedermann & Alexander von Eye
Abstract | Startet den Datei-DownloadPDF of the full article

The position effect in tests with a time limit: the consideration of interruption and working speed
Karl Schweizer & Xuezhu Ren
Abstract | Startet den Datei-DownloadPDF of the full article


Special topic:
Current issues in educational and psychological measurement: Design, calibration, and adaptive testing - Part II
Guest editors: Ulf Kröhne & Andreas Frey

Guest editorial
Ulf Kröhne & Andreas Frey
Startet den Datei-DownloadPDF of the full article

Effect of item order on item calibration and item bank construction for computer adaptive tests
Otto B. Walter & Matthias Rose
Abstract | Startet den Datei-DownloadPDF of the full article

Too hard, too easy, or just right? The relationship between effort or boredom and ability-difficulty fit
Regine Asseburg & Andreas Frey
Abstract | Startet den Datei-DownloadPDF of the full article

The sequential probability ratio test for multidimensional adaptive testing with between-item multidimensionality
Nicki-Nils Seitz & Andreas Frey
Abstract | Startet den Datei-DownloadPDF of the full article

 


A systematic review of the methodology for person fit research in Item Response Theory: Lessons about generalizability of inferences from the design of simulation studies
André A. Rupp

Abstract

This paper is a systematic review of the methodology for person fit research targeted specifically at methodologists in training. I analyze the ways in which researchers in the area of person fit have conducted simulation studies for parametric and nonparametric unidimensional IRT models since the seminal review paper by Meijer and Sijtsma (2001). I specifically review how researchers have operationalized different types of aberrant responding for particular testing conditions in order to compare these simulation design characteristics with features of the real-life testing situations for which person fit analyses are officially reported. I discuss the alignment between the theoretical and practical work and the implications for future simulation work and guidelines for best practice.

Key words: Person fit, systematic review, aberrant responding, item response theory, simulation study, generalizability, experimental design.


André A. Rupp
Associate Professor
HDQM Department
EDMS Program
University of Maryland
1230-A Benjamin Building, College Park
MD 20742, USA
ruppandr@umd.edu

top


Robustness and power of the parametric t test and the nonparametric Wilcoxon test under non-independence of observations
Wolfgang Wiedermann  & Alexander von Eye

Abstract

A large part of previous work dealt with the robustness of parametric significance tests against non-normality, heteroscedasticity, or a combination of both. The behavior of tests under violations of the independence assumption received comparatively less attention. Therefore, in applications, researches may overlook that robustness and power properties of tests can vary with the sign and the magnitude of the correlation between samples. The common paired t test is known to be less powerful in cases of negative between-group correlations. In this case, Bortz and Schuster (2010) recommend the application of the nonparametric Wilcoxon test. Using Monte-Carlo simulations, we analyzed the behavior of the t- and the Wilcoxon tests for the one- and two-sample problem under various degrees of positive and negative correlations, population distributions, sample sizes, and true differences in location. It is shown that already minimal departures from independence heavily affect Type I error rates of the two-sample tests. In addition, results for the one-sample tests clearly suggest that the sign of the underlying correlation cannot be used as a basis to decide whether to use the t test or the Wilcoxon test. Both tests show a dramatic power loss when samples are negatively correlated. Finally, in these cases, the well-known power advantage of the Wilcoxon test diminishes when distributions are skewed and samples are small.

Key words: robustness, power, independence assumption, t test, Wilcoxon test


Wolfgang Wiedermann, PhD
University of Vienna
Unit of Research Methods
Liebiggasse 5
A-1010 Vienna, Austria
wolfgang.wiedermann@univie.ac.at

top


The position effect in tests with a time limit: the consideration of interruption and working speed
Karl Schweizer  & Xuezhu Ren

Abstract

The position effect is a possible source of impairment of the structural validity of a test concerning model fit. In the case of tests with a time limit there is even a complication of the situation because of a decreasing number of participants completing the last few items of the test. Therefore, it is assumed that the appropriate representation of the position effect must additionally consider interruption due to the time limit and the effect of working speed. Interruption can be represented by the same latent variable as the position effect whereas the contribution of working speed requires another one. Confirmatory factor models including a representation of the position effect as a linear, quadratic or logarithmic increase were compared with models additionally considering interruption as a logistic decrease or simply as immediate interruption. Furthermore, there were models additionally considering working speed. In the sample of 305 participants the investigation of probability-based covariances made apparent that the modeling of interruption and also working speed substantially improved model fit. The best-fitting model was characterized by a linearly increasing representation of the position effect combined with a logistic decrease in the more difficult items and a contribution due to working speed.

Key words: position effect, confirmatory factor analysis, tau-equivalent model, method effect


Karl Schweizer, PhD
Department of Psychology
Goethe University Frankfurt
Mertonstr 17
60054 Frankfurt a. M., Germany
K.Schweizer@psych.uni-frankfurt.de

top


Effect of item order on item calibration and item bank construction for computer adaptive tests
Otto B. Walter  & Matthias Rose

Abstract

Item banks are typically constructed from responses to items that are presented in one fixed order; therefore, order effects between subsequent items may violate the independence assumption. We investigated the effect of item order on item bank construction, item calibration, and ability estimation. 15 polytomous items similar to items used in a pilot version of a computer adaptive test for anxiety (Walter et al., 2005; Walter et al., 2007) were presented in one fixed order or in a order randomly generated for each respondent. A total of n=520 out-patients participated in the study. Item calibration (Generalized Partial Credit Model) yielded only small differences of slope and location parameters. Simulated test runs using either the full item bank or an adaptive algorithm produced very similar ability estimates (expected a posteriori estimation). These results indicate that item order had little impact on item calibration and ability estimation for this item set.

Key words: item response theory; computer adaptive testing; local independence; item bank construction


Otto B. Walter, PhD
Universität Bielefeld
Fakultät für Psychologie und Sportwissenschaft
AE Psychologische Methodenlehre und Qualitätssicherung
Postfach 100131
33501 Bielefeld, Germany
otto.walter@charite.de

top


Too hard, too easy, or just right? The relationship between effort or boredom and ability-difficulty fit
Regine Asseburg  & Andreas Frey

Abstract

Usually, it is assumed that achievement tests measure maximum performance. However, test performance is not only associated with ability but also with motivational and emotional aspects of test-taking. These aspects are influenced by individual success probability, which in turn depends on the ratio of individual ability to item difficulty (ability-difficulty fit). The impact of ability-difficulty fit on test-taking motivation and emotion is unknown and rarely considered when interpreting test results.
N = 9,452 ninth-graders in Germany (PISA 2006) completed a mathematics test and a questionnaire on test-taking effort (motivation) and boredom/daydreaming (emotion). Overall, mean item difficulty exceeded individual ability. Ability-difficulty fit was positively linear related with effort and boredom/daydreaming.
The results suggest that low ability students may not show maximum performance in a sequential achievement test. Thus, test score interpretation for this subsample may be invalid. As a solution to this problem the application of computerized adaptive testing is discussed.

Key words: achievement test, test-taking, effort, boredom, Performance


Regine Asseburg, PhD
Leibniz Institute for Science and Mathematics
Education at the University of Kiel (IPN)
Germany
asseburg@ipn.uni-kiel.de

top


The sequential probability ratio test for multidimensional adaptive testing with between-item multidimensionality
Nicki-Nils Seitz  & Andreas Frey

Abstract

It is examined whether the unidimensional Sequential Probability Ratio Test (SPRT) can be pro-ductively combined with multidimensional adaptive testing (MAT). With a simulation study, it is investigated whether this combination results in more accurate simultaneous classifications on two or three dimensions compared to several instances of unidimensional adaptive testing (UCAT) in combination with SPRT. The number of cut scores, and the correlation between the dimensions measured were varied. The average test length was mainly influenced by the number of cut scores (one, four) and the adaptive algorithm (MAT, UCAT). With MAT, a lower average test length was achieved in comparison to the UCAT. It is concluded that MAT will result in a higher percentage of correct classifications than UCAT when more than two dimensions are measured.

Key words: classification, computerized adaptive testing, item response theory, multidimensional adaptive testing, sequential probability ratio test


Nicki-Nils Seitz
Institute of Educational Science
Department of Research Methods in Education
Friedrich-Schiller-University Jena
Am Planetarium 4
07737 Jena, Germany
nicki-nils.seitz@uni-jena.de

top


» Psychological Test and Assessment Modeling Online-Shop...





alttext