SHOPNEWSBÜCHERBUCHREIHENJOURNALECONGRESSPAPERKOMMENTARE



Pabst bei Twitter

 

Sie befinden sich hier: JOURNALE » Psychological Test and Assessment Modeling » Currently available » Inhalt lesen

« zurück




Psychological Test and Assessment Modeling

2016-4

SPECIAL ISSUE: CURRENT METHODOLOGICAL ISSUES IN EDUCATIONAL LARGE-SCALE ASSESSMENTS – PART I
GUEST EDITORS: MATTHIAS STADLER, SAMUEL GREIFF & SABINE KROLAK-SCHWERDT




The detection of heteroscedasticity in regression models for psychological data
Andreas G. Klein, Carla Gerhard, Rebecca D. Büchner, Stefan Diestel & Karin Schermelleh-Engel
Abstract | Startet den Datei-DownloadPDF of the full article

Special Issue:
Current Methodological Issues in Educational Large-Scale Assessments – Part I

Guest editors: Matthias Stadler, Samuel Greiff & Sabine Krolak-Schwerdt

Guest Editorial
Matthias Stadler, Samuel Greiff & Sabine Krolak-Schwerdt
Startet den Datei-DownloadPDF of the full article

The transition to computer-based testing in large-scale assessments: Investigating (partial) measurement invariance between modes
Sarah Buerger, Ulf Kroehne & Frank Goldhammer
Abstract | Startet den Datei-DownloadPDF of the full article

Differentiated assessment of mathematical competence with multidimensional adaptive testing
Anna Mikolajetz & Andreas Frey
Abstract | Startet den Datei-DownloadPDF of the full article

Modeling test context effects in longitudinal achievement data: Examining position effects in the longitudinal German PISA 2012 assessment
Gabriel Nagy, Oliver Lüdtke & Olaf Köller
Abstract | Startet den Datei-DownloadPDF of the full article

Using response time data to inform the coding of omitted responses
Jonathan P. Weeks, Matthias von Davier & Kentaro Yamamoto
Abstract | Startet den Datei-DownloadPDF of the full article

 


The detection of heteroscedasticity in regression models for psychological data
Andreas G. Klein, Carla Gerhard, Rebecca D. Büchner, Stefan Diestel & Karin Schermelleh-Engel

Abstract

One assumption of multiple regression analysis is homoscedasticity of errors. Heteroscedasticity, as often found in psychological or behavioral data, may result from misspecification due to overlooked nonlinear predictor terms or to unobserved predictors not included in the model. Although methods exist to test for heteroscedasticity, they require a parametric model for specifying the structure of heteroscedasticity. The aim of this article is to propose a simple measure of heteroscedasticity, which does not need a parametric model and is able to detect omitted nonlinear terms. This measure utilizes the dispersion of the squared regression residuals. Simulation studies show that the measure performs satisfactorily with regard to Type I error rates and power when sample size and effect size are large enough. It outperforms the Breusch-Pagan test when a nonlinear term is omitted in the analysis model. We also demonstrate the performance of the measure using a data set from industrial psychology.

Keywords: Heteroscedasticity, Monte Carlo study, Regression, Interaction effect, Quadratic effect


Prof. Dr. Andreas G. Klein
Department of Psychology
Goethe University Frankfurt
Theodor-W.-Adorno-Platz 6
60629 Frankfurt, Germany
klein@psych.uni-frankfurt.de

top


The transition to computer-based testing in large-scale assessments: Investigating (partial) measurement invariance between modes
Sarah Buerger, Ulf Kroehne & Frank Goldhammer

Abstract

This paper provides an overview and recommendations on how to conduct a mode effect study in large-scale assessments by addressing criteria of equivalence between paper-based and computer-based tests. These criteria are selected according to the intended use of test scores and test score interpretations. A mode effect study can be implemented using experimental designs. The major benefit of combining experimental design considerations with the IRT methodology of mode effects is the possibility to investigate partial measurement invariance. This allows test scores from different modes to be used interchangeably and means of latent variables or mean differences and correlations to be compared on the population level even if some items differ in difficulty between modes. For this purpose, a multiple-group IRT model approach for analyzing mode effects on the test and item levels is presented. Instances where partial measurement invariance suffices to combine item parameters into one metric are reviewed in this paper. Furthermore, relevant study design requirements and potential sources of mode effects are discussed. Finally, an extension of the modelling approach to explain mode effects by means of item properties such as response format is presented.

Keywords: mode effect, equivalence, computer-based assessment, partial measurement invariance, anchor items


Sarah Buerger, PhD
German Institute for International Educational Research (DIPF)
Solmsstraße 73-75
60486 Frankfurt am Main, Germany
buerger@dipf.de

top


Differentiated assessment of mathematical competence with multidimensional adaptive testing
Anna Mikolajetz & Andreas Frey

Abstract

The theoretical frameworks of large-scale assessments (LSAs) typically describe complex competence constructs. However, due to restrictions in testing time, the complexity of these competence constructs is often reduced to one or a small number of dimensions in operational LSAs. Because of its very high measurement efficiency, multidimensional adaptive testing (MAT) offers a solution to overcome this shortcoming. The present study demonstrates the capability of MAT to measure the 11 subdimensions of mathematical competence that are described in the theoretical framework of the German Educational Standards in Mathematics with sufficient precision without increasing test length. The characteristics of an empirically derived 11-dimensional competence distribution of 9,577 students and the parameters for 253 operational test items were used to simulate the application of MAT. Typical restrictions such as the usage of testlets or the fact that items in an open response format are in the item pool were taken into account in the simulation. Although the used item pool was not constructed for adaptive testing, the results show substantially higher reliability estimates for MAT compared to non-adaptive testing, especially for the subdimensions of mathematical competence, which are not as yet reported in the assessment. The results underscore the capacity of MAT to precisely measure competence constructs with many dimensions without the need to increase test length. This research therefore closes the current gap between theoretical underpinnings and actual measures in LSAs.

Keywords: computerized adaptive testing, item response theory, multidimensional IRT models, large-scale assessment


Anna Mikolajetz, Dipl.-Psych.
Friedrich Schiller University Jena
Department of Methodology and Evaluation Research
Am Steiger 3
07737 Jena, Germany
anna.mikolajetz@uni-jena.de

top


Modeling test context effects in longitudinal achievement data: Examining position effects in the longitudinal German PISA 2012 assessment
Gabriel Nagy, Oliver Lüdtke & Olaf Köller

Abstract

Position effects (PE) in school achievement tests are a specific kind of test context effects (TCEs) that refer to the phenomenon of items becoming more difficult, the later they are positioned in a test. Up until today, PEs have been investigated mainly in cross-sectional settings; this means that little is known about how the size of PEs changes when retesting students. In the present article, we investigate TCEs in the longitudinal extension of the PISA 2012 assessment in Germany. To this end, we propose an extension of the two-dimensional one-parameter item response model, with one dimension per measurement occasion, that includes the effects of booklets (i.e., test forms) on item clusters (i.e., item bundles) that are allowed to vary between assessment occasions and groups (school types). Results indicate that the TCEs uncovered in all domains tested (mathematics, science, and reading) are closely in line with PEs, with reading being most strongly affected, and mathematics being least affected. The size of PEs increased in the second assessment, although the domains were differently affected. This pattern of effects was more pronounced in nonacademic school types. Finally, estimates of average achievement gains appeared to be underestimated by IRT models that neglected TCEs, with differences being largest in domains most strongly affected by PEs (i.e., science and reading).

Keywords: PISA, Item Response Theory, Test Context Effects, Position Effects, Achievement Growth


Prof. Dr. Gabriel Nagy
Leibniz Institute for Science and Mathematics Education
Olshausenstraße 62
24118 Kiel, Germany
nagy@ipn.uni-kiel.de

top


Using response time data to inform the coding of omitted responses
Jonathan P. Weeks, Matthias von Davier & Kentaro Yamamoto

Abstract

Examinees may omit responses on a test for a variety of reasons, such as low ability, low motivation, lack of attention, or running out of time. Some decision must be made about how to treat these missing responses for the purpose of scoring and/or scaling the test, particularly if there is an indication that missingness is not skill related. The most common approaches are to treat the responses as either not reached/administered or incorrect. Depending on the total number of missing values, coding all omitted responses as incorrect is likely to introduce negative bias into estimates of item difficulty and examinee ability. On the other hand, if omitted responses are coded as not reached and excluded from the likelihood function, the precision of estimates of item and person parameters will be reduced. This study examines the use of response time information collected in many computer-based assessments to inform the coding of omitted responses. Empirical data from the Programme for the International Assessment of Adult Competencies (PIAAC) literacy and numeracy cognitive tests are used to identify item-specific timing thresholds via several logistic regression models that predict the propensity of responding rather than produce a missing data point. These thresholds can be used to inform the decision about whether an omitted response should be treated as not administered or as incorrect. The results suggest that for many items the timing thresholds (20 to 30 seconds on average) at a high expected probability level of observing a response are notably higher than thresholds used in the evaluation of rapid guessing of responses (e.g., 5 seconds).

Key words: Response time data, omitted reponses, timing tresholds


Jonathan P. Weeks, PhD
Educational Testing Service
225 Phillips Boulevard
08628, Princeton, New Jersey, United States
jweeks@ets.org

top




<- Zurück zu: Currently available

Öffnet einen internen Link im aktuellen FensterPsychological Test and Assessment Modeling Online-Shop...





alttext