GMX: Extended Graphical Model Checks. A Versatile Replacement of the plotGOF() Function of eRm
Rainer W. Alexandrowicz
PDF of the full article
Exploring Rater Quality in Rater-Mediated Assessment Using the Non-parametric Item Characteristic Curve Estimation
Farshad Effatpanah, Purya Baghaei
PDF of the full article
A hierarchical model for data showing an item-position effect
Karl Schweizer, Andreas Gold, and Dorothea Krampen
PDF of the full article
Meeting the Challenge of Assessing (Students´) Text Quality: Are There any Experts Teachers Can Learn from or Do We Face a More Fundamental Problem?
Ann-Kathrin Hennes, Barbara Maria Schmidt, Takuya Yanagida, Igor Osipov, Christian Rietz, Alfred Schabmann
PDF of the full article
Response time as an indicator of test-taking effort in PISA: country and item-type differences
Michalis P. Michaelides, Militsa Ivanova
PDF of the full article
In memoriam Kurt Pawlik (1934-2022)
Gebhard Sammer
PDF of the full article
GMX: Extended Graphical Model Checks. A Versatile Replacement of the plotGOF() Function of eRm
Rainer W. Alexandrowicz
The article introduces the R-package GMX, which extends the standard graphical model check of the eRm package. It supports the Rasch model, the PCM, and the RSM providing multiple group splits and options for selecting items, split groups, or specific parameters. Along with several graphical features, the package may prove useful for psychometric analyses extending the capabilities of eRm. It is freely available at osf.io/2ryd8.
Keywords: Rasch model, graphical model check, conditional maximum likelihood, multi-group split, R-package
Rainer W. Alexandrowicz
University of Klagenfurt
Institute of Psychology
Methods Department
Universitaetsstrasse 67–69
9020 Klagenfurt, Austria
rainer.alexandrowicz@aau.at
Exploring Rater Quality in Rater-Mediated Assessment Using the Non-parametric Item Characteristic Curve Estimation
Farshad Effatpanah & Purya Baghaei
A large number of researchers have explored the use of non-parametric item response theory (IRT) models, including Mokken scale analysis (Mokken, 1971), for inspecting rating quality in the context of performance assessment. Unlike parametric IRT models, such as Many-Facet Rasch Model (Linacre, 1989), non-parametric IRT models do not entail logistic transformations of ordinal ratings into interval scales neither do they impose any constraints on the form of item response functions. A disregarded method for examining raters’ scoring patterns is the nonparametric item characteristic curve estimation using kernel smoothing approach (Ramsay, 1991) which provides, without giving numerical values, graphical representations for identifying any unsystematic patterns across various levels of the latent trait. The purpose of this study is to use the non-parametric item characteristic curve estimation method for modeling and examining the scoring patterns of raters. To this end, the writing performance of 217 English as a foreign language (EFL) examinees were analyzed. The results of rater characteristic curves, tetrahedron simplex plots, QQ-plot, and kernel density functions across gender sub-groups showed that different exploratory plots derived from the non-parametric estimation of item characteristic curves using kernel smoothing approach can identify various rater effects and provide valuable diagnostic information for examining rating quality and exploring rating patterns, although the interpretation of some graphs are subjective. The implications of the findings for rater training and monitoring are discussed.
Keywords: Non-parametric estimation of item characteristic curves, kernel smoothing, rating patterns, scoring validity
Purya Baghaei
Islamic Azad University
Ostad Yusofi St. 9187147578,
Mashhad, Iran.
pbaghaei@mshdiau.ac.ir
A hierarchical model for data showing an item-position effect
Karl Schweizer, Andreas Gold & Dorothea Krampen
The hierarchical position-effect model for investigating whether data show an item-position effect is presented. This model includes a hierarchy of latent variables for representing such an effect. Several lower level latent variables associated with subsets of items originating from the segmentation of the item set constitute the first level of the hierarchy while the second level includes the general position-effect latent variable only. This model is proposed for situations where an item-position effect is not exactly monotonically increasing, as the customary position-effect model assumes it. In the application to a real data, model fit varied depending on the specification of the hierarchical model. The comparison with the customary position-effect model yielded similar outcomes, but the best model fit was achieved by the hierarchical position-effect model with linear effect specifications at both levels.
Keywords: item-position effect, hierarchical position-effect model, customary position-effect model, method effect, structural investigation
Karl Schweizer
Goethe University Frankfurt
Faculty of Psychology and Sports Sciences
Institute of Psychology
Theodor-W.-Adorno-Platz 6
60323 Frankfurt, Germany
K.Schweizer@psych.uni-frankfurt.de
Meeting the Challenge of Assessing (Students´) Text Quality: Are There any Experts Teachers Can Learn from or Do We Face a More Fundamental Problem?
Ann-Kathrin Hennes, Barbara Maria Schmidt, Takuya Yanagida, Igor Osipov, Christian Rietz, Alfred Schabmann
Despite the importance of writing texts in school, teachers´ competence in assessing the quality of students’ texts seems to be limited with respect to interrater reliability, i.e. objectivity. However, it is unclear whether the reason for this lies in the challenging task itself (assessing text quality) or is a matter of teachers’ lack of expertise (which could be improved by better teacher training). In this study, groups of presumed experts, teachers, and novices rated the overall quality of 20 students´ texts. In addition, they rated the importance of different component properties of texts for text quality assessments. Their ratings of text quality/importance of criteria were compared within the framework of the expert-novice paradigm. A many-facet Rasch model analysis indicated that neither teachers nor any of the other expert groups met predefined expertise criteria. All groups’ diagnostic competences were comparable to novices’ competences. We argue that more effort must be undertaken to identify manifest criteria that define good texts and are suitable for use in school.
Keywords: assessing text quality, teachers’ competences, many-facet Rasch model analysis, composition competence, expert-novice paradigm
Ann-Kathrin Hennes
Department of Therapeutic Education
University of Cologne
Klosterstraße 79b, 50931 Cologne, Germany
ann-kathrin.hennes@uni-koeln.de
Response time as an indicator of test-taking effort in PISA: country and item-type differences
Michalis P. Michaelides & Militsa Ivanova
In low-stakes assessments, when test-takers do not invest adequate effort, test scores underestimate the individual’s true ability, and ignoring the impact of test-taking effort may harm the validity of test outcomes. The study examined the level of examinees’ test-taking effort and accuracy in the Programme for International Student Assessment (PISA) across countries and different item types. The 2015 PISA computerized assessment was administered in 59 jurisdictions. Behavioral measures of students’ test-taking effort were constructed for the Mathematics and Reading assessments by applying a fixed and a normative threshold on item response times to identify rapid guessing. The proportion of rapid guessers on each item was found to be small on average, about 3 %, according to the normative and 1 % with a fixed fivesecond threshold. Rapid guessing was about twice as high in human-coded open-response items, compared to simple and complex multiple-choice items, and computer-scored openresponse items. Average performance for rapid guessers was on average much lower than for test-takers engaged in solution behavior for all types of items and more pronounced in Reading than in Mathematics. Weighted response time effort indicators by country were very high, and positively correlated with country mean PISA score. No other robust correlates were found with response time effort at the country level. Computerized test administrations facilitate the use of response time as a proxy for examinee test-taking effort. Programs may monitor this behavior to identify cross-country differences prior to comparisons of performance and for developing interventions to promote engagement with the assessment.
Keywords: Test-taking behavior, rapid guessing, response time effort, PISA
Michalis P. Michaelides
Dept. of Psychology
1 Panepistimiou Avenue, 2109 Aglantzia
P.O. Box 20537, 1678 Nicosia, Cyprus
Michaelides.michalis@ucy.ac.cy
In memoriam Kurt Pawlik (1934-2022)
Gebhard Sammer
Centre for Psychiatry, and Department for Psychology
Justus Liebig University Giessen
Klinikstrasse 36, 35392 Gießen
Gebhard.Sammer@uni-giessen.de
Psychological Test and Assessment Modeling
Volume 64 · 2022 · Issue 3
Pabst, 2022
ISSN 2190-0493 (Print)
ISSN 2190-0507 (Internet)