**Contents, Volume 50, 2008, Issue 3**

KLAUS D. KUBINGER

On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects

Abstract | PDF of the full article

SUSAN E. EMBRETSON & ROBERT C. DANIEL

Understanding and quantifying cognitive complexity level in mathematical problem solving items

Abstract | PDF of the full article

PHILIPP SONNLEITNER

Using the LLTM to evaluate an item-generating system for reading comprehension

Abstract | PDF of the full article

HEINZ HOLLING, HELEN BLANK, KAROLINE KUCHENBÄCKER & JÖRG-TOBIAS KUHN

Rule-based item design of statistical word problems: A review and first implementation

Abstract | PDF of the full article

JULIA HAHNE

Analyzing position effects within reasoning items using the LLTM for structurally incomplete data

Abstract | PDF of the full article

CHRISTINE HOHENSINN, KLAUS D. KUBINGER, MANUEL REIF, STEFANA HOLOCHER-ERTL, LALE KHORRAMDEL & MARTINA FREBORT

Examining item-position effects in large-scale assessment using the Linear Logistic Test Model

Abstract | PDF of the full article

YIYU XIE & MARK WILSON

Investigating DIF and extensions using an LLTM approach and also an individual differences approach: an international testing context

Abstract | PDF of the full article

KAREN DRANEY & MARK WILSON

A LLTM approach to the examination of teachers ratings of classroom assessment tasks

Abstract | PDF of the full article

RENATO MICELI, MICHELE SETTANNI & GIULIO VIDOTTO

Measuring change in training programs: An empirical illustration

Abstract | PDF of the full article

**On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects**

KLAUS D. KUBINGER *Abstract*

The "linear logistic test model (LLTM) breaks down the item parameter of the Rasch model into a linear combination of certain hypothesized elementary parameters. Apart from the originally intended primary application of generating an indefinite number of items composed of whichever item difficulties the examiner chooses, there are many other potential applications. They all deal with measuring certain item administration effects. This paper illustrates several of these approaches as well as how to design data sampling using the respective LLTMs structure matrix. These approaches deal with:

a) Rasch model item calibration using data sampled consecutively in time but partly from the same examinees; b) measuring position effects of item presentation, in particular, learning and fatigue effects - specific for each position, as well as linear or non-linear; c) measuring content-specific learning effects; d) measuring warming-up effects; e) measuring effects of speeded item presentation; f) measuring effects of different item response formats. It is pointed out that the given LLTM approaches have the advantage of "elegance, as a hierarchical system of concurrent (alternative) hypotheses can be tested.*Key words:* Rasch model, LLTM, item generating rules, position effects, multiple choice format*Klaus D. Kubinger, PhDUniversity of ViennaFaculty of PsychologyHead of the Division of Psychological Assessment and Applied PsychometricsLiebiggasse 5A-1010 Vienna, AustriaE-Mail: *

*klaus.kubinger@univie.ac.at*

**U****nderstanding and quantifying cognitive complexity level in mathematical problem solving items**

SUSAN E. EMBRETSON & ROBERT C. DANIEL*Abstract*

The linear logistic test model (LLTM; Fischer, 1973) has been applied to a wide variety of new tests. When the LLTM application involves item complexity variables that are both theoretically interesting and empirically supported, several advantages can result. These advantages include elaborating construct validity at the item level, defining variables for test design, predicting parameters of new items, item banking by sources of complexity and providing a basis for item design and item generation. However, despite the many advantages of applying LLTM to test items, it has been applied less often to understand the sources of complexity for large-scale operational test items. Instead, previously calibrated item parameters are modeled using regression techniques because raw item response data often cannot be made available. In the current study, both LLTM and regression modeling are applied to mathematical problem solving items from a widely used test. The findings from the two methods are compared and contrasted for their implications for continued development of ability and achievement tests based on mathematical problem solving items. *Key words:* Mathematical reasoning, LLTM, item design, mathematical problem solving*Susan E. EmbretsonSchool of PsychologyGeorgia Institute of Technology654 Cherry StreetAtlanta, GA 30332-0170, USAE-Mail: *

*susan.embretson@psych.gatech.edu*

**Using the LLTM to evaluate an item-generating system for reading comprehension**

PHILIPP SONNLEITNER *Abstract*

Due to inconclusive findings concerning the components responsible for the difficulty of reading comprehension items, this paper attempts to set up an item-generating system using hypothesis-driven modeling of item complexity applying Fischers (1973) linear logistic test model (LLTM) to a German reading comprehension test. This approach guarantees an evaluation of the postulated item-generating system; moreover construct validity of the administered test is investigated. Previous findings in this field are considered; additionally, some text features are introduced to this debate and their impact on item difficulty is discussed. Results once more show a strong influence of formal components (e.g. the number of presented response options in a multiple-choice-format), but also indicate how this effect can be minimized. *Key words:* Rasch model, LLTM, item-generating rules, reading comprehension *Philipp Sonnleitner, MSc.Center of Testing and ConsultingDivision of Psychological Assessment and Applied PsychometricsUniversity of ViennaFaculty of PsychologyLiebiggasse 5A-1010 Vienna, AustriaE-Mail: *

*philipp.sonnleitner@univie.ac.at*

**Rule-based item design of statistical word problems: A review and first implementation**

HEINZ HOLLING, HELEN BLANK, KAROLINE KUCHENBÄCKER & JÖRG-TOBIAS KUHN*Abstract*

Although a large body of research has been published with respect to arithmetic word problems, few studies have investigated statistical word problems in detail. This article therefore pursues two goals: Firstly, a review of current design and analysis of statistical word problems is provided. Secondly, results from a pilot study with systematically-designed statistical word problems are reported. Using the linear-logistic test model (LLTM) as well as the latent regression LLTM, we have found that the postulated cognitive model fits the data well overall. The study provides evidence that statistical word problems can be designed and analysed in a systematic way, and that the proposed cognitive model of solving statistical word problems can be successfully used in future assessments.*Key words:* Statistical word problems, rule-based item design, Rasch model, linear-logistic test model, latent regression LLTM*Prof. Dr. H. HollingChair of Statistics and MethodsWestfälische Wilhelms-Universität MünsterFliednerstr. 2148149 Münster, GermanyE-Mail: *

*holling@uni-muenster.de*

**Analyzing position effects within reasoning items using the LLTM for structurally incomplete data**

JULIA HAHNE *Abstract*

Position or transfer effects on an individuals ability while processing a series of test items are often ignored when tests are created. It is often implicitly assumed that such effects, if they occur, are a) the same for all persons and b) for all items and thus do not contribute to information about person ability or item difficulty. Rasch model analyses cannot quantify position effects because they are invariably confounded with the item difficulty parameters. In case of adaptive testing, where the examinees are administered the same items at different positions, effects of the position of item presentation lead to unfair estimations of item (and, consequently, person) parameters, and are therefore absolutely unwarranted. This study applies the Linear Logistic Test Model (LLTM, Fischer, 1973) for structurally incomplete data to illustrate how a series of test items can be evaluated for position effects. The test material consists of the Viennese Matrices (WMT, Formann & Piswanger, 1979) presented in varying item order to six groups of examinees. The study sample group consisted of 405 high school students. The concept of virtual items is introduced and applied to different models. Several hypotheses are tested by means of hierarchically applied Andersens Likelihood Ratio tests. As a result of these analyses, no significant position effect can be found.*Key words:* LLTM, position effects, Likelihood Ratio tests, hierarchical testing*Mag. Julia HahneCEOPS - Center of Excellence for Orthopaedic Pain management SpeisingOrthopaedical Hospital SpeisingSpeisinger Straße 109A-1130 Vienna, AustriaE-Mail: *

*julia.hahne@ceops.at*

**Examining item-position effects in large-scale assessment using the Linear Logistic Test Model**

CHRISTINE HOHENSINN, KLAUS D. KUBINGER, MANUEL REIF, STEFANA HOLOCHER-ERTL, LALE KHORRAMDEL & MARTINA FREBORT*Abstract*

When administering large-scale assessments, item-position effects are of particular importance because the applied test designs very often contain several test booklets with the same items presented at different test positions. Establishing such position effects would be most critical; it would mean that the estimated item parameters do not depend exclusively on the items difficulties due to content but also on their presentation positions. As a consequence, item calibration would be biased. By means of the linear logistic test model (LLTM), item-position effects can be tested. In this paper, the results of a simulation study demonstrating how LLTM is indeed able to detect certain position effects in the framework of a large-scale assessment are presented first. Second, empirical item-position effects of a specific large-scale competence assessment in mathematics (4th grade students) are analyzed using the LLTM. The results indicate that a small fatigue effect seems to take place. The most important consequence of the given paper is that it is advisable to try pertinent simulation studies before an analysis of empirical data takes place; the reason is, that for the given example, the suggested Likelihood-Ratio test neither holds the nominal type-I-risk, nor qualifies as "robust, and furthermore occasionally shows very low power.*Key words:* Rasch model, LLTM, large-scale assessment, item-position effects, item calibration*Christine Hohensinn, M.Sc.Center of Testing and ConsultingDivision of Psychological Assessment and Applied PsychometricsFaculty of PsychologyUniversity of ViennaLiebiggasse 5A-1010 Vienna, AustriaE-Mail: *

*christine.hohensinn@univie.ac.at*

**Investigating DIF and extensions using an LLTM approach and also an individual differences approach: an international testing context**

YIYU XIE & MARK WILSON*Abstract*

This study intends to investigate two ways to generalise differential item functioning (DIF) by grouping of items that share a common feature, or an item property as in the Linear Logistic Test Model (LLTM). An item "facet refers to this type of grouping, and DIF can be expressed in terms of more fundamental parameters that relate to the facet of items. Hence the differential facet functioning (DFF) model, a particular version of the LLTM, helps to explain the DIF effects more substantively. Using the mathematics data from the Program for International Student Assessment (PISA) 2003, this study shows that modeling the DFF effect through an interaction of the group-by-facet parameter rather than DIF effect on the individual item level can be handled easily with the NLMIXED procedure of SAS. We found that the results are more interpretable when the bias is interpreted on the facet level rather than the item level. Analogous to the multidimensional DIF model, one natural extension of the DFF model is to make the model multidimensional when DFF facets (i.e., LLTM facets) are considered as dimensions. This extension, multidimensional DFF (MDFF), is also investigated. The MDFF model allows individual differences to be modeled on the dimension that exhibits a DFF effect. However, it is always recommended to check the individual DIF estimates and construct a substantive analysis first before conducting DFF and MDFF analysis.*Key words:* DIF, LLTM, differential facet functioning (DFF), multidimensional model, PISA*Yiyu XieUniversity of California, BerkeleyBerkeley, CA 94720, USAE-Mail: *

*yxie05@gmail.com*

**A LLTM approach to the examination of teachers ratings of classroom assessment tasks**

KAREN DRANEY & MARK WILSON*Abstract*

This paper investigates the use of a specific case of the Linear Logistic Test Model, known as the rating scale rater model, in which the item parameter is conceptualized to include an item difficulty parameter, plus a rating severity parameter. Using this model, the severity of groups of teachers is investigated when they scored sets of 321 pretests and posttests designed to be congruent with an embedded assessment system. The items were included in a linked design involving multiple booklets randomly allocated to students. Individual teachers were found to differ in overall severity, but also showed a reasonable amount of consistency within two of the three district moderation groups. Teachers also showed some mean differences between districts. There is also evidence that the model may be too tightly constrained, and further exploration using a less constrained model is indicated.*Key words:* IRT, LLTM, rater effects, teacher effects*Karen DraneyEAEDU - School of Education4323 TolmanBerkeley, CA 94720, USAE-Mail: *

*kdraney@berkeley.edu*

**Measuring change in training programs: An empirical illustration**

RENATO MICELI, MICHELE SETTANNI & GIULIO VIDOTTO *Abstract*

The implementation of training programs often requires a complex design if effectiveness is to be accurately evaluated. Part of the difficulty lies in the fact that trainees must be presented with a series of ever-changing tasks in order to avoid biases due to learning or carryover effects.

The aim of the present study is to experiment and illustrate a simple procedure, based on a special case of the linear logistic test model (LLTM), used to evaluate the effectiveness of a training program. The procedure is empirically applied to a dataset derived from a moped riding skills training program. The sample is composed of 207 high school students who took part in three training sessions using a riding simulator. A different task presentation order was assigned to each subject and the whole design was completely balanced. The procedure applied allowed us to obtain estimates of the overall change in ability that occurred over the course of the training process. Furthermore, we were able to obtain estimates of item and subject parameters unbiased by the influence of change in ability due to training. Implications of the results are discussed and suggestions for future research are presented.*Key words:* Measuring change, Linear Logistic Test Model (LLTM), Multi-Facet Rasch model, Training effectiveness*Renato MiceliDepartment of PsychologyUniversity of Torin*

Via Verdi 10

10124 Torino, Italy, Europe

E-Mail: *renato.miceli@unito.it*