NEWSBÜCHERJOURNALECONGRESSPAPER



 

Sie befinden sich hier: JOURNALE » Psychological Test and Assessment Modeling » Currently available » Inhalt lesen

« zurück




Psychological Test and Assessment Modeling

2016-1

SPECIAL TOPIC: MEASUREMENT EQUIVALENCE OF THE PATIENT REPORTED OUTCOMES MEASUREMENT INFORMATION SYSTEM® (PROMIS®) SHORT FORMS – PART I
GUEST EDITORS: BRYCE B. REEVE & JEANNE A. TERESI




Editorial:
Review and forecast on research in Psychological Test and Assessment Modeling
Klaus D. Kubinger (editor in chief)
Startet den Datei-DownloadPDF of the full article

Treating all rapid responses as errors (TARRE) improves estimates of ability (slightly)
Daniel B. Wright
Abstract | Startet den Datei-DownloadPDF of the full article

Overview to the two-part series: Measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short forms
Bryce B. Reeve & Jeanne A. Teresi
Abstract | Startet den Datei-DownloadPDF of the full article

Methodological issues in examining measurement equivalence in patient reported outcomes measures: Methods overview to the two-part series, “Measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short forms”
Jeanne A. Teresi & Richard N. Jones
Abstract | Startet den Datei-DownloadPDF of the full article

Differential item functioning magnitude and impact measures from item response theory models
Marjorie Kleinman & Jeanne A. Teresi
Abstract | Startet den Datei-DownloadPDF of the full article

The Measuring Your Health study: Leveraging community-based cancer registry recruitment to establish a large, diverse cohort of cancer survivors for analyses of measurement equivalence and validity of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short form items
Roxanne E. Jensen, Carol M. Moinpour, Theresa H.M. Keegan, Rosemary D. Cress, Xiao-Cheng Wu, Lisa E. Paddock, Antoinette M. Stroup & Arnold L. Potosky
Abstract | Startet den Datei-DownloadPDF of the full article

Psychometric Evaluation of the PROMIS® Fatigue measure in an ethnically and racially diverse population-based sample of cancer patients
Bryce B. Reeve, Laura C. Pinheiro, Roxanne E. Jensen, Jeanne A. Teresi, Arnold L. Potosky, Molly K. McFatrich, Mildred Ramirez & Wen-Hung Chen
Abstract | Startet den Datei-DownloadPDF of the full article

Psychometric properties and performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) depression short forms in ethnically diverse groups
Jeanne A. Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Mildred Ramirez & Giyeon Kim
Abstract | Startet den Datei-DownloadPDF of the full article

Measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety short forms in ethnically diverse groups
Jeanne A. Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Mildred Ramirez & Giyeon Kim
Abstract | Startet den Datei-DownloadPDF of the full article

 


Treating all rapid responses as errors (TARRE) improves estimates of ability (slightly)
Daniel B. Wright

Abstract

Response times can be modeled along with response accuracy to estimate ability. Models that do not use response times were compared with three models that do. The predictive accuracy of the models were assessed using leave-out-one-item cross-validation where for a k item test each method is used k times with k-1 items to create ability estimates and these estimates are used to predict responses on the remaining item. The conceptually simplest method using response times, which treats all rapid responses as errors (TARRE), produced the most predictive values. However, the increase was less than would be achieved by having one extra item on the test. Possible effects of changing the scoring algorithm on student test taking behavior need to be explored before implementing any such a change.

Keywords: response times, measurement, IRT


Daniel B. Wright, PhD
Public Education Department
300 Don Gasper Ave.
Santa Fe, NM, 37501, USA
dbrookswr@gmail.com

top


Overview to the two-part series: Measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short forms
Bryce B. Reeve & Jeanne A. Teresi

Abstract

Measurement equivalence across differing socio-demographic groups is essential for valid assessment. This is one of two issues of Psychological Test and Assessment Modeling that contains articles describing methods and substantive findings related to establishing measurement equivalence in self-reported health, mental health and social functioning measures.
The articles in this two part series describe analyses of items assessing eight domains: fatigue, depression, anxiety, sleep, pain, physical function, cognitive concerns and social function. Additionally, two overview articles describe the methods and sample characteristics of the data set used in these analyses. An additional article describes the important topic of assessing magnitude and impact of differential item functioning. These articles provide the first strong evidence supporting the measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short form measures in ethnically, socio-demographically diverse groups, and is a beginning step in meeting the international call for further study of their performance in such groups.

Key words: PROMIS, short form measures, patient-reported outcomes, measurement equivalence, differential item functioning


Bryce Reeve, PhD
Department of Health Policy and Management
Gillings School of Global Public Health
University of North Carolina at Chapel Hill
1101-D McGavran-Greenberg Hall
135 Dauer Drive
CB 7411, Chapel Hill
NC 27599-7411, USA
bbreeve@email.unc.edu

top


Methodological issues in examining measurement equivalence in patient reported outcomes measures: Methods overview to the two-part series, “Measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short forms”
Jeanne A. Teresi & Richard N. Jones

Abstract

The purpose of this article is to introduce the methods used and challenges confronted by the authors of this two-part series of articles describing the results of analyses of measurement equivalence of the short form scales from the Patient Reported Outcomes Measurement Information System® (PROMIS®). Qualitative and quantitative approaches used to examine differential item functioning (DIF) are reviewed briefly. Qualitative methods focused on generation of DIF hypotheses. The basic quantitative approaches used all rely on a latent variable model, and examine parameters either derived directly from item response theory (IRT) or from structural equation models (SEM). A key methods focus of these articles is to describe state-of-the art approaches to examination of measurement equivalence in eight domains: physical health, pain, fatigue, sleep, depression, anxiety, cognition, and social function. These articles represent the first time that DIF has been examined systematically in the PROMIS short form measures, particularly among ethnically diverse groups. This is also the first set of analyses to examine the performance of PROMIS short forms in patients with cancer.
Latent variable model state-of-the-art methods for examining measurement equivalence are introduced briefly in this paper to orient readers to the approaches adopted in this set of papers. Several methodological challenges underlying (DIF-free) anchor item selection and model assumption violations are presented as a backdrop for the articles in this two-part series on measurement equivalence of PROMIS measures.

Key Words: methods, PROMIS, measurement equivalence, differential item functioning, ethnic diversity


Jeanne A. Teresi, Ed.D, Ph.D.
Columbia University Stroud Center
at New York State Psychiatric Institute
1051 Riverside Drive, Box 42, Room 2714, New York
New York, 10032-3702, USA
Teresimeas@aol.com; jat61@columbia.edu

top


Differential item functioning magnitude and impact measures from item response theory models
Marjorie Kleinman & Jeanne A. Teresi

Abstract

Measures of magnitude and impact of differential item functioning (DIF) at the item and scale level, respectively are presented and reviewed in this paper. Most measures are based on item response theory models. Magnitude refers to item level effect sizes, whereas impact refers to differences between groups at the scale score level. Reviewed are magnitude measures based on group differences in the expected item scores and impact measures based on differences in the expected scale scores. The similarities among these indices are demonstrated. Various software packages are described that provide magnitude and impact measures, and new software presented that computes all of the available statistics conveniently in one program with explanations of their relationships to one another.

Key words: differential item functioning, magnitude, impact, item response theory


Marjorie Kleinman, M.S.
New York State Psychiatric Institute
1051 Riverside Drive, Unit 72
New York, NY, 10032, USA
kleinmam@nyspi.columbia.edu

top


The Measuring Your Health study: Leveraging community-based cancer registry recruitment to establish a large, diverse cohort of cancer survivors for analyses of measurement equivalence and validity of the Patient Reported Outcomes Measurement Information System® (PROMIS®) short form items
Roxanne E. Jensen, Carol M. Moinpour, Theresa H.M. Keegan, Rosemary D. Cress, Xiao-Cheng Wu, Lisa E. Paddock, Antoinette M. Stroup & Arnold L. Potosky

Abstract

The Measuring Your Health (MY-Health) study was designed to fill evidence gaps by validating eight Patient Reported Outcomes Measurement Information System® (PROMIS®) domains (Anxiety, Depression, Fatigue, Pain Interference, Physical Function, Sleep Disturbance, Applied Cognitive Function, and Ability to Participate in Social Roles and Activities) across multiple race-ethnic and age groups in a diverse cohort of cancer patients. This paper provides detailed information on MY-Health study design, implementation, and participant cohort; it identifies key challenges and benefits of recruiting a diverse community-based cancer cohort. Between 2010 and 2012, we identified eligible patients for the MY-Health study in partnership with four Surveillance, Epidemiology, and End Results (SEER) program cancer registries located in California, Louisiana, and New Jersey. The overall response rate for the MY-Health cohort (n = 5,506) was 34 %, with a median response time of 9.5 months after initial cancer diagnosis. The cohort represented meaningful diversity of age (22 % under 49 years of age) and race/ethnicity (41 % non-Hispanic White) across seven cancers. Challenges included lower response rates by race/ethnic minorities, young, and advanced-stage cancer patients, use of non-final registry information for eligibility identification, and lower use of translated surveys than expected. The MY-Health cohort represents one of the largest efforts to measure the full range of patient-reported symptoms experienced after initial cancer treatment. It provides sufficient diversity in terms of sociodemographics, symptoms, and function to provide a meaningful validation of eight PROMIS measures.

Key words: PROMIS, MY-Health, measurement equivalence, validity, cancer


Roxanne E. Jensen, Ph.D.
Cancer Prevention and Control Program
Lombardi Comprehensive Cancer Center
Georgetown University
3300 Whitehaven Street NW, Suite 4100
Washington, DC 20007, USA
rj222@georgetown.edu

top


Psychometric evaluation of the PROMIS® Fatigue measure in an ethnically and racially diverse population-based sample of cancer patients
Bryce B. Reeve, Laura C. Pinheiro, Roxanne E. Jensen, Jeanne A. Teresi, Arnold L. Potosky, Molly K. McFatrich, Mildred Ramirez & Wen-Hung Chen

Abstract

Aims: Fatigue is the most prevalent and distressing symptom related to cancer and its treatment affecting functioning and quality of life. In 2010, the National Cancer Institute’s Clinical Trials Planning Meeting on cancer-related fatigue adopted the PROMIS® Fatigue measure as the standard to use in clinical trials. This study evaluates the psychometric properties of the PROMIS Fatigue measure in an ethnically/racially diverse population-based sample of adult cancer patients.
Methods: Patients were recruited from four US cancer registries with oversampling of minorities. Participants completed a paper survey 6 - 13 months post-diagnosis. The 14 fatigue items (5-point Likert-type scale; English-, Spanish-, and Chinese-versions) were selected from the PROMIS Fatigue short forms and larger item bank. Item response theory and factor analyses were used to evaluate item- and scale-level performance. Differential item functioning (DIF) was evaluated using the Wald test and ordinal logistic regression (OLR) methods. OLR-identified items with DIF were evaluated further for their effect on the scale scores (threshold r2 > .13).
Results: The sample included 5,507 patients (2,278 non-Hispanic Whites, 1,122 non-Hispanic Blacks, 1,053 Hispanics, and 917 Asians/ Pacific Islanders); 338 Hispanics were given the Spanish-language version of the survey and 134 Asians the Chinese version. One PROMIS item had poor discrimination as it was the only positively worded question in the fatigue measure. Among Hispanics, no DIF was found with the Wald test, while the OLR method identified five items with DIF comparing the English and Spanish versions; however, the effect of DIF on scores was negligible (r2 ranged from .006 - .015). For the English and Chinese translations, no single item was consistently identified by both DIF tests. Minimal or no impact was observed on the overall scale score comparisons among Whites, Blacks, Hispanics, and Asians using the English language scales. However, greater numbers of items with DIF appeared when comparing Asians/ Pacific Islanders with Whites, Blacks, and Hispanics. “How often were you too tired to think clearly” showed consistent DIF.
Conclusions: Twelve of 14 PROMIS fatigue items performed well across the ethnically/racially diverse samples with minimal findings of DIF that would have any effect on comparing or combining scores across cancer populations. Supporting evidence of the validity and reliability of the PROMIS measures will enhance the adoption of the measures in oncology clinical research.

Keywords: differential item functioning, cancer, PROMIS, fatigue, patient-reported outcomes


Bryce Reeve, PhD
Department of Health Policy and Management
Gillings School of Global Public Health
University of North Carolina at Chapel Hill
1101-D McGavran-Greenberg Hall
135 Dauer Drive
CB 7411, Chapel Hill
NC 27599-7411, USA
bbreeve@email.unc.edu

top


Psychometric properties and performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Depression short forms in ethnically diverse groups
Jeanne A. Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Mildred Ramirez & Giyeon Kim

Abstract

Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer. DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups.
Methods: DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses. The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure. Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates.
Results: Many items evidenced DIF; however, only a few had slightly elevated magnitude. No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons. The following short form items might be targeted for further study because they were also hypothesized to evidence DIF. One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF. Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups. The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education. While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed.
Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution. Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method.
Conclusions: This was the first study to evaluate measurement equivalence of the PROMIS Depression short forms across large samples of ethnically diverse groups. There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS Depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms.

Key Words: depression, PROMIS®, differential item functioning, item response theory, ethnic diversity


Jeanne A. Teresi, Ed.D, Ph.D.
Columbia University Stroud Center
at New York State Psychiatric Institute
1051 Riverside Drive, Box 42, Room 2714, New York
New York, 10032-3702, USA
Teresimeas@aol.com; jat61@columbia.edu

top


Measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety short forms in ethnically diverse groups
Jeanne A. Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Mildred Ramirez & Giyeon Kim

Abstract

This is the first study of the measurement equivalence of the Patient Reported Outcomes Measurement Information System® (PROMIS®) Anxiety short forms in a large ethnically diverse sample. The psychometric properties and differential item functioning (DIF) were examined across different racial/ethnic, educational, age, gender and language groups.
Methods: These data are from individuals selected from cancer registries in the United States. For the analyses of race/ethnicity the reference group was non-Hispanic Whites (n = 2,263), the studied groups were non-Hispanic Blacks (n = 1,117), Hispanics (n = 1,043) and Asians/Pacific Islanders (n = 907). Within the Hispanic subsample, there were 335 interviews conducted in Spanish and 703 in English. The 11 anxiety items were from the PROMIS emotional disturbance item bank.
DIF hypotheses were generated by content experts who rated whether or not they expected DIF to be present, and the direction of the DIF with respect to several comparison groups. The primary method used for DIF detection was the Wald test for examination of group differences in item response theory (IRT) item parameters accompanied by magnitude measures. Expected item scores were examined as measures of magnitude. The method used for quantification of the difference in the average expected item scores was the non-compensatory DIF (NCDIF) index. DIF impact was examined using expected scale score functions. Additionally, precision and reliabilities were examined using several methods.
Results: Although not hypothesized to show DIF for Asians/Pacific Islanders, every item evidenced DIF by at least one method. Two items showed DIF of higher magnitude for Asians/Pacific Islanders vs. Whites: “Many situations made me worry” and “I felt anxious”. However, the magnitude of DIF was small and the NCDIF statistics were not above threshold. The impact of DIF was negligible. For education, six items were identified with consistent DIF across methods: fearful, anxious, worried, hard to focus, uneasy and tense. However, the NCDIF was not above threshold and the impact of DIF on the scale was trivial. No items showed high magnitude DIF for gender. Two items showed slightly higher magnitude for age (although not above the cutoff): worried and fearful. The scale level impact was trivial. Only one item showed DIF with the Wald test after the Bonferroni correction for the language comparisons: “I felt fearful”. Two additional items were flagged in sensitivity analyses after Bonferroni correction, anxious and many situations made me worry. The latter item also showed DIF of higher magnitude, with an NCDIF value (0.144) above threshold. Individual impact was relatively small.
Conclusions: Although many items from the PROMIS short form anxiety measures were flagged with DIF, item level magnitude was low and scale level DIF impact was minimal; however, three items: anxious, worried and many situations made me worry might be singled out for further study. It is concluded that the PROMIS Anxiety short form evidenced good psychometric properties, was relatively invariant across the groups studied, and performed well among ethnically diverse subgroups of Blacks, Hispanic, White non-Hispanic and Asians/Pacific Islanders. In general more research with the Asians/Pacific Islanders group is needed. Further study of subgroups within these broad categories is recommended.

Key words: anxiety, PROMIS, item response theory, differential item functioning, ethnic diversity


Jeanne A. Teresi, Ed.D, Ph.D.
Columbia University Stroud Center
at New York State Psychiatric Institute
1051 Riverside Drive, Box 42, Room 2714, New York
New York, 10032-3702, USA
Teresimeas@aol.com ; jat61@columbia.edu

top




<- Zurück zu: Currently available

Öffnet einen internen Link im aktuellen FensterPsychological Test and Assessment Modeling Online-Shop...





alttext