header advert
Bone & Joint Research Logo

Receive monthly Table of Contents alerts from Bone & Joint Research

Comprehensive article alerts can be set up and managed through your account settings

View my account settings

Visit Bone & Joint Research at:

Loading...

Loading...

Open Access

Trauma

Are validated outcome measures used in distal radial fractures truly valid?

A critical assessment using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist



Download PDF

Abstract

Objectives

Patient-reported outcome measures (PROMs) are often used to evaluate the outcome of treatment in patients with distal radial fractures. Which PROM to select is often based on assessment of measurement properties, such as validity and reliability. Measurement properties are assessed in clinimetric studies, and results are often reviewed without considering the methodological quality of these studies. Our aim was to systematically review the methodological quality of clinimetric studies that evaluated measurement properties of PROMs used in patients with distal radial fractures, and to make recommendations for the selection of PROMs based on the level of evidence of each individual measurement property.

Methods

A systematic literature search was performed in PubMed, EMbase, CINAHL and PsycINFO databases to identify relevant clinimetric studies. Two reviewers independently assessed the methodological quality of the studies on measurement properties, using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. Level of evidence (strong / moderate / limited / lacking) for each measurement property per PROM was determined by combining the methodological quality and the results of the different clinimetric studies.

Results

In all, 19 out of 1508 identified unique studies were included, in which 12 PROMs were rated. The Patient-rated wrist evaluation (PRWE) and the Disabilities of Arm, Shoulder and Hand questionnaire (DASH) were evaluated on most measurement properties. The evidence for the PRWE is moderate that its reliability, validity (content and hypothesis testing), and responsiveness are good. The evidence is limited that its internal consistency and cross-cultural validity are good, and its measurement error is acceptable. There is no evidence for its structural and criterion validity. The evidence for the DASH is moderate that its responsiveness is good. The evidence is limited that its reliability and the validity on hypothesis testing are good. There is no evidence for the other measurement properties.

Conclusion

According to this systematic review, there is, at best, moderate evidence that the responsiveness of the PRWE and DASH are good, as are the reliability and validity of the PRWE. We recommend these PROMs in clinical studies in patients with distal radial fractures; however, more clinimetric studies of higher methodological quality are needed to adequately determine the other measurement properties.

Cite this article: Dr Y. V. Kleinlugtenbelt. Are validated outcome measures used in distal radial fractures truly valid?: A critical assessment using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist. Bone Joint Res 2016;5:153–161. DOI: 10.1302/2046-3758.54.2000462.

Article focus

  • The aim of this systematic review was to evaluate the methodological quality of the clinimetric studies that evaluated measurement properties of the available patient reported outcome measures (PROMs) used in patients with distal radial fractures.

  • To determine which PROM, based on the level of evidence of each individual measurement property, is most appropriate for the evaluation of patients with distal radial fractures.

Key findings

  • The two PROMs that were most extensively evaluated were the patient rated wrist evaluation (PRWE) (with seven of nine measurement properties investigated) and the Disabilities of Arm, Shoulder and Hand (DASH) (with four of nine investigated). The methodological quality of these studies ranged at best from poor to good.

Key messages

  • Strong evidence supporting ‘good quality’ of any of the current available PROMs in patients with distal radial fractures is lacking.

  • The PRWE and DASH are the two most extensively evaluated PROMs. Their measurement properties were mainly good but the methodological quality of the clinimetric studies was low; this does mean that these results may be biased.

  • For now we recommend to use the PRWE or DASH, but more clinimetric studies of higher methodological quality are needed to select PROMs in patients with distal radius fractures with greater confidence.

Strengths and limitations

  • Strength: This is the first study that has used the COnsensus-based Standards for the Selection of Health Measurement INstruments (COSMIN) checklist to systematically review the methodological quality of studies on the measurement properties of PROMs in the evaluation of treatment of distal radial fractures.

  • Strength: Our search was not just limited to English language studies, as both reviewers have a good knowledge of German and Dutch.

  • Limitation: It was not possible to distinguish between poor study reporting and poor methodological quality.

Introduction

Distal radial fractures account for approximately 17 % of all fractures1 and the distal radius is the most common fracture site in the upper extremity.2-4 Despite its high incidence, there is no treatment consensus for these fractures.5 To conduct best evidence clinical trials in distal radial fracture treatment, and to properly compare trial results, there must be consensus on the use of outcome measures. Historically, outcome assessment after distal radial fractures focused on imaging and physical examination (e.g. grip strength and range of motion). These assessments, however, do not represent the patients’ perspective as they do not take the patients’ feelings, opinion or wellbeing into account, which are likely to be more important for the patient.6

In the last two decades, outcomes assessment has shifted towards a patient-centred approach. This approach assesses the outcome based directly on the opinion of the patient. Outcomes such as pain and functional ability, which are highly relevant for patients, can be assessed by patient-reported outcome measures (PROMs).7

Currently, a wide variety of PROMs are available and are used to assess patient-reported functional outcomes for upper limb and wrist disorders.8-20 Several (non-)systematic studies have reviewed the existing literature in order to present available PROMs for assessing wrist and hand function in general.21-25 Over a period of 25 years, the two most extensively used PROMs for evaluating the treatment outcome of patients with distal radial fractures.26 were the Disabilities of Arm, Shoulder and Hand (DASH), and the (original or modified) Gartland and Werley scoring system. However, the patient-rated wrist evaluation (PRWE) was found to have the best measurement properties, e.g. it was found to be the most reliable, valid and responsive instrument for these patients. This conclusion was based on the results of the available clinimetric studies.26 Clinimetrics is a scientific discipline that aims to develop methods of assessing the properties of health measurement instruments, with the aim of improving the quality of outcome measures. Although the measurement properties were found to be good, the authors did not incorporate the methodological quality of these clinimetric studies.

It is important for the understanding of this systematic review to distinguish between the ‘methodological quality’ of clinimetric studies on PROMs and the ‘quality’ (e.g. the measurement properties) of the PROMs themselves. Evidently a PROM is only as good as the methodological quality of its study. In order to assess the methodological quality of clinimetric studies (i.e. studies on measurement properties) on PROMs, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group formulated a set of guidelines. First, the COSMIN group reached consensus on terminology, definitions and a taxonomy of measurement properties of PROMs in an international Delphi study. Next, the group developed a checklist containing standards for evaluating the methodological quality of studies on the measurement properties (e.g., reliability) of measurement instruments (e.g. DASH) (www.cosmin.nl).27 The best PROM should have a high level of evidence (e.g. as evaluated in high quality studies) supporting good quality on all measurement properties. The definitions and a description of the measurement properties are given in Table I.

Table I.

Definitions of the measurement properties.

Definitions of the measurement properties
Internal consistency The degree of the interrelatedness among the items
“Do the different questions in a PROM that are meant to measure the same general construct produce similar scores?”
Reliability The proportion of the total variance in the measurements which is because of “true” differences among patients
“How close are repeated measurements?”
Measurement error The systematic error and random error of a patient’s score that are not attributed to true changes in the construct to be measured
“What amount of change in a score cannot be considered a real or true change?”
Content validity The degree to which the content of a health-related patient-reported outcomes (HR-PRO) instrument is an adequate reflection of the construct to be measured
“Are all items relevant for the specific population and have important activities been missed?”
Structural validity The degree to which the scores of an HR-PRO instrument are an adequate reflection of the dimensionality of the construct to be measured
“Do all items in a PROM reflect single or multiple constructs?”
Hypotheses testing The degree to which the scores of an HR-PRO instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumption that the HR-PRO instrument validly measures the construct to be measured
“What is the expected relationship with other PROMs assessing comparable constructs?”
Cross-cultural validity The degree to which the performance of the items on a translated or culturally adapted HR-PRO instrument is an adequate reflection of the performance of the items of the original version of the HR-PRO instrument
“Has the PROM been correctly translated and retested in another language and cultural setting?”
Criterion validity The degree to which the scores of an HR-PRO instrument are an adequate reflection of a “gold standard”
“Is the PROM tested against the benchmark PROM?”
Responsiveness The ability of an HR-PRO instrument to detect change over time in the construct to be measured
“If patients improve or worsen over time does this change in the PROM accordingly?”
Interpretability* The degree to which one can assign qualitative meaning—that is, clinical or commonly understood connotations—to an instrument’s quantitative scores or change in scores
“What do the scores or change in scores of a PROM mean?”
  1. Clarification in bold

  1. *

    Is not a real measurement property, but nevertheless it is a meaningful requirement for the applicability of PROMs in research

  1. PROM, patient-reported outcome measure

The aim of this systematic review was to evaluate the methodological quality (using the COSMIN checklist) of the clinimetric studies that evaluated measurement properties of the available PROMs used in patients with distal radial fractures, and to make recommendations for the selection of PROMs based on the level of evidence of each individual measurement property. The results of this study might help us to determine which PROM is most appropriate for the evaluation of patients with distal radial fractures.

Materials and Methods

Literature search

We performed a literature search on November 13, 2015 to identify all published studies on the measurement properties of PROMs in the evaluation of treatment of distal radial fractures. The following databases were searched with specific index terms and derivatives of these terms: PubMed (1990 to 2015), EMbase (1990 to 2015), CINAHL (1990 to 2015), and PsycINFO (1990 to 2015). In PubMed we used a validated search filter for finding studies on measurement properties.28 We also added the names of all PROMs that are described for wrist disorders.29 The full search strategy is provided in the supplementary material. We restricted our search to studies published in English, German and Dutch because both reviewers are fluent in these languages. Reference lists were hand-searched to identify additional relevant studies.

Selection criteria

Two reviewers (YK and RN) independently assessed all titles and abstracts. We included studies with a description of the measurement properties of PROMs used in patients with a distal radial fracture. When in doubt about the applicability of a study, the full text article was retrieved and screened for eligibility. Afterwards, the researchers discussed their assessments and consensus was reached. In cases where consensus couldn’t be obtained, a third reviewer (VS), was employed to achieve consensus.

Assessment of the quality of the studies

The same two reviewers independently rated the methodological quality of the studies using the COSMIN checklist (www.cosmin.nl).30

The COSMIN checklist consists of 11 separate checklists, called “boxes”. In nine boxes the quality of nine measurement properties is addressed: a) internal consistency, b) reliability, c) measurement error, d) content validity, e) structural validity, f) hypotheses testing, g) cross-cultural validity, h) criterion validity and i) responsiveness. The last box, “j: interpretability”, is not a measurement property, but nevertheless it is a meaningful requirement for the applicability of PROMs in research. The generalisability of the results is determined with a final box. The definitions of the measurement properties and interpretability are given in Table I.

In each box, the methodological quality can be evaluated based on a variety of items addressing adequate study design and statistical analysis. Each question in any box must be rated as ‘excellent’, ‘good’, ‘fair’, ‘poor’ or ‘not applicable’. Scoring is then performed using the criteria set by the COSMIN group. To obtain a total score for the methodological quality of one of the boxes, “the worst score counts” algorithm was applied as set out by the COSMIN guidelines,31 meaning that the methodological quality of that measurement property was only rated ‘excellent’ if all relevant questions pertaining to that box (e.g. measurement property) were scored as excellent. In all boxes, a small sample size was considered poor methodological quality. As a rule of thumb, a sample size of ⩾ 100 received a rating of ‘excellent’, 50 to 100 received ‘good’, 30 to 50 was rated ‘fair’, and less than 30 was rated as ‘poor’.31

Level of evidence of the measurement properties per PROM

For each PROM, we determined the level of evidence by combining the results of the different studies for each measurement property, as described by Terwee et al.31 The following factors were taken into account: the number of studies (one or multiple), the methodological quality of the studies (excellent/good/fair/poor/not available), and consistency of the results (positive/negative). Based on these factors each measurement property per PROM could be ranked as strong, moderate, limited or conflicting evidence. Only when the methodological quality of the clinimetric study/studies was poor was the level of evidence rated as ‘unknown’.

Results

Included studies

A total of 2064 studies were retrieved by the electronic search performed in PubMed (n = 720), EMbase (n = 1075) and CINAHL/ PsycINFO (n = 269) (Fig. 1). After removing duplicates, 1508 unique studies were identified. The titles and abstracts were independently screened by two researchers, after which 27 studies were deemed potentially eligible. After retrieving and reading the full text, 19 studies were included. Reference evaluation of these 19 articles did not yield any additional relevant studies.

Fig. 1 
            Search strategy and selection of articles. *Nov 13, 2015. **Cinahl search includes PsycInfo database. HR-PRO, health-related patient-reported outcomes; DRF, distal radial fracture.

Fig. 1

Search strategy and selection of articles. *Nov 13, 2015. **Cinahl search includes PsycInfo database. HR-PRO, health-related patient-reported outcomes; DRF, distal radial fracture.

Overall results

In the 19 included studies, a total of 12 PROMs were evaluated (Table II). In three papers, multiple PROMs were evaluated: three,32 three33 and five,34 respectively. Most studies (80%) evaluated more than one measurement property. None of the studies evaluated structural validity. Criterion validity was also not evaluated in any of the studies. However, this was expected given that there are no measurement instruments that can be used as a benchmark, which is a prerequisite of this measurement property. A complete overview of the study characteristics is shown in Table III.

Table II.

Patient-related outcome instruments included in the review8-14,16-20

Abbreviation Full name Original author
PRWE Patient-Rated Wrist Evaluation MacDermid9
DASH Disabilities of Arm, Shoulder and Hand Hudak8
MHQ Michigan Hand Questionnaire Chung11
SF-36 Short Form-36 Ware12
PEM Patient Evaluation Measure Macey10
AIMS2 Arthritis Impact Measurement Scale Meenan14
BWH-CTQ Brigham and Women’s Hospital Carpal Tunnel Questionnaire Levine13
IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire Lips16
PFW Patient Focused Wrist Outcome Instrument Bialocerkowski17
TSK Tampa Scale of Kinesophobia Kori18
CAT Catastrophizing Subscale of the Coping Strategies Questionnaire Rosenstiel19
SES Self-Efficacy Scale Altmaier20

Table III.

Study characteristics32-49

MeasurementInstrument Study n Mean age
Gender
Country Language
(range or sd) Male (%)
Patient-Rated Wrist Evaluation Gabl37 133 62 (19 to 92) 27 Austria German*
Hemelaers38 44 56 (15) 36 Switzerland German
MacDermid39 36/101 45 (10) / 50 (16) 33 / 31 Canada English*
MacDermid32 59 53 (18) 37 Canada English*
Wilcke35 99 58 (18) 20 Sweden Swedish
Lovgren34 16 52 (12) 19 Sweden Swedish
Mehta40 50 46 (14) 56 India Hindi
Kim41 63 56 (19 to 83) 27 Rep. Korea Korean
Schonnemann42 60/29 55 (19 to 86) 27 Denmark Danish
Walenkamp43 102 59 (48 to 66) 30 Netherlands Dutch
Disabilities of Arm, Shoulder and Hand Macdermid32 59 53 (18) 37 Canada English*
Westphal36 107 59 (17 to 84) 27 Germany German
Westphal44 72 60 (16) 29 Germany German
Lovgren34 16 52 (12) 19 Sweden Swedish
Michigan Hand Questionnaire Kotsis45 47 / 37 48 (17) / 51(16) 32 / 38 USA English
Shauver46 51 50 (19 to 83) 37 USA English
Waljee47 128 61 (9) 27 USA/UK English*
Short Form-36 Amadio33 21 57 (14 to 84) 14 USA English*
MacDermid32 59 53 (18) 37 Canada English*
Patient Evaluation Measure Forward48 200 54 (24 to 80) 36 UK English*
Arthritis Impact Measurement Scale2 Amadio33 21 57 (14 to 84) 14 USA English*
Brigham and Women’s Hospital Carpal Tunnel Questionnaire Amadio33 21 57 (14 to 84) 14 USA English*
International Osteoporosis Foundation Wrist Fracture Questionnaire Lips16 105 63 (8) 12 UK/NL/Ita/BE English/Dutch/Italian*
Patient Focused Wrist Outcome Instrument Bialocerkowski49 26 62 (22 to 84) 15 Australia English
Tampa Scale of Kinesophobia Lovgren34 16 52 (12) 19 Sweden Swedish
Catastrophizing Subscale of the Coping Strategies Questionnaire Lovgren34 16 52 (12) 19 Sweden Swedish
Self-Efficacy Scale Lovgren34 16 52 (12) 19 Sweden Swedish
  1. *

    It can be deduced as per the COnsensus-based Standards for the selection of health Measurement INstruments guidelines, often the country in which the study is performed and the language version of the measurement instrument that was used are not mentioned explicitly, but can be deduced from the affiliation of the authors

Of all PROMs, the PRWE has been studied most extensively, followed by the DASH. The eight studies evaluating the PRWE assessed almost all measurement properties: seven of the nine (Table IV). However, overall, the methodological quality of these studies was low, varying from poor to fair for internal consistency, reliability, measurement error, cross-cultural validity and responsiveness, and varying from poor to good for content validity and hypothesis testing. Interpretability was also assessed, but these studies were of poor methodological quality.

Table IV.

Summary of methodological quality of the studies on measurement properties of the PRWE and DASH32-44

PRWE37 PRWE38 PRWE39 PRWE32 PRWE35 PRWE34 PRWE40 PRWE42 PRWE41 PRWE43 DASH32 DASH36 DASH44 DASH34
Generalisability Fair Fair Fair Poor Fair Excel Poor Fair Fair Fair Poor Fair Good Excel
Internal Consistency Poor Poor Fair Poor Poor Poor Poor Poor Poor Poor Poor
Reliability Fair Poor Fair Poor Fair Poor Fair Fair Poor
Measurement Error Fair Poor
Content validity Fair Good
Structural validity
Hypotheses testing Fair Good Fair Fair Poor Fair
Cross-cultural Fair Poor Poor Poor
Criterion validity
Responsiveness Fair Fair Fair Fair Fair Poor Fair Fair
Interpretability Poor Poor Poor Poor
  1. A full overview of all the scores are shown in the supplementary material

The four studies evaluating the DASH32,34-36 assessed less than half of the measurement properties: four of nine. The methodological quality of these studies was generally low, varying from persistently poor for internal consistency, poor to fair for reliability, and consistently fair for responsiveness. Measurement error, content validity, hypothesis testing, cross-cultural validity and interpretability were not assessed in any of the studies (Table IV).

Of the other ten PROMs, one to three measurement properties were assessed. These concerned mostly internal consistency, reliability and responsiveness. Overall, the methodological quality of these clinimetric studies was at best poor to fair (Table V). This is mainly due to the low sample size in the majority of these studies but can also be secondarily attributed to the high amount of items that were scored as “not applicable”. Finally, the lack of description surrounding the statistical methods that were used also contributed to the poor rating.

Table V.

Summary of methodological quality of the studies on measurement properties of the other measurement instruments. A full overview of all the scores are shown in the supplementary material16,32-34,45,46,48,49

MHQ45 MHQ46 MHQ 46 SF-3632 SF-3633 PEM48 IOF16 PFW49 AIMS233 BWH33 TSK34 CAT34 SES34
Generalisability Fair Fair Poor Poor Fair Poor Fair Fair Fair Fair Excellent Excellent Excellent
Internal Consistency Poor Poor Poor Poor Poor
Reliability Poor Poor Poor Poor
Measurement Error
Content validity
Structural validity
Hypotheses testing Poor Poor
Cross-cultural
Criterion validity
Responsiveness Fair Fair Fair Fair Poor Fair Poor Poor Poor
Interpretability Fair
  1. MHQ, Michigan Hand Questionnaire; SF-36, Short Form-36; PEM, Patient Evaluation Measure; AIMS2, Arthritis Impact Measurement Scale; BWH-CTQ, Brigham and Women’s Hospital Carpal Tunnel Questionnaire; IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire; PFW, Patient Focused Wrist Outcome Instrument; TSK, Tampa Scale of Kinesophobia; CAT, Catastrophizing Subscale of the Coping Strategies Questionnaire; SES Self-Efficacy Scale

Level of evidence of the measurement properties per PROM

The synthesis of results per PROM and their accompanying level of evidence are presented in Table VI.

Table VI.

Ratings of measurement properties and interpretability of measurement instruments with level of evidence32-49

PRWE32,34,35,37-43 DASH32,34,36,44 MHQ45-47 SF-3632,33 PEM48 AIMS233 BWH33 IOF33 PFW49 TSK34 CAT34 SES34
Reliability
Internal consistency + ? ? ? ? ? ?
Cronbach’s alpha 0.89 to 0.97 0.93 to 0.98 0,94 0.96 0.68 to 0.82 0.88 to 0.97 0.79 to 0.95
Reliability ++ + ? ? ? ?
Intraclass correlation cofficient 0.81 to 0.97 0.78 to 0.95 NA 0.81 to 0.84 0.85 to 0.89 0.57 to 0.86
Measurement error +
Smallest detectable change 4.4 to 11.0
Validity
Content validity ++
Structural validity
Hypotheses testing ++ + ? +
Comparator instrument DASH Gartland NA NA
Cross-cultural +
Criterion validity
Responsiveness
Responsiveness ++ ++ ++ + ? ? + ?
Standardised response mean NA NA NA NA NA NA NA NA
INTERPRETABILITY
Interpretability ? -
Minimal important change 11.5
  1. + ++ or − − − multiple studies of good quality OR 1 study of excellent quality: strong evidence positive/negative result

  1. + + or − − multiple studies of fair quality OR 1 study of good quality: moderate evidence positive/negative result

  1. + or − 1 study of fair quality: limited evidence positive/negative result

  1. + / − conflicting findings

  1. ? only studies of poor quality: unknown, due to poor methodological quality

  1. NA, not available (not performed or described)

  1. PRWE, Patient-Rated Wrist Evaluation; DASH, Disabilities of Arm, Shoulder and Hand; MHQ, Michigan Hand Questionnaire; SF-36, Short Form-36; PEM, Patient Evaluation Measure; AIMS2, Arthritis Impact Measurement Scale; BWH-CTQ, Brigham and Women’s Hospital Carpal Tunnel Questionnaire; IOF-WFQ International Osteoporosis Foundation Wrist Fracture Questionnaire; PFW, Patient Focused Wrist Outcome Instrument; TSK, Tampa Scale of Kinesophobia; CAT, Catastrophizing Subscale of the Coping Strategies Questionnaire; SES Self-Efficacy Scale

The highest levels of evidence were found for the measurement properties of the PRWE. Nevertheless, the evidence is, at best, limited to moderate. For instance, reliability (assessed in 78% of the studies) ranged from 0.81 to 0.97 (Table VI). Three studies were of poor methodological quality, and four were of fair quality (Table IV). Therefore, the synthesis of these results is that there is moderate evidence supporting good reliability. There is also moderate evidence that the validity (content and hypothesis testing) and responsiveness are good. The evidence is limited in that its internal consistency and cross-cultural validity are good, and its measurement error is acceptable. There is no evidence for its structural and criterion validity. The evidence for the DASH is moderate that its responsiveness is good. The evidence is limited that its reliability and the validity on hypotheses testing are good. There is no evidence for the other measurement properties. The evidence for the other ten PROMs is mainly unknown, since the quality of the studies that evaluated some of the PROM measurement properties (mainly internal consistency, reliability and/or responsiveness) was mainly poor methodologically.

Discussion

The aim of this systematic review was to evaluate the methodological quality of the clinimetric studies that evaluated measurement properties of the available PROMs used in patients with distal radial fractures, and to make recommendations for the selection of PROMs based on the level of evidence of each individual measurement property.

Key findings

The two PROMs that were most extensively evaluated were the PRWE (with seven of nine measurement properties investigated) and the DASH (with four of nine investigated). The methodological quality of these studies ranged at best from poor to good. Therefore, after synthesis of the scores and incorporating the levels of evidence, the quality of these two PROMS is not supported with strong levels of evidence on any of the measurement properties. For the PRWE, there is moderate evidence supporting good reliability, content validity, hypotheses testing and responsiveness. The evidence is only limited in that the measurement error is acceptable and the cross-cultural validity and internal consistency are good. Structural validity and criterion validity were never evaluated, so these lack in evidence. The evidence for interpretability, which is not a measurement property, is unknown, since this was only evaluated in three studies with poor methodological quality. The DASH showed at best moderate evidence for good responsiveness and limited evidence for good hypotheses testing and reliability. All other measurement properties were found to be lacking in evidence.

These findings do not mean that these and other PROMs have poor measurement properties and thus are of poor quality. Since we found that, overall, the measurement properties were good but the methodological quality of the clinimetric studies was low, it does mean that these results may be biased. Therefore, the results of our review do imply that studies of higher methodological quality are needed to properly assess their measurement properties. For instance, many PROMs are translated into multiple languages. The PRWE has been correctly translated into 14 languages, following the translation process described by Beaton et al.50 Nevertheless, we only found cross-cultural validity studies for the Swedish, Hindi, Korean and Danish versions, because the other translated versions were not adequately evaluated. However, our search was limited to English, German and Dutch, so it can be assumed that the cross-cultural validity was evaluated but the results were not published in any of these languages.

Comparison of results with previous literature

Previous reviews described a variety of PROMS measuring wrist and/or hand disorders in general, but not PROMs specific to distal radial fractures. Goldhahn et al25 advise using a combination of a disease-specific PROM (PRWE), an extremity-specific PROM (DASH) and a generic PROM (SF-36). Changulani et al22 compared the measurement properties of four PROMs for wrist and hand disorders. They concluded that the PRWE is the most responsive instrument for evaluating outcomes in patients with a distal radial fracture. These conclusions were drawn before the COSMIN checklist was available. The methodological quality of the clinimetric studies was not taken into account and therefore these results may be biased, especially since in the current review we found that the methodological quality of these studies was, at best, fair. Therefore, we can only conclude that the good responsiveness of the DASH and PRWE is supported by moderate evidence.

Hoang-Kim et al21 assessed the quality of reviews published on currently used PROMs for assessing function of the hand and wrist joints. Although they used COSMIN’s taxonomy, terminology and definitions to define the different measurement properties, they did not systematically review the methodological quality of these studies. Nevertheless, they concluded that the PRWE has good construct validity and responsiveness, and found this to be only slightly better than the DASH for assessing patients with wrist injuries. Based on the results of our review we agree that the PRWE is slightly better investigated than the DASH, but disagree with their rating of “good” on some measurement properties. This difference may be due to the fact that we incorporated the methodological quality of these studies by using the COSMIN checklist instead of only using the COSMIN taxonomy.

Study strengths

To our knowledge, this is the first study that has used the COSMIN checklist to systematically review the methodological quality of studies on the measurement properties of PROMs in the evaluation of treatment of distal radial fractures. Furthermore, the quality of each study was assessed by two independent reviewers, as recommended by the COSMIN group, and a third reviewer in cases of disagreement. Using these methods, we were able to minimise subjective judgement on the outcome. We searched for relevant articles from 1990 onwards, so we consider it unlikely that any relevant PROMs were missed. This is especially true since most PROMs were developed after 1990. Since we found 19 studies eligible from a possible 1508, this shows that our search strategy was very broad and inclusive. Yet, it also demonstrates that the literature on this topic is somewhat lacking. Our search was not just limited to the English language, as both reviewers have a good knowledge of German and Dutch.

Study weaknesses

There were some limitations to this review. As in all reviews, publication bias from unpublished studies may threaten the internal validity as unpublished studies are more likely to report negative or unfavourable results.51 Another limitation of this study was that it was not always clear to the reviewers if specific methodological aspects were not reported or not performed, making it impossible to distinguish between poor study reporting and poor methodological quality. We did not contact the authors of the studies to clarify these issues. It can be assumed that some studies have been executed properly but are not sufficiently well described according to the COSMIN criteria. This may have affected the quality ratings.

The shortcomings of outcome measurement research in distal radial fractures exposed by this review should not be generalised to all clinimetric research in orthopaedic surgery. However, it is known that strong evidence supporting good quality of multiple PROMs for various pathology is lacking52-54 so we advise the reader to be cautious when choosing a PROM based on the results of clinimetric studies without considering their methodological quality.

For future research, we believe that it is especially important to further evaluate the measurement properties and interpretability of the PRWE and DASH outcome measures in higher quality studies. Based on the results of the available clinimetric studies, there is no evidence that these PROMs are not useful in evaluating the treatment of distal radial fractures, and therefore we do not believe that it is necessary to develop new instruments. Currently, based on best available evidence, we recommend using the PRWE or DASH to evaluate the outcome of treatment in patients with distal radial fractures but we cannot stress strongly enough that more clinimetric studies of higher methodological quality are needed in order to more confidently select appropriate PROMs.

According to this systematic review, strong evidence supporting ‘good quality’ of any of the current available PROMs in patients with distal radial fractures is lacking. The evidence that the responsiveness of the PRWE and DASH is good is moderate, as is the evidence for good validity and reliability of the PRWE. We therefore recommend these PROMs in clinical studies in patients with distal radial fractures; however, more clinimetric studies of higher methodological quality are needed to adequately determine their other measurement properties. If the methodological quality of clinimetric studies continues to increase, PROMs can be selected with greater confidence.


Dr Y. V. Kleinlugtenbelt; e-mail:
Author Contributions

Y. V. Kleinlugtenbelt: Design of study; Acquisition, analysis and interpretation of data; Writing and revision of manuscript.

R. W. Nienhuis: Design of study; Acquisition, analysis and interpretation of data; Revision of manuscript.

M. Bhandari: Analysis and interpretation of data; Revision of manuscript.

J. C. Goslings: Analysis and interpretation of data; Revision of manuscript.

R. W. Poolman: Analysis and interpretation of data; Revision of manuscript.

V. A. B. Scholtes: Design of study; Analysis and interpretation of data; Revision of manuscript.


Supplementary material

The full search strategy is provided in supplementary material 1. A full overview of the scores of methodological quality of the studies on measurement properties of all PROMs are shown in Supplementary Tables i to iv.

Funding Statement

None declared.

ICMJE conflict of Interest

M. Bhandari reports personal fees received from Smith & Nephew, Stryker, Amgen, Zimmer, Moximed, Bioventus, Merck, Eli Lilly, Sanofi, Ferring, Conmed, as well as grants from Smith & Nephew, DePuy, Eli Lily, Bioventus, Stryker, Zimmer, Amgen, none of which is related to this article.

References

1 Court-Brown CM , CaesarB. Epidemiology of adult fractures: A review. Injury2006;37:691-697.CrossrefPubMed Google Scholar

2 Alffram PA , BauerGC. Epidemiology of fractures of the forearm. A biomechanical investigation of bone strength. J Bone Joint Surg [Am]1962;44-A:105-114.PubMed Google Scholar

3 Dóczi J , RennerA. Epidemiology of distal radius fractures in Budapest. A retrospective study of 2,241 cases in 1989. Acta Orthop Scand1994;65:432-433.CrossrefPubMed Google Scholar

4 Owen RA , MeltonLJIII, JohnsonKA, IlstrupDM, RiggsBL. Incidence of Colles’ fracture in a North American community. Am J Public Health1982;72:605-607. Google Scholar

5 Wei DH , PoolmanRW, BhandariM, WolfeVM, RosenwasserMP. External fixation versus internal fixation for unstable distal radius fractures: a systematic review and meta-analysis of comparative clinical trials. J Orthop Trauma2012;26:386-394.CrossrefPubMed Google Scholar

6 Darzi L . High Quality Care For All – NHS Next Stage Review Final Report. http://webarchive.nationalarchives.gov.uk/20130107105354/http:/www.dh.gov.uk/prod_consum_dh/groups/dh_digitalassets/@dh/@en/documents/digitalasset/dh_085828.pdf (date last accessed 26 January 2016). Google Scholar

7 Fitzpatrick R , DaveyC, BuxtonMJ, JonesDR. Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess1998;2:i-iv, 1-74.PubMed Google Scholar

8 Hudak PL , AmadioPC, BombardierC. Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand) [corrected]. The Upper Extremity Collaborative Group (UECG). Am J Ind Med1996;29:602-608.CrossrefPubMed Google Scholar

9 MacDermid JC . Development of a scale for patient rating of wrist pain and disability. J Hand Ther1996;9:178-183.CrossrefPubMed Google Scholar

10 Macey AC , BurkeFD, AbbottK, et al.. Outcomes of hand surgery. British Society for Surgery of the Hand. J Hand Surg Br1995;20:841-855.CrossrefPubMed Google Scholar

11 Chung KC , PillsburyMS, WaltersMR, HaywardRA. Reliability and validity testing of the Michigan Hand Outcomes Questionnaire. J Hand Surg Am1998;23:575-587.CrossrefPubMed Google Scholar

12 Ware JE Jr , SherbourneCD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care1992;30:473-483.PubMed Google Scholar

13 Levine DW , SimmonsBP, KorisMJ, et al.. A self-administered questionnaire for the assessment of severity of symptoms and functional status in carpal tunnel syndrome. J Bone Joint Surg [Am]1993;75-A:1585-1592.CrossrefPubMed Google Scholar

14 Meenan RF , MasonJH, AndersonJJ, GuccioneAA, KazisLE. AIMS2. The content and properties of a revised and expanded Arthritis Impact Measurement Scales Health Status Questionnaire. Arthritis Rheum1992;35:1-10.CrossrefPubMed Google Scholar

15 Smith HB . Smith hand function evaluation. Am J Occup Ther1973;27:244-251.PubMed Google Scholar

16 Lips P , CooperC, AgnusdeiD, et al.. Quality of life in patients with vertebral fractures: validation of the Quality of Life Questionnaire of the European Foundation for Osteoporosis (QUALEFFO). Working Party for Quality of Life of the European Foundation for Osteoporosis. Osteoporos Int1999;10:150-160.CrossrefPubMed Google Scholar

17 Bialocerkowski AE , GrimmerKA, BainGI. Development of a patient-focused wrist outcome instrument. Hand Clin2003;19:437-448.CrossrefPubMed Google Scholar

18 Kori S , MillerR, ToddD. Kinesiophobia: a new view of chronic pain behavior. Pain Management1990;3:35-43. Google Scholar

19 Rosenstiel AK , KeefeFJ. The use of coping strategies in chronic low back pain patients: relationship to patient characteristics and current adjustment. Pain1983;17:33-44.CrossrefPubMed Google Scholar

20 Altmaier EM , RussellDW, KaoCF, LehmannTR, WeinsteinJN. Role of self-efficacy in rehabilitation outcome among chronic low back pain patients. J Couns Psychol1993;40:335-391. Google Scholar

21 Hoang-Kim A , PegreffiF, MoroniA, LaddA. Measuring wrist and hand function: common scales and checklists. Injury2011;42:253-258.CrossrefPubMed Google Scholar

22 Changulani M , OkonkwoU, KeswaniT, KalairajahY. Outcome evaluation measures for wrist and hand: which one to choose?Int Orthop2008;32:1-6.CrossrefPubMed Google Scholar

23 Bialocerkowski AE , GrimmerKA, BainGI. A systematic review of the content and quality of wrist outcome instruments. Int J Qual Health Care2000;12:149-157.CrossrefPubMed Google Scholar

24 Schuind FA , MourauxD, RobertC, et al.. Functional and outcome evaluation of the hand and wrist. Hand Clin2003;19:361-369.CrossrefPubMed Google Scholar

25 Goldhahn J , AngstF, SimmenBR. What counts: outcome assessment after distal radius fractures in aged patients. J Orthop Trauma2008;22:S126-S130.CrossrefPubMed Google Scholar

26 Goldhahn J , BeatonD, LaddA, et al.. Recommendation for measuring clinical outcome in distal radius fractures: a core set of domains for standardized reporting in clinical practice and research. Arch Orthop Trauma Surg2014;134:197-205.CrossrefPubMed Google Scholar

27 Mokkink LB , TerweeCB, PatrickDL, et al.. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res2010;19:539-549.CrossrefPubMed Google Scholar

28 Terwee CB , JansmaEP, RiphagenII, de VetHC. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res2009;18:1115-1123.CrossrefPubMed Google Scholar

29 Suk M , HansonBP, NorvellDC, HelfetDL. The AO Handbook of Musculoskeletal Outcomes Measures and Instruments. First edition, Thieme, 2005. Google Scholar

30 Mokkink LB , TerweeCB, PatrickDL, et al.. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol2010;63:737-745.CrossrefPubMed Google Scholar

31 Terwee CB , MokkinkLB, KnolDL, et al.. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res2012;21:651-657.CrossrefPubMed Google Scholar

32 MacDermid JC , RichardsRS, DonnerA, BellamyN, RothJH. Responsiveness of the short form-36, disability of the arm, shoulder, and hand questionnaire, patient-rated wrist evaluation, and physical impairment measurements in evaluating recovery after a distal radius fracture. J Hand Surg Am2000;25:330-340.CrossrefPubMed Google Scholar

33 Amadio PC , SilversteinMD, IlstrupDM, SchleckCD, JensenLM. Outcome after Colles fracture: the relative responsiveness of three questionnaires and physical examination measures. J Hand Surg Am1996;21:781-787.CrossrefPubMed Google Scholar

34 Lövgren A , HellströmK. Reliability and validity of measurement and associations between disability and behavioural factors in patients with Colles’ fracture. Physiother Theory Pract2012;28:188-197. Google Scholar

35 Wilcke MT , AbbaszadeganH, AdolphsonPY. Evaluation of a Swedish version of the patient-rated wrist evaluation outcome questionnaire: good responsiveness, validity, and reliability, in 99 patients recovering from a fracture of the distal radius. Scand J Plast Reconstr Surg Hand Surg2009;43:94-101.CrossrefPubMed Google Scholar

36 Westphal T , PiatekS, SchubertS, SchuschkeT, WincklerS. Reliability and validity of the upper limb DASH questionnaire in patients with distal radius fractures. Z Orthop Ihre Grenzgeb2002;140:447-451. (In German). Google Scholar

37 Gabl M , KrappingerD, AroraR, et al.. Acceptance of patient-related evaluation of wrist function following distal radius fracture (DRF). Handchir Mikrochir Plast Chir2007;39:68-72. (In German). Google Scholar

38 Hemelaers L , AngstF, DrerupS, SimmenBR, Wood-DauphineeS. Reliability and validity of the German version of “the Patient-rated Wrist Evaluation (PRWE)” as an outcome measure of wrist pain and disability in patients with acute distal radius fractures. J Hand Ther2008;21:366-376. Google Scholar

39 MacDermid JC , TurgeonT, RichardsRS, BeadleM, RothJH. Patient rating of wrist pain and disability: a reliable and valid measurement tool. J Orthop Trauma1998;12:577-586.CrossrefPubMed Google Scholar

40 Mehta SP , MhatreB, MacDermidJC, MehtaA. Cross-cultural adaptation and psychometric testing of the Hindi version of the patient-rated wrist evaluation. J Hand Ther2012;25:65-77.CrossrefPubMed Google Scholar

41 Kim JK , KangJS. Evaluation of the Korean version of the patient-rated wrist evaluation. J Hand Ther2013;26:238-243.CrossrefPubMed Google Scholar

42 Schønnemann JO , HansenTB, SøballeK. Translation and validation of the Danish version of the Patient Rated Wrist Evaluation questionnaire. J Plast Surg Hand Surg2013;47:489-492.CrossrefPubMed Google Scholar

43 Walenkamp MM , de Muinck KeizerRJ, GoslingsJC, et al.. The Minimum Clinically Important Difference of the Patient-rated Wrist Evaluation Score for Patients With Distal Radius Fractures. Clin Orthop Relat Res2015;473:3235-3241.CrossrefPubMed Google Scholar

44 Westphal T . Reliability and responsiveness of the German version of the Disabilities of the Arm, Shoulder and Hand questionnaire (DASH). Unfallchirurg2007;110:548-552.(In German.) Google Scholar

45 Kotsis SV , LauFH, ChungKC. Responsiveness of the Michigan Hand Outcomes Questionnaire and physical measurements in outcome studies of distal radius fracture treatment. J Hand Surg Am2007;32:84-90.CrossrefPubMed Google Scholar

46 Shauver MJ , ChungKC. The minimal clinically important difference of the Michigan hand outcomes questionnaire. J Hand Surg Am2009;34:509-514.CrossrefPubMed Google Scholar

47 Waljee JF , KimHM, BurnsPB, ChungKC. Development of a brief, 12-item version of the Michigan Hand Questionnaire. Plast Reconstr Surg2011;128:208-220.CrossrefPubMed Google Scholar

48 Forward DP , SitholeJS, DavisTR. The internal consistency and validity of the Patient Evaluation Measure for outcomes assessment in distal radius fractures. J Hand Surg Eur Vol2007;32:262-267.CrossrefPubMed Google Scholar

49 Bialocerkowski AE , GrimmerKA, BainGI. Validity of the patient-focused wrist outcome instrument: do impairments represent functional ability?Hand Clin2003;19:449-455.CrossrefPubMed Google Scholar

50 Goldhahn J , ShishaT, MacdermidJC, GoldhahnS. Multilingual cross-cultural adaptation of the patient-rated wrist evaluation (PRWE) into Czech, French, Hungarian, Italian, Portuguese (Brazil), Russian and Ukrainian. Arch Orthop Trauma Surg2013;133:589-593.CrossrefPubMed Google Scholar

51 Easterbrook PJ , BerlinJA, GopalanR, MatthewsDR. Publication bias in clinical research. Lancet1991;337:867-72.CrossrefPubMed Google Scholar

52 Green A , LilesC, RushtonA, KyteDG. Measurement properties of patient-reported outcome measures (PROMS) in Patellofemoral Pain Syndrome: a systematic review. Man Ther2014;19:517-526.CrossrefPubMed Google Scholar

53 Grevnerts HT , TerweeCB, KvistJ. The measurement properties of the IKDC-subjective knee form. Knee Surg Sports Traumatol Arthrosc2015;23:3698-3706.CrossrefPubMed Google Scholar

54 Kroman SL , RoosEM, BennellKL, HinmanRS, DobsonF. Measurement properties of performance-based outcome measures to assess physical function in young and middle-aged people known to be at high risk of hip and/or knee osteoarthritis: a systematic review. Osteoarthritis Cartilage2014;22:26-39. Google Scholar