Study assesses reliability of PROM score change in spine surgery research

Study author Catharina Parai

Changes in patient-reported outcome measure (PROM) scores must be considerable in order to distinguish a true change from random error in degenerative lumbar spine surgery research, a paper published in the European Spine Journal has found. Authored by Catharina Parai and colleagues (University of Gothenburg; Goteborg; Sweden) the paper was awarded the International Society for the Study of the Lumbar Spine (ISSLS) prize in clinical science for 2020.

Parai and colleagues note that the clinically Minimal Important Change (MIC) must exceed the distribution-based Smallest Detectable Change (SDC) to distinguish a true change in outcome from measurement error. Through the study, they aimed to define the SDC of three common PROMs in degenerative lumbar spine surgery, namely: Numeric Rating Scale (NRS Back/Leg), Oswestry Disability Index (ODI) and Euroqol-5-Dimensions (EQ-5D Index) and to compare them to their MICs.

MIC computations were based on the Swedish spinal register—Swespine—and included patients operated between 1998–2017 (n=98,732). Adults, with either of the three degenerative diagnoses, lumbar disc herniation, lumbar spinal stenosis or degenerative disc disorder, were included in the cohort. The MIC values were obtained in ROC curve analyses, where the transition question Global Assessment (GA) was used as the reference criterion. Values equal or above the MIC consisted of patients having self-assessed as either “pain free” or “much improved” at the one-year follow-up.

Study participants for a retest study, upon which the SDC calculations were based, were collected consecutively at Stockholm Spine Center and Spine Center Göteborg between November 2017 and May 2019, both from the waiting list (pre-op group) and from those followed up one year after surgery (post-op group).

The pre-op group filled out the first booklet (T1) at the clinic on the day they were listed for surgery, with the second booklet (T2) then sent by mail one week later, and the respondents given five days to return the forms. In the post-op group, a request for study participation was added to the one-year Swespine follow-up booklet (T1). One week after the booklet was registered at the Swespine office, the second questionnaire (T2) was sent out by mail, with a request to return the form within five days. Inclusion to the pre-op group stopped as the total number of participants exceeded 30 in all three diagnoses. For the analyses, the pre-op and the post-op groups, as well as the diagnoses, were merged.

The time interval between the two points of estimation, T1 and T2, was within 10−35 days. The difference in PROM score for each participant between T1 and T2 was plotted against the time interval and correlated in Spearman rank analyses to check whether the number of days between T1 and T2 had an influence on the PROM score. The occurrence of systematic differences between T1 and T2 was examined using the Sign test for categorical data and the Wilcoxon’s sign rank test for continuous data.

In total, 248 participants filled out the booklet at T1. Both questionnaires were returned by 182 (74.6%) participants, 83 from the pre-op group and 99 from the post-op group. The time interval between T1 and T2 was 20±8 days.

In terms of results, the study team noted that the number of days between T1 and T2 did not correlate with the PROM scores (Spearman rank correlation coefficient for ODI: −0.07; NRS BACK: 0.06; NRS LEG: −0.08; EQ-5D INDEX: −0.03; GA BACK: 0.045; GA LEG: −0.144; Satisfaction: 0.128).

There were no statistically significant systematic differences between T1 and T2 for any of the PROMs, the study team noted. The data were not found to be heteroscedastic, meaning that the measurement error appeared to be uniform across scale values. MIC calculations were based on the lumbar Swespine register population, stratified for diagnosis. For NRS BACK/LEG and ODI the SDCs exceed the MICs to some extent, Parai and colleagues found.

Comparing the SDCs to the MIC values for the entire lumbar Swespine population, the study team found that the SDC for both NRS scales exceeded the corresponding MICs, while the SDC and MIC were equal for the ODI. There was a considerable gap between the SDC and MIC for the EQ-5D Index.

Parai and colleagues write that the study found that large SDCs, exceed tough MIC cut-off values, for some of the most commonly used PROMs in spine surgery research. The error, they write, was mainly due to a large intra-individual variation between the two test occasions and not to systematic differences.

A consequence of large measurement errors in PROMs, is the need of considerable change in outcome in order to distinguish a random error from true change, the team concluded.


Please enter your comment!
Please enter your name here