| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Research Reports |
D.L. Hart, PT, PhD, is Director of Consulting and Research, Focus On Therapeutic Outcomes, Inc, PO Box 11444, Knoxville, TN 37939 (USA).
M.W. Werneke, PT, MS, Dip MDT, is Physical Therapist, Spine Rehabilitation at CentraState Medical Center, Freehold, New Jersey.
S.Z. George, PT, PhD, is Associate Professor, Department of Physical Therapy, Center for Pain Research and Behavioral Health, Brooks Center for Rehabilitation Studies, University of Florida, Gainesville, Florida.
J.W. Matheson, PT, DPT, MS, SCS, OCS, CSCS, is Physical Therapist, Minnesota Sport and Spine Rehabilitation, Burnsville, Minnesota.
Y.-C. Wang, OT, PhD, is Research Assistant, Focus On Therapeutic Outcomes, Inc, Knoxville, Tennessee, and Postdoctoral Fellow, Rehabilitation Institute of Chicago, Chicago, Illinois.
K.F. Cook, PhD, is Research Associate Professor, Department of Rehabilitation Medicine, University of Washington, Seattle, Washington.
J.E. Mioduski, MS, is Programmer, Focus On Therapeutic Outcomes, Inc, Knoxville, Tennessee.
S.W. Choi, PhD, is Research Assistant Professor, Department of Medical Social Sciences and Center on Outcomes, Research and Education, Feinberg School of Medicine, Northwestern University, Chicago, Illinois.
Address all correspondence to Dr Hart at: hart{at}fotoinc.com
Submitted July 24, 2008;
Accepted April 10, 2009
Objective: The purpose of this study was to develop efficient yet accurate screening methods for identifying elevated levels of fear-avoidance beliefs regarding work or physical activities in people receiving outpatient rehabilitation.
Design: A secondary analysis of data collected prospectively from people with a variety of common neuromusculoskeletal diagnoses was conducted.
Methods: Intake Fear-Avoidance Beliefs Questionnaire (FABQ) data were collected from 17,804 people who had common neuromusculoskeletal conditions and were receiving outpatient rehabilitation in 121 clinics in 26 states (in the United States). Item response theory (IRT) methods were used to analyze the FABQ data, with particular emphasis on differential item functioning among clinically logical groups of subjects, and to identify screening items. The accuracy of screening items for identifying subjects with elevated levels of fear was assessed with receiver operating characteristic analyses.
Results: Three items for fear of physical activities and 10 items for fear of work activities represented unidimensional scales with adequate IRT model fit. Differential item functioning was negligible for variables known to affect functional status outcomes: sex, age, symptom acuity, surgical history, pain intensity, condition severity, and impairment. Items that provided maximum information at the median for the FABQ scales were selected as screening items to dichotomize subjects by high versus low levels of fear. The accuracy of the screening items was supported for both scales.
Limitations: This study represents a retrospective analysis, which should be replicated using prospective designs. Future prospective studies should assess the reliability and validity of using one FABQ item to screen people for high levels of fear-avoidance beliefs.
Conclusions: The lack of differential item functioning in the FABQ scales in the sample tested in this study suggested that FABQ screening could be useful in routine clinical practice and allowed the development of single-item screening for fear-avoidance beliefs that accurately identified subjects with elevated levels of fear. Because screening was accurate and efficient, single IRT-based FABQ screening items are recommended to facilitate improved evaluation and care of heterogeneous populations of people receiving outpatient rehabilitation.
|
|
|---|
On the basis of theories of fear and avoidance of activities, Waddell et al4 developed the Fear-Avoidance Beliefs Questionnaire (FABQ) to assess the association between fear-avoidance beliefs and work disability for people with chronic low back pain syndromes. The FABQ is a self-report questionnaire with 2 scales: 1 assessing fear-avoidance beliefs regarding work activities (FABQ-W) and 1 assessing fear-avoidance beliefs regarding physical activities (FABQ-PA).4 Evidence supported an association between fear-avoidance beliefs regarding work and absence from work because of low back pain.4 Thus, Waddell et al4 recommended that clinicians consider screening for fear-avoidance beliefs when managing low back pain. Subsequent studies indicated that elevated levels of fear were associated with10,11 and were predictive of12,13 disability and absence from work in people with low back and cervical spine pain syndromes. There is evidence that identifying people with elevated levels of fear-avoidance beliefs and managing those beliefs accordingly may reduce fear and predict or improve outcomes.1,4,5,10,12–22
Fear-avoidance beliefs may affect people with conditions other than low back pain. Evidence5,23 supported the possible existence of fear-avoidance beliefs or pain-related fear in people who have other impairments or who may not have pain, perhaps because of learned behavior after previous painful episodes or misconceptions about pain.24 Pain-related fear scales, including the FABQ scales, have been used to assess the levels of fear in people with acute16 and chronic4,10,11 low back pain syndromes, cervical spine pain syndromes,11,25–27 cervical spine and shoulder28 pain syndromes, hip impairments,29 knee impairments,29–31 chronic headache,32 fibromyalgia,33 and chronic fatigue syndrome.33 It is reasonable to believe that pain-related fear would be applicable to people with other conditions including, but not limited to, osteoarthritis,34 knee impairments,30,31 and neuropathic pain.35 These studies suggested that pain-related fear is not uncommon in people with a wide variety of neuromusculoskeletal conditions, with and without pain, and another study reported the prevalence of elevated levels of fear-avoidance beliefs to be more than 40% in specific samples.36
George17 described several screening methods designed to identify people with elevated levels of fear, including the FABQ. Despite the availability of these methods, therapists do not routinely screen for elevated levels of fear, a fact that may be attributable partly to the burden of collecting data or the difficulty in interpreting measures. In response to these concerns, George17 challenged clinicians and researchers to refine screening techniques by making them more efficient and accurate to try to improve acceptance and clinical use. Developing efficient and accurate screening methods is particularly important for therapists assuming first-contact roles in patient care,37 who need to identify confounding conditions that could reduce the effectiveness of their management strategies in diverse patient populations.38 Screening results indicating elevated levels of fear would alert therapists to the likelihood that patients might be fearful of activities that might be part of their therapeutic interventions; such a situation might portend worse outcomes.39 Because short tests commonly are associated with increased measurement error,40 definitive testing often is recommended to confirm the presence of the condition.41 Given that there is preliminary evidence of effective interventions for people with elevated levels of pain-related fear,19,42 the challenge appears to be relevant to improved patient care and outcomes.
In an effort to minimize the measurement error related to short tests, some authors have recommended modern psychometric techniques, such as item response theory (IRT) methods.43 Such methods are useful for assessing patient-report screening surveys because they facilitate both the evaluation of whether items mean the same thing to different respondents (ie, differential item functioning [DIF], described in the Method section)44 and the identification of screening items by use of item information functions (described in the Method section).45 The absence of DIF is important if FABQ scales are to be used to screen diverse populations for elevated levels of fear-avoidance beliefs. The use of item information functions facilitates the selection of single screening items associated with the lowest measurement error related to a given level of fear.45
The overall purpose of this study was to develop an efficient yet accurate screening method for identifying people who have elevated levels of fear-avoidance beliefs regarding work or physical activities and who are receiving outpatient rehabilitation. The specific purposes were: (1) to use IRT methods to analyze FABQ items, with particular emphasis on DIF among clinically logical groups of people and identify screening items for each FABQ scale and (2) to assess the accuracy of screening items for identifying people with elevated levels of fear-avoidance beliefs. If the results suggest that screening items can identify people with elevated levels of fear accurately, then more-precise fear-avoidance testing could be initiated or management strategies could be used to reduce fear and improve outcomes. In addition, accurate screening would reduce costly testing of people not likely to be at risk of having elevated levels of fear.
|
|
|---|
Setting and Participants
We analyzed data from 17,804 people (a sample of convenience) treated for common neuromusculoskeletal conditions in 121 outpatient rehabilitation clinics in 26 states (in the United States) between May 2002 and December 2006 (Tab. 1). Clinics were participating with Focus On Therapeutic Outcomes, Inc (Knoxville, Tennessee), an international medical rehabilitation database management company.46,47 People were selected from the database of Focus On Therapeutic Outcomes because they had answered the FABQ for physical or work activities (see below): 16,243 people had answered the FABQ for physical activities, 5,517 people had answered the FABQ for work activities, and 3,956 people had answered both the work activity and the physical activity surveys. Although diagnostic information was available for only 68% of the people, the most prevalent groupings of ICD-9-CM codes48 were related to soft-tissue disorders of muscle, synovium, tendon, or bursa (ICD-9-CM codes 725–729; 25% of people) and pathologies of the spine (ICD-9-CM codes 720–724; 18% of people). Most people were receiving payment benefits from health maintenance organizations (17%), preferred provider organizations (11%), workers compensation (10%), and Medicare Part B (9%). Data on payers were missing for 38% of the people.
|
View this table: [in a new window] |
Table 1. Characteristics of Subjects at Rehabilitation Intake (N=17,804)
|
Fear-Avoidance Beliefs Items
The items in the FABQ describe the relationship between pain and physical activities or work activities; for example: "Physical activity might harm my back" or "I cannot do my normal work with my present pain." For each item, a scale with ratings of 0 to 6 (0="completely disagree," 3="unsure," and 6="completely agree") is used. There are no word descriptors for responses 1, 2, 4, and 5. Responses from 4 items are summed to produce a score representative of the level of fear of physical activities, and responses from 7 items are summed to produce a score representative of the level of fear of work activities.4 Research findings have supported good item internal consistency and reliability and the presence of 2 factors in the FABQ (fear of work activities and fear of physical activities),4 FABQ measure test-retest reliability,4 and an association of fear-avoidance beliefs with absence from work and disability.8,10,11,56 Because of interest in assessing the fear-avoidance beliefs of people receiving outpatient rehabilitation regardless of impairment, 2 items were reworded to eliminate references to the back (Appendix). We believed that the resulting scale was appropriate for anyone with pain or fear of pain, such as the people seeking outpatient rehabilitation.
Data Analyses
Distribution of response choices.
The frequency distribution of responses to each item was evaluated.
IRT analyses.
We used unidimensional IRT methods to analyze the data43,57–59 to determine how well the IRT model fit the data and how well IRT assumptions were met.58 For unidimensional IRT models to be appropriate for analyzing FABQ items, the items must measure only one construct; that is, the scale must be unidimensional.45,59 In addition, the items must be locally independent; that is, any 2 items must not be correlated when the latent trait is fixed.45 We used modern factor analytic methods50–52,60 to investigate unidimensionality and local independence assumptions. The presence of a dominant factor in the FABQ items was assessed with exploratory factor analysis (EFA) and then confirmatory factor analysis (CFA),61 eliminating items with factor loadings of less than 0.40.62 Pairs of items with absolute residual correlations of greater than 0.25 were considered locally dependent.62 All 16 FABQ items were used for the initial IRT analyses because we wanted to test the factor structure of all FABQ items.4 The EFA was used to explore the general structure of the FABQ items without the imposition of a preconceived structure to determine whether 1 or more factors were present in the data. The CFA was used to verify the factor structure once the factors were identified with the EFA.63 The CFA model fit was evaluated with the comparative fit index (CFI),64 the Tucker-Lewis index (TLI),65 and the root-mean-square error of approximation (RMSEA).63,66 The TLI and the CFI range from 0 (poor fit) to 1 (good fit). Values for the CFI and the TLI of greater than 0.90 are indicative of good model fit. Values for the RMSEA of less than 0.08 suggest adequate fit.64
IRT model selection, item information function analysis, and item fit.
We fitted items remaining after unidimensionality and local independence testing to the graded-response IRT model (GRM)67,68 by using PARSCALE software (version 4.1).
,69 The GRM was chosen because it is appropriate for ordered responses (such as FABQ items), it allows item discrimination parameters to vary, and it can be used to estimate ability parameters (theta values) that represent a subject's level of fear. We used PARSCALE software to fit the data to the GRM and to estimate discrimination parameters and category response functions for each item.68,70 Category response functions represent the probability that an examinee will successfully complete a particular response category. The category characteristic curve for each item in a response category is used to estimate the operating characteristic curve for each item, which represents the probability of endorsing a response category for the item at a given subject's ability (theta value).70 The category response functions are resolved into an item location or difficulty parameter and a set of category parameters. Therefore, PARSCALE produces an item discrimination parameter, an item difficulty parameter, and a set of category parameters for each item. PARSCALE estimates a subject's level of fear (theta value), category characteristic curves and parameters, and item difficulty parameters, all of which are placed on the same normal (
=0, SD=1) fear metric in logits.67
PARSCALE (with the GRM) also estimates item information functions, which quantify the capability of a given item to adequately estimate a subject's ability across the fear-avoidance scale range.45,70 An item information function describes each item's contribution to overall test precision. The sum of the item information functions defines the ideal precision of the test (ie, test information function) at a given ability, facilitating evaluation of the expected standard error. The standard error of a subject's ability estimate is inversely proportional to the test information function: SE=1/square root of the test information function. For samples with an observed variance of 1, a standard error of less than 0.23 is comparable to a reliability of greater than 0.95 (reliability=1–SE2).71,72
Item discrimination parameters and operating characteristic curves were assessed to determine how well the items were modeled with the GRM. Because there is no recognized best way to assess the fit of data to the GRM, particularly for samples exceeding 1,500,71 we used 3 basic approaches to assess the fit of our data to the GRM. First, we assessed empirical operating characteristic curves to ensure that they progressed from less difficult to more difficult along the fear-avoidance axis and that each curve reached a maximum at a unique interval of the scale.67,70 Second, we assessed item discrimination parameters (ie, slopes) for an estimation of the discrimination power for each item. Items with larger discrimination parameters (higher slopes) differentiate subjects with fear levels varying over the range of theta values appropriate for the item better than do items with lower slopes; therefore, items with slopes of greater than 0.70 are preferred.62 Third, we assessed theoretical versus empirical operating characteristic curves for a qualitative determination of the fit of items to the GRM. Visual inspection of empirical operating characteristic curves can reveal the extent and nature of item fit or misfit.
DIF.
The remaining items were assessed for DIF by selection of clinically logical groups of subjects: sex (male/female), surgical history (yes/no), acuity of symptoms (number of calendar days between date of onset of symptoms and date of initial evaluation: acute=21 days or less, subacute=22–<90 days, and chronic =90 days or more), age group (18–<45, 45–<65, 65–<75, and 75 years or older), number of comorbidities (0, 1, 2, 3, or more), pain intensity (below median, median, or above median, as indicated with a numeric rating of 0 ["no pain"] to 10 ["pain as bad as it can be"]), and impairment grouping (Tab. 1). Differential item functioning is present when the relationship between item responses and the trait measured by the test differs systematically between groups of subjects after the subjects underlying abilities are controlled for.44 The variables selected for testing have been shown to affect functional status outcomes.53,54 Associations between fear and these independent variables have only begun to be investigated, and preliminary results suggest the need for further investigation.39
For FABQ measures to be used as screening tools regardless of a subject's impairment, DIF must be absent or negligible in as many independent variables as possible—but most importantly, in impairment. Because confirmable diagnoses for many subjects receiving outpatient rehabilitation often are not available,73 we elected to group subjects by impairment (ie, the problem directing patient management). Differential item functioning testing for impairment grouping was performed in 3 ways. First, all subjects were grouped by general impairment (ie, a medical, neurological, or orthopedic problem). Second, subjects with an orthopedic impairment were grouped by area treated (ie, upper extremity, spine, or lower extremity). Third, subjects with an orthopedic impairment were grouped by specific body part treated within each area treated (upper extremity: shoulder, elbow, and wrist or hand; spine: cervical and lumbar; lower extremity: hip, knee, and foot or ankle).
Each item was assessed for DIF with difwithpar software (version 1.0),
,74–79 which combines IRT calibration estimated by the GRM67 with PARSCALE software69 with multiple ordinal logistic regression models for each item and demographic category by use of Stata software (version 9.2).
,80 Using methods described by Crane et al,75 we evaluated items for the presence of uniform DIF (ie, the interference related to demographic groups between ability and item responses is the same across the entire range measured by the test) by examining the relative difference between beta coefficients in the regression models and nonuniform DIF (ie, the interference varies at different levels of the trait being measured) by comparing the –2 log likelihoods of 2 of the regression models.79 For nonuniform DIF, we used Bonferroni adjustment for
values on the basis of the number of items in the scale. The process is sequential (ie, it starts with one independent variable and progresses to subsequent variables) and iterative (ie, decisions are made at each step during the difwithpar process). For example, when an item was identified with DIF, the software created a new item. Thus, items found to have DIF related to an independent variable, such as sex, were split into 2 new items. For the first new item, responses for women were coded as in the original data set, whereas for men, all responses were set to missing. For the second new item, responses for men were coded as in the original data set, whereas for women, all responses were set to missing. We thus calibrated item parameters independently in the 2 groups for items identified with DIF. Items free of DIF served as anchor items, ensuring that ability estimates (ie, levels of fear) were calibrated on the same metric for the 2 sexes. The presence of possible false-positive or false-negative DIF results was assessed.75
In some samples, particularly large samples, DIF might be detected (significant) but might be of little practical importance.52,78 Therefore, before progressing sequentially to the next variable for DIF assessment, we assessed the correlation between unadjusted ability estimates and DIF-adjusted ability estimates, and we assessed the magnitude of the difference between unadjusted and DIF-adjusted ability estimates. We repeated the entire procedure for surgical history, severity, age group, and impairment grouping.
Screening item selection and accuracy of the screening items.
We wanted to select screening items that provided the most information for the center of the fear continuum. We expected FABQ measures not to be normally distributed14; therefore, we used the median for each fear scale as the measure of central tendency.
For each scale, we examined item information functions and selected 2 screening items that provided the most information (ie, the lowest measurement error) at the median fear level. Using the median, we dichotomized subjects by low versus high levels of fear of physical activities and fear of work activities with the IRT-based theta values estimated from all items for each scale.
We used nonparametric receiver operating characteristic (ROC) curve analyses to quantify the accuracy of the responses to the screening item or items (ie, 1 or 2 screening items per scale) for discriminating subjects with fear levels below the median (low) or above the median (high).81 Such analyses produce plots of sensitivity/(1 – specificity) for the diagnostic test (ie, the screening items). For each ROC, a diagnostic cut score was identified by selecting the item response (or sum of 2 item responses) with the largest average specificity/sensitivity. Positive likelihood ratios (+LRs) and negative likelihood ratios (–LRs)82 and the percentages of subjects correctly identified were produced for each cut score. Positive likelihood ratios were calculated as sensitivity/(1 – specificity), and negative likelihood ratios were calculated as (1 – sensitivity)/specificity.83 Likelihood ratios are summary measures of diagnostic test performance (ie, classification) that indicate how much a given classification will raise or lower the pretest probability of the target disorder of interest (ie, level of fear).83–85 Acceptable +LRs are 2 or higher, and acceptable –LRs are 0.5 or lower because they generate at least small but possibly important changes in the predictive value of the test.85 Areas under the ROC curves, standard errors, and 95% confidence intervals were used to describe the ROC results. To determine whether using 1 versus using 2 screening items was more accurate for discriminating subjects with low versus high levels of fear, we assessed the equality of the area under the curves by using an algorithm suggested by DeLong et al.86
Mapping IRT-based measures to original summative scores.
To assist clinicians in relating new IRT-based FABQ measures to original FABQ summative scores,4 we mapped the new IRT-based FABQ measures to the original FABQ summative scores4 by aggregating the original summative scores by each tenth of a logit of the IRT-based measures. Using the original 0 to 6 item responses,4 we summed the responses for items 2 through 5 (Appendix) to produce a summative score (0–24) for fear of physical activities, and we summed the responses for items 6, 7, 9, 10, 11, 12, and 15 to produce a summative score (0–42) for fear of work activities. At each tenth of a logit for each FABQ-PA and FABQ-W IRT-based theta value, the mean and 95% confidence interval for the original summative scores were calculated.
|
|
|---|
IRT Analyses
The EFA results indicated that a 2-factor solution for the 16 FABQ items (n=3,956, CFI=0.92, TLI=0.97, RMSEA=0.19) fit the data well. Items were loaded on the FABQ-PA and FABQ-W scales originally described by Waddell et al,4 and the 2-factor solution controlled 69.1% of the variance in the data. The 2-factor CFA results supported the presence of 2 factors (CFI=0.93, TLI=0.97, RMSEA=0.18). Items were separated into respective scales (for the 11-item FABQ-W, n=5,517; for the 5-item FABQ-PA, n=16,243), and separate 1-factor CFAs were run.
The CFA results for the 11-item FABQ-W suggested that 2 of the 3 fit statistics supported the fit of the 1-factor solution (CFI=0.94, TLI=0.97, RMSEA=0.25), all items were loaded on 1 factor (loadings of >.75), but the items "My pain was caused by my work or by an accident at work" and "I do not think that I will ever be able to go back to that work" had a residual correlation of –0.26. The former item was deleted because it was associated with the most pairs of items with higher absolute residual correlations, and another CFA was run on the 10 remaining items. The CFA results for the 10-item FABQ-W supported a slightly improved model fit (CFI=0.95, TLI=0.98, RMSEA=0.23), all items were loaded on 1 factor (loadings of 0.68), there was no absolute residual correlation of greater than 0.25, there was 1 residual correlation of less than –0.20, and there was a reduction in absolute residual correlations of greater than 0.10, from 36.4% (11-item scale) to 28.9% (10-item scale). With the exception of the RMSEA, the results supported a unidimensional scale with good local independence.
The CFA results for the 5-item FABQ-PA suggested that 2 of the 3 fit statistics supported the fit of the 1-factor solution (CFI=0.95, TLI=0.93, RMSEA=0.23), all items but 1 were loaded on 1 factor (for the item "My pain was caused by physical activity," the loading was 0.34), and all absolute residual correlations were less than 0.25. This item was deleted, and another CFA was run on the 4 remaining items. The CFA results for the 4-item FABQ-PA suggested a questionably improved model fit (CFI=0.96, TLI=0.95, RMSEA=0.26), all items were loaded on 1 factor (loadings of >0.60), there was no absolute residual correlation of greater than 0.25, there was 1 residual correlation of greater than 0.20, and there was a reduction in absolute residual correlations of greater than .10, from 15.0% (5-item scale) to 8.3% (4-item scale). With the exception of the RMSEA, the results supported a unidimensional scale with good local independence.
IRT Modeling, Item Information Function Analysis, and Item Fit
We fitted items from both FABQ scales separately to the GRM. Initial inspection of the operating characteristic curves demonstrated that there were no distinct maximum values of the item response curves for the second and third as well as the fifth and sixth response categories (ie, responses without word descriptors) for both scales. Therefore, these response categories (ie, second and third as well as fifth and sixth) were collapsed, and the data were refit to the GRM. Subsequent inspection of operating characteristic curves supported an improved shape for each curve, with clear maximum values. One item ("Physical activity makes my pain worse") had a discrimination parameter of less than 0.7 (actual value=0.67) and was deleted. Examination of empirical operating characteristic curve plots suggested that all items fit the GRM. The 3-item FABQ-PA data were refit to the GRM (Tab. 2).
|
View this table: [in a new window] |
Table 2. Fear-Avoidance Belief Item Banks and Item Parameter Estimates
|
![]() View larger version (22K): [in a new window] |
Figure 1. (A) Test information function for fear-avoidance beliefs regarding physical activities. (B) Test information function for fear-avoidance beliefs regarding work activities. Information=test information function.
|
The DIF results for fear of work activities (10-item scale) provided similar results. No items with DIF were identified for the variables sex, age, number of comorbidities, level of pain, overall impairment grouping (medical, neurological, and orthopedic), and orthopedic impairments of the upper or lower extremity. Several items were shown to have nonuniform DIF for the variables symptom severity; surgical history; orthopedic impairment grouping by upper extremity, lower extremity, or spine; and orthopedic impairment grouping by cervical or lumbar spine. However, the unadjusted and adjusted levels of fear were highly correlated (r>.99), and the average differences between the unadjusted and adjusted measures ranged from –.12 to .02—values that represented a range of standard deviations of .01 to .11. Therefore, the identified DIF was considered to be of little practical importance.
Screening Item Selection and Accuracy of the Screening Items
Both the FABQ-PA and the FABQ-W were distributed nonnormally (Shapiro-Wilks W statistics, P<.05). The median for the FABQ-PA was –.07, and the median for the FABQ-W was –.02; these values were used to dichotomize subjects by level of fear.
We identified 2 items (Tab. 2) with the highest slopes as the first 2 screening items per scale: screening item 1 [SHLDNOT—"I should not do physical activities which (might) make my pain worse"] and screening item 2 [CANNOT—"I cannot do physical activities which (might) make my pain worse"] for fear of physical activities and screening item 1 (WRKCANT—"I cannot do my normal work with my present pain") and screening item 2 (WRKSHNT—"I should not do my normal work with my present pain") for fear of work activities. Although WRKSHNT provided slightly more information (
=2.53) than WRKCANT (
=2.52), the WRKCANT item information function provided more information at the median theta value and therefore was selected as the most informative for the work scale at the cut score for high levels of fear.
The ROC results describing the accuracy of using 1 or 2 items to predict subjects with high levels of fear of physical or work activities are shown in Table 3. Although the areas under the ROCs were similar when 1 or 2 screening items were used to identify high levels of fear, the use of 2 items produced larger areas (
2=402.6, df=1, P<.001, for the FABQ-PA;
2=139.6, df=1, P<.001, for the FABQ-W) (Fig. 2). However, because the use of 1 screening item produced strong values for areas under the curves, sensitivity, specificity, +LR, –LR, and percentages of subjects correctly classified and because the addition of a second screening item did not substantially improve all of these values over those obtained with 1 screening item, we decided to use only 1 item (ie, the most informative at the median theta value) as the screening item to identify subjects with high levels of fear for both scales.
|
View this table: [in a new window] |
Table 3. Diagnostic Accuracy of Using 1 or 2 Screening Items to Identify Subjects With High Levels of Fear-Avoidance Beliefs Regarding Physical Activities (PA) or Work Activities (WA)
|
![]() View larger version (15K): [in a new window] |
Figure 2. (A) Receiver operating characteristic (ROC) curves for use of 1 and 2 screening items to identify subjects with high levels of fear of physical activities. (B) ROC curves for use of 1 and 2 screening items to identify subjects with high levels of fear of work activities. One ROC area=1 screening item was used to estimate the area under the ROC curve, Two ROC area=2 screening items were used to estimate the area under the ROC curve.
|
|
View this table: [in a new window] |
Table 4. Cross-Walk Table for Scoring Fear-Avoidance Scales With Item Response Theory (IRT) and Original Summative Methodsa
|
|
|
|---|
The present study was performed in direct response to a challenge to refine current screening techniques for elevated levels of fear-avoidance beliefs, so that people can be accurately and efficiently classified and their conditions can be managed accordingly.17 Because our IRT-based screening requires only 1 item to accurately classify people, the method is efficient. Improved efficiency, that is, a reduced burden of collecting data, may be the catalyst for more widespread screening for fear-avoidance beliefs in routine outpatient therapy and may facilitate concurrent screenings of multiple psychosocial prognostic indicators, such as depression38 and pain-related fear.4 As more therapists assume a first-contact role,37 efficient yet accurate screening of multiple constructs will be developed to meet therapists needs, which will allow rapid identification of people who may require certain types of help as early as possible. The use of IRT methods can facilitate such development because IRT methods are well suited to the development of new scales and the reassessment of existing scales, including the identification of single screening items and the assessment of measurement precision.
The results of the present study indicated that subjects selected the original FABQ item responses4 with word descriptors more than responses without word descriptors. In addition, approximately 4% (floor) and 9% (ceiling) of the subjects selected "completely disagree" and "completely agree" responses for all items of the FABQ-PA scale, respectively; the corresponding values for the FABQ-W scale were 20% (floor) and 3% (ceiling). Therefore, FABQ scores tended not to be normally distributed, and subjects might cluster at the 2 scale extremes; these findings support the results of previous studies14,91 in which medians were used as measures of central tendencies for both FABQ scales to dichotomize subjects.
The results of the factor analyses indicated that the FABQ items were unidimensional, with good local independence, in the original item format4 once the item CAUSED was deleted from the work scale and the item PHYSACTV was deleted from the physical activity scale because they were not loaded strongly on the respective scales. The loss of the item PHYSACTV because of low factor loading is consistent with the reports of Waddell et al4 and Staerkle et al.13 However, beyond the loss of the item PHYSACTV, our results cannot be compared directly with those of Waddell et al4 and Staerkle et al13 because we used factor analyses designed for categorical data, the samples differed in size and diversity, and we edited 2 FABQ items to eliminate references to the back (Appendix).
To our knowledge, no other research group has analyzed FABQ data by using IRT methods, which allowed subjects FABQ responses to be described in probabilistic terms.43,57–59,92 Specifically, operating characteristic curves, which graphically depict the correspondence between the predicted responses to an item and the latent trait,59 demonstrated that subjects did not differentiate the original FABQ responses well for responses with no word descriptors; this finding supports the frequency distribution results and calls into question the use of response categories without word descriptors. When we collapsed the 7 responses to 5, the monotonic nature of the operating characteristic curves was restored; this finding implies that the original response anchoring adds error to the measurement of levels of fear-avoidance beliefs and that subjects may be able to better differentiate among 5 responses, thus supporting recent recommendations.93,94 A good fit of items to the 2-parameter IRT model was obtained with the 10-item work activity scale and the collapsed response choices, but a good item fit with the physical activity scale and the collapsed response choices was obtained only after the deletion of 1 more item for fear of physical activities: WORSE. Therefore, the final IRT-based FABQ scales contained 3 items for physical activities and 10 items for work activities. These results suggest that the FABQ—in its original format of 7 response categories for 4 items in the physical activity scale and 7 items in the work activity scale—could be improved through the use of IRT methods, which some authors suggest are more exacting than the classical test theory method43,58,92 originally used to analyze FABQ data and develop the original FABQ scales.
Once the FABQ data were analyzed with IRT methods, screening items that provided maximum information45 at the median for the FABQ scales could be easily selected. Item response theory methods are ideally suited to this task because plots of item information functions allow identification of the amount of information or discriminating ability of each item at any level of fear. We wanted to develop a test (ie, the screening items) that accurately dichotomized subjects into groups with low versus high levels of fear (ie, high levels of fear are disease positive),83 and selecting screening items that provided maximum information at the median fear level produced strong diagnostic test results.
According to Sackett et al,83 when a test with high specificity is positive, the result effectively rules in the diagnosis. The specificities for the physical activity and work activity scales with 1 screening item were strong, 0.98 and 0.93, respectively. In addition, the +LRs, which can be interpreted as the ratio of true-positive results to false-positive results,84 also were strong. Although the use of 2 screening items dramatically improved the +LR for the physical activity scale, the already strong specificity improved little; in addition, the work activity scale specificity and +LR did not improve appreciably with 2 screening items. The +LR can be interpreted as a cost-to-benefit ratio, in which the rate of true-positive results represents a benefit criterion and the rate of false-positive results represents a cost criterion.84 To minimize unnecessary testing and inappropriate treatment, +LR should be high. Here, we elected to use one screening item per scale, and this method was accurate and efficient and produced high +LRs. Therefore, with the IRT-based fear-avoidance belief scales, a subject who selected the unlabeled response "unsure" or higher for the SHLDNOT screening item on the FABQ-PA scale was about 35 times more likely to have high levels of fear; a subject who selected "unsure" or higher on the WRKCANT triage item on the FABQ-W scale was about 13 times more likely to have high levels of fear. High levels of fear represented FABQ scores higher than the median fear level, which has been associated with poorer functional status outcomes.14,19–21,35,88–90
The original FABQ scales scored with summative methods as described by Waddell et al4 are common. However, summative scoring of categorical data typically produces nonlinear scores, whereas IRT-based measures produce linear scores, as evidenced by the data in Table 4. Summative scores are easy to obtain in clinics, but scores from IRT-based measures require computer technology to obtain. The validity of using parametric statistical techniques for nonlinear summative measures has been questioned.95,96 Clinicians who wish to transform new IRT-based measures to original FABQ summative scores4 can use the cross-walk table. For example, if the IRT-based measure of fear of physical activities were 0.6 logit, then the original summative score, estimated from the cross-walk table, would be 17.5 (95% confidence interval=17.4–17.7)—a value considered to be elevated.
Finally, predictions of elevated levels of fear relate to intake fear. Median intake FABQ scores have been used to classify subjects into groups with high versus low levels of fear.14,19 Findings from these randomized controlled trials14,19 suggested that modifications of management strategies designed to reduce the effects of fear-avoidance beliefs for subjects with elevated levels of fear tend to decrease disability (ie, improve functional status). However, as described by George et al,1 dichotomizing subjects on the basis of a median cut score at intake does not necessarily represent an increased probability of developing chronic symptoms. In addition, FABQ items demonstrated no DIF by level of pain intensity, a finding that could facilitate future studies examining the relationship between pain intensity and activity-related fear. Further studies with longitudinal designs and external criteria are recommended to test the predictive power of the cut scores identified with our data, as well as the use of screening items to assess improvements in functional status or quality of life associated with changes in fear-avoidance beliefs or even improvements in fear-avoidance beliefs as a treatment outcome.
Limitations and Future Studies
The present study is not without limitations. The RMSEA values were higher than desired for assessing the fit of the data to the CFAs. All other CFA fit indexes were strong, as were assessments of the fit of the data to the GRM. High RMSEA values imply that the data do not fit the CFA; therefore, further testing to validate the notion that FABQ items represent unidimensional scales worthy of IRT analyses is recommended. Subject grouping by impairment to assess fear-avoidance belief screening may not be as discriminating as other methods of grouping subjects, including grouping by diagnosis; however, the impairment data appeared to be clinically logical, and obtaining confirmable, reliable, and valid diagnoses for many people seeking outpatient rehabilitation is difficult. Other methods of grouping subjects should be explored.
The present study represents a retrospective analysis of an existing effectiveness database. The researchers had no control over which subjects were asked to complete the FABQ surveys; therefore, the potential for biased results exists. However, because the sample was large, it can be argued that the results represented adequate estimates of the FABQ scores. However, it would prudent to investigate the effect of not all subjects answering the FABQ surveys. Future prospective studies related to the reliability and validity of screening FABQ measures are encouraged. Future studies should consider screening for levels of fear with FABQ cut scores that are not based on a median split and should explore potentially informative associations between clinical variables and other psychosocial factors, including levels of fear, false-positive results, and floor and ceiling effects. The use of screening information by clinicians to modify management and interventions to improve outcomes is encouraged. Because IRT-based screening for elevated levels of fear-avoidance beliefs was accurate, the use of elevated levels of fear-avoidance beliefs as a risk adjustment variable in longitudinal studies of changes in functional status should be explored. Finally, efficient collection of data is facilitated by the use of computers. Exploration of the efficiency and accuracy of combining IRT-based FABQ screening with computerized adaptive testing of functional status is encouraged.
|
|
|---|
|
|
|---|
![]() View larger version (37K): [in a new window] |
Appendix. Modified Fear-Avoidance Beliefs Questionnairea a Modified and reprinted with permission of the International Association for the Study of Pain from: Waddell G, Newton M, Henderson I, et al. A Fear-Avoidance Beliefs Questionnaire (FABQ) and the role of fear-avoidance beliefs in chronic low back pain and disability. Pain. 1993:52:157–168. b Item was modified from the original wording by eliminating references to the back.
|
The Institutional Review Board for the Protection of Human Subjects, Focus On Therapeutic Outcomes, Inc, approved the project.
Part of this research was presented at the International Conference on Outcomes Measurement; September 11–13, 2008; Bethesda, Maryland; and at the Combined Sections Meeting of the American Physical Therapy Association; February 9–12, 2009; Las Vegas, Nevada.
* Focus On Therapeutic Outcomes, Inc, PO Box 11444, Knoxville, TN 37939-1444 (Web site: www.fotoinc.com). ![]()
Scientific Software International Inc, 7383 N Lincoln Ave, Suite 100, Lincolnwood, IL 60712-1747. ![]()
Crane P, Gibbons LE, Jolley L, van Belle G, University of Washington, Seattle, WA, 2005. ![]()
StataCorp LP, 4905 Lakeway Dr, College Station, TX 77845. ![]()
|
|
|---|
This article has been cited by other articles:
![]() |
D. L. Hart, Y.-C. Wang, K. F. Cook, and J. E. Mioduski A Computerized Adaptive Test for Patients With Shoulder Impairments Produced Responsive Measures of Function Physical Therapy, June 1, 2010; 90(6): 928 - 938. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||