Conference Report: Medical Outcomes Trust Conference Presents Dramatic Advances in Patient-Based Outcomes Assessment and Potential Applications in Accreditation Louise Kaegi, MA |
|||||||||||||||
| Louise Kaegi, MA, is Consulting Editor, The Joint Commission Journal on Quality Improvement, and Executive Editor, Joint Commission Benchmark; phone 630/792-5433; fax 630/792-4433; e-mail lkaegi@jcaho.org. Reprinted with the permission of JCAHO. | Clinicians, quality professionals, clinical researchers, and health services managers at the Medical Outcomes Trust's fourth annual conference considered widespread applications of technologic achievements in patient-based outcomes assessment.
* Dr Tarlov, who serves as an unpaid volunteer president, is executive director of the Health Institute at the New England Medical Center (Boston) and holds professorships at Harvard University School of Public Health and Tufts University School of Medicine (Boston). The Medical Outcomes Trust is a nonprofit public service membership organization dedicated to advancing the use of high-quality, standardized patient-based measures of health and functioning to improve health care outcomes. The Trust's Scientific Advisory Committee evaluates and approves health status assessment instruments, and it distributes, royalty free, generic and condition-specific outcome measures and manuals for their use, scoring, and interpretation. It also maintains a Projects Registry of ongoing health outcomes projects using Trust-approved instruments. Further information is available at www.outcomes-trust.org. Keynote Address: Future Directions in Health Status Assessment In his keynote address, "Future Directions in Health Status Assessment," John E. Ware, Jr, PhD, of QualityMetric (Lincoln, RI) and The Health Institute (New England Medical Center, Boston), identified what needs to happen next in order to put health care outcomes as defined by the patient into the databases used in medical decision making. Patient-based health assessment has proved to have value across three major applications:
He also described the new methodology known as Dynamic Health Assessment (DynHA), which uses a computerized interactive process to select questions specifically for each individual patient from pools of hundreds of questions from many surveys, a technique that yields a brief but more precise health status assessment.* * On its Web site (www.qmetric.com), QualityMetric announces these plans and details on availability of the new survey technology. Issues in Measurement Dr Ware acknowledged that practical issues have come to be very important, noting that health status was not in the HEDIS (Health Plan Employer Data and Information Set; National Committee for Quality Assurance, Washington, DC) measures until health status could be measured with a one-page questionnaire. "Short forms are wonderful," he avowed, "but they're not up to the next generation of assessments." Their chief limitation is the very thing that makes them popular--their brevity. This is a disadvantage, explained Dr Ware, when it comes to another important requirement--precision. To manage the risk and monitor the change in health for an individual patient (as opposed to a population), a much higher degree of precision is required. One of the biggest problems with health measures up until now, observed Dr Ware, is that we measure health basically at the bottom of the population distribution and not at the top, where most people are or where we want to restore most people to. So we really don't have a good conceptualization of measurement of health for purposes of monitoring outcomes. "We also have to remind ourselves that we have different goals for health for different segments of the population and different circumstances and that we need a measurement system for each of those segments."* That means that
Standardizing Health Status Measures Dr Ware described a major recalibration project under way in which the SF-36 and SF-12 instruments from the Medical Outcomes Study (MOS) will offer new norm-based scoring. The average adult in the United States is assigned a score of 50 (on a 100-point scale) in physical health according to the physical component summary (PCS) measure.* The average well person with the same sociodemographic profile but no chronic disease will score one-half of a standard deviation (SD) above that--at 55. Patients with very serious physical morbidities (about 100 conditions on this continuum now) are about 2 SDs below an equivalent well person, and they are about 11/2 SDs below the average person with the same sociodemographic characteristics. The SD is 10 for this particular metric. This permits making comparisons, estimating the burden of disease, and estimating the benefits of treatment. * The SF-36 measures eight health concepts, which are relevant across age, disease, and treatment groups (see Figure 1, p 6). Often health status is assessed using summary scales for the physical component (primarily the first four scales) and the mental component (primarily the second four scales).
Standardizing outcomes across treatments also allows looking at different strategies for spending health care dollars. Treatments most beneficial in terms of restoring patients to normal functional health can be rank ordered. A new hip, a new knee, a new heart, a new heart valve, and a new kidney have been linked to 1 to 11/2 SD average improvementa huge effect. A number of other therapies, such as medications for arthritis, for asthma, and for migraines, are shown to bring a substantial improvement. Norming and standardizing offers a new way to score and interpret measures, a development that came from trying to help clinicians understand profiles, explained Dr Ware. It used to be that the scales reflected the highly variable ceilings and floors of the measures, making the health profile look like a mountain range with peaks and valleys. When clinicians asked for some other way to illustrate the normed scores so that they wouldnt have to memorize all those means and SDs, what we did is basically what has been done in psychological testing for a hundred years: We just scored all those scales to have the same mean and the same standard deviation. Any time somebody scores above 50 it is above the mean, and every point is 1/10 of an SD because the SD is 10. This makes it easy to grasp the burden of the disease by just looking at how much of the profile is below the norm and see the benefits of treatment visually with norm-based scoring in a before-and-after outcomes research project (Figure 1, p 6). In contrast to norming for a population, clinical practice requires outcomes monitoring efforts that yield precise scores for each individual patient. We want to look at the patients own norm and from that to understand whats going on in care and services, and we want narrow confidence intervals of 95% around score estimates for each patient, explained Dr Ware. Short-form measures do not give us narrow enough confidence intervals, he asserted. Small differences in reliability have huge implications for confidence intervals for an individual patient. Dr Ware said that what was once thought an impossible ideal is now possiblethat is, a real evenly marked ruler, based on a mean at 50, that measures health throughout the entire range with a great deal of precision at any level. Questionnaire item responses can now be located on a ruler in such a way that the results are reproducible. This can be seen by comparing the scoring on a single ruler for different questionnaires that evoke responses down to the floor and up to the ceiling (see Figures 24, for different items on physical functioning rulers, pp 7-8). Used alone, such instruments as the widely used Nottingham Health Profile (NHP), the Sickness Impact Profile (SIP), and the activities of daily living (ADL) are useful in many clinical circumstances, they are not precise outcomes measures for people who score high or have moved up from a low to a high score. Advances in Psychometrics
Choosing Hardware and Software for Health Status Assessment Nelda Johnson, PhD, PharmD, of Outcomes Research & Design, Inc (Chicago), stated that many of those looking for a new computer system take the "what is your price for that (an advertised product)" approach. It is better, however, to tell the vendors in detail what you need and then let them come back to you with ideas. Dr Johnson outlined sample categories with questions to serve as a needs assessment to guide this process (see Sidebar 1, p 9). The next steps are to
Dr Johnson directed her suggestions to an audience with various needs. Very simple software can help with data design and database analysis, she noted. Others need to get data entered faster with a shorter turnaround time. Other possibilities to consider include sending out questionnaires to a fax-back service that can send scores back and using a touch-screen system on which patients punch in their responses. Once the volume of use gets high enough, it will offset the cost of the computer, Dr Johnson pointed out. Low volume may make it more cost-effective to collect data manually, but that method is not good for individual decision making. "I would put in some dummy data and do my own validity testing," she suggested. "In conclusion, communicate with the vendor about what you want to accomplishnot just where you are today, but where you want to be in the future." That well may include an integrated online system where clinical data are integrated with cost and outcomes data. Outcomes in Clinical Trials
Dr Townsend, the moderator of this session, reviewed Subjective Measures in Clinical Trials, by Kate Meaker, MS, of the Center for Drug Evaluation and Research in the U.S. Food and Drug Administration (FDA).* Selection of a patient-based instrument for use in clinical trials should be validated before the trial begins for the desired patient population and must be able to detect a clinically meaningful difference a difference (or change) in a score that is perceived as beneficial or that would warrant a change in the patients management. Historically, the FDA has looked at clinical efficacy (results under ideal circumstances), not clinical effectiveness (results in everyday medical practice), so it is especially concerned about internal validity (for example, does the instrument really measure what it is purported to measure?). That means that people from industry collecting information from clinical trials also have to be concerned about internal validity so that the FDA will allow them to disseminate information they are gathering in clinical trials. Also particularly important in the case of patient-based measurement instruments, with their great susceptibility to some forms of bias, is the use of blinding. Patient-based measurements can be useful in clinical trials if they address relevant questions, are used appropriately, and meet requirements for rigorous science. The FDA has moved a long way, asserted Dr Townsend, but it still has a long way to go in defining what rigorous science is from a humanistic perspective in clinical practice. * Ed note: Ms Meaker was a scheduled speaker who was unable to make it to the meeting. Use of Quality of Life (QOL) Measures in Clinical Trials
Just a good idea? Assigned to you? Youre responsible for measuring outcomes and now you have to? Or are you looking at long-term practice improvement or a quality improvement initiative? Try to figure out the types of activities involved and communicate that to the vendor. What are the logistics? Single site? Single clinic? Single hospital? Whole system where you have to get data from site to site? How many computers do you have? How many do you need? What kind of support is available for those different systems? What do you want your results to look like? Do you want the system to give you back raw data because you have a good information system and because people are going to process it there and they are already set up to do that? Or do you need to get back a final product that looks like reports? Or do you want something in between so you can manipulate some of the aggregate data because you dont fully trust what the final report looks like and you want to be able to play around with the numbers? How will the results be used? Are you going to be preparing final reports and sending them out on a monthly or an annual basis? What exactly is the report going to look like? Do you need good graphics in it? Are you going to be able to do those yourself, or is that something you are going to require of the software? What are the specific requirements for tables? Who are the stakeholders that are going to be involved in selecting this software? Do you need to have somebody on board right away? West Haven-Yale multidimensional pain inventory MOS-SF-36 Dr Osterhaus reported general success in measuring migraine through use of the SF-36 and the Migraine-Specific Quality of Life (MSQOL) Questionnaire. In trials for patients with osteoarthritis (where pain is related to movement), in which QOL and clinical considerations are intertwined, multiple measures are often used, such as the disease-specific QOL measures, the SF-36, and the composite WOMAC (Western Ontario McMaster Osteoarthritis Index; assesses pain, sickness, and functioning) or HAQ (Health Assessment Questionnaire). Chief conclusions about measurement of pain in clinical trials are that it is quite variable between conditions but more consistent within conditions, and the relationship between QOL and pain (which is under review) engages different perspectives (clinical, regulatory, and methodologic). Patient Health-Related Quality of Life (HRQL) Measures in Randomized Clinical Trials of Antiretroviral Therapies for HIV Disease Dennis A. Revicki, PhD, of MEDTAP International (Bethesda, Md), discussed similar methodologic issues. The standard of care now for HIV infection is combination therapy, and HIV disease has changed from an acute to a chronic disease; thus, patient adherence to combination therapy regimens may be related to HRQL. The challenges for measuring HRQL in HIV disease include the following:
Dr Revicki reviewed HIV-diseasespecific instruments that have recently come into use (see Sidebar 3, p 10) and described his work with Albert Wu (Johns Hopkins University, Baltimore) in developing the HIV Health Survey from the MOS items and scales most widely used in clinical trials, with 35 items from across more than 10 domains. Dr Revicki reported generally favorable findings on the psychometric characteristics of HIV-diseasespecific instruments, including the following:
In conclusion, these HRQL instruments assess the combined impact of disease progression and treatment, and they provide health outcomes from the patients perspective. Questions from the audience to the presenters highlighted several key issues on the use of patient-based outcomes measures in clinical trials (see Sidebar 4, p 11). Outcomes Assessment in Specific Diseases Three presentations showed how to put together outcomes measurement for three high-prevalence, high-cost chronic diseases: depression, low back pain, and congestive heart failure (CHF). Depression Twenty years ago in mental health assessment, we turned to professionals, but now we turn to patients, because that is now really the standard for understanding outcomes in mental health, noted Barbara Dickey, PhD, of Harvard Medical School (Boston) and McLean Hospital (Belmont, Mass). The ideal system of assessment
Psychometric analysis has shown, Dr Dickey stated, that very disordered patients can use the SF-36 and give responses just as valid as those of the general public. Many of the instruments most widely used for depression (for example, the Beck Depression Inventory, the Zung Self-Rating Depression Scale) are not full outcomes measurement instruments because they measure only depression and have no up side. Dr Dickey gave high marks to a generic mental health measure known as BASIS-32 (Behavior and Symptom Identification Scale), which she uses along with SF-36 or SF-12. The BASIS-32 instrument is based on 32 specific functions and symptoms taken from the patient perspective and clustered into 5 areas of problems experienced by patients: relation to self and others, depression/anxiety, daily living skills, impulsive/addictive, and psychosis. Widely used in many settings, especially for inpatients, BASIS-32 has strong psychometric properties (benchmark values are not yet published). Low Back Pain in the Older Adult Maura D. Iverson, PT, MPH, SD, of Northeastern University (Boston) and Brigham and Womens Hospital, compared selected outcomes measures for assessing low pack pain (LBP) in adults aged 65 years and older (see Sidebar 5, p 216). Dr Iversen approved of the VAS and the McGill instruments for pain assessment and of the Roland and the Oswestry instruments for disability. Most disease-specific measures of LBP are designed for specific populations (for example, the surgical cohort for the LSS [Lumbar Spinal Stenosis Questionnaire] and the younger working adult for the NASS [North American Spine Society instrument]). In general, Dr Iversen observed, the scales vary in quality and psychometric properties, and no scale is best for all purposes. Special consideration is needed when assessing outcomes in older adults (with particular attention to the length of the survey since fatigue is a factor), the specificity of items (disability and function), and the design of response sets (patients have to read through every item, which can be cumbersome). CHF Disease Management Program
A: No, these were randomized trials so physicians did not have outcomes measures available. Albert Wu has introduced these measures in the Johns Hopkins clinic, and other physicians around the country are trying them out to see how these measures can help with clinical management. Q: What happens if patients have stress, comorbidities, and so forth that can affect their pain threshold? A: This is an issue in pain measurement, but in clinical trials the randomization should take care of this. If you are talking about an individual patient, then those things must be taken into consideration in deciding how to manage pain. Q: While we are measuring QOL in trials, we arent seeing this used much by pharmaceutical companies in promotion and dissemination of their productswhy not? A: There are some examples of doing so, but not a huge number of them. The FDA is trying to come up with guidelines on what quality of evidence would suffice to support a QOL benefit. This will have a lot to do with the demonstrated validity of the measures, evidence of what is a clinically meaningful difference, or clinically significant effect, so that the FDA and external consumers can judge whether there was a clinically significant impact and not just a statistically significant impact. A number of issues related to statistical data, such as missing information and the construction of summary scores, must be addressed. Q: What does the literature show about initial data, baseline data, and dropouts? A: We have found that people with lower scores have a higher probability for dropping out of the study for all kinds of reasonsfor instance, they are more likely to have a clinical event or die. You do have to take this into account in your trial because it could look as though the patients were getting better and better over time when what is really happening is that you are losing from your dataset the sicker people. Outcomes in Accreditation Health policies requiring the use of patient-based outcomes measures will serve as catalysts for implementation in virtually all facets of health care systems today. The major role of accreditation was reflected in a session featuring presentations by the nations key accreditorsthe American Medical Associations American Medical Accreditation Program (AMAP; Chicago), the Joint Commission on Accreditation of Healthcare Organizations (Oakbrook Terrace, Ill), and the National Committee for Quality Assurance (NCQA; Washington, DC)which have been working together since mid-1998 on performance measure identification and development through the Performance Measurement Coordinating Council (PMCC). AMAP William F. Jessee, MD, described AMAPs three-part mission:
AMAPs five components for evaluation are credentials, personal qualifications, environment of care, clinical performance, and patient care results. AMAP currently has 22 standards, of which 12 are required and 10 are supplemental. To be accredited, a physician must meet all required standards and score 11 out of a possible 22 points. An outcomes standard (Standard 22S), which is still evolving, calls for "current participation in an ongoing process that evaluates clinical performance and/or patient care results of the applicant and of other physicians." Physicians cannot be accredited through AMAP if they do not practice in some setting in which they can be evaluated by peers. Some 4,000 applicants had applied to date, reported Dr Jessee. A study of 2,500 of these physicians showed that 98% were board certified and that 75% to 80% were getting accredited the first time around. AMAP supports the development and testing of performance measures and the standardization of core measures. Dr Jessee likened the current state of outcomes measurement to the state of disease nosology before the advent of the ICD (International Classification of Diseases) system. Getting some standardization for some measures would be a step forward. In addition to ongoing efforts such as work on the PMCC, AMAP has a great potential for forming a strong private-sector network for standard setting and quality improvement activity, concluded Dr Jessee. The Joint Commission The introduction of outcomes measures is fundamental to what all of us are about, and that is to drive the quality improvement process, observed Dennis OLeary, MD. If you dont measure, you wont know what needs to be improved and you will have no ability to set priorities. Further, once you have intervened you wont know whether you have succeeded unless you have an ability to measure, he cautioned, stressing the linkage between standards and measurement. The Joint Commission believes that outcomes and other performance measures must be integral parts of the accreditation process, explained Dr OLeary. Performance measurement should not be a stand alone activity; rather, performance data should literally drive the accreditation process and our expectations of organizations. Performance data should be telling us where to look in evaluating organization performance, he added. Thus, organizations should be using data to meet standards expectations and eventually be able to provide tangible real-world proof that their outcomes have improved. Thats where the public expectations and our expectations are going, asserted Dr OLeary. Addressing some other issues in the outcomes and performance measurement arenas, he noted that investment has been virtually nil in development of ways to translate data into actionable information. Citing the Joint Commissions experience since launching its Agenda for Change in 1986, Dr OLeary reviewed the requirements of a sound methodology for developing performance measures that included rigorous field testing. Another issue facing the outcomes and performance measurement initiative is the effect of presenting a new cost in an already cost-restrained environment. Everyone is talking about the importance of performance measurement, but no one is putting money on the table for it. Were not close to having what is going to be needed to deploy well-tested measures out in the field, noted Dr OLeary. He concluded with an update on the milestones reached and the time lines to come for the Joint Commissions ORYX initiative. A two-stage process, based on sound reality testing, is under way to phase in the Joint Commissions seven accreditation programs, starting with hospitals and long term care organizations. The goal is to get everyone on board by implementing minimal measure requirements, and then make the train go faster by introducing core measures and gathering true comparative data. It also became clear to us that we would have to work together with others to bring some coherence to standardizing core measures, noted Dr OLeary in commenting on the formation of the collaborative PMCC. If we are not successful with this, we will have a continuing Tower of Babel. Success will be marked by the ability to demonstrate that performance measurement really has led to definitive improvements in the quality of health care and that the information yielded from this effort has really helped consumers inform their health care decisions, Dr OLeary concluded. NCQA
Disability and Impairment Measures Disease-Specific Measures *LBP; low luck pain. Problems arising in connection with use of outcomes in health plans include the following:
This means that any dataset, such as HEDIS, reflects only the state of the art existing at the time it was developed, but it improves incrementally. Dr Sennett described a new evaluation system under development at NCQA which may address some of the limits that have impeded progress to date. NCQAs results-driven accreditation, planned for July 1999, is intended to combine performance data with on-site review to create a more complete, robust, and consumer-friendly picture of a health plan. NCQA uses the HEDIS measures to define the areas in which performance matters and then gives a score depending on results (not just improvement over time). The new evaluation mechanism will use AHCPRs CAHPS (Consumer Assessments of Health Plans Study) 2.0H clinical performance data and member satisfaction data, which together will comprise 25% of the overall accreditation score; the remaining 75% will include health plan systems data. Outcomes in Policy and Practice: Health of Seniors (HOS)/Medicare Health Outcomes Survey* Health of Seniors (HOS) is a precedent-setting initiative that integrates the use of patient-based outcomes measures into the process of accrediting managed care organizations that serve an elderly Medicare population. Sharon Sokoloff, PhD, of the Medical Outcomes Trust, reviewed the goals and development of the program. HOS is the first patient-based outcomes measure to assess the quality of care provided to the Medicare population (ages 65 and older) in managed care organizations (MCOs). HCFA requires MCOs serving the Medicare population to collect this information. HOS is a part of HEDIS 3.0, NCQAs set of standardized performance measures providing standardized information to assess and compare the performance of MCOs. A global, longitudinal, self-administered outcomes measure based on the SF-36 will track outcomes using two of its summary scores, the physical component summary (PCS) and the mental component summary (MCS). An outcomes score will be calculated to indicate the percentage of Medicare plan members who got better during the two-year period, those who stayed the same, and those who got worse. MCOs will be compared on the basis of this two-year change score. Other features of the HOS project include case-mix adjusters and several subscale scores potentially useful for focusing improvement activity. *Ed note: Now called the Medicare Health Outcomes Survey, the questionnaire is scheduled to go out with the wave of Medicare managed health care surveys in March 1999. The first cohort1,000 randomly sampled Medicare beneficiaries from each of 268 Medicare Managed Care Plans (questionnaire mailed to 279,135 and completed by 167,093, or 60% response rate)was surveyed in 1998 and will be resurveyed in spring 2000. Cohort two will be administered beginning in March 1999. All managed care plans with Medicare+Choice contracts are participating. Source: www.hcfa.gov/quality/qlty-3e.htm (Jan 27, 1999). Closing Remarks Closing the meeting, Dr Tarlov reflected on major technologic innovations and practical applications of outcomes measures, with a forward look at the Trusts next state-of-the-art conference (Baltimore, October 25, 1999). All of us have been excited to learn of the new developments in norm-based scoring, which will help in standardization across measures, facilitate the development of benchmarks for quality improvement, and allow comparison of scores among groups, plans, diseases, and populations, remarked Dr Tarlov. Using dynamic systems for assessing health has other benefits, too, he noted: It shortens respondent time, elevates precision, enhances cost-effectiveness, and allows health status assessment to be applied routinely in the monitoring of individual patients. As for practical applications, continued Dr Tarlov, new therapeutic approaches using outcomes data from clinical trials in migraine, osteoarthritis, and HIV infection demonstrated just how powerful assessment of functioning and well-being can be in seeking added effectiveness of new treatments. Use of outcomes assessment in evaluating treatment effects in depression, low back pain, and congestive heart failure, and effectiveness in seniors provided all we need to know to build confidence in the measures as an important tool in disease management. Dr Tarlov reminded the group of the significance of using outcomes assessment in accreditation, as described by the Joint Commission, the NCQA, and the AMA. Their formation of a coordinating council (the PMCC) to seek standardization to reduce duplication of effort on the part of physicians, plans, hospitals, and other institutions promises to improve care and ease burden at the same time, he predicted. Finally, all the innovations and works in progress presented at the meeting will facilitate achievement of the ultimate purpose of health care, to preserve thehealth and improve the well-being of all patients, concluded Dr Tarlov.
|
||||||||||||||