July 1999 - Volume 4 - Issue 1 A Publication for Members of Medical Outcomes Trust

Conference Report:

Medical Outcomes Trust Conference Presents Dramatic Advances in Patient-Based Outcomes Assessment and Potential Applications in Accreditation

Louise Kaegi, MA

Louise Kaegi, MA, is Consulting Editor, The Joint Commission Journal on Quality Improvement, and Executive Editor, Joint Commission Benchmark; phone 630/792-5433; fax 630/792-4433; e-mail lkaegi@jcaho.org. Reprinted with the permission of JCAHO. Clinicians, quality professionals, clinical researchers, and health services managers at the Medical Outcomes Trust's fourth annual conference considered widespread applications of technologic achievements in patient-based outcomes assessment.

Conference-at-a-Glance

Background: At its fourth annual State-of-the-Art Health Outcomes Conference, November 2, 1998, the Medical Outcomes Trust (Boston) convened experts to review advances in outcomes assessment technology and potential applications in clinical trials, clinical practice, and accreditation.

Keynote address: "Future Directions in Health Status Assessment" identified what needs to happen next in order to put patient-defined outcomes into the databases used in medical decision making. Advances include a major recalibration of the SF-36 and SF-12 instruments from the Medical Outcomes Study (MOS) offering new norm-based scoring and the new methodology known as Dynamic Health Assessment (DynHA™), which uses a computerized interactive process to select questions to produce a briefer but more precise assessment.

Choosing computer software: A detailed needs assessment should be made and submitted to vendors to identify the best software for outcomes management in a particular organization.

Outcomes in clinical trials: Scientific and regulatory requirements differ between clinical trials and clinical practice, as seen in health status measurement of pain (migraine and osteoarthritis) and in antiretroviral therapies for patients with HIV (human immunodeficiency virus) disease.

Outcomes assessment in specific diseases: Similarities and distinctive challenges are identified in outcomes measurement of depression, low back pain, and congestive heart failure.

Outcomes in accreditation: Efforts are ongoing in integrating outcomes measures into the accreditation process for physicians, health care organizations, and health care plans.

Health of Seniors/Medicare Health Outcomes Survey (HOS): The Health Care Financing Administration is unrolling the first patient-based outcomes measure to assess the quality of care provided to the Medicare population in managed care organizations.

At its Fourth Annual State-of-the Art Health Outcomes Conference, November 2, 1998, the Medical Outcomes Trust of Boston brought together the "outcomes community" of clinicians, quality professionals, clinical researchers, and health services managers to review the current status of patient-based assessment in outcomes management and to consider advances in outcomes assessment technology and potential applications. The Trust's president, Alvin R. Tarlov, MD, announced that the conference was focused on the important high-profile action occurring now in the use of outcomes in the real world of health care, and he called for more explicit consideration by the private sector of its use of outcomes.* The one-day program offered a keynote address, four one-hour sessions on uses of outcomes in different conditions and settings, and a separate session on choosing outcomes-related computer hardware and software. The outcomes sessions were followed by reports from the Agency for Health Care Policy and Research (AHCPR) and the Trust's Scientific Advisory Committee.

    * Dr Tarlov, who serves as an unpaid volunteer president, is executive director of the Health Institute at the New England Medical Center (Boston) and holds professorships at Harvard University School of Public Health and Tufts University School of Medicine (Boston). The Medical Outcomes Trust is a nonprofit public service membership organization dedicated to advancing the use of high-quality, standardized patient-based measures of health and functioning to improve health care outcomes. The Trust's Scientific Advisory Committee evaluates and approves health status assessment instruments, and it distributes, royalty free, generic and condition-specific outcome measures and manuals for their use, scoring, and interpretation. It also maintains a Projects Registry of ongoing health outcomes projects using Trust-approved instruments. Further information is available at www.outcomes-trust.org.

Keynote Address: Future Directions in Health Status Assessment

In his keynote address, "Future Directions in Health Status Assessment," John E. Ware, Jr, PhD, of QualityMetric (Lincoln, RI) and The Health Institute (New England Medical Center, Boston), identified what needs to happen next in order to put health care outcomes as defined by the patient into the databases used in medical decision making. Patient-based health assessment has proved to have value across three major applications:

  • Population monitoring--to define what is normal;
  • Outcomes research--to determine what works; and
  • Everyday clinical practice--to improve medical decision making.

He also described the new methodology known as Dynamic Health Assessment (DynHA™), which uses a computerized interactive process to select questions specifically for each individual patient from pools of hundreds of questions from many surveys, a technique that yields a brief but more precise health status assessment.*

    * On its Web site (www.qmetric.com), QualityMetric announces these plans and details on availability of the new survey technology.

Issues in Measurement

Dr Ware acknowledged that practical issues have come to be very important, noting that health status was not in the HEDIS (Health Plan Employer Data and Information Set; National Committee for Quality Assurance, Washington, DC) measures until health status could be measured with a one-page questionnaire. "Short forms are wonderful," he avowed, "but they're not up to the next generation of assessments." Their chief limitation is the very thing that makes them popular--their brevity. This is a disadvantage, explained Dr Ware, when it comes to another important requirement--precision. To manage the risk and monitor the change in health for an individual patient (as opposed to a population), a much higher degree of precision is required.

One of the biggest problems with health measures up until now, observed Dr Ware, is that we measure health basically at the bottom of the population distribution and not at the top, where most people are or where we want to restore most people to. So we really don't have a good conceptualization of measurement of health for purposes of monitoring outcomes. "We also have to remind ourselves that we have different goals for health for different segments of the population and different circumstances and that we need a measurement system for each of those segments."* That means that

  • For the great majority of us who are healthy, the goal is to keep us healthy;
  • for those of us who are acutely ill, the goal is to restore us to normal functioning and well-being; and
  • with the increasing number of people who have one or more chronic conditions, we want to give them as good a life as they can possibly have.
  • These distinctions are further discussed on the Foundation for Accountability (FACCT) Web site (www.FACCT.org).

Standardizing Health Status Measures

Dr Ware described a major recalibration project under way in which the SF-36 and SF-12 instruments from the Medical Outcomes Study (MOS) will offer new norm-based scoring. The average adult in the United States is assigned a score of 50 (on a 100-point scale) in physical health according to the physical component summary (PCS) measure.* The average well person with the same sociodemographic profile but no chronic disease will score one-half of a standard deviation (SD) above that--at 55. Patients with very serious physical morbidities (about 100 conditions on this continuum now) are about 2 SDs below an equivalent well person, and they are about 11/2 SDs below the average person with the same sociodemographic characteristics. The SD is 10 for this particular metric. This permits making comparisons, estimating the burden of disease, and estimating the benefits of treatment.

    * The SF-36 measures eight health concepts, which are relevant across age, disease, and treatment groups (see Figure 1, p 6). Often health status is assessed using summary scales for the physical component (primarily the first four scales) and the mental component (primarily the second four scales).

By measuring health outcomes one treatment at a time the same way we define and standardize the norm health, we can look at the extent to which a treatment restores someone to his or her usual zone of function. For instance, more than 30 studies have documented the burden of asthma on this metric. Some have shown that today’s state-of-the-art treatment for asthma improves physical functioning about .50 SD, which raises the score close to what it would be for an average person who has the same sociodemographic characteristics but without asthma.

Standardizing outcomes across treatments also allows looking at different strategies for spending health care dollars. Treatments most beneficial in terms of restoring patients to normal functional health can be rank ordered. A new hip, a new knee, a new heart, a new heart valve, and a new kidney have been linked to 1 to 11/2 SD average improvement—a huge effect. A number of other therapies, such as medications for arthritis, for asthma, and for migraines, are shown to bring a substantial improvement.

“Norming and standardizing offers a new way to score and interpret measures, a development that came from trying to help clinicians understand profiles,” explained Dr Ware. It used to be that the scales reflected the highly variable ceilings and floors of the measures, making the health profile look like a mountain range with peaks and valleys. When clinicians asked for some other way to illustrate the normed scores so that they wouldn’t have to memorize all those means and SDs, “what we did is basically what has been done in psychological testing for a hundred years: We just scored all those scales to have the same mean and the same standard deviation.” Any time somebody scores above 50 it is above the mean, and every point is 1/10 of an SD because the SD is 10. This makes it easy to grasp the burden of the disease by just looking at how much of the profile is below the norm and see the benefits of treatment visually with norm-based scoring in a before-and-after outcomes research project (Figure 1, p 6).

In contrast to norming for a population, clinical practice requires outcomes monitoring efforts that yield precise scores for each individual patient. “We want to look at the patient’s own norm and from that to understand what’s going on in care and services, and we want narrow confidence intervals of 95% around score estimates for each patient,” explained Dr Ware. “Short-form measures do not give us narrow enough confidence intervals,” he asserted. Small differences in reliability have huge implications for confidence intervals for an individual patient. Dr Ware said that what was once thought an impossible ideal is now possible—that is, a real evenly marked “ruler,” based on a mean at 50, that measures health throughout the entire range with a great deal of precision at any level. Questionnaire item responses can now be located on a ruler in such a way that the results are reproducible. This can be seen by comparing the scoring on a single ruler for different questionnaires that evoke responses down to the “floor” and up to the “ceiling” (see Figures 2–4, for different items on physical functioning rulers, pp 7-8). Used alone, such instruments as the widely used Nottingham Health Profile (NHP), the Sickness Impact Profile (SIP), and the activities of daily living (ADL) are useful in many clinical circumstances, they are not precise outcomes measures for people who score high or have moved up from a low to a high score.

Advances in Psychometrics

Dr Ware reviewed the advances in psychometrics with DynHA and computerized adaptive assessment, which makes use of the same new technology recently introduced by Graduate Records Examinations which selects questions to match a subject’s responses to easy and hard questions. Demonstrating with real cases, drawn from 3,000 patients who filled out questionnaires during a four-year period in the MOS, he showed how to work with some patients being screened for possible depression, by asking pointed questions that could precisely locate a subject who first appeared close to endpoints or thresholds on the ruler. And now, because all questions from many different surveys are put in a pool all calibrated on the same ruler, one can compare the high-scoring person with a low-scoring person even though none of them filled out any question in common. “In all of these examples, we’ve never found an instance where administering more than ten items increased our precision or changed our decision on these scores,” noted Dr Ware. Many laboratories around the country and around the world are now contributing item pools to this calibration project with the understanding that the estimates of marks on the ruler will be put on the World Wide Web in the public domain. Patient respondent burden (time and effort) is anticipated to be reduced by 50% or more with the new technology compared with the full-length forms, and as a result data collection costs will be lowered. But the most important reasons for using these methods, concluded Dr Ware, are that they will eliminate the ceiling and floor effects of the old methods and that they will increase precision. He predicts that it will be possible to do a health assessment on a single patient for less than $1 per assessment any time, 24 hours a day, using the electronic database and the Internet. He challenged all the seminar participants to add health status to their patient and system databases since the practical barriers to doing so are being removed.

Choosing Hardware and Software for Health Status Assessment

Nelda Johnson, PhD, PharmD, of Outcomes Research & Design, Inc (Chicago), stated that many of those looking for a new computer system take the "what is your price for that (an advertised product)" approach. It is better, however, to tell the vendors in detail what you need and then let them come back to you with ideas. Dr Johnson outlined sample categories with questions to serve as a needs assessment to guide this process (see Sidebar 1, p 9).

The next steps are to

  • define the scope of your outcomes assessment (research project, accreditation requirement, patient care decisions, benchmarking/performance assessments, systemwide outcomes management);
  • determine the features needed (form design, database design, data collection, data entry, data analysis, report and graphics generation, integration with other systems); and
  • compare technologies (features available, documentation and validity, support services, flexibility/customization, future applications).

Dr Johnson directed her suggestions to an audience with various needs. Very simple software can help with data design and database analysis, she noted. Others need to get data entered faster with a shorter turnaround time. Other possibilities to consider include sending out questionnaires to a fax-back service that can send scores back and using a touch-screen system on which patients punch in their responses. Once the volume of use gets high enough, it will offset the cost of the computer, Dr Johnson pointed out. Low volume may make it more cost-effective to collect data manually, but that method is not good for individual decision making. "I would put in some dummy data and do my own validity testing," she suggested. "In conclusion, communicate with the vendor about what you want to accomplish—not just where you are today, but where you want to be in the future." That well may include an integrated online system where clinical data are integrated with cost and outcomes data.

Outcomes in Clinical Trials

According to Ray Townsend, PharmD, of the University of North Carolina (Chapel Hill) and Strategic Outcomes Services in Research Triangle Park (NC), the pharmaceutical industry has contributed to the scientific development of patient-based health status measures. Much of the valuable information that pharmacists pass on to patients comes from clinical trials, a venue that differs from clinical practice in the scientific and regulatory requirements it must satisfy. Thus capturing the “lessons learned” for clinical practice sometimes involves some translation.

Dr Townsend, the moderator of this session, reviewed “Subjective Measures in Clinical Trials,” by Kate Meaker, MS, of the Center for Drug Evaluation and Research in the U.S. Food and Drug Administration (FDA).* Selection of a patient-based instrument for use in clinical trials should be validated before the trial begins for the desired patient population and must be able to detect a clinically meaningful difference— “a difference (or change) in a score that is perceived as beneficial or that would warrant a change in the patient’s management.” Historically, the FDA has looked at clinical efficacy (results under ideal circumstances), not clinical effectiveness (results in everyday medical practice), so it is especially concerned about internal validity (for example, does the instrument really measure what it is purported to measure?). That means that people from industry collecting information from clinical trials also have to be concerned about internal validity so that the FDA will allow them to disseminate information they are gathering in clinical trials. Also particularly important in the case of patient-based measurement instruments, with their great susceptibility to some forms of bias, is the use of blinding. Patient-based measurements can be useful in clinical trials if they address relevant questions, are used appropriately, and meet requirements for rigorous science. The FDA has moved a long way, asserted Dr Townsend, but it still has a long way to go in defining what rigorous science is from a “humanistic” perspective in clinical practice.

    * Ed note: Ms Meaker was a scheduled speaker who was unable to make it to the meeting.

Use of Quality of Life (QOL) Measures in Clinical Trials

Sidebar 1. Needs Assessment with Questions to Guide Choice of Hardware and Software
Why do you want to measure outcomes?

Just a good idea? Assigned to you? You’re responsible for measuring outcomes and now you have to? Or are you looking at long-term practice improvement or a quality improvement initiative? Try to figure out the types of activities involved and communicate that to the vendor.

What are the logistics?

Single site? Single clinic? Single hospital? Whole system where you have to get data from site to site? How many computers do you have? How many do you need? What kind of support is available for those different systems?

What do you want your results to look like?

Do you want the system to give you back raw data because you have a good information system and because people are going to process it there and they are already set up to do that? Or do you need to get back a final product that looks like reports? Or do you want something in between so you can manipulate some of the aggregate data because you don’t fully trust what the final report looks like and you want to be able to play around with the numbers?

How will the results be used?

Are you going to be preparing final reports and sending them out on a monthly or an annual basis? What exactly is the report going to look like? Do you need good graphics in it? Are you going to be able to do those yourself, or is that something you are going to require of the software? What are the specific requirements for tables?

Who are the stakeholders that are going to be involved in selecting this software?

Do you need to have somebody on board right away?

Sidebar 2. Instruments for Measuring Pain
Unidimensional Instruments
  • Numeric rating scales
  • Visual Analog Scale (VAS)
  • Multidimensional Instruments
  • McGill Pain Questionnaire (MPQ)
  • Dartmouth Pain Questionnaire (adjunct to MPQ)

West Haven-Yale multidimensional pain inventory

  • Brief pain inventory (cancer pain)
  • Memorial Pain Assessment (cancer}
  • American Pain Society questionnaire
  • Instruments Measuring Impact of Pain on Activity and Functioning

MOS-SF-36

  • Sickness Impact Profile (SIP)
  • Karnofsky Performance Status
  • Nottingham Health Profile (NHP)
Jane T. Osterhaus, PhD, of Searle’s Global Health Outcomes Department (Skokie, Ill), discussed the issues encountered in trying to measure pain and QOL. Pain presents challenges to measurement since it has no biologic markers, and pain measures are quite variable, including frequency, duration, and severity. As for QOL, measurement of pain relies on the patient’s perception. Both unidimensional and multidimensional instruments and instruments specific to a type of pain are available (Sidebar 2, p 9).
Color

Dr Osterhaus reported general success in measuring migraine through use of the SF-36 and the Migraine-Specific Quality of Life (MSQOL) Questionnaire. In trials for patients with osteoarthritis (where pain is related to movement), in which QOL and clinical considerations are intertwined, multiple measures are often used, such as the disease-specific QOL measures, the SF-36, and the composite WOMAC (Western Ontario McMaster Osteoarthritis Index; assesses pain, sickness, and functioning) or HAQ (Health Assessment Questionnaire).

Chief conclusions about measurement of pain in clinical trials are that it is quite variable between conditions but more consistent within conditions, and the relationship between QOL and pain (which is under review) engages different perspectives (clinical, regulatory, and methodologic).

Patient Health-Related Quality of Life (HRQL) Measures in Randomized Clinical Trials of Antiretroviral Therapies for HIV Disease

Dennis A. Revicki, PhD, of MEDTAP International (Bethesda, Md), discussed similar methodologic issues. The standard of care now for HIV infection is combination therapy, and HIV disease has changed from an acute to a chronic disease; thus, patient adherence to combination therapy regimens may be related to HRQL.

The challenges for measuring HRQL in HIV disease include the following:

  • Applications within medical practice for clinical management of patients with HIV disease.
  • Measuring HRQL across HIV disease stages from asymptomatic to end-stage AIDS (acquired immunodeficiency syndrome)—are the same domains of functioning and well-being important across disease stages? And can existing instruments be applied across disease stages? (Newer techniques, such as item response theory and computer adaptive testing, may make this easier to accomplish.)
  • Measuring HRQL and being able to compare results across culturally diverse populations (both in the United States and internationally, and in developed and developing countries).

Dr Revicki reviewed HIV-disease–specific instruments that have recently come into use (see Sidebar 3, p 10) and described his work with Albert Wu (Johns Hopkins University, Baltimore) in developing the HIV Health Survey from the MOS items and scales most widely used in clinical trials, with 35 items from across more than 10 domains.

Dr Revicki reported generally favorable findings on the psychometric characteristics of HIV-disease–specific instruments, including the following:

  • The HIV-disease–specific instruments have demonstrated good to excellent reliability (test–retest);
  • Summary and domain scale scores have evidence of construct validity (related to HIV disease severity and stage, clinical status, and other health measures);
  • Summary and domain scale scores have evidence of predictive validity (related to mortality, clinical endpoints, and study dropout); and
  • Summary and domain scale scores have evidence of responsiveness to clinically meaningful changes.

In conclusion, these HRQL instruments assess the combined impact of disease progression and treatment, and they provide health outcomes from the patient’s perspective.

Questions from the audience to the presenters highlighted several key issues on the use of patient-based outcomes measures in clinical trials (see Sidebar 4, p 11).

Outcomes Assessment in Specific Diseases

Three presentations showed how to put together outcomes measurement for three high-prevalence, high-cost chronic diseases: depression, low back pain, and congestive heart failure (CHF).

Depression

“Twenty years ago in mental health assessment, we turned to professionals, but now we turn to patients, because that is now really the standard for understanding outcomes in mental health,” noted Barbara Dickey, PhD, of Harvard Medical School (Boston) and McLean Hospital (Belmont, Mass). The ideal system of assessment

  • is clinically relevant,
  • is sensitive to change (not all instruments are),
  • is culturally sensitive (for example, in a study of the homeless and mentally ill how do you separate depression from realistic discouragement?),
  • has low patient burden,
  • involves the patient/consumer,
  • is integrated into standard operating procedures, and
  • meets CQI (continuous quality improvement), regulatory, and managed care/payer requirements.

Psychometric analysis has shown, Dr Dickey stated, that very disordered patients can use the SF-36 and give responses just as valid as those of the general public. Many of the instruments most widely used for depression (for example, the Beck Depression Inventory, the Zung Self-Rating Depression Scale) are not full outcomes measurement instruments because they measure only depression and have no “up side.” Dr Dickey gave high marks to a generic mental health measure known as BASIS-32 (Behavior and Symptom Identification Scale), which she uses along with SF-36 or SF-12. The BASIS-32 instrument is based on 32 specific functions and symptoms taken from the patient perspective and clustered into 5 areas of problems experienced by patients: relation to self and others, depression/anxiety, daily living skills, impulsive/addictive, and psychosis. Widely used in many settings, especially for inpatients, BASIS-32 has strong psychometric properties (benchmark values are not yet published).

Low Back Pain in the Older Adult

Maura D. Iverson, PT, MPH, SD, of Northeastern University (Boston) and Brigham and Women’s Hospital, compared selected outcomes measures for assessing low pack pain (LBP) in adults aged 65 years and older (see Sidebar 5, p 216). Dr Iversen approved of the VAS and the McGill instruments for pain assessment and of the Roland and the Oswestry instruments for disability. Most disease-specific measures of LBP are designed for specific populations (for example, the surgical cohort for the LSS [Lumbar Spinal Stenosis Questionnaire] and the younger working adult for the NASS [North American Spine Society instrument]). In general, Dr Iversen observed, the scales vary in quality and psychometric properties, and no scale is best for all purposes. Special consideration is needed when assessing outcomes in older adults (with particular attention to the length of the survey since fatigue is a factor), the specificity of items (disability and function), and the design of response sets (patients have to read through every item, which can be cumbersome).

CHF Disease Management Program

Sidebar 3. HIV-Disease-Specific Instruments Developed During the Past Five Years
  • MOS HIV Health Survey
  • General Health Self Assessment (developed for use in the AIDS Clinical Trials Group studies)
  • HIV Patient Reported Status and Experience (PARSE) Survey
  • HIV Cost and Service Utilization Study (HCSUS)
  • Multidimensional Quality of Life Questionnaire (for HIV/AIDS [MQoL-HIV])
  • Revised Functional Assessment of Human Immunodeficiency Virus Infection (FAHI) quality of life instrument
Sidebar 4. Q & A on the Use of Patient-Based Outcomes Measures in Clinical Trials
Q: Were health outcomes measures in HIV trials available to the clinicians at the time they were treating the patients?

A: No, these were randomized trials so physicians did not have outcomes measures available. Albert Wu has introduced these measures in the Johns Hopkins clinic, and other physicians around the country are trying them out to see how these measures can help with clinical management.

Q: What happens if patients have stress, comorbidities, and so forth that can affect their pain threshold?

A: This is an issue in pain measurement, but in clinical trials the randomization should take care of this. If you are talking about an individual patient, then those things must be taken into consideration in deciding how to manage pain.

Q: While we are measuring QOL in trials, we aren’t seeing this used much by pharmaceutical companies in promotion and dissemination of their products—why not?

A: There are some examples of doing so, but not a huge number of them. The FDA is trying to come up with guidelines on what quality of evidence would suffice to support a QOL benefit. This will have a lot to do with the demonstrated validity of the measures, evidence of what is a clinically meaningful difference, or clinically significant effect, so that the FDA and external consumers can judge whether there was a clinically significant impact and not just a statistically significant impact. A number of issues related to statistical data, such as missing information and the construction of summary scores, must be addressed.

Q: What does the literature show about initial data, baseline data, and dropouts?

A: We have found that people with lower scores have a higher probability for dropping out of the study for all kinds of reasons—for instance, they are more likely to have a clinical event or die. You do have to take this into account in your trial because it could look as though the patients were getting better and better over time when what is really happening is that you are losing from your dataset the sicker people.

Pat O’Mara, of the Fallon Clinic, Inc (Worcester, Mass), recounted the development and implementation of Fallon’s CHF Disease Management Program, which was intended to reduce preventable complications from CHF, improve the quality of care and outcomes, increase patient satisfaction, and reduce the cost of care. Through use of a multidisciplinary approach, patient contact with the systems was streamlined to provide "one-stop shopping," with ambulatory and acute care pathways ensuring attention to care of the patient across the continuum of care. A unique part of the program is a computerized database that allows tracking, analyzing, and reporting of clinical information—both process and outcomes data. A disease management identifier screen on the mainframe can be accessed at the clinic as well as the hospital. Patients get a program tag that lets it be known that information can be found online (which is especially important when patients show up in the emergency department).

Outcomes in Accreditation

Health policies requiring the use of patient-based outcomes measures will serve as catalysts for implementation in virtually all facets of health care systems today. The major role of accreditation was reflected in a session featuring presentations by the nation’s key accreditors—the American Medical Association’s American Medical Accreditation Program (AMAP; Chicago), the Joint Commission on Accreditation of Healthcare Organizations (Oakbrook Terrace, Ill), and the National Committee for Quality Assurance (NCQA; Washington, DC)—which have been working together since mid-1998 on performance measure identification and development through the Performance Measurement Coordinating Council (PMCC).

AMAP

William F. Jessee, MD, described AMAP’s three-part mission:

  • To continuously improve the quality of medical care by establishing standards for physician quality; evaluating the performance of individual physicians; and providing information to support and encourage physician performance improvement;
  • To improve the efficiency of physician credentialing and performance measurement activities by eliminating unnecessary duplication and redundancy in data collection and verification; and
  • To support patient, physician, and purchaser choice.

AMAP’s five components for evaluation are credentials, personal qualifications, environment of care, clinical performance, and patient care results. AMAP currently has 22 standards, of which 12 are required and 10 are supplemental. To be accredited, a physician must meet all required standards and score 11 out of a possible 22 points. An outcomes standard (Standard 22S), which is still evolving, calls for "current participation in an ongoing process that evaluates clinical performance and/or patient care results of the applicant and of other physicians." Physicians cannot be accredited through AMAP if they do not practice in some setting in which they can be evaluated by peers. Some 4,000 applicants had applied to date, reported Dr Jessee. A study of 2,500 of these physicians showed that 98% were board certified and that 75% to 80% were getting accredited the first time around. AMAP supports the development and testing of performance measures and the standardization of core measures. Dr Jessee likened the current state of outcomes measurement to the state of disease nosology before the advent of the ICD (International Classification of Diseases) system. Getting some standardization for some measures would be a step forward. In addition to ongoing efforts such as work on the PMCC, AMAP has a great potential for forming a strong private-sector network for standard setting and quality improvement activity, concluded Dr Jessee.

The Joint Commission

“The introduction of outcomes measures is fundamental to what all of us are about, and that is to drive the quality improvement process,” observed Dennis O’Leary, MD. “If you don’t measure, you won’t know what needs to be improved and you will have no ability to set priorities. Further, once you have intervened you won’t know whether you have succeeded unless you have an ability to measure,” he cautioned, stressing the linkage between standards and measurement. The Joint Commission believes that outcomes and other performance measures must be integral parts of the accreditation process, explained Dr O’Leary. “Performance measurement should not be a stand alone activity; rather, performance data should literally drive the accreditation process and our expectations of organizations. Performance data should be telling us where to look in evaluating organization performance,” he added. Thus, organizations should be using data to meet standards expectations and eventually be able to provide tangible real-world proof that their outcomes have improved. “That’s where the public expectations and our expectations are going,” asserted Dr O’Leary. Addressing some other issues in the outcomes and performance measurement arenas, he noted that investment has been “virtually nil” in development of ways to translate data into actionable information. Citing the Joint Commission’s experience since launching its Agenda for Change in 1986, Dr O’Leary reviewed the requirements of a sound methodology for developing performance measures that included rigorous field testing. Another issue facing the outcomes and performance measurement initiative is the effect of presenting a new cost in an already cost-restrained environment. “Everyone is talking about the importance of performance measurement, but no one is putting money on the table for it. We’re not close to having what is going to be needed to deploy well-tested measures out in the field,” noted Dr O’Leary. He concluded with an update on the milestones reached and the time lines to come for the Joint Commission’s ORYX initiative. A two-stage process, based on sound reality testing, is under way to phase in the Joint Commission’s seven accreditation programs, starting with hospitals and long term care organizations. The goal is to get everyone on board by implementing minimal measure requirements, and then make “the train” go faster by introducing core measures and gathering true comparative data. “It also became clear to us that we would have to work together with others to bring some coherence to standardizing core measures,” noted Dr O’Leary in commenting on the formation of the collaborative PMCC. “If we are not successful with this, we will have a continuing Tower of Babel.” Success will be marked by the ability to demonstrate that performance measurement really has led to definitive improvements in the quality of health care and that the information yielded from this effort has really helped consumers inform their health care decisions, Dr O’Leary concluded.

NCQA

Sidebar 5. Instruments Used in Outcomes Measurement of Low Back Pain
Pain Measures
  • Visual Analog Scale (VAS; has lines with anchors to show a continuum of pain)
  • Pain Drawing (used as a screening tool)
  • Dallas Pain Questionnaire (developed primarily for patients with chronic LBP, 16 items with a VAS to assess the impact of pain on daily, work. and leisure activities)
  • Low Back Pain Rating Scale (assesses pain, disability, physical impairment)
  • McGill Pain Questionnaire (assesses sensory, evaluative. and affective components of pain)
  • Observation of Pain Behaviors

Disability and Impairment Measures

  • The Roland, or the Disability Questionnaire (a modification of the Sickness Impact Profile)
  • Oswestry Disability Index
  • Quebec Back Pain Disability Scale (20 items on level of disability: upper extremity)
  • Low Back Pain Rating Scale (both acute and chronic; L pain, disability, and impairment)

Disease-Specific Measures

  • Lumbar Spinal Stenosis (LSS) Questionnaire
  • NASS (North American Spine Society) Lumbar Spine Outcomes Assessment Instrument

*LBP; low luck pain.

Cary Sennett, MD, PhD, explained how NCQA is bringing performance data into accreditation. “Our real mission is to change the rules of engagement in the health care marketplace—to try to move from a marketplace in which we see competition—but really intense price competition—to a marketplace where competition is based on value and organizations motivated to improve value, not simply to cut price.” He seconded Dr O’Leary’s emphasis on providing “actionable information” and not just data, and added, “but we just aren’t there yet.”

Problems arising in connection with use of outcomes in health plans include the following:

  • Typically, outcomes are jointly determined, making it difficult to ascertain the marginal effect of a health plan;
  • They may proceed over long time lines (and thus appear to be out of date);
  • They are data intensive and therefore often costly; and
  • They run up against the problems arising with small numbers (such as inability to produce statistically meaningful results).

This means that any dataset, such as HEDIS, reflects only the state of the art existing at the time it was developed, but it improves incrementally.

Dr Sennett described a new evaluation system under development at NCQA which may address some of the limits that have impeded progress to date. NCQA’s results-driven accreditation, planned for July 1999, is intended to combine performance data with on-site review to create a more complete, robust, and consumer-friendly picture of a health plan. NCQA uses the HEDIS measures to define the areas in which performance matters and then gives a score depending on results (not just improvement over time). The new evaluation mechanism will use AHCPR’s CAHPS (Consumer Assessments of Health Plans Study) 2.0H clinical performance data and member satisfaction data, which together will comprise 25% of the overall accreditation score; the remaining 75% will include health plan systems data.

Outcomes in Policy and Practice: Health of Seniors (HOS)/Medicare Health Outcomes Survey*

Health of Seniors (HOS) is a precedent-setting initiative that integrates the use of patient-based outcomes measures into the process of accrediting managed care organizations that serve an elderly Medicare population. Sharon Sokoloff, PhD, of the Medical Outcomes Trust, reviewed the goals and development of the program.

HOS is the first patient-based outcomes measure to assess the quality of care provided to the Medicare population (ages 65 and older) in managed care organizations (MCOs). HCFA requires MCOs serving the Medicare population to collect this information. HOS is a part of HEDIS 3.0, NCQA’s set of standardized performance measures providing standardized information to assess and compare the performance of MCOs. A global, longitudinal, self-administered outcomes measure based on the SF-36 will track outcomes using two of its summary scores, the physical component summary (PCS) and the mental component summary (MCS). An outcomes score will be calculated to indicate the percentage of Medicare plan members who got better during the two-year period, those who stayed the same, and those who got worse. MCOs will be compared on the basis of this two-year change score. Other features of the HOS project include case-mix adjusters and several subscale scores potentially useful for focusing improvement activity.

    *Ed note: Now called the Medicare Health Outcomes Survey, the questionnaire is scheduled to go out with the wave of Medicare managed health care surveys in March 1999. The first cohort—1,000 randomly sampled Medicare beneficiaries from each of 268 Medicare Managed Care Plans (questionnaire mailed to 279,135 and completed by 167,093, or 60% response rate)—was surveyed in 1998 and will be resurveyed in spring 2000. Cohort two will be administered beginning in March 1999. All managed care plans with Medicare+Choice contracts are participating. Source: www.hcfa.gov/quality/qlty-3e.htm (Jan 27, 1999).

Closing Remarks

Closing the meeting, Dr Tarlov reflected on major technologic innovations and practical applications of outcomes measures, with a forward look at the Trust’s next state-of-the-art conference (Baltimore, October 25, 1999).

“All of us have been excited to learn of the new developments in norm-based scoring, which will help in standardization across measures, facilitate the development of benchmarks for quality improvement, and allow comparison of scores among groups, plans, diseases, and populations,” remarked Dr Tarlov. Using dynamic systems for assessing health has other benefits, too, he noted: “It shortens respondent time, elevates precision, enhances cost-effectiveness, and allows health status assessment to be applied routinely in the monitoring of individual patients.”

As for practical applications, continued Dr Tarlov, new therapeutic approaches using outcomes data from clinical trials in migraine, osteoarthritis, and HIV infection demonstrated just how powerful assessment of functioning and well-being can be in seeking added effectiveness of new treatments. Use of outcomes assessment in evaluating treatment effects in depression, low back pain, and congestive heart failure, and effectiveness in seniors provided all we need to know to build confidence in the measures as an important tool in disease management.

Dr Tarlov reminded the group of the significance of using outcomes assessment in accreditation, as described by the Joint Commission, the NCQA, and the AMA. “Their formation of a coordinating council (the PMCC) to seek standardization to reduce duplication of effort on the part of physicians, plans, hospitals, and other institutions promises to improve care and ease burden at the same time,” he predicted.

Finally, all the innovations and works in progress presented at the meeting will facilitate achievement of the ultimate purpose of health care, to preserve thehealth and improve the well-being of all patients, concluded Dr Tarlov.