Limitations of big data

In this article by Tom Emerick and David Toomey, founders of Thera Advisors, we describe good uses of health plan data and discuss the limitations of health plan data.

As you think about claims data, the information is capturing the services provided to a patient by a healthcare provider for preventive care or for the diagnosis or the treatment of a condition. This information can be grouped by different cohorts—those getting preventive exams, those examining categories of care, or those that seeing specific physicians and/or hospitals for conditions. These data, for example, can be grouped by diagnoses, called a diagnosis related group, involving a hospital stay. However, all claims data is just a collection of medical bills. Medical bills do not contain a complete look at the patient, such as important information as a patient’s prognosis. That’s a gap. Thus, it is important to set appropriate expectations on the use of the data.

Number 1 (one of the most important): Avoid the averages
Most claims data sets are not normally distributed, so the averages do not provide relevant information. In most discussions today, employers evaluate the average cost of employees with specific conditions, e.g., diabetes or high blood pressure. This is a flawed approach because spending by employees with various chronic conditions is skewed, thus not really “averageable”. For example, assume 90% of an employee population with diabetes is spending $10,000/year and 10% is spending $250,000/year, the average will be a meaningless $34,000/year. All too often, a wild goose chase ensues, when in fact the focus should be on the $250,000 cohort to understand why they were so much more expensive.

Number 2: Follow the money
A superior use of claims data is to look at distributions of spending. In most plans today, roughly 8% of enrollees are consuming 80% of plan dollars, and these 8% typically change every twelve to eighteen months. (We still run into benefit managers who were unaware of that.) The future belongs to micro-managing these “outliers”, rather than the 92% who spend only 20% of the dollars. If you study those outliers carefully, you will find that only about 7% of their spending possibly would have been preventable, and then only if they faithfully did what their doctors told them to do decades earlier. A cardiologist recently told me that of the patients he has seen with a significant acute blockage, about 25% had no known health risks of any kind…no high blood pressure, cholesterol, diabetes, obesity, no smoking, no genetic predisposition, etc. As such, there is a component of randomness in terms of many who gets blocked arteries. The same holds true for cancer. For the other 75%, their physicians have usually counseled them on the importance of exercise and nutrition and the dangers of tobacco use, but to no avail.

Number 3: Realize the limitations for quality designations
Yet another big error is trying to use claims data to determine the best quality doctors. You better be really, really talented to try that one. Why? We are in an era in which many doctors are making their “quality” and “outcomes” look better by referring their most complex and risky patients to someone else. (Much has been written about this.) On the other hand, there are highly effective doctors, who take responsibility for their riskiest patients, but as a consequence score poorly on so-called “quality measures”. The real travesty is that the low scoring doctors ironically may be the most cost-effective and provide the best care.

Number 4: Misdiagnoses are a real cost driver
Another huge shortcoming of claims data is one that Readers of Cracking Health Costs know about. Namely, a large number of patients with complex health problems are simply misdiagnosed – today, that’s about 20% of the outliers in benefit plans accounting for 18% of claim dollars. Thus, you cannot rely on diagnoses in claims data, and you cannot tell who is getting diagnoses right or wrong – this takes detective work beyond claims data. Click here for a good article by the Mayo Clinic on rates of misdiagnoses. We have sent hundreds of people to the Mayo Clinic for second opinions and can verify by personal experience the truth in that article…same for other clinics we have used for employers. Our first rule in selecting a Center of Excellence is its success in correctly diagnosing patients with complex health problems. Huge amounts of claim dollars are spent on treatments or surgeries that are either completely erroneous or clearly suboptimal. An executive at a Fortune 100 company once said to me that the biggest quality failure in healthcare is to misdiagnose a patient…everything that follows harms the patient.

Number 5: Coding can impact the data analysis
During a data analysis for a very larger employer, over 250k covered lives, they told me they had not paid for a solid organ transplant in a number of years. Based on their size, they should have been paying for about 25 a year. After further detective work, we discovered their consultant was using a DRG grouper that coded all transplants as ventilator cases…who knows why…but a huge error. The benefit team had no idea they were really paying for about 25 a year at an average cost over five years of about $1,500,000 each.

Number 6: Reversion to the mean                                                              One thing we’ve learned from years of claims analysis of big companies’ benefit programs is that if you have enough life years of data, it all looks about the same, i.e., it reverts to the mean. If the workforce is comparatively older, they will have somewhat more high cost claims.


Tom Emerick and David Toomey are founders of Thera Advisors. Their focus is to help employers maximize their role as the purchaser of healthcare services in working with suppliers to impact their population’s health and to lower costs.





Leave a Reply

Your email address will not be published. Required fields are marked *