The devil’s in the detail: how big data could be camouflaging implant outcomes

Current issue

The devil’s in the detail: how big data could be camouflaging implant outcomes

Knee replacement is a highly effective treatment for arthritis and a range of other disorders around the knee, and there are a wide range of implant brands and joint replacements available, which are quite rightly highly regulated in the UK through a national registry. The options available within many brand portfolios have grown exponentially over the past few years. A recent ground-breaking study published in The Bone & Joint Journal has investigated the effect of this expansion of implant brand portfolios and highlighted where there may be a lack of transparency in the big data registries reporting on them.

In this interview, Andrew Duckworth is joined by Editor-in-Chief at The Bone & Joint Journal, Professor Fares Haddad, and the authors of this paper from the October 2021 issue of the journal, ‘Implant brand portfolios, the potential for camouflage of data, and the role of the Orthopaedic Data Evaluation Panel in total knee arthroplasty’. The authors are Mr John Phillips, consultant orthopaedic surgeon at the Exeter Knee Reconstruction Unit and a member of the Orthopaedic Data Evaluation Panel (ODEP), and Mr Keith Tucker, Chair of the ODEP and Beyond Compliance advisory group.

Can you give a brief overview of the role of the National Joint Registry (NJR) and how this content can compare with ODEP in this context, and also a bit about ODEP ratings and assessment?

ODEP and the NJR were set up around 2002 in the wake of the 3M Capital hip system problems. A registry was something many of us had been asking for for years but it was never going to be a quick fix. That's why ODEP was created by NICE with a requirement that manufacturers of hip implants should submit data to support the use of their implants. NICE started off by telling us that we should have two ratings - ten rating for products, with data for ten years, and a three rating for new products. We realised that this was never going to work, was never going to be adequate, so we introduced the five- and seven-year benchmarks, as well as the concept that once an implant was enrolled in the process, manufacturers had to keep climbing up through the benchmarks as time went by, so nothing could be stationary.

In 2012 the British Association for Surgery of the Knee (BASK) came to us to initiate ODEP for knees. Then shoulders came along in 2017 and we're now moving ahead with spine ankle, wrist, and hand. We have strongly encouraged manufacturers to look at data on their implants and by default it is now difficult for manufacturers to market their implants, certainly in this country, without an ODEP rating. I should also note that we're all unpaid and we're proud of our independence.

With regards to the NJR, as I said it was realised that a registry was an investment for the longer term. Nowadays the NJR has wide ranging responsibilities for monitoring surgeons, hospitals and implants, as well as all the research that goes along with it. But it mustn't be forgotten that the original remit for the NJR was to keep records of patients with implants so that if there was ever a problem, we could go back. With 3M, we didn't know who had got the implants, so it was difficult to get patients back to the clinics. When we had the metal-on-metal problem the NJR was very quick at getting patients identified and back to their hospitals for check-ups.

The NJR is the largest registry in the world and certainly the one most quoted and used - especially by manufacturers. But what we've also got now is big data, and we can now see that there is the potential for camouflage within big data. That's what we're trying to address.

Can you expand on the vast range of options that can now exist within a certain brand and what potential issues that can give us with regards to the ODEP and NJR?

When hips were introduced, they were introduced as a femoral components then acetabular components, and each got a rating. When knees were introduced, a construct was brought together. A tibial base plate couldn't receive a single rating in the same way as the femoral component couldn’t receive a rating. You had to have a certain base plate with a certain patella and a certain femoral component with a certain insert.

If you consider a basic knee replacement system, you have a base plate, a femoral component, and then the corresponding inserts, and that is with, or without a patella. You end up with a grid of four different options. That would be a very basic system, but if only life were that simple. As everyone knows who's performed the replacement surgery, there are variants available within certain brands. The options potentially available include uncemented versions, you can have a modular tibial base plate with a stem. You can have a mono-block where the insert is attached to the base plate. You could have an allergy type with alternative bearing, you could have a mobile bearing base plate. You could have fixed bearing base plate. You could have, again, a different material, cobalt, chrome, titanium, and you can have different shapes. Then consider that you could have equal number of different types of femoral components, inserts, different makes, different materials, different shapes sizes. This led us to come up with this question: how many potential combinations could you have within a system?

Prof Haddad what are you your thoughts on these potential issues?

This is a great paper because it articulates something we'd been thinking about, worrying about and talking about in the journal for a long time. We've been pushing extremely hard for registries to clean up and improve their reporting. We recognise the immense value of these registries, but as early as 2013, we put out a couple of editorials on the trouble, as well as the benefits, of big data and another on how it should be interpreted. This is another reminder that we can generate hypotheses from big data, but then we need to take a step back and drill down. It's not to do down or denigrate big data; it's an important part of what we do. Whenever we're looking at these things, it's important that we bear in mind that the devil is going to be in the detail. I really welcome this concept of ‘camouflage’ now being in the public eye and the great work this paper's done from that perspective.

Can you give just a very brief overview of how your assessment of this was performed to answer the study aims.

We just did the maths and put things together and very simply worked out what happens if you added second femoral components and added second tibial components. The study isn't evaluating any specific implant or implant brand data, it's hypothetical. It highlights that there are 30 commonly used mainstream brands of knee replacement in the UK. And the NJR reports further subdivides these major brand names into about 49 groups for analysis. We break the results down to demonstrate the effect of addition of alternative implant options within a brand, a hypothetical brand portfolio.

If we take the first example that looks at cruciate retaining (CR) and posterior stabilised (PS) options, what does that breakdown into?

That breaks out into that simple grid, or you have a CR and a PS option, with or without the patella and the tibia is the same; for instance, you end up with four different options. Therefore, if you have 30 brands then potentially you've got four different options with each brand. But then if you move onto our next examples looking at the addition of one to three various additional options it all started becoming very interesting. For every additional second tibia option, maybe you had a modular tibia, then that would have the effect of doubling the number of options. You go from four to eight and then you had a second type of insert, maybe a cross-linked poly, that again doubles it to second femur. That takes you to 16. And then you had another femur and, and then you ended up with 32 different options. Every time you add a second option, with or without a patella, you end up doubling it to 32. By the time you start adding a third option, you add to it by 1.5 and then it goes down to 1.3, if you have fourth options, but again, that takes us into the situation where you could have three different femurs and two different femurs and patellars and inserts; you end up with 42 or 48 different options.

When you add in the uncemented options the numbers go crazy because you have the option then of hybridising them all. You could have an uncemented femur with a cemented femur. If you have your standard hypothetical group of four and then you add uncemented options, you quadruple the numbers. You go from four to 16 options with uncemented, with all the various options.

If you have two different versions of everything, and then you add an uncemented option you get to 128 compatible variance. If you add an uncemented patellar onto that, then you go to 192 different options and it gets crazier and crazier. In the paper we put together a hypothetical brand with essentially three different options in each of them and came up with 750 compatible brands. I did some quick maths yesterday and worked out that if you have six, because there are some brands out there with six different tibias, femurs, patellas, then there are over 10,000 different compatible options that a surgeon could potentially use.

What do you feel are the key take home messages from this? And maybe any caveats that go with it?

Surgeons should hopefully be aware now that there are multiple different variations within a brand, even if it has been labelled a good brand. If you start using more niche options within a brand, surgeons must be aware that there may not be data available to support the use of that particular construct.

Through these big data sets a lot of niche combinations may well be camouflaged and then the results may be camouflaged within the larger datasets within a brand. Through working with ODEP and through looking at the NJR research we know that not all knees are the same and not all hips are the same and not all implants are the same. Some implants may be performing better than others and some implants may be worse than others but if you perform surgery with niche implants, which haven't got many numbers, you won't know the results and they might be terrible. A certain niche combination that has only been performed four times ever may have failed four times, but within the tens of thousands in the bigger data set in the NJR, those results would be camouflaged.

Surgeons can protect themselves by referring to ODEP. ODEP independently analyses constructs which have been collected together. Surgeons can check that a certain implant with the femur they prefer to use, and the patellar that they like to use, and the inserts and the tibia, have got independent verification at five, seven, ten and 15 years. We want to make sure surgeons open their eyes to the implants they are using.

Prof Haddad, how do you think we move forward, not only in our implant regulation, but also in management of innovation? How do we do that safely and use big data to do that the best way we can?

There are some great messages here. I think the other caveat is that beyond the implant constructs you've got here, you've also got different alignment philosophies and results, because what you end up with may not be what you aim to end up with and on top of that, different enhanced technologies. We’ve had some reports in the journal this year, for example, of an implant which had a high failure rate in certain hands, but the defence is that it does well in the registry. Now, if you follow that a little bit further forward, it may well mean there is a certain construct done in a certain way and the implant manufacturers change that tray. There must be a signal there that's been noticed internally and yet there are patients walking around with that tibial tray. For our profession, there are implications for how we react to that and what we do. We can't just hide behind that registry data.

Moving forward, this is a really important thing; when surgeons are going to change practice they need to take a step back and consider, do I know how well I'm doing with my current constructs? Do I have my own data? Do I have my own outcomes? And how do those correspond with the outcomes out there both in the registries, but also in the peer-reviewed literature? And then if I'm going to change, how am I going to measure my own outcome? And if my numbers aren't going to be big enough, where am I going to get that external data? Because the overall registry curve is not really going to give them all the data they need. That's a really key message from John and Keith that everybody needs to take away. This whole concept of camouflage means we need to break down exactly what people are going to do and this is what RCTs do beautifully. We're going to step back from looking at big data for one valuable end of the scale, and actually look at small mechanistic studies as a way of comparing A versus B still being a very valuable part of what we do in orthopaedics.

If you’d like to read the full paper you can do so here. You can listen to the podcast version of this interview here.