Digging deep - artificial intelligence and deep learning in orthopaedic research

Current issue

Digging deep - artificial intelligence and deep learning in orthopaedic research

Artificial intelligence (AI) technology, from Alexa to online product recommendations, and from predictive search engines to self-driving cars, is already a huge part of our daily lives. Recently, there has been a surge in orthopaedic deep learning research as a result of the increasing accessibility of AI tools for researchers and health systems.

In this interview, Andrew Duckworth is joined by Editor-in-Chief at The Bone & Joint Journal Prof Fares Haddad and Jonathan Vigdorchik from the Hospital for Special Surgery in New York who recently wrote an article on this topic (‘Deep learning in orthopaedic research’) for the journal.

Could you give us a brief overview of what AI and deep learning are and what potential roles they have, or even already have in medicine?

AI is a catchall term. Artificial intelligence really refers to computers that are learning to do what humans can already do; problem solving, decision making, speech, language etc. We see it in our everyday lives; things like Siri or Alexa on our phones or facial recognition software. My phone can organise every single photo of me just by facial recognition.

Do any of you use Instagram? Instagram will bring you ads based on your searches. That's AI. If you use Amazon, it's suggesting products based on your previous activity; the computers are learning how you think and trying to execute the things that you're trying to do before you do it. They are mimicking human intelligence. AI has actually been around since World War II and even in the 1950s they were teaching computers how to do things.

In terms of medicine now, do you know what they call two orthopaedic surgeons reading an EKG? A double blinded study! So nowadays every time you get an EKG, it's got a little report on it saying if it's normal rhythm or if there are any abnormalities; that's the very earliest version of AI in medicine. Another more recent example is vaccine development. The AI can actually simulate how a virus is going to mutate so that it can create new strains of the vaccines.

In your editorial, you talk about the methodologies of AI, in terms of deep learning and computer vision. What are those?

Computer vision is when a computer learns to do what a human eye can do; recognising things from images or videos. Machine learning is another subset of AI; when a computer is actually learning from the tasks that it's already doing. Every time it executes a command, it learns how it does it and when it executes that command again, it takes its prior knowledge and keeps learning. Deep learning goes even further. It's where we're simulating what a brain does, aka neural networks. It's almost like there are a hundred different computers and each one of them has a different minuscule task that it executes, and then it combines and meshes all this information together in a kind of neural network architecture, just like our human brain can do.

As you both point out in your editorial, there's been a real surge in this area in the literature and orthopaedic deep learning literature in particular. What has been your own experience of it as editor of the journal, and also as a researcher and clinician?

From the editorial perspective, like any new methodology, we've seen a huge upsurge in submissions that reflect some form of AI or machine learning. Some of these are outstanding, great ideas and important questions. We saw a similar surge previously with systematic reviews and meta-analysis; as soon as people got hold of big data in its various forms, including the registries, it just became an exercise in which people throw it at just about anything. Right now, we're in that phase. We are seeing much more poorly carried out work by units that are just doing it in big volumes because they think it's a great idea and that it's going to raise their profile. I do think it's something that's here to stay but, as an editor, it’s something that we're seeing a little bit too much of at the moment. In particular, we're seeing it with that mystique of the ‘black box’, whereby people are creating and writing these reports that other people can't interpret or understand, which is a problem.

In my current practice, computer-assisted planning and computer-assisted arthroplasty surgery is just an obvious example. We've got preoperative imaging data, both plain radiographs and cross-sectional imaging, then we've got a tremendous amount of intraoperative data from these computers and robots, and we’re starting to put all that together with all the process data that we have. We've also got electronic medical records now, or many of us have. So, in my own little world there seems to be an endless opportunity to try and make sense of that and to try and create some patterns for understanding what we should be doing at an early stage and also for prognosticating. I think it's going to be a rich seam of work over the next few years, the question is how we apply it and the rigour with which we look at it. It's an answer to some questions, but it's not an answer to all the questions.

What's your experience Jonathan?

It is very similar. I think people are using the term AI for just about anything, but just looking at bigger sets of data with the computer and doing larger multi-variate analyses isn’t AI. It's also prone to the same errors that are possible with a multi-variate analysis, like confound. There's a really good example where the authors looked at hundreds of thousands of medical records of people who were admitted with pneumonia, trying to predict death risk. They found that if you have asthma or if you are greater than a hundred years old, those are actually your best predictors of not dying. The reason for this is that if you're a hundred, people are really aggressive with your care, and if you have asthma, you already have a doctor, you know what the symptoms of shortness of breath are and you get earlier access to care. Confounders in the data sets are prevalent, so you have to be careful applying these big models to bigger data sets.

Could you highlight some of the important limitations of this methodology?

It's prone to a lot of the errors that we see with any type of statistics. You can overfit your model for example; in a lot of these models and when you look at big data sets, you see these scattered plots and the model can actually predict perfectly an entire scatter plot as opposed to looking at the trends of the data over time. Sometimes the models are too good and lose their applicability or generalisability to the other data sets.

We're also starting to look at how you report these types of studies. What's come out recently is something called TRIPOD, the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis. It's a list of 22 questions that the author needs to go through to show exactly how they did the study in detail. It's almost like the PRISMA way to do multi or systematic reviews. It’s a very proper way to do it so that everybody knows the science and integrity that you're putting into that study, and I think that's really important. In the examination of big data the data quality is the important thing; if you put rubbish in you get rubbish out.

We also talk about having external multi-institutional setting data with large, heterogenous groups of patients for cross validation and independent testing; that's going be key. We are getting the data sets that maybe we can do that with; the Mayo Clinic just published several hundred thousand hip x-rays in their database of AI for hip radiography. These are data sets that would take an army of a hundred medical students ten years to measure and we can do it in a matter of hours.

We presented at the AAOS about 20,000 measurements done in an hour for leg-length discrepancy for hip replacements; massive data sets looking at the automation of imaging can be done, looking at thousands of readmissions for risk prediction after joint replacement or even any of the medical factors. We can now do these much larger scale studies, if we do them correctly and interpret the data correctly.

How do you feel we should harness the power and the potential of AI and what's the role of the journal in guiding that type of research?

I think we need to recognise that it is a powerful tool that's now at our disposal. We need to apply it correctly because there are situations, there are data sets, there are questions where this is going to be our best bet for getting an answer and for being able to improve healthcare.

From a journal perspective, what we've tried to do initially with Bayliss and Jones' annotation, and then last year with Luke Farrow and Dominic Meek's paper, is try to announce (like we did with SEARCH) that we want to look at these papers, but that when we are looking at them we really want to understand what the aims of the study are and that the methodology needs to be transparent. What is the data, what are its limits, how is it validated? And how has it all been looked at so that somebody else could repeat that piece of work? We're going to continue to encourage the stream of work, but we're only going to encourage it if it is high quality and for the right reasons. As I've said many a time now, there are so many things that are unanswered in our specialty, and there will be different techniques to answer many of those questions; AI and machine learning, and this ability to work through massive data sets, will be a key part of what we're using over the next few years. There's going to have to be, ironically, a level of human intelligence in the interpretation of what this data all means. I don't think we're obsolete just yet!

If you’d like to read the full paper you can do so here. You can listen to the podcast version of this interview here.