From AI model to software medical device: Why the algorithm is only a fraction of the work

A good rule of thumb we can confirm from our own experience developing AI medical solutions is the following:

“For every $1 you spend developing an algorithm, you must spend $100 to deploy and support it.”

If you’re not familiar with our industry, this may sound counterintuitive. Isn’t the AI algorithm most of the work when building an AI-based product? Not exactly. The development of AI clinical solutions does not consist solely of modelling. It is a long and challenging process, from gathering and curating medical data to training, testing, validating, and certifying the model; deploying it in the hospitals’ complex IT landscape; maintaining and improving its performance.

In this article, I zoom in on the development of a ‘complete’ AI solution, based on our approach with Veye Lung Nodules, a medical device currently used in over 70 European sites. The story of Veye is, in many ways, the story of building Aidence, from founding to our recent acquisition.

The clinical challenge

In 2015, Mark-Jan, Tim, a former co-founder, and I were brought together by a shared ambition. We wanted to tackle healthcare challenges doing what we knew best: building technology. Specifically, we sought to be among the first to leverage the new and very promising application of artificial intelligence (deep learning, more precisely).

AI had proven to be a suitable method of analysing and interpreting patterns in images. So, we decided to focus on medical imaging. We started talking to radiologists, trying to understand their workflow. We looked for simple challenges we could solve with automation. Several opportunities came up: MRI and X-rays of the knee, lumbar spine MRI, and lung nodules on chest CTs.

All compelling use cases, but we had to be pragmatic. As a startup, we needed to sustain our business as quickly as possible. We, therefore, considered the application with the highest volume of cases and high reimbursement. Our goal was to develop a minimum viable product (MVP), obtain revenue, then use it for further useful clinical applications.

Pulmonary nodule management on chest CT scans seemed to fit the brief. Lung cancer is the biggest cancer killer worldwide, and volume-heavy screening programmes are increasingly considered to diagnose it in time. But early detection requires radiologists to perform tedious and error-prone (semi)-manual tasks. For instance, they would search for millimetric lesions with the naked eye, count, segment, and measure them.

We resolved to work on an AI solution automating some lung nodule management tasks. It was both a practical choice and a leap of faith, as developments in screening were uncertain.

QMS before anything else

How do you start building an AI-based clinical application? Not with a line of code, as you may expect, but with a quality management system (QMS). This is very different from how you’d go about a consumer mobile app, for example. For a medical device, there is a patient behind everything you do. A robust quality management system is the ticket to the healthcare market.

For an AI solution to solve real clinical problems, it must be certified throughout its development and pass all the regulatory standards for Software as a Medical Device (SaMD). Without registration, documentation, and certification showing that the solution is developed according to the law, all the work is in vain.

To build the QMS, we reached out to consultants, who guided us to the correct templates. We then went ahead and set up the standard operating procedures and systems ourselves, a detailed, cumbersome task.

How to develop an AI medical device

With a clear problem to solve in mind and a QMS in place, the development may begin:

Data collection

To build a well-performing model, we needed large, diverse, and high-quality datasets from various scanners, hospitals, and countries. The quality of our algorithms – and therefore the value of our company – depended on access to data.

In all honesty, we initially aimed for other applications than lung nodules on chest CTs, but getting relevant data was too complicated and time-consuming. For chest CTs, we could use a publicly available dataset: 45,000 scans from the American National Lung Screening Trial (NLST). This is the data on which we trained Veye’s detection feature.

Annotations

Medical imaging AI ‘learns’ from radiologists by ‘seeing’ large volumes of manually annotated scans. Annotations are the radiologist’s findings on a scan: an outline of a lung nodule, its classification, size, and so on. At this stage, we recruited and trained experienced radiologists to perform the annotations on the NLST dataset.

To streamline the process, we created a simple, in-house interface that made it easier to collect their input (the ‘annotator’). Since then, this tool has gone through several iterations. It is now at a level that allows us to scale what we call an ‘annotations factory’.

Modelling

The design of the network architecture consists of three main tasks:

Training the model to recognise the patterns in the annotated scans;
Defining the outcome of the algorithm, based on recommendations from medical societies and input from radiologists;
Deciding how results are presented to the user.

The first model in our medical device was for lung nodule detection. At the time, AI-enabled detection was the topic of several publications and competitions, such as the Luna challenge, in which our algorithm ranked high on accuracy. Healthcare professionals were beginning to understand the value of AI in detection. We felt that we had the capabilities not only to build a device enabling it but actually to go beyond research to put it in the hands of physicians.

We officially founded Aidence shortly after this realisation, in November 2015. In early 2016, we recruited two machine learning students to help with the modelling and a clinician to advise us.

Building algorithms was also our hobby, and we kept experimenting with it. We created a model that could detect osteoarthritis at an accuracy comparable to the radiologists’; it took only three weeks! In 2017, with a slightly bigger data science team, we took on the Kaggle challenge, where our model was successful in predicting lung cancer on a single scan and ranked third out of over 1,500 entries. With the prize money, we bought a first-rate GPU server.

Clinical validation

The performance of the AI model must be validated on a test set. This dataset is independent of the one used to train the algorithm and contains the ground truth (i.e. the radiologists’ final assessment).

We validated the clinical performance of Veye Lung Nodules using two databases:

the LIDC/IDRI, a database designed to support the evaluation of CAD software;
a database developed in cooperation with the University of Edinburgh during a clinical trial funded by the NHS.

We demonstrated that the system was performing well for both datasets, with an overall accuracy of 95.96%. Readings aided by Veye Lung Nodules yielded a 10% higher sensitivity than unaided readings while hardly affecting the false positive rate. To read the entire clinical validation study, visit this page.

Integration

With modelling and validation completed, there is still a big ‘if’ when it comes to the algorithm’s usefulness in daily clinical practice: workflow integration.

Radiologists are reluctant to exit their regular working window (or even room) to open a new application, upload a scan, wait for the analysis, and finally sit down for reporting. Interruptions take time, decrease efficiency, and cause frustration. AI can only provide diagnostic decision support if integrated into the radiologist’s Picture Archiving and Communication System (PACS). Integration is a prime driver for AI adoption too; only if there is no disruption in the radiologist’s daily workflow is there is interest to start using AI.

The international standard to communicate and manage medical images and data is DICOM; thus, Veye had to integrate with it. However, hospitals are notorious for their complex IT infrastructure and different installations of DICOM. This meant that we needed to build software ‘around’ the AI to deliver Veye’s results directly in the analysed study. So much for a standard, one-size-fits-all method.

We ensured that our AI solution is seamlessly embedded in the radiology workflow by closely collaborating with users, PACS vendors, and IT managers.

Overall, most of our work as an AI company is software engineering, the critical capability to deploy and maintain algorithms within the hospitals’ systems. Yet software does not get the deserved attention. I made a call to change that in one of my previous articles.

Certification

All stages came together in the technical documentation submitted to our Notified Body (i.e. an organisation designated to assess the conformity of certain products before market release). The submission contained the clinical evaluation report and the risk assessment for the device.

We obtained the CE certification of Veye Lung Nodules in December 2017, after an initial seed funding round and Leon Doorn joining as a regulatory expert. Since then, we have built a strong quality assurance/compliance team. In 2020, we were one of the first AI companies to obtain a class IIb certification under the new EU Medical Device Regulation (MDR).

Changes to the algorithm (i.e. product upgrades) might require new submissions to the Notified Body, slowing down the process of releasing improved models into clinical practice. Leon delved into the legislation around algorithmic changes in this article.

Contracting, deployment, and service

Getting AI into the hospital touches upon many specialities: data processing, radiology decision-making, technical integration, etc. On occasion, we sat across ten hospital representatives: the department head, two radiologists, an internal AI champion, a legal expert, a data protection expert, the project manager, the IT manager, and the PACS administrator. Reaching an agreement takes three to six months on average. And shortcuts are unacceptable as they would undermine our valued QMS.

All signatures done, deploying the AI solution in clinical practice requires site integration, piloting, testing, training and continuous support. We argue that the safest and most efficient way to integrate AI solutions in hospitals’ infrastructure is via the cloud. Our tech experts explained why in a separate piece.

The story is far from over when the AI solution analyses the first patient scan. It is essential to correctly and continually monitor the safety and performance of the device in clinical practice. As Aidence, we are transparent about our quality improvement and post-market surveillance processes and advocate for the industry to do the same.

Repeat as needed

As we continued our conversations with radiologists, we understood that a lung nodule detection capability alone was insufficient. The radiology report needs to include more information on the found nodules, such as their type and diameter.

So, after building the detection model, we went back to the drawing board. We repeated the steps in the above process for every extra feature of Veye. Most also required their own AI models and specific data.

Apart from detection, Veye Lung Nodules now also automatically classifies, measures and assesses the growth of lung nodules, all in one click.

Time and funds

The development of our AI medical solution started with us founders investing our savings in our idea; we ran through all we had and were nowhere near done. The first full-feature prototype of Veye Lung Nodules was made possible by a Dutch government loan. I vividly recall leaving the room after our pitch, sure we would not get it, only to receive a positive response two hours later. It was one of the early confirmations that we were on to something.

Although the technology was clearly working, integrating AI into a medical product added many layers of complexity. Consequently, the seed and series A rounds that came later were trickier tasks. Investors sometimes asked for more certainty than we could deliver or imposed strict terms and conditions. But once clinicians started saying that they were impressed with Veye, investors began extending their support.

To put it mildly, finances were very tight right before each funding round. We were on the verge of stopping (at least) three times building our company and product. What kept us going was the proof that our solution made a real difference in clinical practice: our AI was detecting cancers that humans would have missed. Within the initial 75 cases at the first hospital where we installed it, Veye detected a growing nodule that four radiologists had overlooked. Immense impact on the patient!

The hard work was more than worth it. Veye now analyses thousands of patient scans each week across Europe in lung cancer screening and routine practice. We regularly hear about cases in which it saved the day by detecting nodules that were nearly impossible to find (for example, located behind an artery). Some radiologists don’t want to report without it anymore.

Afterthought: In-house AI

Hospitals or research groups sometimes choose to develop their own algorithms, but we do not know of any success stories. It may be due to the many things to consider with an AI clinical application. Following the legal standards and regulations for software medical devices is perhaps the most challenging. It is also of vital importance to ensure safety and quality.

My opinion is that checking all the steps in development is a stretch for hospitals. If we consider what it took for us, the breakdown is:

Four years of product development;
A team of 60+ with an emphasis on software and delivery;
€12.5m in funding to validate the solution at scale.

It makes sense to team up with AI vendors who have the time and resources to do everything by the book. Of course, this remains an organisational decision.

Intelligent software

An AI medical solution is much more than an algorithm. At Aidence, we think a better way to describe our products is ‘intelligent software’. And we’ve come a long way building this software: from the founders doing everything themselves to a team of seven working from a basement, and fast-forward to over 60 people today. This was no overnight success but a seven-year-long obstacle course.

With each step and funding round, we grew our team with clinical, regulatory, operations, and business experts. Yet our data science team still only consists of less than five people. This reflects the extent of algorithm work versus all the different skills and capabilities required to do everything else. In other words, where the 100 dollars go for each dollar.