On first sight and consideration, one may say that new solutions driven by Artificial Intelligence perfectly align with the current regulation regarding software as a stand-alone medical device. Whilst both the FDA and the European Commission consider the existence of ‘software as a stand-alone medical device’ (or ‘SaMD’), in their regulatory frameworks, do these frameworks still hold when the technology and ability of AI evolves?
How different are AI algorithms from traditional diagnostic software?
Computer-Aided Detection or Diagnosis (CAD) software already existed before these started to make use of Artificial Intelligence (‘AI’) based solutions. Traditional CAD software devices are manually coded by engineers to detect lesions (as an example). Such rules are rigid and require manual adjustment and tweaking to obtain optimal performance. AI-driven CAD software, on the contrary, is ‘self-learning’ which means that the algorithms are capable of adjusting themselves based on the input that is fed into the algorithm. One should note that this ‘self-learning’ applies to the development of the algorithm prior to releasing it for use in clinical practice. Today an algorithm doesn’t continue learning once it is released.
The self-learning capacity of AI algorithms allows them to be developed on large datasets and achieve a higher level of performance compared to traditional software devices.
Veye Chest, our deep learning solution for pulmonary nodules
Our solution, Veye Chest, automates detection and characterisation of pulmonary nodules to support radiologists in their review of chest CT-scans. Veye Chest is intended for use as a second or concurrent reader tool. But what does this actually mean?
- Second reader tool – The radiologist reads the full chest CT-scan, documents its findings and then reads the results provided by Veye Chest to determine if anything was missed when he read the chest CT-scan;
- Concurrent reader tool – The radiologist reads the full chest CT-scan while the results of Veye Chest are visible in the chest CT-scan. This allows the radiologist to immediately note the Veye Chest results within his radiology report.
To qualify a reader tool, the European Medical Device Directives (soon to be replaced by the Medical Device Regulation) set out the relevant requirements. Such requirements and how these have been applied to Veye Chest are explained in the following sections.
The certification process of Veye Chest
At the start of developing Veye Chest we determined, in cooperation with our Notified Body (1), the applicable classification of Veye Chest (class IIa). Classification will determine the regulatory route to follow and the level of oversight provided by the Notified Body. In Europe, medical device classification is divided up between Class I, Class IIa/b, or Class III. In parallel to actual development (coding), we prepared the full suite of technical documentation required and set up our company-wide quality management system.
During development the algorithm requires training. For training, Aidence used large training datasets (e.g. over 45,000 images). Part of the training dataset is enhanced with new labeling done by a team of Dutch radiologists.
Validation of Veye Chest is a two-step approach. First, we internally validated the performance on an independent dataset that was not used for product development. Think of this internal validation as in-house testing. This validation dataset should be representative of the intended use environment. Second, a team at the University of Edinburgh and NHS Lothian performed a clinical performance assessment on a locally compiled dataset. This included a ground truth based on the majority consensus of three thoracic radiologists from the NHS Lothian.
To complete the CE submission we conducted an in-depth risk analysis. In our risk analysis process, we engage with external, independent radiologists and medical physicists to help Aidence identify and evaluate risks and evaluate the usability of Veye Chest. The role of radiologists and medical physicists is crucial for us to understand whether or not the use of Veye Chest would introduce unacceptable risks to a patient. The outcomes of the risk analysis have been documented and submitted to the Notified Body as part of the CE submission.
Once development was completed and CE marking received, Veye Chest was released onto the market with a set performance level. Since its release, we continue to improve the device (both from a technical and performance perspective). Changes to the product, such as performance improvements, require re-verification and validation. In some instances, re-evaluation or certification with the Notified Body is also required. The current regulatory framework doesn’t support continuous learning in clinical practice as a result of this.
At the moment, we are also preparing ourselves for the US market, which brings additional validation and control requirements. We will update on that experience at a later stage.
As people develop, so do AI solutions
We believe that deep learning technology has a bright future and will move beyond being a support tool for performing many routine tasks autonomously. The question is, however, what impacts such changes would mean for physicians and how regulators will tackle these challenges.
Before giving an outlook on full autonomous programs, let’s first consider ‘first reader tools’.
- First reader tool – A first reader tool is used to fully analyse the image for any anomalies (for Veye Chest that would be Pulmonary Nodules). Consequently, a physician (radiologist for Veye Chest) uses those outcomes to further determine the treatment or diagnostic pathway.
Under the current EU legislation, there are no set rules that distinguish requirements for second, concurrent, or first reader tools. With the MDR coming into force in 2020, there will be a more stringent framework in place overall, but it will not directly introduce specific second, concurrent, or first reader tool requirements (2).
However, it will introduce a new classification rule specific for software programs (3), which considers the level of information provided by the software and the use of that information in clinical practice. This applies both to software developed by manufacturers or developed in-house by hospitals. The intent here in my opinion is, the more impact the information generated by the software has on clinical decision making and risk to the patient, the higher the device classification will be. Under this new rule, software may be classified as Class IIa, Class IIb, or even Class III. Higher classification will result in more stringent oversight such as for devices with a higher level of autonomy.
What is not clarified, however, and remains a gap in the current framework, leaving all stakeholders in the dark, is the level of (clinical) validation that would be required for each level. What type of clinical data and performance would be acceptable for meeting regulatory requirements, e.g. for first reader tools? Questions that will come up: would this require the conduct of prospective, rather than retrospective studies? The current EU regulatory framework leaves these questions for a notified body to interpret, which might not ensure a level playing field in the market (considering the multitude of Notified Bodies).
In theory, it allows for notified bodies to set their own expectation standards, of which manufacturers may not even be aware. We believe clear standards (e.g. ISO standards) are required to be agreed upon by relevant parties (e.g. industry, clinical practice, governments). This also holds true for radiologists and other users to be able to understand and assess the output of an AI solution.
AI or humans, who should take the driver seat?
The regulatory framework is not the only aspect that requires consideration when responsibilities shift from radiologist to a software tool. The EU recently published ethics guidelines for trustworthy AI (4). These guidelines lay down 7 key high-level requirements that AI systems must meet to be deemed trustworthy. The first requirement relates to human agency and oversight, effectively meaning that AI systems should empower human beings to allow them to make informed decisions. Proper oversight mechanisms need to be in place, which can be ensured through having a human-in-the-loop, human-on-the-loop, or human-in-command approaches.
Transparency is another requirement addressed by the guidelines: “operators should be able to explain the decisions their AI systems make”. Ideally, this is embedded in the design of AI solutions. However, to explain how an image-based AI model comes to its conclusions is not trivial and will require some thought, if for some aspects possible at all.
Although high level, some of these requirements go hand-in-hand with needs from clinical practice in my point of view. The acceptance that first reader AI tools are on the horizon needs to be acknowledged by both the regulators and clinicians.
In conclusion, development and use of AI today is in its infancy. Changes to the medical environment will come but will take time. When humans start transferring parts of their responsibilities to AI-driven systems, control will be relevant both from regulators to ensure a level playing field for manufacturers exist, and from physicians to ensure that quality of care is guaranteed. These are interesting times with challenges ahead, with proper industry and regulator alignment, patients and physicians should be able to provide a high level of patient care.
This blog is limited to regulatory requirements around AI solutions and their role in clinical practice. However, there are many more aspects to consider when this technology evolves, for example, related to the need for patient consent, or how about complexities around continuous learning? Especially the latter leaves us with additional regulatory questions, which we will address in future blog posts.
 The bodies in the European Union accredited for granting CE marks to medical devices.
 In the US, the Food and Drug Administration (FDA) is responsible for the regulation of medical devices (510(k)
 2017/745 Annex VIII, Rule 11