What is software testing and QA for medical devices?
Software testing and quality assurance for medical devices is the discipline of systematically verifying that software behaves correctly across the full range of inputs it will encounter in clinical use, and producing the documented evidence that a regulatory submission requires. Unlike consumer software testing, which focuses on finding bugs before release, medical device software testing is a formal process tied to a documented specification: each test case has a defined expected output, and every deviation is investigated and either accepted within tolerance or resolved before the software is considered validated. IEC 62304, the international standard for medical device software lifecycle processes, defines the verification activities required at unit, integration, and system level. The output is not just a passing test suite but a traceable record linking each requirement to the test that verifies it and the result that confirms it. Devsort has produced IEC 62304 test documentation as part of the PKG Health clinical algorithm validation programme.
How does clinical software testing differ from standard QA?
Standard QA finds defects. Clinical software QA produces evidence. The distinction matters because the test record is a regulatory artefact, not just an internal quality signal. A test suite that passes is necessary but not sufficient — the suite must be complete (covering the full input specification), the methodology must be documented, and the record must be traceable to the requirement that each test verifies.
This changes what a test plan looks like. Test cases are derived from the software requirements specification, not from exploratory knowledge of the system. Edge cases and boundary conditions are identified from the clinical input specification — the formal definition of the inputs the device will encounter in real use — not from developer intuition. And deviation handling is formal: any test case that does not produce the expected output is investigated at the code level and either accepted with documented rationale or corrected before the record is closed.
Devsort's algorithm validation work for PKG Health was exactly this kind of testing. Every algorithm output was compared against the validated reference on clinical datasets, every deviation was investigated, and the complete record formed the evidence base for regulatory change submissions.
How do we structure a testing and QA engagement?
- 1
Review the specification and define test scope
We begin by reviewing the software requirements specification and identifying the input domain — the full range of inputs the software must handle. We produce a test plan that maps each requirement to the test cases that verify it, identifies the datasets or input generators needed, and specifies the expected outputs and the tolerance criteria for comparison.
- 2
Build the test suite
We implement test cases at the appropriate level — unit tests for individual algorithm components, integration tests for the full processing pipeline, and system tests on the target hardware or deployment environment. For clinical algorithm testing, this typically means building a comparison framework that runs both the reference and target implementation on clinical datasets and computes the deviation at each output.
- 3
Execute and record
We run the full test suite and produce the execution record: inputs used, outputs observed, expected outputs, comparison methodology, and the disposition of every deviation. The record is structured to be directly usable as a regulatory submission artefact, not requiring reformatting or supplementary explanation by the regulatory affairs team.
- 4
Deliver the validation package
We deliver the complete validation package: test plan, test case specifications, execution records, deviation log, and an equivalence or validation conclusion. We support integration into your quality management system and remain available to answer technical questions from regulatory reviewers during the submission process.
Algorithm validation testing: the PKG Health programme
The PKG Health algorithm refactoring programme required systematic testing of six clinical algorithms — bradykinesia, dyskinesia, tremor, off-wrist detection, sleep, and walking patterns — each ported from legacy implementations in C/C++, Julia, and TCL to a validated Python implementation.
For each algorithm, we built a comparison framework that ran both the original and the refactored implementation on the same clinical datasets, computed the output deviation at every point in the time series, and logged every deviation that exceeded the predefined tolerance. Deviations were investigated at the code level, traced to their cause, and either corrected or accepted within the documented clinical tolerance.
The complete test record for each algorithm formed the validation evidence submitted as part of the FDA 510(k), CE, and TGA regulatory change assessments for the PKG monitoring system.
Frequently asked questions
What does IEC 62304 require for software testing?
IEC 62304 requires that software verification activities be planned, executed, and documented for each software safety class. For Class B and Class C software, this includes unit testing of software units, integration testing of software items, and system testing of the complete software. Each test must have a defined expected result, and deviations must be investigated and resolved before the verification activity is closed. The output is a set of test records that form part of the software lifecycle documentation submitted to regulators.
Can you build automated test suites for clinical software?
Yes. We build automated test suites in Python (pytest) for algorithm and pipeline testing, and in the appropriate framework for the target platform (Jest for TypeScript, XCTest for Swift, JUnit for Kotlin). For clinical algorithm validation, automated test suites are essential: they allow the full test suite to be re-run against any new implementation or any new dataset without manual effort, and the execution log is automatically produced in a format suitable for the validation record.
Do you test embedded firmware?
Yes. Embedded firmware testing involves both host-side testing of algorithm components against the reference implementation, and hardware-in-the-loop testing of the full firmware on the actual target device. Host-side testing validates the numerical output of the ported algorithm. Hardware-in-the-loop testing validates that real-time execution, memory layout, and timing constraints are met on the physical hardware. Both levels produce comparison records that form the embedded validation evidence.
Can you test software that we built internally?
Yes. We work as an independent QA function on software built by client teams, providing the external verification that a regulatory submission often requires to demonstrate independence. We review the existing specification, build the test suite against it, execute it, and produce the record. We work within your QMS and produce documentation in the format your regulatory affairs team requires.
How do you handle proprietary algorithm IP during testing?
We work under NDA before reviewing any algorithm implementation. Our test records document inputs, expected outputs, observed outputs, and deviations — not the internal logic of the algorithm. The record demonstrates that the system behaves correctly; it does not expose the mechanism by which it does so. This is consistent with the approach we take in all clinical algorithm engagements.