Clinical Algorithm Development2.5 years

How we refactored and validated a clinical algorithm stack for PKG Health (now Empatica).

A 2.5-year engagement to make certified movement disorder algorithms maintainable, portable, and deployable on five wearable platforms — without invalidating the FDA 510(k) clearance.

A certified algorithm stack that no one could touch.

PKG Health had built something genuinely valuable: a suite of algorithms for monitoring bradykinesia, dyskinesia, and tremor in Parkinson's disease patients, validated through clinical studies and cleared by the FDA under 510(k) certification. The algorithms were running in production. They were generating clinical data for real patients.

The problem was the code they ran on. The algorithm implementations were spread across legacy codebases in C/C++, Julia, TCL, and Java — each written by different teams at different times, with different conventions, minimal documentation, and no unified architecture. Nobody on the current team fully understood all of it.

The FDA certification was tied to the existing implementation. Any change to the algorithm code — even a bug fix, even a refactoring that produced identical outputs — had to be formally assessed against the regulatory documentation before it could be deployed. In practice, this meant the algorithms were frozen. PKG could not add new hardware support, extend the algorithms to new device platforms, or fix known issues without triggering a regulatory review process whose outcome was uncertain.

That was the situation when Devsort engaged.

Refactor one algorithm at a time. Match the output exactly. Document everything.

Our methodology was defined by a single constraint: the refactored code had to produce outputs numerically equivalent to the certified original on every clinical dataset. Not approximately equivalent. Not equivalent on average. Equivalent at each data point, within a documented tolerance, for every input the algorithm would encounter in clinical use.

We worked one algorithm at a time. For each one, we began by reverse-engineering the existing implementation: reading the code, understanding the computation it performed, documenting its behaviour in language that a clinical reviewer — not just a developer — could follow. This documentation step was not optional. It was how we confirmed that we understood what we were about to reimplement.

We reimplemented each algorithm in Python, with full documentation and a test framework that ran both the original and the new implementation on the same clinical datasets and compared every output. Any deviation from the expected output was investigated at the code level. If the deviation was within the documented clinical tolerance, it was recorded and accepted with rationale. If it was not, the code was corrected before moving forward.

The algorithms we refactored include the bradykinesia score and its associated statistics, the dyskinesia score and statistics, the tremor score and statistics, the off-wrist detection model, sleep scores, and early morning bradykinesia severity.

The tremor algorithm required particular attention. The original implementation took approximately five minutes to process a single recording. We reduced this to thirty seconds — a tenfold improvement — while maintaining full numerical equivalence with the validated output. This was achieved through algorithmic restructuring rather than approximation: the same computation, organised to minimise redundant work.

Algorithms refactored and validated

30s

Tremor processing time (down from 5 min)

>90%

Off-wrist detection accuracy

Regulatory standards: FDA, CE, TGA

Extending to five device platforms.

Validating the refactored algorithms in Python was step one. Step two was making them hardware-agnostic — able to run on data from any wearable device, not just the platform the original validation studies used.

PKG Health wanted the algorithm suite to run on Apple Watch, Samsung Galaxy Watch, Sony SmartWatch, Empatica E4, and ActiGraph devices. Five fundamentally different accelerometers, each with different axis orientation conventions, different sampling rates, different filter characteristics baked into the hardware, and different raw data output formats.

We built a preprocessing pipeline that normalises these differences. The pipeline detects the device type from the data file, applies the appropriate transformation — axis remapping, unit conversion, resampling to the algorithm's expected input frequency — and outputs a standardised signal that the clinical algorithms treat identically regardless of source device. Every transformation was validated against clinical datasets to confirm that algorithm outputs were equivalent across platforms.

A subset of the validated Python algorithms was then reimplemented in embedded C for deployment on the Empatica wrist device itself — enabling on-device processing with the same numerical behaviour as the cloud-based Python implementation, validated on hardware.

What the work made possible.

Before the engagement: the algorithm stack was hardware-locked to its original platform, implemented in four legacy languages, undocumented, and effectively unmaintainable. Any change required a regulatory assessment whose outcome and timeline were uncertain.

After the engagement: the same clinical algorithms ran in production-quality Python, fully documented, with a test framework that could validate any future change against the certified reference. They ran on five device platforms. A subset ran on the Empatica device hardware. The validation documentation for each algorithm and each platform transition was complete and structured for regulatory submission.

PKG Health was acquired by Empatica during the engagement period. The work Devsort had done — expanding the algorithm suite's reach to new device partners, making the codebase maintainable and documented — expanded the technical capability of the portfolio that changed hands. We do not claim to have caused the acquisition. We note that the work made the platform significantly more valuable as a technical asset.

The engagement ran for 2.5 years, covering six algorithms across five device platforms, under FDA 510(k), CE, and TGA regulatory frameworks throughout.

Working on a similar problem?

If your clinical algorithm system has accumulated complexity, lacks documentation, or needs to be validated across new hardware — we have done this work before. Tell us about your project.

Start a conversation See our services