Clinical assay report consolidation

blood test results
Services Involved: 

Tissue sample measurements submitted by multiple external partners for the same panel of clinical assays come in as tabular report files with an inconsistent file structure: layout, headers, data column names. In this project Saber Informatics was asked to set up an automated data consolidation pipeline so that hundreds of reports that are regularly submitted by a dozen different external partners get translated and consolidated into a single data table with minimal or no manual intervention.

Unintentional creativity in data reporting causes problems

In an ideal world all partners would use exactly the same locked report template. In practice however, that is not always possible to achieve. Assay parameters may change, assays within the panel may change often. Distributing an updated locked template would not be practical. Sometimes partner scientists interpret the guidelines provided as just that, guidelines, and make slight modifications dictated by their workflows to make reporting easier. 

Reports would contain an extra header row or two, merged cells, different (synonymous to what the sponsor uses) column names. Unfortunately these slight changes to the recommended format do not themselves stay constant, slowly shifting over time. To a human reader all of that is barely noticeable because individual reports make perfect sense. Only when the number of reports increases and automated processing is introduced these issues aggregate into critical data quality problems.

Automated data pipelining saves valuable resources

Our Client requested assistance with setting up an automated data pipeline to translate and consolidate incoming clinical assay reports of tissue sample measurements. Saber Informatics collaboratively with local subject matter experts designed a mapping table to map data column names used by vendors to those used by the Client. We then built a data pipeline to scan a file share for new report files, parse them, log data warnings or errors, send automated notifications to SMEs, and finally to consolidate the data into a single table per run. The pipeline runs nightly.

As a result of this project there is measurably (and substantially) more data that is processed correctly into the final consolidated report in a timely fashion. SMEs no longer need to process any reports manually. When a run notification contains warnings they update the mapping and restart a run to re-generate the consolidated report. As an added benefit any other issues get automatically logged and brought to the reviewers' attention as well. Automated mapping reduces errors, logs issues, measurably improves data quality and frees up SMEs to do more valuable tasks.

About Us

Saber Informatics is a US data science consultancy founded in 2012.

Our focus is on pharmaceutical R&D, specifically data preparation for ML/AI initiatives.

  info@saberinformatics.com

Recent News

blair witch proj
published 2 months 4 weeks ago
mountain
published 2 years 7 months ago