Analytical Data Rescue

Services Involved: 

Our client reorganized some of its R&D operations and as a consequence a decision was made to divest a preclinical research facility.

Review and Register

As part of the process, historical analytical data on all compounds synthesized and tested at the site was reviewed. It was determined that some of the data (both raw and processed) had already been registered in the corporate databases and only needed to be archived. Some analytical data however was not, and it had to be thoroughly reviewed, organized by compound and lot, and registered. It had to be registered in order to support future patent applications relevant to those compounds.

The amount of data was in the hundreds of gigabytes, spread across over half a million files in a somewhat chaotic deeply nested folder hierarchy. Saber Informatics was asked to help "make sense" of the analytical data, both raw files and processed reports.

Interview Scientists

We interviewed the scientists most familiar with the types of experiments and codified a set of rules for annotating individual files and their content. Then, using high-performance parsing algorithms we annotated each file with the information relevant to it according to the codified rules. A detailed virtual index was created for the entire dataset. Note that the annotation process was not fully automated: it required a re-run of computerized tasks multiple times in an iterative fashion to arrive at a complete and efficient data index.

Having the index allowed us to proceed to data registration using workflows already in place for other data. In addition, we set up a mini-website on the intranet where each compound was given a page with all the data files and relevant metadata linked to it. The website included a deep text search so any attributes such as a scientist's name could be used to look up relevant data.

Structure and Meaning over Quantity

Lessons learned: there was a substantial manual component to parsing the data because its structure and meaning needed to be understood in order to generate an efficient and complete index. The amount of data was of less importance than its structure.

Prevented Data Loss

Working collaboratively with the Client's scientists Saber Informatics helped prevent the loss of important analytical data that will be needed to support future patent applications.

About Us

Saber Informatics is a US data science consultancy founded in 2012.

Our focus is on pharmaceutical R&D, specifically data preparation for ML/AI initiatives.

  info@saberinformatics.com

Recent News

blair witch proj
published 2 months 3 weeks ago
mountain
published 2 years 7 months ago