Collate preclinical PK/PD rodent study data

Services Involved: 

The Client had several hundred preclinical safety studies done on rodents in the span of several years. The results (tissue gene expression data) were captured into Excel files, categorized by toxicity endpoint. Each study contained multiple Excel files (one file per endpoint, one worksheet per several compounds in each file). The Client needed to extract and clean up the data to build a database for a predictive modeling initiative.

An extensive internal review showed that data capture by scientists was consistent within teams for certain time periods, but not consistent between teams. The data files therefore could not be just imported into a database because they were in a variety of formats, with minor inconsistencies within file collections in each format.

We realized that by coming up with a programming logic strategy we could collate data sets with similar/modified experimental designs in a short period of time. Multiple data parsers, each including validation steps, were designed and developed. Approximately eighty percent of the files were parsed in a fully automated mode. The next fifteen percent required small parser modifications and iterative development to account for minor template variations and missing data sections. The last five percent of the entire file collection were shown to contain data entry errors and these were corrected by a pharmacologist before finally being processed by the automated data reader.

About Us

Saber Informatics is a US data science consultancy founded in 2012.

Our focus is on pharmaceutical R&D, specifically data preparation for ML/AI initiatives.

  info@saberinformatics.com

Recent News

blair witch proj
published 2 months 4 weeks ago
mountain
published 2 years 7 months ago