Status View dashboards for data workflow managers

online catalog
Services Involved: 

In the past year we have been engaged in several projects with the same requirement to make a live "status view" or "catalog" that would always contain up-to-date information about any files processed in a data flow, any data records that were corrected, etc.

Complex data flows need oversight.

When a (typically complex) data flow runs nightly and ingests thousands of data records in every run, manual oversight becomes a burden. When inevitable questions about data provenance are asked, it would take hours to trace and find answers. For example, "for study A, do we have everything now to start a review" or "exactly which files with subject X data did we run last week because we'll need to un-load and reload newer vendor files for it?" It is not unusual to see a data scientist swamped with requests like that.

Reusable templates.

We created a reusable software pack - entirely with open-source software, to quickly build "status view" intranet mini-sites. It is designed for environments where requirements keep changing, which makes all the difference to our end users who get a professional-quality product.

Precise functionality.

We start from a site template (a 50 Mb zip file), configure, add project-specific metadata, done. The resulting mini-site is lean and precise: it does exactly what is needed, and nothing more. It takes a single command to start or stop it. Once the site is running, a data flow can add or delete content via a standard web services API. We successfully used Pipeline Pilot to call the API and keep "status view" content up to date.

Use cases.

In one of the projects we were engaged by a Client company to catalog all analytical reports for all compounds synthesized and tested at one site over a period of several years. That data has been contained in several hundred thousand files in a somewhat chaotic deeply nested folder hierarchy. We set up a mini-website on the intranet where each compound was given a page with all the data files and relevant metadata linked to it. The website (which contained many thousands of automatically created pages) included a deep text search so any attributes such as a scientist's name could be used to look up relevant data.

In another project for another Client, we set up an automatically refreshed "status view" mini-site for a data workflow that processes incoming CRO data files. During processing, the workflow cross-checks patient metadata in the files against internal Client databases and generates detailed log statistics. Processed data is organized by patient, study, and other parameters. After each nightly run the workflow algorithm refreshes the "status view" mini-site with the most recent statistics so business users can see which subjects were processed, which studies, and exactly how many records came from which file. The site contains roughly a thousand interlinked pages, one per entity such as a data file or study.

We can work together.

   Call us to discuss whether Saber Informatics can help in your projects. Try us - we deliver verifiable results.

About Us

Saber Informatics is a US data science consultancy founded in 2012.

Our focus is on pharmaceutical R&D, specifically data preparation for ML/AI initiatives.

  info@saberinformatics.com

Recent News

blair witch proj
published 4 months 3 weeks ago
mountain
published 2 years 9 months ago