
Scientists at a global pharma company site here in Massachusetts used a sequence analytics toolkit that came with a business acquisition a decade ago. The original authors of the code have long moved on and eventually even the IT infrastructure on which the algorithms used to run was about to be replaced.
Modernize an aging code without disrupting production work
Saber Informatics was asked to modernize the sequence analysis toolkit to bring it up to date with current compute standards and to make it fit with today's research infrastructure at the company. The problem was, there was very limited documentation and no source code. We had to reconstruct the entire toolkit with all its methods from a binary R Stats package. Luckily, such packages can be reverse-engineered to extract most of their source code in a human-readable format. Our target was Python and - due to the nature of sequence analytics, its high-level computational biology routines. Unsurprisingly, the most difficult task in the entire project was to make sure that the Python version outputs exactly the same results as its legacy version. This is where Saber Informatics' accumulated experience in data QC came handy.
QC is key to success in computational methods
We were able to successfully reconstruct every step in the calculations using Python instead of R. Working with the test sets provided by our client, at each intermediate step we verified the output of the original toolkit against its updated Python version. One complicating factor was the sheer size of the data (sequences). Using our proprietary techniques and experience we were able to approach the problem with step-wise benchmarking / unit tests and then scaled up. In addition to a working toolkit we delivered complete source code (in both R and Python) and documentation to the client scientists.
Our client scientists now have an up-to-date computational toolkit that they can themselves update in the future. Having documented and run every step as a script, we ensured the quality of output and therefore the reliability of our client's calculations in the future.
Contact us (sales_at_saberinformatics_dot_com) or call us to discuss in confidence the challenges your organization is facing and how we can address them together.