
The Client's scientists and laboratory technicians run experiments using instruments at multiple locations in the US and overseas. Experiment result data collected from these instruments is automatically and immediately output into files (by the instruments themselves). The file format is set by each instrument vendor. The files are deposited (either automatically or manually, this varies by instrument) into predefined network folders. An automated software task monitors instrument data folders across the network and reads the data into a database.
In this project we designed and implemented 25 data file readers to process all instrument file formats. Each reader includes multiple data validation checks designed to spot errors in the data and instrument operator mistakes. All readers are hosted on a server and run unattended.
The challenge was to not only read so many different file formats with hierarchical data but to ensure that all newly generated files were parsed as soon as they appeared, network outages or many files appearing at once did not break the readers, and that rejected files generated automated notifications sent to the instrument owners for review. We set up a cascade of scheduled readers and loggers to queue and automatically process the 150 new files that were generated every business day by all the instruments at all lab locations.