Embracing random walks in machine learning

mountain

Anyone developing predictive models with Keras with a Tensorflow backend will sooner or later want to demonstrate them - to their research group, company colleagues, customers. It is a chance to get your efforts recognized, prove a hypothesis, and move research forward. Sounds great, right?

But there awaits a surprise. It's nearly impossible to rebuild the same model again. Like a cloud of mist, you never get exactly the same trained model weights twice. They vary quite a bit whenever anyone attempts to reproduce your calculation, undermining trust in the model. Thousands of people tried to find a way to reliably reproduce the training process. The commonly suggested solution requires one to fix random seeds.

When I researched various ways of fixing the seeds, the problem gradually started to look very familiar. I have seen this before! Fifteen years ago, Miklos Feher (the scientist, not the soccer player of the same name) and Chris Williams (then my boss at CCG) spent many hours trying to come to grips with non-reproducibility (http://dx.doi.org/10.1007/s10822-007-9154-7) of conformation minimization and molecular docking runs.

Miklos and Chris did not look for an immediate solution that would allow them to finish a specific calculation, they - being scientists - instead wanted to understand the problem in depth. They found that even when randomness is eliminated from the algorithm itself, it still is present in atomic xyz coordinate initialization from reading files and adding hydrogens. Since the optimization hypersurface is complex with many minima, it is extremely sensitive to the slightest differences in starting coordinates. It is a bit like being strapped to a snowboard sliding sideways on a sharp rail - a gust of wind will send you down either side.

The randomness inherent in training Keras model weights only highlights the already present dimensionality of the problem. Fixing a seed won't solve the problem but only provide an illusion of a solution. Changes in parameters create a new hypersurface. A single walk on it is insufficient. By analogy, why tennis matches consist of multiple sets, why card games are played multiple times before the best player can be determined, why archers are given multiple tries, etcetera. To create an ensemble of runs in order to negate the inherent randomness. To embrace it.

Returning to Keras reproducibility, it would seem that trying to create an ensemble of models is the more reliable solution, similar to how it's done during training in sports. Do the same jump or a ball shot multiple times from the same position. This would require building a Keras model several times on the same training data with identical parameters, and then ensembling these into one that fully shows the effect of chosen hyperparameters and layer structure. More complex/bigger models would require more replicates.

About Us

Saber Informatics is a US data science consultancy founded in 2012.

Our focus is on pharmaceutical R&D, specifically data preparation for ML/AI initiatives.

  info@saberinformatics.com

Recent News

blair witch proj
published 4 months 3 weeks ago
mountain
published 2 years 9 months ago