Theories, Models and Data

In the modern world, a theory is a mathematical model, and a mathematical model is a theory. A theory described in words is not a theory, it is an explanation or an opinion.

The interesting thing about mathematical models is that they go far beyond data reproduction. A theoretical model of a biological structure or process may be entirely hypothetical, or it may use a certain amount of quantitative data from experiments, integrate it into a theoretical framework and ask questions that result from the combined model.

A Bayesian model in contrast is a purely data-driven construct which usually requires additional quantitative values (‘priors’) which have to be estimated. A dynamical model of metabolic or protein signaling processes in the cell assumes only a simple theoretical structure, kinetic rate equations, and then proceeds to fill the model with data (many estimated) and analyses the results. A neural network model takes a set of data and performs a statistical analysis to cluster the patterns for similarity, or to assign new patterns to previously established categories. Similarly, high-throughput or other proteomic data are usually analysed for outliers and variance with statistical significance with respect to a control data set. Graph analysis of large-scale datasets for a cell type, brain regions, neural connections etc. also aim to reproduce the dataset, to visualize it, and to provide quantitative and qualitative measures of the resulting natural graph.
All these methods primarily attempt to reproduce the data, and possibly make predictions concerning missing data or the behavior of a system that is created from the dataset.

Theoretical models can do more.

A theoretical model can introduce a hypothesis on how a biological system functions, or even, how it ought to function. It may not even need detailed experimental data, i.e. experiments and measurements, but it certainly needs observations and outcomes. It should be specific enough to spur new experiments in order to verify the hypothesis.
In contrast to Popper, a hypothetical model should not be easily falsifiable. If that were the case, it would probably be an uninteresting, highly specific model, for which experiments can be easily performed to falsify the model. A theoretical model should be general enough to explain many previous observations and open up possibilities for many new experiments, which support, modify and refine the model. The model may still be wrong, but at least it is interesting.
It should not be easy to decide which of several hypothetical models covers the complex biological reality best. But if we do not have models of this kind, and level of generality, we cannot guide our research towards progress in answering pressing needs in society, such as in medicine. We then have to work with old, outdated models and are condemned to accumulate larger and larger amounts of individual facts for which there is no use. Those facts form a continuum without a clear hierarchy, and they become quickly obsolete and repetitive, unless they are stored in machine-readable format, where they become part of data-driven analysis, no matter their quality and significance. In principle, such data can be accumulated and rediscovered by theoreticians which look for confirmation of a model. But they only have significance after the model exists.

Theories are created, they cannot be deduced from data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s