What’s wrong with modeling?

A typical task in theoretical neuroscience is “modeling”, for instance, building and analysing network models of cerebellar cortex that incorporate the diversity of cell types and synaptic connections observed in this structure. The goal is to better understand the functions of these diverse cell types and connections. Approaches from statistical physics, nonlinear dynamics and machine learning can be used, and models should be “ constrained “ by electrophysiological, transcriptomic and connectomic data.

This sounds all very well and quite innocuous. It’s “state of the art”. So what is wrong with this approach that dominates current computational neuroscience?

What is wrong is the belief that a detailed, bottom-up model could fulfill the goal of understanding its function. This is only a way to synthesize existing information. Many, most aspects of the model are selected in advance, and much existing information is left out, since it is irrelevant for the model. Irrelevant for the model does not mean that it may not be crucial for the function of the real biological object, such as a neuron. For instance, for decades the fact that many neurons adapt in terms of their ion channels under behavioral learning was simply ignored. Then the notion of whole-neuron learning became acceptable, and under the term “intrinsic excitability” it slowly became part of computational modeling, and now various functions are being discovered, where those changes were first dismissed as “homeostatic” , i.e. only functions in terms of house-keeping were accepted.

If we started a top-down model in terms of what makes sense (what would evolution prefer), and what is logically required or useful for the neuron to operate, then we would have realized a long time ago that whole neuron (intrinsic) excitability is a crucial part of learning. This paper was first published on arxiv 2005, but accepted for publication only in 2013, when intrinsic excitability had become more widely known.

The next paradigm: AI, Neural Networks, the Symbolic Brain

I don’t think neural networks are still in their infancy, I think they are moribund. Some say, they hit a roadblock. AI used to be based on rules, symbols and logic. Then probability and statistics came along, and neural networks were created as an efficient statistical paradigm. But unfortunately, the old AI (GOFAI) was discontinued, it was replaced by statistics. Since then ever more powerful computers and ever more data came along, therefore statistics, neural networks (NN), machine learning seemed to create value, even though they were based on simple concepts from the late eighties, early nineties or even earlier. We failed to develop, success was easy. Some argue for hybrid models, combining GOFAI and NN. But that was tried early on, and it wasn’t very successful. What we now need is a new, and deeper understanding of what the brain actually does. Because it obviously does symbol manipulation, it does logic, it does math. Most importantly, we humans learned to speak, using nothing better than a mammalian brain (with a few specializations). I believe there is a new paradigm out there which can fulfill these needs: language, robotics, cognition, knowledge creation. I call it the vertical-horizontal model: a model of neuron interaction, where the neuron is a complete microprocessor of a very special kind. This allows to build a symbolic brain. There will be a trilogy of papers to describe this new paradigm, and a small company to build the necessary concepts. At the present time, here is a link to an early draft, hard to read, not a paper, more a collection right now, but ready for feedback at my email! I’ll soon post a summary here as well.

Heavy-tailed distributions and hierarchical cell assemblies

In earlier work, we meticulously documented the distribution of synaptic weights and the gain (or activation function) in many different brain areas. We found a remarkable consistency of heavy-tailed, specifically lognormal, distributions for firing rates, synaptic weights and gains (Scheler2017).

Why are biological neural networks heavy-tailed (lognormal)?

Cell assemblies: Lognormal networks support models of a hierarchically organized cell assembly (ensembles). Individual neurons can activate or suppress a whole cell assembly if they are the strongest neuron or directly connect to the strongest neurons (TeramaeJetal2012).
Storage: Sparse strong synapses store stable information and provide a backbone of information processing. More frequent weak synapses are more flexible and add changeable detail to the backbone. Heavy-tailed distributions allow a hierarchy of stability and importance.
Time delay of activation is reduced because strong synapses activate quickly a whole assembly (IyerRetal2013). This reduces the initial response time, which is dependent on the synaptic and intrinsic distribution. Heavy-tailed distributions activate fastest.
Noise response: Under additional input, noise or patterned, the pattern stability of the existing ensemble is higher (IyerRetal2013, see also KirstCetal2016). This is a side effect of integration of all computations within a cell assembly.

Why hierarchical computations in a neural network?

Calculations which depend on interactions between many discrete points (N-body problems, Barnes and Hut 1986), such as particle-particle methods, where every point depends on all others, lead to an O(N^2) calculation. If we supplant this by hierarchical methods, and combine information from multiple points, we can reduce the computational complexity to O(N log N) or O(N).

Since biological neural networks are not feedforward but connect in both forward and backward directions, they have a different structure from ANNs (artificial neural networks) – they consist of hierarchically organised ensembles with few wide-range excitability ‘hub’ neurons and many ‘leaf’ neurons with low connectivity and small-range excitability. Patterns are stored in these ensembles, and get accessed by a fit to an incoming pattern that could be expressed by low mutual information as a measure of similarity. Patterns are modified by similar access patterns, but typically only in their weak connections (else the accessing pattern would not fit).

Ion channel expression is not regulated by spiking behavior

An important topic to understand intrinsic excitability is the distribution and activation of ion channels. In this respect the co-regulations between ion channels are of significant interest. MacLean et al. (2003) could show that overexpression of an A-type potassium channel by shal-RNA-injection in neurons of the stomatogastric ganglion of the lobster is compensated by upregulation of Ih such that the spiking behavior remained unaltered.

A non functional shal-mutant whose overexpression did not affect spiking had the same effect, which shows that the regulation does not happen at the site of the membrane, by measuring the spiking behavior. In this case, Ih was upregulated, even though IA activity was unaltered, and spiking behavior was increased. (This is in contrast to e.g. O’Leary et al., 2013, who assume homeostatic regulation of ion channel expression at the membrane, by spiking behavior.)

In drosophila-motoneurons the expression of shal and shaker – both responsible for IA – is reciprocally coupled. If one is reduced, the other is upregulated to a constant level of IA activity at the membrane. Other ion channels, like (INAp and IM) are again antagonistic, which means they correlate positively: if one is reduced, the other is reduced as well to achieve the same level of effect (Golowasch2014). There are a number of publications which have all documented similar effects, e.g. (MacLean et al., 2005, Schulz et al., 2007; Tobin et al., 2009; O’Leary et al., 2013).

We must assume that the expression level of ion channels is regulated and sensed inside the cell and that the levels of genes for different ion channels are coupled – by genetic regulation or on the level of RNA regulation.

To summarize: When there is high IA expression, Ih is also upregulated. When one gene responsible for IA is suppressed, the other gene is more highly expressed, to achieve the same level of IA expression. When (INap), a permanent sodium channel, is reduced, (IM), a potassium channel, is also reduced.

It is important to note that these ion channels may compensate for each other in terms of overall spiking behavior, but they have subtly different properties of activation, e.g. by the pattern of spiking or by neuromodulation. For instance, if cell A reduces ion channel currents like INap and IM, compensating to achieve the same spiking behavior, once we apply neuromodulation to muscarinic receptors on A, this will affect IM, but not INap. The behavior of cell A, crudely the same, is now altered under certain conditions.

To model this – other than by a full internal cell model – requires internal state variables which guide ion channel expression, and therefore regulate intrinsic excitability. These variables would model ion channel proteins and their respective interaction, and in this way guarantee acceptable spiking behavior of the cell. This could lead to the idea of an internal module which sets the parameters necessary for the neuron to function. Such an internal module that self-organizes its state variables according to specified objective functions could greatly simplify systems design. Instead of tuning systems parameters by outside methods – which is necessary for ion-channel based models – each neuronal unit itself would be responsible for its ion channels and be able to self-tune them separately from the whole system.

Linked to the idea of internal state variables is the idea of internal memory, which I have referred to several times in this blog. If I have an internal module of co-regulated variables, which set external parameters for each unit, then this module may serve as a latent memory for variables which are not expressed at the membrane at the present time (s. Er81). The time course of expression and activation at the membrane and of internal co-regulation need not be the same. This offers an opportunity for memory inside the cell, separated from information processing within a network of neurons.

Memory and the Volatility of Spines

Memory has a physical presence in the brain, but there are no elements which permanently code for it.

Memory is located – among other places – in dendritic spines. Spines are being increased during learning and they carry stimulus or task-specific information. Ablation of spines destroys this information (Hayashi-Takagi A2015). Astrocytes have filopodia which are also extended and retracted and make contact with neuronal synapses. The presence of memory in the spine fits to a neuron-centric view: Spine protrusion and retraction are guided by cellular programs. A strict causality such that x synaptic inputs cause a new spine is not necessarily true, as a matter of fact highly conditional principles of spine formation or dissolution could hold, where the internal state of the neuron and the neuron’s history matters. The rules for spine formation need not be identical to the rules for synapse formation and weight updating (which depend on at least two neurons making contact).

A spine needs to be there for a synapse to exist (in spiny neurons), but once it is there, clearly not all synapses are alike. They differ in the amount of AMPA presence and integration, and other receptors/ion channels as well. For instance, Sk-channels serve to block off a synapse from further change, and may be regarded as a form of overwrite protection. Therefore, the existence or lack of a spine is the first-order adaptation in a spiny neuron, the second-order adaptation involves the synapses themselves.

However, spines are also subject to high variability, on the order of several hours to a few days. Some elements may have very long persistence, months in the mouse, but they are few. MongilloGetal2017 point out the fragility of the synapse and the dendritic spine in pyramidal neurons and ask what this means for the physical basis of memory. Given what we know about neural networks, for memory to be permanent, is it necessary that the same spines remain? Learning allows to operate with many random elements, but memory has prima facie no need for volatility.

It is most likely that memory is a secondary, ’emergent’ property of volatile and highly adaptive structures. From this perspective it is sufficient to keep the information alive, among the information-carrying units, which will recreate it in some form.

The argument is that the information is redundantly coded. So if part of the coding is missing, the rest still carries enough information to inform the system, which recruits new parts to carry the information. The information is never lost, because not all synapses, spines, neurons are degraded at the same time, and because internal reentrant processing keeps the information alive and recreates new redundant parts at the same time as other parts are lost. It is a dynamic cycling of information. There are difficulties, if synapses are supposed to carry the whole information. The main difficulty is: if all patterns at all times are being stored in synaptic values, without undue interference, and with all the complex processes of memory, forgetting, retrieval, reconsolidation etc., can this be fitted to a situation, where the response to a simple visual stimulus already involves 30-40% of the cortical area where there is processing going on? I have no quantitative model for this. I think the model only works if we use all the multiple, redundant forms of plasticity that the neuron possesses: internal states, intrinsic properties, synaptic and morphological properties, axonal growth, presynaptic plasticity.

Theories, Models and Data

In the modern world, a theory is a mathematical model, and a mathematical model is a theory. A theory described in words is not a theory, it is an explanation or an opinion.

The interesting thing about mathematical models is that they go far beyond data reproduction. A theoretical model of a biological structure or process may be entirely hypothetical, or it may use a certain amount of quantitative data from experiments, integrate it into a theoretical framework and ask questions that result from the combined model.

A Bayesian model in contrast is a purely data-driven construct which usually requires additional quantitative values (‘priors’) which have to be estimated. A dynamical model of metabolic or protein signaling processes in the cell assumes only a simple theoretical structure, kinetic rate equations, and then proceeds to fill the model with data (many estimated) and analyses the results. A neural network model takes a set of data and performs a statistical analysis to cluster the patterns for similarity, or to assign new patterns to previously established categories. Similarly, high-throughput or other proteomic data are usually analysed for outliers and variance with statistical significance with respect to a control data set. Graph analysis of large-scale datasets for a cell type, brain regions, neural connections etc. also aim to reproduce the dataset, to visualize it, and to provide quantitative and qualitative measures of the resulting natural graph.
All these methods primarily attempt to reproduce the data, and possibly make predictions concerning missing data or the behavior of a system that is created from the dataset.

Theoretical models can do more.

A theoretical model can introduce a hypothesis on how a biological system functions, or even, how it ought to function. It may not even need detailed experimental data, i.e. experiments and measurements, but it certainly needs observations and outcomes. It should be specific enough to spur new experiments in order to verify the hypothesis.
In contrast to Popper, a hypothetical model should not be easily falsifiable. If that were the case, it would probably be an uninteresting, highly specific model, for which experiments can be easily performed to falsify the model. A theoretical model should be general enough to explain many previous observations and open up possibilities for many new experiments, which support, modify and refine the model. The model may still be wrong, but at least it is interesting.
It should not be easy to decide which of several hypothetical models covers the complex biological reality best. But if we do not have models of this kind, and level of generality, we cannot guide our research towards progress in answering pressing needs in society, such as in medicine. We then have to work with old, outdated models and are condemned to accumulate larger and larger amounts of individual facts for which there is no use. Those facts form a continuum without a clear hierarchy, and they become quickly obsolete and repetitive, unless they are stored in machine-readable format, where they become part of data-driven analysis, no matter their quality and significance. In principle, such data can be accumulated and rediscovered by theoreticians which look for confirmation of a model. But they only have significance after the model exists.

Theories are created, they cannot be deduced from data.