What’s wrong with modeling?

A typical task in theoretical neuroscience is “modeling”, for instance, building and analysing network models of cerebellar cortex that incorporate the diversity of cell types and synaptic connections observed in this structure. The goal is to better understand the functions of these diverse cell types and connections. Approaches from statistical physics, nonlinear dynamics and machine learning can be used, and models should be “ constrained “ by electrophysiological, transcriptomic and connectomic data.

This sounds all very well and quite innocuous. It’s “state of the art”. So what is wrong with this approach that dominates current computational neuroscience?

What is wrong is the belief that a detailed, bottom-up model could fulfill the goal of understanding its function. This is only a way to synthesize existing information. Many, most aspects of the model are selected in advance, and much existing information is left out, since it is irrelevant for the model. Irrelevant for the model does not mean that it may not be crucial for the function of the real biological object, such as a neuron. For instance, for decades the fact that many neurons adapt in terms of their ion channels under behavioral learning was simply ignored. Then the notion of whole-neuron learning became acceptable, and under the term “intrinsic excitability” it slowly became part of computational modeling, and now various functions are being discovered, where those changes were first dismissed as “homeostatic” , i.e. only functions in terms of house-keeping were accepted.

If we started a top-down model in terms of what makes sense (what would evolution prefer), and what is logically required or useful for the neuron to operate, then we would have realized a long time ago that whole neuron (intrinsic) excitability is a crucial part of learning. This paper was first published on arxiv 2005, but accepted for publication only in 2013, when intrinsic excitability had become more widely known.

The next paradigm: AI, Neural Networks, the Symbolic Brain

I don’t think neural networks are still in their infancy, I think they are moribund. Some say, they hit a roadblock. AI used to be based on rules, symbols and logic. Then probability and statistics came along, and neural networks were created as an efficient statistical paradigm. But unfortunately, the old AI (GOFAI) was discontinued, it was replaced by statistics. Since then ever more powerful computers and ever more data came along, therefore statistics, neural networks (NN), machine learning seemed to create value, even though they were based on simple concepts from the late eighties, early nineties or even earlier. We failed to develop, success was easy. Some argue for hybrid models, combining GOFAI and NN. But that was tried early on, and it wasn’t very successful. What we now need is a new, and deeper understanding of what the brain actually does. Because it obviously does symbol manipulation, it does logic, it does math. Most importantly, we humans learned to speak, using nothing better than a mammalian brain (with a few specializations). I believe there is a new paradigm out there which can fulfill these needs: language, robotics, cognition, knowledge creation. I call it the vertical-horizontal model: a model of neuron interaction, where the neuron is a complete microprocessor of a very special kind. This allows to build a symbolic brain. There will be a trilogy of papers to describe this new paradigm, and a small company to build the necessary concepts. At the present time, here is a link to an early draft, hard to read, not a paper, more a collection right now, but ready for feedback at my email! I’ll soon post a summary here as well.

Learning in the Brain: Difference learning vs. Associative learning

The feedforward/feedback learning and interaction in the visual system has been analysed as a case of “predictive coding” , the “free energy principle” or “Bayesian perception”. The general principle is very simple, so I will call it “difference learning”. I believe that this is directly comparable (biology doesn’t invent, it re-invents) to what is happening at the cell membrane between external (membrane) and internal (signaling) parameters.

It is about difference modulation: an existing or quiet state, and then new signaling (at the membrane) or by perception (in the case of vision). Now the system has to adapt to the new input. The feedback connections transfer back the old categorization of the new input. This gets added to the perception so that a percept evolves which uses the old categorization together with the new input to achieve quickly an adequate categorization for any perceptual input. There will be a bias of course in favor of existing knowledge, but that makes sense in a behavioral context.

The same thing happens at the membrane. An input signal activates membrane receptors (parameters). The internal parameters – the control structure – transfers back the stored response to the external membrane parameters. And the signal generates a suitable neuronal response according to its effect on external (bottom-up) together with the internal control structure (top-down). The response is now biased in favor of an existing structure, but it also means all signals can quickly be interpreted.

If a signal overcomes a filter, new adaptation and learning of the parameters can happen.

The general principle is difference learning, adaptation on the basis of a difference between encoded information and a new input. This general principle underlies all membrane adaptation, whether at the synapse or the spine, or the dendrite, and all types of receptors, whether AMPA, GABA or GPCR.

We are used to believe that the general principle of neural plasticity is associative learning. This is an entirely different principle and merely derivative of difference learning in certain contexts. Associative learning as the basis of synaptic plasticity goes back more than a 100 years. The idea was that by exposure to different ideas or objects, the connection between them in the mind was strengthened. And it was then conjectured that two neurons (A and B) both of which are activated would strengthen their connection (from A to B). More precisely, as was later often found, A needed to fire earlier than B, in order to encode a sequential relation.

What would be predicted by difference learning? An existing connection would encode the strength of synaptic activation at that site. As long as the actual signal matches, there is no need for adaptation. If it becomes stronger, the synapse may acquire additional receptors by using its internal control structure. This control structure may have requirements about sequentiality. The control structure may also be updated to make the new strength permanent, a new set-point parameter. On the other hand, a weaker than memorized signal will ultimately lead the synapse to wither and die.

Similar outcomes, entirely different principles. Association is encoded by any synapse, and since membrane receptors are plastic, associative learning is a restricted derivative of difference learning.

Functional Principles of Neural Plasticity

It is really difficult to conceptualize things in a novel way – when one has been conditioned for decades to believe in the synaptic plasticity theory of memory.
I offer a new type of theory, first outlined in the vision statement linked above, which is called a horizontal-vertical integration theory. There could be several such theories. All such theories would agree that each neuron has internal and external parameters, and only external parameters influence their horizontal interactions with other neurons (mostly by electrophysiology). The exchange of information between the membrane (external) and internal zones is the vertical integration. The horizontal integration uses contact points (synapses) on the membrane, in highly plastic environments with spines as compartmentalized vertical integration sites.
In addition to being more adequate for the biological facts, this will be friendly for computation as well. Synaptic plasticity doesn’t have much stability or permanence and we can’t build structure. All information is at the membrane where it is constantly changed. That is a huge problem.
The vision statement above already contains a very specific theory. In the background it is understood that spatiotemporal spike timing patterns are the representations, which is where patterns, perceptions and thoughts reside. But those representations use the existing neurons with their own plasticity. Thus representations are not only signal-driven, they integrate perceptions with existing knowledge. And the neurons’ individual memories influence the patterns that result from ongoing sensations.
Ion exchange, especially calcium, is quite important for linking external (membrane) and internal (submembrane, cytosolic) parameters. It is quite apparent from the experimental literature that this is a major gateway, probably the fastest. There is also exchange between the core, nuclear parameters, and the internal parameters in the cytosol (including the spine). These are slower exchanges involving proteins like transcription factors. Noticeably we have a system with fast plasticity on the outside and slower, lasting plasticity distant from external signaling. Such a system offers both stability and fast reactivity, and is unique in its properties compared to existing computing architectures.
It will be more work to flesh this out.
Theories are not true, they are useful. Only the individual fact can be established as true. But theories need to anchor individual facts, and a horizontal-vertical integration theory has the potential to cover a large amount of what is known. At the same time, its computing abilities are fascinating. LTP/LTD has outlived its usefulness.

ai anesthesia Bayes cell assembly Cellular intelligence consciousness cortex cortical microcolumns critical period cybernetics dopamine electrophysiology ensembles epigenetics feature learning feedback heavy-tailed distributions hierarchy inhibition ion channels knowledge learning LTD LTP memory models modules mouse network neural coding neural plasticity neuron perception predictive coding processor psf system signal transduction spiking stimulation protocol synapses Synaptic Plasticity theories transfer functions Turing patterns vertical-horizontal

Two parts of the brain

Here is an image from the 10th art of neuroscience contest. It is striking in that it gives a view of the brain, where there are two major parts: the cerebrum and the cerebellum. Of course there are a lot of interesting structures hidden within the cerebrum, but still: Is this a von Neumann architecture, memory and processor? No, we already know that memory is an ubiquitous phenomenon and not localized to any brain structure (notwithstanding the function of the hippocampus). How about a CPU and a GPU? Closer, since the cerebellum is (a) structurally remarkably homogeneous, suitable to performing many routine operations (b) fast, by an order of magnitude faster than the cortex, (c) considered to perform ‘supportive’ functions in terms of smooth motor movements, complex perception, but also standard thought processes, always improving fast coordination and integration of information (d) also, as it mostly speeds up, improves and calibrates brain processes, an organism is able to survive without it  (e) contains very many processing units (50% of all neurons of the brain), even though its function is subsidiary to cortex, (f) has a very constrained feedback modality, a bottleneck in information transfer through deep cerebellar nuclei, even though it receives large amounts of information from the cortex. Although the cerebellum does not do graphical displays, it may “image” cortical information, i.e. represent it in a specific, simple format, and use this to guide motor movements, while providing only limit feedback about its performance. Food for thought.

Why a large cortex?

mouse

If we compare a small mouse cortex with a large human cortex, the connectivity per neuron is approximately the same (10^4/neuron SchuezPalm1989). So why did humans add so many neurons, and why did the connectivity remain constant? For the latter question we may conjecture that a maximal size is already reached in the mouse. Our superior human cognitive skills thus rest on the increased number of neurons in cortex, which means the number of modules (cortical microcolumns) went up, not the synaptic connectivity as such.

Epigenetics and memory

Epigenetic modification is a powerful mechanism for the induction, the expression and persistence of long-term memory.

For long-term memory, we need to consider diverse cellular processes. These occur in neurons from different brain regions (in particular hippocampus, cortex, amygdala) during memory consolidation and recall. For instance, long-term changes in kinase expression in the proteome, changes in receptor subunit composition and localization at synaptic/dendritic membranes, epigenetic modifications of chromatin such as DNA methylation and histone methylation in the nucleus, and the posttranslational modifications of histones, including phosphorylation and acetylation, all these play a role. Histone acetylation is of particular interest because a number of known medications exist, which function as histone deacetylase inhibitors (HDACs), i.e. have a potential to increase DNA transcription and memory (more on this in a later post).

Epigenetic changes are important because they define the internal conditions for plasticity for the individual neuron. They underlie for instance, kinase or phosphatase-mediated (de)activations of enzymatic proteins and therefore influence the probability of membrane proteins to become altered by synaptic activation.

Among epigenetic changes, DNA methylation typically acts to alter, often to repress, DNA transcription at cytosine, or CpG islands in vertebrates. DNA methylation is mediated by enzymes such as Tet3, which catalyses an important step in the demethylation of DNA. In dentate gyrus of live rats, it was shown that the expression of Tet3 is greatly increased by LTP – synaptically induced memorization – , suggesting that certain DNA stretches were demethylated [5], and presumably activated. During induction of LTP by high frequency electrical stimulation, DNA methylation is changed specifically for certain genes known for their role in neural plasticity [1]. The expression of neural plasticity genes is widely correlated with the methylation status of the corresponding DNA .

So there is interesting evidence for filtering the induction of plasticity via the epigenetic landscape and modifiers of gene expression, such as HDACs. Substances which act as histone deacetylase inhibitors (HDACs) increase histone acetylation. An interesting result from research on fear suggests that HDACs increase some DNA transcription, and enhance specifically fear extinction memories [2], [3],[4]. 

Transmission is not Adaptation

Current synaptic plasticity models have one decisive property which may not be biologically adequate, and which has important repercussions on the type of memory and learning algorithms in general that can be implemented: Each processing or transmission event is an adaptive learning event.

In contrast, in biology, there are many pathways that may act as filters from the use of a synapse to the adaptation of its strength. In LTP/LTD, typically 20 minutes are necessary to observe the effects. This requires the activation of intracellular pathways, often co-occurence of a GPCR activation, and even nuclear read-out.

Therefore we have suggested a different model, greatly simplified at first to test its algorithmic properties. We include intrinsic excitability in learning (LTP-IE, LTD-IE). The main innovation is that we separate learning or adaptation from processing or transmission. Transmission events leave traces at synapses and neurons that disappear over time (short-term plasticity), unless they add up over time to unusually high (low) neural activations, something that can be determined by threshold parameters. Only if a neuron engages in a high (low) activation-plasticity event we get long-term plasticity at both neurons and synapses, in a localized way. Such a model is in principle capable of operating in a sandpile fashion. We do not know yet what properties the model may exhibit. Certain hypotheses exist, concerning abstraction and compression of a sequence of related inputs, and the development of an individual knowledge.

Memory and the Volatility of Spines

Memory has a physical presence in the brain, but there are no elements which permanently code for it.

Memory is located – among other places – in dendritic spines. Spines are being increased during learning and they carry stimulus or task-specific information. Ablation of spines destroys this information (Hayashi-Takagi A2015). Astrocytes have filopodia which are also extended and retracted and make contact with neuronal synapses. The presence of memory in the spine fits to a neuron-centric view: Spine protrusion and retraction are guided by cellular programs. A strict causality such that x synaptic inputs cause a new spine is not necessarily true, as a matter of fact highly conditional principles of spine formation or dissolution could hold, where the internal state of the neuron and the neuron’s history matters. The rules for spine formation need not be identical to the rules for synapse formation and weight updating (which depend on at least two neurons making contact).

A spine needs to be there for a synapse to exist (in spiny neurons), but once it is there, clearly not all synapses are alike. They differ in the amount of AMPA presence and integration, and other receptors/ion channels as well. For instance, Sk-channels serve to block off a synapse from further change, and may be regarded as a form of overwrite protection. Therefore, the existence or lack of a spine is the first-order adaptation in a spiny neuron, the second-order adaptation involves the synapses themselves.

However, spines are also subject to high variability, on the order of several hours to a few days. Some elements may have very long persistence, months in the mouse, but they are few. MongilloGetal2017 point out the fragility of the synapse and the dendritic spine in pyramidal neurons and ask what this means for the physical basis of memory. Given what we know about neural networks, for memory to be permanent, is it necessary that the same spines remain? Learning allows to operate with many random elements, but memory has prima facie no need for volatility.

It is most likely that memory is a secondary, ’emergent’ property of volatile and highly adaptive structures. From this perspective it is sufficient to keep the information alive, among the information-carrying units, which will recreate it in some form.

The argument is that the information is redundantly coded. So if part of the coding is missing, the rest still carries enough information to inform the system, which recruits new parts to carry the information. The information is never lost, because not all synapses, spines, neurons are degraded at the same time, and because internal reentrant processing keeps the information alive and recreates new redundant parts at the same time as other parts are lost. It is a dynamic cycling of information. There are difficulties, if synapses are supposed to carry the whole information. The main difficulty is: if all patterns at all times are being stored in synaptic values, without undue interference, and with all the complex processes of memory, forgetting, retrieval, reconsolidation etc., can this be fitted to a situation, where the response to a simple visual stimulus already involves 30-40% of the cortical area where there is processing going on? I have no quantitative model for this. I think the model only works if we use all the multiple, redundant forms of plasticity that the neuron possesses: internal states, intrinsic properties, synaptic and morphological properties, axonal growth, presynaptic plasticity.

Theories, Models and Data

In the modern world, a theory is a mathematical model, and a mathematical model is a theory. A theory described in words is not a theory, it is an explanation or an opinion.

The interesting thing about mathematical models is that they go far beyond data reproduction. A theoretical model of a biological structure or process may be entirely hypothetical, or it may use a certain amount of quantitative data from experiments, integrate it into a theoretical framework and ask questions that result from the combined model.

A Bayesian model in contrast is a purely data-driven construct which usually requires additional quantitative values (‘priors’) which have to be estimated. A dynamical model of metabolic or protein signaling processes in the cell assumes only a simple theoretical structure, kinetic rate equations, and then proceeds to fill the model with data (many estimated) and analyses the results. A neural network model takes a set of data and performs a statistical analysis to cluster the patterns for similarity, or to assign new patterns to previously established categories. Similarly, high-throughput or other proteomic data are usually analysed for outliers and variance with statistical significance with respect to a control data set. Graph analysis of large-scale datasets for a cell type, brain regions, neural connections etc. also aim to reproduce the dataset, to visualize it, and to provide quantitative and qualitative measures of the resulting natural graph.
All these methods primarily attempt to reproduce the data, and possibly make predictions concerning missing data or the behavior of a system that is created from the dataset.

Theoretical models can do more.

A theoretical model can introduce a hypothesis on how a biological system functions, or even, how it ought to function. It may not even need detailed experimental data, i.e. experiments and measurements, but it certainly needs observations and outcomes. It should be specific enough to spur new experiments in order to verify the hypothesis.
In contrast to Popper, a hypothetical model should not be easily falsifiable. If that were the case, it would probably be an uninteresting, highly specific model, for which experiments can be easily performed to falsify the model. A theoretical model should be general enough to explain many previous observations and open up possibilities for many new experiments, which support, modify and refine the model. The model may still be wrong, but at least it is interesting.
It should not be easy to decide which of several hypothetical models covers the complex biological reality best. But if we do not have models of this kind, and level of generality, we cannot guide our research towards progress in answering pressing needs in society, such as in medicine. We then have to work with old, outdated models and are condemned to accumulate larger and larger amounts of individual facts for which there is no use. Those facts form a continuum without a clear hierarchy, and they become quickly obsolete and repetitive, unless they are stored in machine-readable format, where they become part of data-driven analysis, no matter their quality and significance. In principle, such data can be accumulated and rediscovered by theoreticians which look for confirmation of a model. But they only have significance after the model exists.

Theories are created, they cannot be deduced from data.