What’s wrong with modeling?

A typical task in theoretical neuroscience is “modeling”, for instance, building and analysing network models of cerebellar cortex that incorporate the diversity of cell types and synaptic connections observed in this structure. The goal is to better understand the functions of these diverse cell types and connections. Approaches from statistical physics, nonlinear dynamics and machine learning can be used, and models should be “ constrained “ by electrophysiological, transcriptomic and connectomic data.

This sounds all very well and quite innocuous. It’s “state of the art”. So what is wrong with this approach that dominates current computational neuroscience?

What is wrong is the belief that a detailed, bottom-up model could fulfill the goal of understanding its function. This is only a way to synthesize existing information. Many, most aspects of the model are selected in advance, and much existing information is left out, since it is irrelevant for the model. Irrelevant for the model does not mean that it may not be crucial for the function of the real biological object, such as a neuron. For instance, for decades the fact that many neurons adapt in terms of their ion channels under behavioral learning was simply ignored. Then the notion of whole-neuron learning became acceptable, and under the term “intrinsic excitability” it slowly became part of computational modeling, and now various functions are being discovered, where those changes were first dismissed as “homeostatic” , i.e. only functions in terms of house-keeping were accepted.

If we started a top-down model in terms of what makes sense (what would evolution prefer), and what is logically required or useful for the neuron to operate, then we would have realized a long time ago that whole neuron (intrinsic) excitability is a crucial part of learning. This paper was first published on arxiv 2005, but accepted for publication only in 2013, when intrinsic excitability had become more widely known.

Language in the brain

We need quite different functionality than statistics to model language. We may need this functionality in other areas of intelligence, but with language it is obvious. Or, for the DL community – we can model anything with statistics, of course. It will just not be a very good model …

What follows is that the synaptic plasticity that produces statistical learning does not allow us to build a human language model. Weight adjustment of connections in a graph is simply not sufficient – under any possible model of language to capture language competency.

This is where it becomes interesting. We have just stated that the synaptic plasticity hypothesis of memory is wrong, or else our mammalian brains would be incapable of producing novel sentences and novel text, something we have not memorized before.

The next paradigm: AI, Neural Networks, the Symbolic Brain

I don’t think neural networks are still in their infancy, I think they are moribund. Some say, they hit a roadblock. AI used to be based on rules, symbols and logic. Then probability and statistics came along, and neural networks were created as an efficient statistical paradigm. But unfortunately, the old AI (GOFAI) was discontinued, it was replaced by statistics. Since then ever more powerful computers and ever more data came along, therefore statistics, neural networks (NN), machine learning seemed to create value, even though they were based on simple concepts from the late eighties, early nineties or even earlier. We failed to develop, success was easy. Some argue for hybrid models, combining GOFAI and NN. But that was tried early on, and it wasn’t very successful. What we now need is a new, and deeper understanding of what the brain actually does. Because it obviously does symbol manipulation, it does logic, it does math. Most importantly, we humans learned to speak, using nothing better than a mammalian brain (with a few specializations). I believe there is a new paradigm out there which can fulfill these needs: language, robotics, cognition, knowledge creation. I call it the vertical-horizontal model: a model of neuron interaction, where the neuron is a complete microprocessor of a very special kind. This allows to build a symbolic brain. There will be a trilogy of papers to describe this new paradigm, and a small company to build the necessary concepts. At the present time, here is a link to an early draft, hard to read, not a paper, more a collection right now, but ready for feedback at my email! I’ll soon post a summary here as well.

Learning in the Brain: Difference learning vs. Associative learning

The feedforward/feedback learning and interaction in the visual system has been analysed as a case of “predictive coding” , the “free energy principle” or “Bayesian perception”. The general principle is very simple, so I will call it “difference learning”. I believe that this is directly comparable (biology doesn’t invent, it re-invents) to what is happening at the cell membrane between external (membrane) and internal (signaling) parameters.

It is about difference modulation: an existing or quiet state, and then new signaling (at the membrane) or by perception (in the case of vision). Now the system has to adapt to the new input. The feedback connections transfer back the old categorization of the new input. This gets added to the perception so that a percept evolves which uses the old categorization together with the new input to achieve quickly an adequate categorization for any perceptual input. There will be a bias of course in favor of existing knowledge, but that makes sense in a behavioral context.

The same thing happens at the membrane. An input signal activates membrane receptors (parameters). The internal parameters – the control structure – transfers back the stored response to the external membrane parameters. And the signal generates a suitable neuronal response according to its effect on external (bottom-up) together with the internal control structure (top-down). The response is now biased in favor of an existing structure, but it also means all signals can quickly be interpreted.

If a signal overcomes a filter, new adaptation and learning of the parameters can happen.

The general principle is difference learning, adaptation on the basis of a difference between encoded information and a new input. This general principle underlies all membrane adaptation, whether at the synapse or the spine, or the dendrite, and all types of receptors, whether AMPA, GABA or GPCR.

We are used to believe that the general principle of neural plasticity is associative learning. This is an entirely different principle and merely derivative of difference learning in certain contexts. Associative learning as the basis of synaptic plasticity goes back more than a 100 years. The idea was that by exposure to different ideas or objects, the connection between them in the mind was strengthened. And it was then conjectured that two neurons (A and B) both of which are activated would strengthen their connection (from A to B). More precisely, as was later often found, A needed to fire earlier than B, in order to encode a sequential relation.

What would be predicted by difference learning? An existing connection would encode the strength of synaptic activation at that site. As long as the actual signal matches, there is no need for adaptation. If it becomes stronger, the synapse may acquire additional receptors by using its internal control structure. This control structure may have requirements about sequentiality. The control structure may also be updated to make the new strength permanent, a new set-point parameter. On the other hand, a weaker than memorized signal will ultimately lead the synapse to wither and die.

Similar outcomes, entirely different principles. Association is encoded by any synapse, and since membrane receptors are plastic, associative learning is a restricted derivative of difference learning.

Functional Principles of Neural Plasticity

It is really difficult to conceptualize things in a novel way – when one has been conditioned for decades to believe in the synaptic plasticity theory of memory.
I offer a new type of theory, first outlined in the vision statement linked above, which is called a horizontal-vertical integration theory. There could be several such theories. All such theories would agree that each neuron has internal and external parameters, and only external parameters influence their horizontal interactions with other neurons (mostly by electrophysiology). The exchange of information between the membrane (external) and internal zones is the vertical integration. The horizontal integration uses contact points (synapses) on the membrane, in highly plastic environments with spines as compartmentalized vertical integration sites.
In addition to being more adequate for the biological facts, this will be friendly for computation as well. Synaptic plasticity doesn’t have much stability or permanence and we can’t build structure. All information is at the membrane where it is constantly changed. That is a huge problem.
The vision statement above already contains a very specific theory. In the background it is understood that spatiotemporal spike timing patterns are the representations, which is where patterns, perceptions and thoughts reside. But those representations use the existing neurons with their own plasticity. Thus representations are not only signal-driven, they integrate perceptions with existing knowledge. And the neurons’ individual memories influence the patterns that result from ongoing sensations.
Ion exchange, especially calcium, is quite important for linking external (membrane) and internal (submembrane, cytosolic) parameters. It is quite apparent from the experimental literature that this is a major gateway, probably the fastest. There is also exchange between the core, nuclear parameters, and the internal parameters in the cytosol (including the spine). These are slower exchanges involving proteins like transcription factors. Noticeably we have a system with fast plasticity on the outside and slower, lasting plasticity distant from external signaling. Such a system offers both stability and fast reactivity, and is unique in its properties compared to existing computing architectures.
It will be more work to flesh this out.
Theories are not true, they are useful. Only the individual fact can be established as true. But theories need to anchor individual facts, and a horizontal-vertical integration theory has the potential to cover a large amount of what is known. At the same time, its computing abilities are fascinating. LTP/LTD has outlived its usefulness.

ai anesthesia Bayes cell assembly Cellular intelligence consciousness cortex cortical microcolumns critical period cybernetics dopamine electrophysiology ensembles epigenetics feature learning feedback heavy-tailed distributions hierarchy inhibition ion channels knowledge learning LTD LTP memory models modules mouse network neural coding neural plasticity neuron perception predictive coding processor psf system signal transduction spiking stimulation protocol synapses Synaptic Plasticity theories transfer functions Turing patterns vertical-horizontal

Two parts of the brain

Here is an image from the 10th art of neuroscience contest. It is striking in that it gives a view of the brain, where there are two major parts: the cerebrum and the cerebellum. Of course there are a lot of interesting structures hidden within the cerebrum, but still: Is this a von Neumann architecture, memory and processor? No, we already know that memory is an ubiquitous phenomenon and not localized to any brain structure (notwithstanding the function of the hippocampus). How about a CPU and a GPU? Closer, since the cerebellum is (a) structurally remarkably homogeneous, suitable to performing many routine operations (b) fast, by an order of magnitude faster than the cortex, (c) considered to perform ‘supportive’ functions in terms of smooth motor movements, complex perception, but also standard thought processes, always improving fast coordination and integration of information (d) also, as it mostly speeds up, improves and calibrates brain processes, an organism is able to survive without it  (e) contains very many processing units (50% of all neurons of the brain), even though its function is subsidiary to cortex, (f) has a very constrained feedback modality, a bottleneck in information transfer through deep cerebellar nuclei, even though it receives large amounts of information from the cortex. Although the cerebellum does not do graphical displays, it may “image” cortical information, i.e. represent it in a specific, simple format, and use this to guide motor movements, while providing only limit feedback about its performance. Food for thought.

Cocaine Dependency and restricted learning

Substantial work (NasifFJetal2005a, NasifFJetal2005b , DongYetal2005HuXT2004, Marinellietal2006) has shown that repeated cocaine administration changes the intrinsic excitability of prefrontal cortical (PFC) neurons (in rats), by altering the expression of ion channels. It downregulates voltage-gated K+ channels, and increases membrane excitability in principal (excitatory) PFC neurons.

An important consequence of this result is the following: by restricting expression levels of major ion channels, the capacity of the neuron to undergo intrinsic plasticity (IP) is limited, and therefore its learning or storage capacity is reduced.

Why is IP important?

It is often assumed that the “amount” of information that can be stored by the whole neuron is restricted compared to each of its synapses, and therefore IP cannot have a large role in neural computation. This view is based on a number of assumptions, namely (a) that IP is only expressed by a single parameter such as a firing threshold or a bit value indicating internal calcium release, (b) that IP could be replaced by a “bias term” for each neuron, essentially another parameter on a par with its synaptic parameters and trainable along with these (c) at most, that this bias term is multiplicative, not additive like synaptic parameters, but still just one learnable parameter and (d) that synapses are independently trainable, on the basis of associative activation, without a requirement of the whole neuron to undergo plasticity. Since the biology of intrinsic excitability and plasticity is very complex, there are very many aspects of it, which could be relevant in a neural circuit (or tissue) and it is challenging to extract plausible components which could be most significant for IP – it is certainly a fruitful area for further study.

In our latest paper we challenge mostly (d), i.e. we advocate a model, where IP implies localist SP (synaptic plasticity), and therefore the occurrence of SP is tied to the occurrence of IP in a neuron. In this sense the whole neuron extends control of plasticity over its synapses, in this particular model, over its dendritic synapses. It is well-known that some neurons, such as in dentate gyrus, exhibit primarily presynaptic plasticity, i.e. control over axonal synapses (mossy fiber contacts onto hippocampal CA3 neurons), but we have not focused on this question concerning cortex from the biological point of view. In any case, if this model captures an important generalization, then cocaine dependency leads to reduced IP, and, as a consequence reduced SP, at a principal neuron’s site. If the neuron is reduced in its ability to learn, i.e. to adjust its voltage-gated K+ channels, such that it operates with heightened membrane excitability, then its dendritic synapses should also be restricted in their capacity to learn (for instance to undergo LTD).

As a matter of fact, a more recent paper (Otisetal2018) shows that if we block intrinsic excitability during recall in a specific area of the PFC (prelimbic PFC), memories encoded in this area are actually prevented from becoming activated.

Why a large cortex?

mouse

If we compare a small mouse cortex with a large human cortex, the connectivity per neuron is approximately the same (10^4/neuron SchuezPalm1989). So why did humans add so many neurons, and why did the connectivity remain constant? For the latter question we may conjecture that a maximal size is already reached in the mouse. Our superior human cognitive skills thus rest on the increased number of neurons in cortex, which means the number of modules (cortical microcolumns) went up, not the synaptic connectivity as such.

Soft coded Synapses

A new preprint by Filipovicetal2009* shows that striatal projection neurons (MSNs) receive different amounts of input, dependent on whether they are D2-modulated, and part of the indirect pathway, or D1-modulated, and part of the direct pathway. In particular membrane fluctuations are higher in the D1-modulated neurons (mostly at higher frequencies): they receive both more inhibitory and excitatory input. This also means that they are activated faster.

The open question is: what drives the difference in input? Do they have stronger synapses or more synapses? If the distribution of synaptic strength is indeed a universal, they could have stronger synapses overall (different peak of the distribution), or more synapses (area under the curve).

Assuming that synapses adapt to the level of input they receive, having stronger synapses would be equivalent to being connected to higher frequency neurons; but there would be a difference in terms of fluctuations of input. Weak synapses have low fluctuations of input, while strong synapses, assuming they are sent out from neurons with a higher frequency range, have larger fluctuations in input to the postsynaptic neuron.

It is also possible that the effect results from a higher amount of correlation in synaptic input to D1-modulated neurons than D2-modulated neurons. However, since correlations are an adaptive feature in neural processing, it would be unusual to have an overall higher level of correlation to one of two similar neuronal groups: it would be difficult to maintain concurrently with fluctuations in correlation which are meaningful to processing (attention).

An additional observation is that dopamine depletion reduces the difference between D2- and D1-modulated MSNs. Since membrane fluctuations are due to differences of synaptic input (AMPA and GABA-A driven), but there is only conflicting evidence that D1 receptors modulate these receptors (except NMDA receptors), one would postulate a presynaptic effect. So, possibly the effect is located at indirect pathway, D2-modulated neurons, which receive less input when dopamine is present, and adjust to a lower level of synaptic input. (Alternatively, reduction of D1 activation could result in less NMDA/ AMPA, more GABA-A, i.e. less synaptic input in a D1 dopamine-dependent way.) In the dopamine depleted mouse, both pathways would receive approximately similar input.   Under this hypothesis, it is not primarily differences in structural plasticity which result in different synaptic input levels, but instead a “soft-coded” (dopamine-coded)  difference, which depends on dopamine levels and is realized by presynaptic/postsynaptic dopamine receptors. Further results will clarify this question.

*Thanks to Marko Filipovic for his input. The interpretations are my own.

Heavy-tailed distributions and hierarchical cell assemblies

In earlier work, we meticulously documented the distribution of synaptic weights and the gain (or activation function) in many different brain areas. We found a remarkable consistency of heavy-tailed, specifically lognormal, distributions for firing rates, synaptic weights and gains (Scheler2017).

Why are biological neural networks heavy-tailed (lognormal)?

Cell assemblies: Lognormal networks support models of a hierarchically organized cell assembly (ensembles). Individual neurons can activate or suppress a whole cell assembly if they are the strongest neuron or directly connect to the strongest neurons (TeramaeJetal2012).
Storage: Sparse strong synapses store stable information and provide a backbone of information processing. More frequent weak synapses are more flexible and add changeable detail to the backbone. Heavy-tailed distributions allow a hierarchy of stability and importance.
Time delay of activation is reduced because strong synapses activate quickly a whole assembly (IyerRetal2013). This reduces the initial response time, which is dependent on the synaptic and intrinsic distribution. Heavy-tailed distributions activate fastest.
Noise response: Under additional input, noise or patterned, the pattern stability of the existing ensemble is higher (IyerRetal2013, see also KirstCetal2016). This is a side effect of integration of all computations within a cell assembly.

Why hierarchical computations in a neural network?

Calculations which depend on interactions between many discrete points (N-body problems, Barnes and Hut 1986), such as particle-particle methods, where every point depends on all others, lead to an O(N^2) calculation. If we supplant this by hierarchical methods, and combine information from multiple points, we can reduce the computational complexity to O(N log N) or O(N).

Since biological neural networks are not feedforward but connect in both forward and backward directions, they have a different structure from ANNs (artificial neural networks) – they consist of hierarchically organised ensembles with few wide-range excitability ‘hub’ neurons and many ‘leaf’ neurons with low connectivity and small-range excitability. Patterns are stored in these ensembles, and get accessed by a fit to an incoming pattern that could be expressed by low mutual information as a measure of similarity. Patterns are modified by similar access patterns, but typically only in their weak connections (else the accessing pattern would not fit).