Learning in the Brain: Difference learning vs. Associative learning

The feedforward/feedback learning and interaction in the visual system has been analysed as a case of “predictive coding” , the “free energy principle” or “Bayesian perception”. The general principle is very simple, so I will call it “difference learning”. I believe that this is directly comparable (biology doesn’t invent, it re-invents) to what is happening at the cell membrane between external (membrane) and internal (signaling) parameters.

It is about difference modulation: an existing or quiet state, and then new signaling (at the membrane) or by perception (in the case of vision). Now the system has to adapt to the new input. The feedback connections transfer back the old categorization of the new input. This gets added to the perception so that a percept evolves which uses the old categorization together with the new input to achieve quickly an adequate categorization for any perceptual input. There will be a bias of course in favor of existing knowledge, but that makes sense in a behavioral context.

The same thing happens at the membrane. An input signal activates membrane receptors (parameters). The internal parameters – the control structure – transfers back the stored response to the external membrane parameters. And the signal generates a suitable neuronal response according to its effect on external (bottom-up) together with the internal control structure (top-down). The response is now biased in favor of an existing structure, but it also means all signals can quickly be interpreted.

If a signal overcomes a filter, new adaptation and learning of the parameters can happen.

The general principle is difference learning, adaptation on the basis of a difference between encoded information and a new input. This general principle underlies all membrane adaptation, whether at the synapse or the spine, or the dendrite, and all types of receptors, whether AMPA, GABA or GPCR.

We are used to believe that the general principle of neural plasticity is associative learning. This is an entirely different principle and merely derivative of difference learning in certain contexts. Associative learning as the basis of synaptic plasticity goes back more than a 100 years. The idea was that by exposure to different ideas or objects, the connection between them in the mind was strengthened. And it was then conjectured that two neurons (A and B) both of which are activated would strengthen their connection (from A to B). More precisely, as was later often found, A needed to fire earlier than B, in order to encode a sequential relation.

What would be predicted by difference learning? An existing connection would encode the strength of synaptic activation at that site. As long as the actual signal matches, there is no need for adaptation. If it becomes stronger, the synapse may acquire additional receptors by using its internal control structure. This control structure may have requirements about sequentiality. The control structure may also be updated to make the new strength permanent, a new set-point parameter. On the other hand, a weaker than memorized signal will ultimately lead the synapse to wither and die.

Similar outcomes, entirely different principles. Association is encoded by any synapse, and since membrane receptors are plastic, associative learning is a restricted derivative of difference learning.

Soft coded Synapses

A new preprint by Filipovicetal2009* shows that striatal projection neurons (MSNs) receive different amounts of input, dependent on whether they are D2-modulated, and part of the indirect pathway, or D1-modulated, and part of the direct pathway. In particular membrane fluctuations are higher in the D1-modulated neurons (mostly at higher frequencies): they receive both more inhibitory and excitatory input. This also means that they are activated faster.

The open question is: what drives the difference in input? Do they have stronger synapses or more synapses? If the distribution of synaptic strength is indeed a universal, they could have stronger synapses overall (different peak of the distribution), or more synapses (area under the curve).

Assuming that synapses adapt to the level of input they receive, having stronger synapses would be equivalent to being connected to higher frequency neurons; but there would be a difference in terms of fluctuations of input. Weak synapses have low fluctuations of input, while strong synapses, assuming they are sent out from neurons with a higher frequency range, have larger fluctuations in input to the postsynaptic neuron.

It is also possible that the effect results from a higher amount of correlation in synaptic input to D1-modulated neurons than D2-modulated neurons. However, since correlations are an adaptive feature in neural processing, it would be unusual to have an overall higher level of correlation to one of two similar neuronal groups: it would be difficult to maintain concurrently with fluctuations in correlation which are meaningful to processing (attention).

An additional observation is that dopamine depletion reduces the difference between D2- and D1-modulated MSNs. Since membrane fluctuations are due to differences of synaptic input (AMPA and GABA-A driven), but there is only conflicting evidence that D1 receptors modulate these receptors (except NMDA receptors), one would postulate a presynaptic effect. So, possibly the effect is located at indirect pathway, D2-modulated neurons, which receive less input when dopamine is present, and adjust to a lower level of synaptic input. (Alternatively, reduction of D1 activation could result in less NMDA/ AMPA, more GABA-A, i.e. less synaptic input in a D1 dopamine-dependent way.) In the dopamine depleted mouse, both pathways would receive approximately similar input.   Under this hypothesis, it is not primarily differences in structural plasticity which result in different synaptic input levels, but instead a “soft-coded” (dopamine-coded)  difference, which depends on dopamine levels and is realized by presynaptic/postsynaptic dopamine receptors. Further results will clarify this question.

*Thanks to Marko Filipovic for his input. The interpretations are my own.

Ion channel expression is not regulated by spiking behavior

An important topic to understand intrinsic excitability is the distribution and activation of ion channels. In this respect the co-regulations between ion channels are of significant interest. MacLean et al. (2003) could show that overexpression of an A-type potassium channel by shal-RNA-injection in neurons of the stomatogastric ganglion of the lobster is compensated by upregulation of Ih such that the spiking behavior remained unaltered.

A non functional shal-mutant whose overexpression did not affect spiking had the same effect, which shows that the regulation does not happen at the site of the membrane, by measuring the spiking behavior. In this case, Ih was upregulated, even though IA activity was unaltered, and spiking behavior was increased. (This is in contrast to e.g. O’Leary et al., 2013, who assume homeostatic regulation of ion channel expression at the membrane, by spiking behavior.)

In drosophila-motoneurons the expression of shal and shaker – both responsible for IA – is reciprocally coupled. If one is reduced, the other is upregulated to a constant level of IA activity at the membrane. Other ion channels, like (INAp and IM) are again antagonistic, which means they correlate positively: if one is reduced, the other is reduced as well to achieve the same level of effect (Golowasch2014). There are a number of publications which have all documented similar effects, e.g. (MacLean et al., 2005, Schulz et al., 2007; Tobin et al., 2009; O’Leary et al., 2013).

We must assume that the expression level of ion channels is regulated and sensed inside the cell and that the levels of genes for different ion channels are coupled – by genetic regulation or on the level of RNA regulation.

To summarize: When there is high IA expression, Ih is also upregulated. When one gene responsible for IA is suppressed, the other gene is more highly expressed, to achieve the same level of IA expression. When (INap), a permanent sodium channel, is reduced, (IM), a potassium channel, is also reduced.

It is important to note that these ion channels may compensate for each other in terms of overall spiking behavior, but they have subtly different properties of activation, e.g. by the pattern of spiking or by neuromodulation. For instance, if cell A reduces ion channel currents like INap and IM, compensating to achieve the same spiking behavior, once we apply neuromodulation to muscarinic receptors on A, this will affect IM, but not INap. The behavior of cell A, crudely the same, is now altered under certain conditions.

To model this – other than by a full internal cell model – requires internal state variables which guide ion channel expression, and therefore regulate intrinsic excitability. These variables would model ion channel proteins and their respective interaction, and in this way guarantee acceptable spiking behavior of the cell. This could lead to the idea of an internal module which sets the parameters necessary for the neuron to function. Such an internal module that self-organizes its state variables according to specified objective functions could greatly simplify systems design. Instead of tuning systems parameters by outside methods – which is necessary for ion-channel based models – each neuronal unit itself would be responsible for its ion channels and be able to self-tune them separately from the whole system.

Linked to the idea of internal state variables is the idea of internal memory, which I have referred to several times in this blog. If I have an internal module of co-regulated variables, which set external parameters for each unit, then this module may serve as a latent memory for variables which are not expressed at the membrane at the present time (s. Er81). The time course of expression and activation at the membrane and of internal co-regulation need not be the same. This offers an opportunity for memory inside the cell, separated from information processing within a network of neurons.

Egocentric representations for general input

The individual neuron’s state need not be determined only by the inputs received.

(a). It may additionally be seeded with a probability for adaptation that is distributed wrt the graph properties of the neuron (like betweenness centrality, choke points etc.), as well as the neuron’s current intrinsic excitability (IE) (which are related). This seeded probability would correspond to a sensitivity of the neuron to the representation that is produced by the subnetwork. The input representation is transformed by the properties of the subnetwork.

(b). Another way to influence neurons independent of their input is to link them together. This can be done by simulating of neuromodulators (NMs) which influence adaptivity for a subset of neurons within the network. There are then neurons which are linked together and increase or turn on their adaptivity because they share the same NM receptors. Different sets of neurons can become activated and increase their adaptivity, whenever a sufficient level of a NM is reached. An additional learning task is then to identify suitable sets of neurons. For instance, neurons may encode aspects of the input representation that result from additional, i.e. attentional, signals co-occuring with the input.

(c). Finally, both E and I neurons are known to consist of morphologically and genetically distinct types. This opens up additional ways of creating heterogeneous networks from these neuron types and have distinct adaptation rules for them. Some of the neurons may not even be adaptive, or barely adaptive, while others may be adaptive only once, (write once, read-only), or be capable only of upregulation, until they have received their limit. (This applies to synaptic and intrinsic adaptation). Certain neurons may have to follow the idea of unlimited adaptation in both directions in order to make such models viable.

Similar variants in neuron behavior are known from technical applications of ANNs: hyperparameters that link individual parameters into groups (‘weight sharing’) have been used, terms like ‘bypassing’ mean that some neurons do not adjust, only transmit, and ‘gating’ means that neurons may regulate the extent of transmission of a signal (cf. LSTM, ScardapaneSetal2018). Separately, the model ADAM (or ADAMW) has been proposed which computes adaptive learning rates for each neuron and achieves fast convergence.

A neuron-centric biological network model (‘neuronal automaton’) offers a systematic approach to such differences in adaptation. As suggested, biological neurons have different capacities for adaptation and this may extend to their synaptic connections as well. The model would allow to learn different activation functions and different adaptivity for each neuron, helped by linking neurons into groups and using fixed genetic types in the setup of the network. In each specific case the input is represented by the structural and functional constraints of the network and therefore transformed into an internal, egocentric representation.

Dendritic computation

A new paper  Universal features of dendrites through centripetal branch ordering published: July 3, 2017) shows more or less the opposite of what it cites as common wisdom: „neuronal computation is known to depend on the morphology of dendrites”

Namely, since all dendrites follow general topological principles, it is probably not the dendritic morphology that matters in a functional sense. To make a dendrite functional, i.e. let it participate in adaptive information processing, we have to refer to the ion channels and GPCRs that populate the spines and shafts and shape the generation of action potentials.

Compare:
Dendritic integration: 60 years of progress. (Stuart GJ, Spruston N.) Nat Neurosci. 2015 Dec;18(12):1713-21. doi: 10.1038/nn.4157. Epub 2015 Nov 25. Review. PMID:26605882.

Plasticity of dendritic function. Magee JC, Johnston D. Curr Opin Neurobiol. 2005 Jun;15(3):334-42. Review. PMID:15922583

Gabriele Scheler BMC Neurosci. 2013; 14(Suppl 1): P344. Published online 2013 Jul 8. doi: 10.1186/1471-2202-14-S1-P344. PMCID: PMC3704850

Neuromodulation of circuits with variable parameters: single neurons and small circuits reveal principles of state-dependent and robust neuromodulation. Marder E1, O’Leary T, Shruti S. Annu Rev Neurosci. 2014;37:329-46. doi: 10.1146/annurev-neuro-071013-013958.

Dopamine and Neuromodulation

Some time ago, I suggested that equating dopamine with reward learning was a bad idea. Why?
First of all, because it is a myopic view of the role of neuromodulation in the brain, (and also in invertebrate animals). There are at least 4 centrally released neuromodulators, they all act on G-protein-coupled receptors (some not exclusively), and they all have effects on neural processing as well as memory. Furthermore there are myriad neuromodulators which are locally  released, and which have similar effects, all acting through different receptors, but on the same internal pathways, activating G-proteins.

Reward learning means that reward increases dopamine release, and that increased dopamine availability will increase synaptic plasticity.

That was always simplistic and like any half-truth misleading.

Any neuromodulator is variable in its release properties. This results from the activity of its NM-producing neurons, such as in locus ceruleus, dorsal raphe, VTA, medulla etc., which receive input, including from each other, and secondly from control of axonal and presynaptic release, which is independent of the central signal. So there is local modulation of release. Given a signal which increases e.g. firing in the VTA, we still need to know which target areas are at the present time responsive, and at which synapses precisely the signal is directed. It depends on the local state of the network, how the global signal is interpreted.

Secondly, the activation of G-protein coupled receptors is definitely an important ingredient in activating the intracellular pathways that are necessary for the expression of plasticity. Roughly, a concurrent activation of calcium and cAMP/PKA (within 10s or so) has been found to be supportive or necessary of inducing synaptic plasticity. However, dopamine, like the other centrally released neuromodulators, acts through antagonistic receptors, increasing or decreasing PKA, increasing or reducing plasticity. It is again local computation which will decide the outcome of NM signaling at each site.

So, is there a take-home message, rivaling the simplicity of dopamine=reward?

NMs alter representations (=thought) and memorize them (=memory) but the interpretation is flexible at local sites (=learn and re-learn).

Dopamine alters thought and memory in a way that can be learned and re-learned.

Back in 1995 I came up with the idea of analysing neuromodulators like dopamine as a method of introducing global parameters into neural networks, which were considered at the time to admit only local, distributed computations. It seemed to me then, as now, that the capacity for global control of huge brain areas (serotonergic, cholinergic, dopaminergic and noradrenergic systems), was really what set neuromodulation apart from the neurotransmitters glutamate and GABA. There is no need to single out dopamine as the one central signal, which induces simple increases in its target areas, when in reality changes happen through antagonistic receptors, and there are many central signals.  Also, the concept of hedonistic reward is badly defined and essentially restricted to Pavlovian conditioning for animals and addiction in humans.

Since the only known global parameter in neural networks at the time occurred in reinforcement learning, some people created a match, using dopamine as the missing global reinforcement signal (Schultz W, Dayan P, Montague PR. A neural substrate of prediction and reward. Science. 1997). That could not work, because reinforcement learning requires proper discounting within a decision tree. But the idea stuck. Ever since I have been upset at this primitive oversimplification. Bad ideas in neuroscience.

Scheler, G and Fellous, J-M: Dopamine modulation of prefrontal delay activity- reverberatory activity and sharpness of tuning curves.  Neurocomputing, 2001.

Scheler, G. and Schumann, J: Presynaptic modulation as fast synaptic switching: state-dependent modulation of task performance. Proceedings of the International Joint Conference on Neural Networks 2003, Volume: 1. DOI: 10.1109/IJCNN.2003.1223347