Learning in the Brain: Difference learning vs. Associative learning

The feedforward/feedback learning and interaction in the visual system has been analysed as a case of “predictive coding” , the “free energy principle” or “Bayesian perception”. The general principle is very simple, so I will call it “difference learning”. I believe that this is directly comparable (biology doesn’t invent, it re-invents) to what is happening at the cell membrane between external (membrane) and internal (signaling) parameters.

It is about difference modulation: an existing or quiet state, and then new signaling (at the membrane) or by perception (in the case of vision). Now the system has to adapt to the new input. The feedback connections transfer back the old categorization of the new input. This gets added to the perception so that a percept evolves which uses the old categorization together with the new input to achieve quickly an adequate categorization for any perceptual input. There will be a bias of course in favor of existing knowledge, but that makes sense in a behavioral context.

The same thing happens at the membrane. An input signal activates membrane receptors (parameters). The internal parameters – the control structure – transfers back the stored response to the external membrane parameters. And the signal generates a suitable neuronal response according to its effect on external (bottom-up) together with the internal control structure (top-down). The response is now biased in favor of an existing structure, but it also means all signals can quickly be interpreted.

If a signal overcomes a filter, new adaptation and learning of the parameters can happen.

The general principle is difference learning, adaptation on the basis of a difference between encoded information and a new input. This general principle underlies all membrane adaptation, whether at the synapse or the spine, or the dendrite, and all types of receptors, whether AMPA, GABA or GPCR.

We are used to believe that the general principle of neural plasticity is associative learning. This is an entirely different principle and merely derivative of difference learning in certain contexts. Associative learning as the basis of synaptic plasticity goes back more than a 100 years. The idea was that by exposure to different ideas or objects, the connection between them in the mind was strengthened. And it was then conjectured that two neurons (A and B) both of which are activated would strengthen their connection (from A to B). More precisely, as was later often found, A needed to fire earlier than B, in order to encode a sequential relation.

What would be predicted by difference learning? An existing connection would encode the strength of synaptic activation at that site. As long as the actual signal matches, there is no need for adaptation. If it becomes stronger, the synapse may acquire additional receptors by using its internal control structure. This control structure may have requirements about sequentiality. The control structure may also be updated to make the new strength permanent, a new set-point parameter. On the other hand, a weaker than memorized signal will ultimately lead the synapse to wither and die.

Similar outcomes, entirely different principles. Association is encoded by any synapse, and since membrane receptors are plastic, associative learning is a restricted derivative of difference learning.

Antagonistic regulation for cellular intelligence

Cellular intelligence refers to information processing in single cells, i.e. genetic regulation, protein signaling and metabolic processing, all tightly integrated with each other. The goal is to uncover general ‘rules of life’ wrt e.g. the transmission of information, homeostatic and multistable regulation, learning and memory (habituation, sensitization etc.). These principles extend from unicellular organisms like bacteria to specialized cells, which are parts of a multicellular organism.

A prominent example is the ubiquitous role of feedback cycles in cellular information processing. These are often nested, or connected to a central hub, as a set of negative feedback cycles, sometimes interspersed with positive feedback cycles as well. Starting from Norbert Wiener’s work on cybernetics, we have gained a deeper understanding of this regulatory motif, and the complex modules that can be built from a multitude of these cycles by modeling as well as mathematical analysis.

Another motif that is similar in significance and ubiquity is antagonistic interaction. A prototypical antagonistic interaction consists of a signal, two pathways, one negative, one positive, and a target. The signal connects to the target by both pathways. No further parts are required.

On the face of it, this interaction seems redundant. When you connect a signal to a target by a positive and a negative connection, the amount of change is a sum of both connections, and for this, one connection should be sufficient. But this motif is actually very widespread and powerful, and there are two main aspects to this:

A. Gearshifting, scale-invariance or digitalization of input: for an input signal that can occur at different strengths, the antagonistic transmission allows to shift the signal to a lower level/gear with a limited bandwidth compared to the input range. This can also be described as scale-invariance or standardization of the input, or in the extreme case, digitalization of an analog input signal.

B. Fast onset-slow offset response curves: in this case the double transmission lines are used with a time delay. The positive interaction is fast, the negative interaction is slow. Therefore there is a fast peak response with a slower relaxation time– useful in many biological contexts, where fast reaction times are crucial.

Negative feedback cycles which can achieve similar effects by acting on the signal itself: the positive signal is counteracted by a negative input which reduces the input signal. The result is again a fast peak response followed by a downregulation to an equilibrium value. The advantage for antagonistic interactions is that the original signal is left intact, which is useful. because the same signal may act on other targets unchanged. In a feedback cycle the signal itself is consumed by the feedback interaction. The characteristic shape of the signal, fast peak response with a slower downregulation, may therefore arise from different structures.

The type of modules that can be built from both antagonistic interactions and feedback have not been explored systematically. However, one example is morphogenetic patterning, often referred to as ‘Turing patterns’, which relies on a positive feedback cycle by an activator, plus antagonistic interactions (activator/inhibitor) with a time delay for the inhibitor.

 turing_pattern

Internal Memory: An Example

A protein Er81 which is present in about 60% of parvalbumin interneurons (parvalbumin is a calcium buffer which is fast, in contrast to calbindin) in layer II/III in the cortex of mice has been found to have an effect on the latency of spiking in these interneurons. (Er81 is also found in layer V pyramidal cells, there are also publications about that). This is mediated by the expression of the Kv1.1 potassium channel. Neurons with low Er81 expression have less Kv1.1, and these neurons, fast spiking basket cells, respond without latency. These neurons receive both E and I input. Neurons with high Er81 expression have more Kv1.1 channels, and these neurons (primarily basket cells again) have noticeable latencies. In slices it was found that cells of this kind have mostly E input and much less I input.
It then was shown that these ‘types’ of neurons actually undergo adult plasticity. A simple experiment – stimulation with kainate, and inhibition with nifedipine, a L-type calcium blocker – showed that Er81 expression was regulated inversely proportional to total network activity, and that this was observable after approximately two hours. So this is a kind of internal plasticity on the same time scale as LTP/LTD.
Additional experiments showed that Er81 plasticity was mediated by calcium entry into the cell (as so many other forms of plasticity), so we have evidence for a cell-specific regulation of Er81.
More precisely, the internal memory is the level of Er81. This can be a long-term storage element and remain constant over long time periods. The plasticity is intrinsic, i.e. in the expression of ion channels. The internal memory sets a parameter on the membrane (µKAs cf. Scheler 2013). When the internal memory changes – a new value emerges, the old value is overwritten – then there is a read-out at the membrane in terms of the µKAs parameter. So in this particular case, it seems as if the internal value is superfluous, and the µKAs is identical to epsilon Er81. But this is a mistake, in reality, µKAs is set by a number of factors, and epsilon ER81 very probably has other effects in the system as well.
It is not clear from this work, why the innervation by E and I neurons is different,  and also how and whether this changes, on the same time scale, or at all.
A surprising observation from this paper is also that high activity causes latencies of interneurons to appear, but low activity abolishes them. One might think that with less latency, there is more inhibition in the network, and high activity abolishes latencies to upregulate inhibition. That is not the case.
Without a simulation, I’d guess that inhibitory latencies reduce excitatory pressure; where activation is stored in the membrane potential of I neurons without letting them spike. There is then reduced spiking of I neurons, but still a reduction of overall excitation in the network, since the capacity of the I neuron to buffer synaptic input is enhanced. These neurons receive mostly E input, because they have this buffer capacity, no-latency neurons in contrast participate in disinhibition – they respond to the level of inhibition as well and adjust their activity. There is more activity stored in the network with longer latencies but less spiking. This is just a guess concerning the behavior of a real network.
Summarizing: A cytoplasmic protein Er81 regulates the density of Kv1.1 channels, which is a form of intrinsic plasticity that is set by a cell-internal calcium-related parameter. Neuronal activation of course increases calcium entry, so the internal parameter is influenced by external signals. The density of Kv1.1 channels influences spike latency and overall spike frequency. There is no synaptic plasticity in this scenario.
 Tuning of fast-spiking interneuron properties by an activity-dependent transcriptional switch
Nathalie Dehorter etal.
Science  11 Sep 2015: Vol. 349, Issue 6253, pp. 1216-1220
DOI: 10.1126/science.aab3415

Local Adjustment of a Biochemical Reaction System

This is an explanation which refers to the paper
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0055762 (Fig.4).

Since the explanation was short at that point, here is a better way to explain it:

Image

The elementary psf results from using the kinetic parameters and executing a single reaction complex, i.e. one backward and one forward reaction. This is the minimal unit we need. For binding reactions this is A+B <->AB (forward kon, backward koff), for enzymatic reactions it is  A+E<->AE->A* and A*->A.(forward kon, backward koff, kcat and kcatd)

But in a system, every reaction is embedded. Therefore the elementary psf is changed. Example:
one species participates in two reactions and binds to two partners. The kinetic rate parameters for the binding reaction is the same, but some amount of the species is sucked up by the other reaction.
Therefore, if we look at the psf, its curve will be different, and we call this the systemic psf. It obviously depends on the collection of reactions, as a matter of fact on the collection of ALL connected reactions, what this psf will look like.

Now in practice, only a limited amount of “neighboring reactions” will have an effect. This has also been determined by other papers, i.e. the observation that local changes at one spot do not “travel” far.
Therefore we can now do a neat trick:

We look at a whole system, and focus in on a single psf, which means a systemic psf. Example:
GoaGTPO binds to AC5Ca and produces GoaGTPAC5Ca. In this system, the binding reaction is very weak. The curve over the range of GoaGTP (~10-30nM) goes from near 0 to maybe 5 nM at most. We may decide or have measured that we want to improve the model at this point. We may use data that indicate a curve going from about 10nM to about 50nM for the same input of GoaGTP (~10-30nM). Good thing is that we can define just such a curve using hyperbolic parameters. We have measured or want to place the curve such that ymax=220, and C=78, n=1.

So now we know what the systemic psf should be, but how do we get there? We adjust the underlying  kinetic rate parameters for this reaction and any neighboring reactions such that this systemic psf results (and the others do not change or very little).
This can  obviously be done by an iterative process

  • adjust the reaction itself first, (change kinetic rates)
  • then adjust every other reaction which has changed,(change kinetic rates)
  • and continue until the new goal is met and all other psfs are still the same.
  • Use reasonable error ranges to define “goal is met” and “psfs are the same”.

Without error ranges, I do not offer a proof that such a procedure will always converge. As a matter of fact I suspect it  may NOT always be possible. Therefore we need reasonable error ranges.
In practice, in most cases I believe 2,3, maybe 4 reactions are all that is affected, everything else will have such small adjustments that it is not worth touching. These functions remain very local. In the example given, only one other reaction was changed at all.

The decisive part is that we can often measure such a systemic psf, such a transfer function somewhere in the system, and therefore independently calibrate the system.

We measure the systemic psf, but we now have a procedure to force the system into matching this new measurement by adjusting kinetic rates, and using the psf parameters to define the intended, adjusted local transfer function.

In many cases, as in the given example, this allows to locally and specifically test and improve the system – this is novel, and it only works because we made the clear conceptual difference between kinetic rate parameters (which are elementary) and systemic psf parameters.

We do not derive kon vs. koff or the precise dynamics in this way.  For a binding reaction it is only the ratio koff/kon (=Kd) that matters, for an enzymatic reaction it is koff/kon and kcat/kcatd. There are multiple solutions. Dynamic matching may filter out which ones match not only the transfer function, but also the timing. This has not been addressed, because it would only be another filtering step.

The procedure outlined for local adjustment of a biochemical reaction system needs to be  implemented,and more experience gained on spread of local adjustments and reasonable error bounds.

AMP and cAMP

When the steady state level of cAMP rises, the AMP:ATP ratio in a cell also increases.

“In cardiomyocytes, β2-AR stimulation resulted in a reduction in ATP production but was accompanied by a rise in its precursor, AMP … The AMP/ATP ratio was enhanced …, which subsequently led to the activation of AMP-activated kinase (AMPK)….Lietal2010(JPhysiol).

This activates AMP kinase, which phosphorylates TSC2 and RAPTOR, a subcomplex of mTORC1, and de-activates mTORC1. mTORC1 is a protein complex that is activated by nutrients and growth factors, and it is of importance in neurodegeneration. Together with PDK1, it activates S6K1, which stimulates protein synthesis by the ribosomal protein S6. S6K1 and mTORC1 are caught in a positive feedback loop.

In other words we have a complex integration of signals that converge on the ribosome in order to influence protein synthesis by sensing energy levels in the cell. Basically, AMPK decreases protein synthesis (mTORc1).

Under optimal physiological conditions, the AMP-to-ATP ratio is maintained at a level of
0.01 (*)

(*) Hardie DG and Hawley SA. AMP-activated protein kinase: the energy
charge hypothesis revisited. Bioessays 23: 1112–1119, 2001..

And here is something entirely different: sensing ph-levels.

“Intracellular acidification, another stimulator of in vivo
cAMP synthesis, but not glucose, caused an increase in
the GTP/GDP ratio on the Ras proteins.” (RollandFetal2002)

So there is a lot that is very interesting about cAMPs connection to cellular state sensing, and mediating between cellular state and protein synthesis.

Homeostatic Regulation – LDL Receptors

A recent news story

covered the development of drugs targeting PCSK9, a “pro-protein” that decreases the density of LDL receptors (e.g. in the liver). The interesting part for the computational biologist is the regulation of LDL receptors: The density of LDL receptors depends on the amount of LDL available in the bloodstream. With more LDL, their density increases. Another way to increase LDLR density are statins (drugs). However, statins also activate PCSK9. And PCSK9 acts to decrease LDL. In other words, it looks as if we have a classic homeostatic regulation, and by interfering with it at one point, we may activate processes that counteract the wanted drug effect. If we add PCSK9 inhibitors now (by monoclonal antibodies, or as in this case by miRNA interference), we believe we may have a more radical effect on keeping LDLR active. In any case, it tells us that we need to understand a system that we interfere with.