Here is an image from the 10th art of neuroscience contest. It is striking in that it gives a view of the brain, where there are two major parts: the cerebrum and the cerebellum. Of course there are a lot of interesting structures hidden within the cerebrum, but still: Is this a von Neumann architecture, memory and processor? No, we already know that memory is an ubiquitous phenomenon and not localized to any brain structure (notwithstanding the function of the hippocampus). How about a CPU and a GPU? Closer, since the cerebellum is (a) structurally remarkably homogeneous, suitable to performing many routine operations (b) fast, by an order of magnitude faster than the cortex, (c) considered to perform ‘supportive’ functions in terms of smooth motor movements, complex perception, but also standard thought processes, always improving fast coordination and integration of information (d) also, as it mostly speeds up, improves and calibrates brain processes, an organism is able to survive without it (e) contains very many processing units (50% of all neurons of the brain), even though its function is subsidiary to cortex, (f) has a very constrained feedback modality, a bottleneck in information transfer through deep cerebellar nuclei, even though it receives large amounts of information from the cortex. Although the cerebellum does not do graphical displays, it may “image” cortical information, i.e. represent it in a specific, simple format, and use this to guide motor movements, while providing only limit feedback about its performance. Food for thought.
Substantial work (NasifFJetal2005a, NasifFJetal2005b , DongYetal2005 , HuXT2004, Marinellietal2006) has shown that repeated cocaine administration changes the intrinsic excitability of prefrontal cortical (PFC) neurons (in rats), by altering the expression of ion channels. It downregulates voltage-gated K+ channels, and increases membrane excitability in principal (excitatory) PFC neurons.
An important consequence of this result is the following: by restricting expression levels of major ion channels, the capacity of the neuron to undergo intrinsic plasticity (IP) is limited, and therefore its learning or storage capacity is reduced.
Why is IP important?
It is often assumed that the “amount” of information that can be stored by the whole neuron is restricted compared to each of its synapses, and therefore IP cannot have a large role in neural computation. This view is based on a number of assumptions, namely (a) that IP is only expressed by a single parameter such as a firing threshold or a bit value indicating internal calcium release, (b) that IP could be replaced by a “bias term” for each neuron, essentially another parameter on a par with its synaptic parameters and trainable along with these (c) at most, that this bias term is multiplicative, not additive like synaptic parameters, but still just one learnable parameter and (d) that synapses are independently trainable, on the basis of associative activation, without a requirement of the whole neuron to undergo plasticity. Since the biology of intrinsic excitability and plasticity is very complex, there are very many aspects of it, which could be relevant in a neural circuit (or tissue) and it is challenging to extract plausible components which could be most significant for IP – it is certainly a fruitful area for further study.
In our latest paper we challenge mostly (d), i.e. we advocate a model, where IP implies localist SP (synaptic plasticity), and therefore the occurrence of SP is tied to the occurrence of IP in a neuron. In this sense the whole neuron extends control of plasticity over its synapses, in this particular model, over its dendritic synapses. It is well-known that some neurons, such as in dentate gyrus, exhibit primarily presynaptic plasticity, i.e. control over axonal synapses (mossy fiber contacts onto hippocampal CA3 neurons), but we have not focused on this question concerning cortex from the biological point of view. In any case, if this model captures an important generalization, then cocaine dependency leads to reduced IP, and, as a consequence reduced SP, at a principal neuron’s site. If the neuron is reduced in its ability to learn, i.e. to adjust its voltage-gated K+ channels, such that it operates with heightened membrane excitability, then its dendritic synapses should also be restricted in their capacity to learn (for instance to undergo LTD).
As a matter of fact, a more recent paper (Otisetal2018) shows that if we block intrinsic excitability during recall in a specific area of the PFC (prelimbic PFC), memories encoded in this area are actually prevented from becoming activated.
If we compare a small mouse cortex with a large human cortex, the connectivity per neuron is approximately the same (10^4/neuron SchuezPalm1989). So why did humans add so many neurons, and why did the connectivity remain constant? For the latter question we may conjecture that a maximal size is already reached in the mouse. Our superior human cognitive skills thus rest on the increased number of neurons in cortex, which means the number of modules (cortical microcolumns) went up, not the synaptic connectivity as such.
A new preprint by Filipovicetal2009* shows that striatal projection neurons (MSNs) receive different amounts of input, dependent on whether they are D2-modulated, and part of the indirect pathway, or D1-modulated, and part of the direct pathway. In particular membrane fluctuations are higher in the D1-modulated neurons (mostly at higher frequencies): they receive both more inhibitory and excitatory input. This also means that they are activated faster.
The open question is: what drives the difference in input? Do they have stronger synapses or more synapses? If the distribution of synaptic strength is indeed a universal, they could have stronger synapses overall (different peak of the distribution), or more synapses (area under the curve).
Assuming that synapses adapt to the level of input they receive, having stronger synapses would be equivalent to being connected to higher frequency neurons; but there would be a difference in terms of fluctuations of input. Weak synapses have low fluctuations of input, while strong synapses, assuming they are sent out from neurons with a higher frequency range, have larger fluctuations in input to the postsynaptic neuron.
It is also possible that the effect results from a higher amount of correlation in synaptic input to D1-modulated neurons than D2-modulated neurons. However, since correlations are an adaptive feature in neural processing, it would be unusual to have an overall higher level of correlation to one of two similar neuronal groups: it would be difficult to maintain concurrently with fluctuations in correlation which are meaningful to processing (attention).
An additional observation is that dopamine depletion reduces the difference between D2- and D1-modulated MSNs. Since membrane fluctuations are due to differences of synaptic input (AMPA and GABA-A driven), but there is only conflicting evidence that D1 receptors modulate these receptors (except NMDA receptors), one would postulate a presynaptic effect. So, possibly the effect is located at indirect pathway, D2-modulated neurons, which receive less input when dopamine is present, and adjust to a lower level of synaptic input. (Alternatively, reduction of D1 activation could result in less NMDA/ AMPA, more GABA-A, i.e. less synaptic input in a D1 dopamine-dependent way.) In the dopamine depleted mouse, both pathways would receive approximately similar input. Under this hypothesis, it is not primarily differences in structural plasticity which result in different synaptic input levels, but instead a “soft-coded” (dopamine-coded) difference, which depends on dopamine levels and is realized by presynaptic/postsynaptic dopamine receptors. Further results will clarify this question.
*Thanks to Marko Filipovic for his input. The interpretations are my own.
In earlier work, we meticulously documented the distribution of synaptic weights and the gain (or activation function) in many different brain areas. We found a remarkable consistency of heavy-tailed, specifically lognormal, distributions for firing rates, synaptic weights and gains (Scheler2017).
Why are biological neural networks heavy-tailed (lognormal)?
Cell assemblies: Lognormal networks support models of a hierarchically organized cell assembly (ensembles). Individual neurons can activate or suppress a whole cell assembly if they are the strongest neuron or directly connect to the strongest neurons (TeramaeJetal2012).
Storage: Sparse strong synapses store stable information and provide a backbone of information processing. More frequent weak synapses are more flexible and add changeable detail to the backbone. Heavy-tailed distributions allow a hierarchy of stability and importance.
Time delay of activation is reduced because strong synapses activate quickly a whole assembly (IyerRetal2013). This reduces the initial response time, which is dependent on the synaptic and intrinsic distribution. Heavy-tailed distributions activate fastest.
Noise response: Under additional input, noise or patterned, the pattern stability of the existing ensemble is higher (IyerRetal2013, see also KirstCetal2016). This is a side effect of integration of all computations within a cell assembly.
Why hierarchical computations in a neural network?
Calculations which depend on interactions between many discrete points (N-body problems, Barnes and Hut 1986), such as particle-particle methods, where every point depends on all others, lead to an O(N^2) calculation. If we supplant this by hierarchical methods, and combine information from multiple points, we can reduce the computational complexity to O(N log N) or O(N).
Since biological neural networks are not feedforward but connect in both forward and backward directions, they have a different structure from ANNs (artificial neural networks) – they consist of hierarchically organised ensembles with few wide-range excitability ‘hub’ neurons and many ‘leaf’ neurons with low connectivity and small-range excitability. Patterns are stored in these ensembles, and get accessed by a fit to an incoming pattern that could be expressed by low mutual information as a measure of similarity. Patterns are modified by similar access patterns, but typically only in their weak connections (else the accessing pattern would not fit).
Epigenetic modification is a powerful mechanism for the induction, the expression and persistence of long-term memory.
For long-term memory, we need to consider diverse cellular processes. These occur in neurons from different brain regions (in particular hippocampus, cortex, amygdala) during memory consolidation and recall. For instance, long-term changes in kinase expression in the proteome, changes in receptor subunit composition and localization at synaptic/dendritic membranes, epigenetic modifications of chromatin such as DNA methylation and histone methylation in the nucleus, and the posttranslational modifications of histones, including phosphorylation and acetylation, all these play a role. Histone acetylation is of particular interest because a number of known medications exist, which function as histone deacetylase inhibitors (HDACs), i.e. have a potential to increase DNA transcription and memory (more on this in a later post).
Epigenetic changes are important because they define the internal conditions for plasticity for the individual neuron. They underlie for instance, kinase or phosphatase-mediated (de)activations of enzymatic proteins and therefore influence the probability of membrane proteins to become altered by synaptic activation.
Among epigenetic changes, DNA methylation typically acts to alter, often to repress, DNA transcription at cytosine, or CpG islands in vertebrates. DNA methylation is mediated by enzymes such as Tet3, which catalyses an important step in the demethylation of DNA. In dentate gyrus of live rats, it was shown that the expression of Tet3 is greatly increased by LTP – synaptically induced memorization – , suggesting that certain DNA stretches were demethylated , and presumably activated. During induction of LTP by high frequency electrical stimulation, DNA methylation is changed specifically for certain genes known for their role in neural plasticity . The expression of neural plasticity genes is widely correlated with the methylation status of the corresponding DNA .
So there is interesting evidence for filtering the induction of plasticity via the epigenetic landscape and modifiers of gene expression, such as HDACs. Substances which act as histone deacetylase inhibitors (HDACs) increase histone acetylation. An interesting result from research on fear suggests that HDACs increase some DNA transcription, and enhance specifically fear extinction memories , ,.
An important topic to understand intrinsic excitability is the distribution and activation of ion channels. In this respect the co-regulations between ion channels are of significant interest. MacLean et al. (2003) could show that overexpression of an A-type potassium channel by shal-RNA-injection in neurons of the stomatogastric ganglion of the lobster is compensated by upregulation of Ih such that the spiking behavior remained unaltered.
A non functional shal-mutant whose overexpression did not affect spiking had the same effect, which shows that the regulation does not happen at the site of the membrane, by measuring the spiking behavior. In this case, Ih was upregulated, even though IA activity was unaltered, and spiking behavior was increased. (This is in contrast to e.g. O’Leary et al., 2013, who assume homeostatic regulation of ion channel expression at the membrane, by spiking behavior.)
In drosophila-motoneurons the expression of shal and shaker – both responsible for IA – is reciprocally coupled. If one is reduced, the other is upregulated to a constant level of IA activity at the membrane. Other ion channels, like (INAp and IM) are again antagonistic, which means they correlate positively: if one is reduced, the other is reduced as well to achieve the same level of effect (Golowasch2014). There are a number of publications which have all documented similar effects, e.g. (MacLean et al., 2005, Schulz et al., 2007; Tobin et al., 2009; O’Leary et al., 2013).
We must assume that the expression level of ion channels is regulated and sensed inside the cell and that the levels of genes for different ion channels are coupled – by genetic regulation or on the level of RNA regulation.
To summarize: When there is high IA expression, Ih is also upregulated. When one gene responsible for IA is suppressed, the other gene is more highly expressed, to achieve the same level of IA expression. When (INap), a permanent sodium channel, is reduced, (IM), a potassium channel, is also reduced.
It is important to note that these ion channels may compensate for each other in terms of overall spiking behavior, but they have subtly different properties of activation, e.g. by the pattern of spiking or by neuromodulation. For instance, if cell A reduces ion channel currents like INap and IM, compensating to achieve the same spiking behavior, once we apply neuromodulation to muscarinic receptors on A, this will affect IM, but not INap. The behavior of cell A, crudely the same, is now altered under certain conditions.
To model this – other than by a full internal cell model – requires internal state variables which guide ion channel expression, and therefore regulate intrinsic excitability. These variables would model ion channel proteins and their respective interaction, and in this way guarantee acceptable spiking behavior of the cell. This could lead to the idea of an internal module which sets the parameters necessary for the neuron to function. Such an internal module that self-organizes its state variables according to specified objective functions could greatly simplify systems design. Instead of tuning systems parameters by outside methods – which is necessary for ion-channel based models – each neuronal unit itself would be responsible for its ion channels and be able to self-tune them separately from the whole system.
Linked to the idea of internal state variables is the idea of internal memory, which I have referred to several times in this blog. If I have an internal module of co-regulated variables, which set external parameters for each unit, then this module may serve as a latent memory for variables which are not expressed at the membrane at the present time (s. Er81). The time course of expression and activation at the membrane and of internal co-regulation need not be the same. This offers an opportunity for memory inside the cell, separated from information processing within a network of neurons.
The individual neuron’s state need not be determined only by the inputs received.
(a). It may additionally be seeded with a probability for adaptation that is distributed wrt the graph properties of the neuron (like betweenness centrality, choke points etc.), as well as the neuron’s current intrinsic excitability (IE) (which are related). This seeded probability would correspond to a sensitivity of the neuron to the representation that is produced by the subnetwork. The input representation is transformed by the properties of the subnetwork.
(b). Another way to influence neurons independent of their input is to link them together. This can be done by simulating of neuromodulators (NMs) which influence adaptivity for a subset of neurons within the network. There are then neurons which are linked together and increase or turn on their adaptivity because they share the same NM receptors. Different sets of neurons can become activated and increase their adaptivity, whenever a sufficient level of a NM is reached. An additional learning task is then to identify suitable sets of neurons. For instance, neurons may encode aspects of the input representation that result from additional, i.e. attentional, signals co-occuring with the input.
(c). Finally, both E and I neurons are known to consist of morphologically and genetically distinct types. This opens up additional ways of creating heterogeneous networks from these neuron types and have distinct adaptation rules for them. Some of the neurons may not even be adaptive, or barely adaptive, while others may be adaptive only once, (write once, read-only), or be capable only of upregulation, until they have received their limit. (This applies to synaptic and intrinsic adaptation). Certain neurons may have to follow the idea of unlimited adaptation in both directions in order to make such models viable.
Similar variants in neuron behavior are known from technical applications of ANNs: hyperparameters that link individual parameters into groups (‘weight sharing’) have been used, terms like ‘bypassing’ mean that some neurons do not adjust, only transmit, and ‘gating’ means that neurons may regulate the extent of transmission of a signal (cf. LSTM, ScardapaneSetal2018). Separately, the model ADAM (or ADAMW) has been proposed which computes adaptive learning rates for each neuron and achieves fast convergence.
A neuron-centric biological network model (‘neuronal automaton’) offers a systematic approach to such differences in adaptation. As suggested, biological neurons have different capacities for adaptation and this may extend to their synaptic connections as well. The model would allow to learn different activation functions and different adaptivity for each neuron, helped by linking neurons into groups and using fixed genetic types in the setup of the network. In each specific case the input is represented by the structural and functional constraints of the network and therefore transformed into an internal, egocentric representation.
Cellular intelligence refers to information processing in single cells, i.e. genetic regulation, protein signaling and metabolic processing, all tightly integrated with each other. The goal is to uncover general ‘rules of life’ wrt e.g. the transmission of information, homeostatic and multistable regulation, learning and memory (habituation, sensitization etc.). These principles extend from unicellular organisms like bacteria to specialized cells, which are parts of a multicellular organism.
A prominent example is the ubiquitous role of feedback cycles in cellular information processing. These are often nested, or connected to a central hub, as a set of negative feedback cycles, sometimes interspersed with positive feedback cycles as well. Starting from Norbert Wiener’s work on cybernetics, we have gained a deeper understanding of this regulatory motif, and the complex modules that can be built from a multitude of these cycles by modeling as well as mathematical analysis.
Another motif that is similar in significance and ubiquity is antagonistic interaction. A prototypical antagonistic interaction consists of a signal, two pathways, one negative, one positive, and a target. The signal connects to the target by both pathways. No further parts are required.
On the face of it, this interaction seems redundant. When you connect a signal to a target by a positive and a negative connection, the amount of change is a sum of both connections, and for this, one connection should be sufficient. But this motif is actually very widespread and powerful, and there are two main aspects to this:
A. Gearshifting, scale-invariance or digitalization of input: for an input signal that can occur at different strengths, the antagonistic transmission allows to shift the signal to a lower level/gear with a limited bandwidth compared to the input range. This can also be described as scale-invariance or standardization of the input, or in the extreme case, digitalization of an analog input signal.
B. Fast onset-slow offset response curves: in this case the double transmission lines are used with a time delay. The positive interaction is fast, the negative interaction is slow. Therefore there is a fast peak response with a slower relaxation time– useful in many biological contexts, where fast reaction times are crucial.
Negative feedback cycles which can achieve similar effects by acting on the signal itself: the positive signal is counteracted by a negative input which reduces the input signal. The result is again a fast peak response followed by a downregulation to an equilibrium value. The advantage for antagonistic interactions is that the original signal is left intact, which is useful. because the same signal may act on other targets unchanged. In a feedback cycle the signal itself is consumed by the feedback interaction. The characteristic shape of the signal, fast peak response with a slower downregulation, may therefore arise from different structures.
The type of modules that can be built from both antagonistic interactions and feedback have not been explored systematically. However, one example is morphogenetic patterning, often referred to as ‘Turing patterns’, which relies on a positive feedback cycle by an activator, plus antagonistic interactions (activator/inhibitor) with a time delay for the inhibitor.
Current synaptic plasticity models have one decisive property which may not be biologically adequate, and which has important repercussions on the type of memory and learning algorithms in general that can be implemented: Each processing or transmission event is an adaptive learning event.
In contrast, in biology, there are many pathways that may act as filters from the use of a synapse to the adaptation of its strength. In LTP/LTD, typically 20 minutes are necessary to observe the effects. This requires the activation of intracellular pathways, often co-occurence of a GPCR activation, and even nuclear read-out.
Therefore we have suggested a different model, greatly simplified at first to test its algorithmic properties. We include intrinsic excitability in learning (LTP-IE, LTD-IE). The main innovation is that we separate learning or adaptation from processing or transmission. Transmission events leave traces at synapses and neurons that disappear over time (short-term plasticity), unless they add up over time to unusually high (low) neural activations, something that can be determined by threshold parameters. Only if a neuron engages in a high (low) activation-plasticity event we get long-term plasticity at both neurons and synapses, in a localized way. Such a model is in principle capable of operating in a sandpile fashion. We do not know yet what properties the model may exhibit. Certain hypotheses exist, concerning abstraction and compression of a sequence of related inputs, and the development of an individual knowledge.