What’s wrong with modeling?

A typical task in theoretical neuroscience is “modeling”, for instance, building and analysing network models of cerebellar cortex that incorporate the diversity of cell types and synaptic connections observed in this structure. The goal is to better understand the functions of these diverse cell types and connections. Approaches from statistical physics, nonlinear dynamics and machine learning can be used, and models should be “ constrained “ by electrophysiological, transcriptomic and connectomic data.

This sounds all very well and quite innocuous. It’s “state of the art”. So what is wrong with this approach that dominates current computational neuroscience?

What is wrong is the belief that a detailed, bottom-up model could fulfill the goal of understanding its function. This is only a way to synthesize existing information. Many, most aspects of the model are selected in advance, and much existing information is left out, since it is irrelevant for the model. Irrelevant for the model does not mean that it may not be crucial for the function of the real biological object, such as a neuron. For instance, for decades the fact that many neurons adapt in terms of their ion channels under behavioral learning was simply ignored. Then the notion of whole-neuron learning became acceptable, and under the term “intrinsic excitability” it slowly became part of computational modeling, and now various functions are being discovered, where those changes were first dismissed as “homeostatic” , i.e. only functions in terms of house-keeping were accepted.

If we started a top-down model in terms of what makes sense (what would evolution prefer), and what is logically required or useful for the neuron to operate, then we would have realized a long time ago that whole neuron (intrinsic) excitability is a crucial part of learning. This paper was first published on arxiv 2005, but accepted for publication only in 2013, when intrinsic excitability had become more widely known.

Language in the brain

We need quite different functionality than statistics to model language. We may need this functionality in other areas of intelligence, but with language it is obvious. Or, for the DL community – we can model anything with statistics, of course. It will just not be a very good model …

What follows is that the synaptic plasticity that produces statistical learning does not allow us to build a human language model. Weight adjustment of connections in a graph is simply not sufficient – under any possible model of language to capture language competency.

This is where it becomes interesting. We have just stated that the synaptic plasticity hypothesis of memory is wrong, or else our mammalian brains would be incapable of producing novel sentences and novel text, something we have not memorized before.

The next paradigm: AI, Neural Networks, the Symbolic Brain

I don’t think neural networks are still in their infancy, I think they are moribund. Some say, they hit a roadblock. AI used to be based on rules, symbols and logic. Then probability and statistics came along, and neural networks were created as an efficient statistical paradigm. But unfortunately, the old AI (GOFAI) was discontinued, it was replaced by statistics. Since then ever more powerful computers and ever more data came along, therefore statistics, neural networks (NN), machine learning seemed to create value, even though they were based on simple concepts from the late eighties, early nineties or even earlier. We failed to develop, success was easy. Some argue for hybrid models, combining GOFAI and NN. But that was tried early on, and it wasn’t very successful. What we now need is a new, and deeper understanding of what the brain actually does. Because it obviously does symbol manipulation, it does logic, it does math. Most importantly, we humans learned to speak, using nothing better than a mammalian brain (with a few specializations). I believe there is a new paradigm out there which can fulfill these needs: language, robotics, cognition, knowledge creation. I call it the vertical-horizontal model: a model of neuron interaction, where the neuron is a complete microprocessor of a very special kind. This allows to build a symbolic brain. There will be a trilogy of papers to describe this new paradigm, and a small company to build the necessary concepts. At the present time, here is a link to an early draft, hard to read, not a paper, more a collection right now, but ready for feedback at my email! I’ll soon post a summary here as well.

Two parts of the brain

Here is an image from the 10th art of neuroscience contest. It is striking in that it gives a view of the brain, where there are two major parts: the cerebrum and the cerebellum. Of course there are a lot of interesting structures hidden within the cerebrum, but still: Is this a von Neumann architecture, memory and processor? No, we already know that memory is an ubiquitous phenomenon and not localized to any brain structure (notwithstanding the function of the hippocampus). How about a CPU and a GPU? Closer, since the cerebellum is (a) structurally remarkably homogeneous, suitable to performing many routine operations (b) fast, by an order of magnitude faster than the cortex, (c) considered to perform ‘supportive’ functions in terms of smooth motor movements, complex perception, but also standard thought processes, always improving fast coordination and integration of information (d) also, as it mostly speeds up, improves and calibrates brain processes, an organism is able to survive without it  (e) contains very many processing units (50% of all neurons of the brain), even though its function is subsidiary to cortex, (f) has a very constrained feedback modality, a bottleneck in information transfer through deep cerebellar nuclei, even though it receives large amounts of information from the cortex. Although the cerebellum does not do graphical displays, it may “image” cortical information, i.e. represent it in a specific, simple format, and use this to guide motor movements, while providing only limit feedback about its performance. Food for thought.

Why a large cortex?

mouse

If we compare a small mouse cortex with a large human cortex, the connectivity per neuron is approximately the same (10^4/neuron SchuezPalm1989). So why did humans add so many neurons, and why did the connectivity remain constant? For the latter question we may conjecture that a maximal size is already reached in the mouse. Our superior human cognitive skills thus rest on the increased number of neurons in cortex, which means the number of modules (cortical microcolumns) went up, not the synaptic connectivity as such.

Antagonistic regulation for cellular intelligence

Cellular intelligence refers to information processing in single cells, i.e. genetic regulation, protein signaling and metabolic processing, all tightly integrated with each other. The goal is to uncover general ‘rules of life’ wrt e.g. the transmission of information, homeostatic and multistable regulation, learning and memory (habituation, sensitization etc.). These principles extend from unicellular organisms like bacteria to specialized cells, which are parts of a multicellular organism.

A prominent example is the ubiquitous role of feedback cycles in cellular information processing. These are often nested, or connected to a central hub, as a set of negative feedback cycles, sometimes interspersed with positive feedback cycles as well. Starting from Norbert Wiener’s work on cybernetics, we have gained a deeper understanding of this regulatory motif, and the complex modules that can be built from a multitude of these cycles by modeling as well as mathematical analysis.

Another motif that is similar in significance and ubiquity is antagonistic interaction. A prototypical antagonistic interaction consists of a signal, two pathways, one negative, one positive, and a target. The signal connects to the target by both pathways. No further parts are required.

On the face of it, this interaction seems redundant. When you connect a signal to a target by a positive and a negative connection, the amount of change is a sum of both connections, and for this, one connection should be sufficient. But this motif is actually very widespread and powerful, and there are two main aspects to this:

A. Gearshifting, scale-invariance or digitalization of input: for an input signal that can occur at different strengths, the antagonistic transmission allows to shift the signal to a lower level/gear with a limited bandwidth compared to the input range. This can also be described as scale-invariance or standardization of the input, or in the extreme case, digitalization of an analog input signal.

B. Fast onset-slow offset response curves: in this case the double transmission lines are used with a time delay. The positive interaction is fast, the negative interaction is slow. Therefore there is a fast peak response with a slower relaxation time– useful in many biological contexts, where fast reaction times are crucial.

Negative feedback cycles which can achieve similar effects by acting on the signal itself: the positive signal is counteracted by a negative input which reduces the input signal. The result is again a fast peak response followed by a downregulation to an equilibrium value. The advantage for antagonistic interactions is that the original signal is left intact, which is useful. because the same signal may act on other targets unchanged. In a feedback cycle the signal itself is consumed by the feedback interaction. The characteristic shape of the signal, fast peak response with a slower downregulation, may therefore arise from different structures.

The type of modules that can be built from both antagonistic interactions and feedback have not been explored systematically. However, one example is morphogenetic patterning, often referred to as ‘Turing patterns’, which relies on a positive feedback cycle by an activator, plus antagonistic interactions (activator/inhibitor) with a time delay for the inhibitor.

 turing_pattern

Consciousness made easy

IMS Photo Contest 2017

For a long time I didn’t know what research on consciousness was to be about. Was it being able to feel and think? Was it perceptual awareness (as in ‘did you hear that sound’?) What did attention have to do with it (the searchlight hypothesis), i.e. lots of stored information is present but not ‘in consciousness’ at any given moment in time?
Finally, while discussing  the issue that no one has a good theory of anesthesia (TMK), (i.e. how it happens and why it works), it occurred to me we can simplify the question, and make it solvable in a fairly easy way:
Consciousness (C) made easy is just the difference between awake state W  and anesthesia/slow wave sleep SWS/A.

C = W – SWS/A

The difference is what makes up consciousness. We can measure this difference in a number of ways, brain imaging, neuronal spiking behavior, EEG/EcoG, LFPs, voltammetry of neurochemicals, possibly gene expression, and quantify it. Sure it is not a simple task, and people may disagree on how to integrate measurements for a solid theory of what is happening, but conceptually it is at least clearly defined.

600px-CjwUpDownStateFig1

Charles Wilson (2008), Scholarpedia, 3(6):1410.               doi:10.4249/scholarpedia.1410

An important difference is the appearance of up-and down states when unconscious. Possibly in this state only the purely mechanical coupling of the neuronal mass remains, and the fine-tuned interactions by chemical receptors and channels is simplified such that the high entropy asynchronous spiking is abolished.

It would be interesting to further investigate the soliton theory for this question.

 

Some thoughts about Language Evolution

Judging by ontogenetic development, language is derived from two separate streams (and a third one later on).
One of these is the development of object and action concepts by visuomotoric handling as opposed to background, space or situation. This may well be specific to humans with infants developing eye-hand-coordination in contrast to other species which do not undergo such a phase and may not develop strong concepts for objects. Infants also form a concept for actions when they experience their own agency and extend it to other agents.
The other is the sound-making ability, which appears innately pleasurable to human infants, and contributes to a long period of babbling.
At around 10-12 months of age these developments combine (‘naming insight’ in child language literature) and articulated words which refer to simple concepts arise. These refer to objects at first, sometimes actions and are articulated by phonological sequences. The one and two word (‘pivot grammar’) stages follow.
I see no reason to assume that phylogenetic development should have been different – the development of articulatory abilities, considered beautiful for their own sake (‘music/song’) running in parallel with solid conceptual structuring of the environment before one and two word communication became commonplace.
What about communication and communicative needs? I believe they popped up after the first concept naming skills took root. Suddenly there must have been a drift towards higher information content messages rather than just a string of words denoting objects and actions. Communication then is what must have driven grammar. Note that grammar remains a highly social accomplishment – like phonological sophistication it even requires a critical period. Grammar like phonology probably has a strong striatal component – habit learning – and it spontaneously appears in sign language as well. For this we would require a more detailed theory on how grammar arose from communication, when each grammatical system is distinct from any other.

Theories, Models and Data

In the modern world, a theory is a mathematical model, and a mathematical model is a theory. A theory described in words is not a theory, it is an explanation or an opinion.

The interesting thing about mathematical models is that they go far beyond data reproduction. A theoretical model of a biological structure or process may be entirely hypothetical, or it may use a certain amount of quantitative data from experiments, integrate it into a theoretical framework and ask questions that result from the combined model.

A Bayesian model in contrast is a purely data-driven construct which usually requires additional quantitative values (‘priors’) which have to be estimated. A dynamical model of metabolic or protein signaling processes in the cell assumes only a simple theoretical structure, kinetic rate equations, and then proceeds to fill the model with data (many estimated) and analyses the results. A neural network model takes a set of data and performs a statistical analysis to cluster the patterns for similarity, or to assign new patterns to previously established categories. Similarly, high-throughput or other proteomic data are usually analysed for outliers and variance with statistical significance with respect to a control data set. Graph analysis of large-scale datasets for a cell type, brain regions, neural connections etc. also aim to reproduce the dataset, to visualize it, and to provide quantitative and qualitative measures of the resulting natural graph.
All these methods primarily attempt to reproduce the data, and possibly make predictions concerning missing data or the behavior of a system that is created from the dataset.

Theoretical models can do more.

A theoretical model can introduce a hypothesis on how a biological system functions, or even, how it ought to function. It may not even need detailed experimental data, i.e. experiments and measurements, but it certainly needs observations and outcomes. It should be specific enough to spur new experiments in order to verify the hypothesis.
In contrast to Popper, a hypothetical model should not be easily falsifiable. If that were the case, it would probably be an uninteresting, highly specific model, for which experiments can be easily performed to falsify the model. A theoretical model should be general enough to explain many previous observations and open up possibilities for many new experiments, which support, modify and refine the model. The model may still be wrong, but at least it is interesting.
It should not be easy to decide which of several hypothetical models covers the complex biological reality best. But if we do not have models of this kind, and level of generality, we cannot guide our research towards progress in answering pressing needs in society, such as in medicine. We then have to work with old, outdated models and are condemned to accumulate larger and larger amounts of individual facts for which there is no use. Those facts form a continuum without a clear hierarchy, and they become quickly obsolete and repetitive, unless they are stored in machine-readable format, where they become part of data-driven analysis, no matter their quality and significance. In principle, such data can be accumulated and rediscovered by theoreticians which look for confirmation of a model. But they only have significance after the model exists.

Theories are created, they cannot be deduced from data.