Brian Castle
Timeline Revisited


So far we've talked about neurons and synapses, and we've seen one example each of a sensory and motor system. At this point we should keep in mind that the purpose of the brain is to optimize the behavior of the organism in real time, and that the brain is a computational window moving through physical time for exactly this purpose. In doing so it obeys physical laws, but it is nonlinear, and it has memory, and it often operates far from equilibrium, meaning that we can not adequately model it with simple linear systems.


Timeline Organization

Let's begin by organizing some of the systems we already talked about, into a neural timeline centered on "now". We talked about the visual system, and how the retina sends it information, and we talked about the oculomotor system, and how it brings targets to the fovea. The point of control between these two systems is out in space somewhere, that is to say, the eyes focus on a point in visual space, and then the information from that point is fed into the brain. How does the brain know, whether the point in focus, is actually the point that was desired?

Fixation is a rather complex reflex, as it involves the acquisition of an object whose coordinates have been determined relative to the current eye position. Therefore, the visual image must be processed first, the boundaries of objects detected and stored, the objects identified (or at least indexed) for target selection, and then the target information encoded and transmitted to the oculomotor system. Machines can do this, but they require sophisticated programming. Human beings self organize, the connections form all by themselves under very limited genetic guidance, and the biggest part of self-organization is the neural system coming to represent the properties of the data itself. The organism can best align its movements with a changing environment by learning the properties of the environment and adapting to them. Such adaptation happens in both space and time, as the fundamental piece of input is a moving stimulus.

Historically one of the most important concepts that developed around the turn of the 20th century is that neurons that fire together somehow strengthen their mutual connectivity. This idea was formalized by the psychologist Donald Hebb in 1949, in the form of a temporal correlation window between the firing of a neuron and the synaptic activity that immediately precedes it. In a discrete time setting the equation might look something like:

dW(i,j) = W(i,j) + α A(i) A(j)

where α is a learning rate and A(i) is the activity of neuron i. When both neuron i and neuron j are firing (1) at the same time, the strength of the synapse from i to j is increased by a factor α. We can get the synaptic strength to decrease instead, by making one of the neurons inhibitory or by changing the sign of α.

In practice, we'd like to make sure that the learning algorithm actually converges, and a common way of doing this is to adjust the amount of lateral inhibitory influence in the network. Recall that network topography can be conveniently specified in terms of convergences and divergences along the principal geometric axes. Whereas the retina is topopgrahic (and so is the cortex and the oculomotor targeting system), the oculomotor integrator is not, and at some point there must be a translation between the two-dimensional visual topography and the one-dimensional code that determines eye muscle contraction.

We can begin in a simple way by considering the self-organization of the oculomotor integrator, which is non-topographic. It takes a burst input from an EBN neuron, and converts it into a target position which it holds and forwards to the motor neuron. The incoming code is a burst rate, the amplitude of the saccade is linearly related to the firing rate within the burst. We need to control three things precisely: the beginning and ending of the saccade, the amplitude of the saccade, and the acquisition of the target as determined by an error between target position and the visual axis of the fovea. From a control systems standpoint, this translates into a set point and a gain, along with some start/stop logic. The gain is to be set "such that" the target is as close as possible to the fovea, after the eye movement.

We can represent this arrangement from a control system standpoint, as shown in the figure. We want to optimize eye position at time t=T=0, and the control loop has shown has an internal delay of 3 units on each side, so it will optimize at t=+3 based on information from t=-3. There are three inherent behaviors that must be fulfilled: fixation, tracking, and saccades (including their respective reflexes). At this level though, all we have to do is self-organize the integrator so it correctly tallies the incoming bursts and holds the designated eye position. This is just setting the gain. The gain is allowed to self-organize "such that" the amplitude of the muscle contraction exactly corresponds to the target position, and this is just a negative feedback loop, the only difficulty being that we have translate from retinal coordinates to muscle coordinates.



In a real human, the horizontal integrator lives in the nucleus prepositus hypoglossi (NPH), which works in conjunction with the medial vestibular nucleus (MVN). There is a separate integrator for vertical eye movements, in the interstitial nucleus of Cajal (INC). These systems develop after birth, in the first few weeks after the intial exposure to light. Since the targeting system in the superior colliculus is retinotopic, the eye movement has to be separated into its principal axial components. The two well known ways of doing this are with dot products and cosine angles (which turn out to the same thing in Euclidean geometries), and neural networks are especially adept at calculating dot products (as they are just multiplications of the input vector by the weight matrix). What remains then, is a way of translating the geographic coordinate to a burst amount. The idea is, the farther away the target is from the fovea, the bigger the burst amount will be, because a bigger muscle contraction (eye movement) is needed to foveate an eccentric target.



In humans, the timing around the oculomotor loop is very tight. EBN activity begins 11-12 msec before a saccade, which is just enough time for two synapses (presumably these would be the one from the EBN to the abducens motor neuron, and the one from the motor neuron to the lateral rectus muscle). However there's still an issue with the control loop. To derive an appropriate error signal, we need the amount of mismatch between the target and the fovea. How do we get that? On the retina is an image consisting of many possible targets, including the one we want. How is the desired target separated from all the rest?

Ultimately this question can only be answered by the visual system. First the object needs to be given a bounding box, then the center of mass needs to be identified, then the coordinates of interest relative to the center of mass can be specified, and so on. Part of this information is used to select the target, but once the target is selected, another part of the object-related information is used to focus precisely within the target area. Sometimes it doesn't matter exactly where the active focus begins, but sometimes it does. This places stringent requirements on the targeting system, since it's not necessarily true that peripheral targets are less important.

Once having a unique set of target coordinates though, we still have the issue of translating those into muscle contractions. The projection onto coordinate axes is straightforward, however these axes are only needed at and below the level of the EBN burst neurons. There are schemes that allow the simultaneous calculation of dot product and target vector, as long as a closed loop control output from the oculomotor muscles is not needed. This latter issue is somewhat thorny insofar as it involves the palisade endings, which are still controversial, however ultimately in humans there is no behavioral need to move the eyes to a specific angle in the absence of a visual stimulus, and in fact many people find that learning this behavior is quite difficult. The control loop seems to be closed by the visual image itself, which drives both saccades and pursuit movements.

Therefore we can have an open loop ballistic phase at the beginning of a saccade (which dovetails with human physiology), until such time as the visual control loop can activate. One of the closed loop's most important functions is turning off the saccade, which should be done precisely on target, and therefore the loop should be predictive of the target location at the time it's turned off. Mapping this onto the timeline, we can thus clearly see the need for several types of control loops, and in a real brain the existence and timing of these loops can be verified with white noise analysis.



Now let us consider two different kinds of self organization that occur at a higher level. First, there is the topographic organization of the optic radiation and its modular organization at the level of the cerebral cortex. Then, there is the mapping between the cerebral cortex and the oculomotor targeting system in the superior colliculus. Much is known about the former, and less about the latter. To understand topographic mapping, we first need to take a look at the mathematics underlying self organizing maps.


Topographic Networks

At this juncture in the discussion, topographic networks can be loosely equated with CNN's (convolutional neural networks), at least in the context of the retinotopic visual system. In this analogy the convolutional kernel is related to the convergence and divergence of the neural connections, and could in many cases be considered hard-wired.

In a digital monitor each pixel has a coordinate and an RGB value (a word). However at the output of the retina each "pixel" has attached to it four colors, on and off times, velocities, spatial and temporal frequency information and dynamics, and possibly systemic information related to ongoing oscillations. Instead of three values per pixel, the retina outputs several dozen. These are then further integrated in different ways in the LGN. Each of the six layers leaving the LGN (plus the koniocellular strands between the layers) contains multiple channels (on cells, off cells, etc) which then multiplex into various layers in the cerebral cortex.

In the primary visual cortex there are 140 million neurons on each side, compared to just 1 million leaving the retina and 6 million in the LGN. The topography of the cortex is such that each mini-column consisting of about 60-80 neurons corresponds with approximately one point on the retina, and thus we expect a complete set of orientation columns and ocular dominance columns with their accompanying blob to cover a retinal territory equivalent to about 80 photoreceptors. This is approximately correct given what we know about the connections between photoreceptors and bipolar cells, bipolar cells and retinal ganglion cells, and ganglion cells and LGN relay neurons.

The salient feature of a topographic map is that its coordinate system is maintained through the mapping. In general there will be both neural gradients and macroscopic brain topography overlaid on top of the connection map, however the underlying coordinate transformation can always be described by the synaptic weight matrix Wi,j. If the input to a layer is X(t) and the output is Y(t), then

Y(t) = f ( Σ Wi,j(t) * X(t) )

where f is a nonlinear threshold function (usually sigmoidal, logistic, ReLU, etc), and unused or non-existent connections are assigned a weight of 0. In a 1:1 point-to-point topography Wi,j = 0 for all i != j, whereas in an omniconnected topography we must sum over all inputs. In a convolutional architecture the kernel usually covers some combination k of nearest neighbors, so we might have a function like [ d(Y,X) < k ] multiplied by a weighting factor based on the distance d. In all cases, the maps X and Y must be aligned in order to calculate d. If we have source and destination manifolds we must convert to a common coordinate system first, do the math, and convert back. Since these networks mainly work on correlation, it is important that we be able to map and parametrize non-Euclidean distributions. This topic is treated by information geometry, which we'll consider momentarily.


Self Organizing Maps

In self organizing maps, it's the coordinate system that changes, not the underlying data. We're going to find the coordinate mapping that best describes the data. Self organizing maps work on correlation. Imagine there is a retina R, and a visual cortex C. The nerve fibers from R have to find their targets in C, and the topography has to be maintained in the mapping. This will be a two dimensional mapping, in retinal coordinates, and we'd like to be able to use Cartesian coordinates or polar coordinates, whichever happens to be more convenient at the time. We'd like the topography to self-organize, and we'd also like the synaptic strengths to self-organize. (Sometimes we get lucky and those are the same thing, but sometimes they have to be separated).

We need to approach this scenario from two different directions simultaneously. The first is, we can stipulate that each map (source and destination) has a pair of marker molecules whose gradients "approximately" determine the position of each neuron. And second, we need to specify the learning rule, and the mechanism by which changes in synaptic strength occur. In this network, learning will occur by competition rather than error gradient. However later we'll see how we can accomplish the same thing with gradient descent.

Before we look at connecting two maps, let's look at what happens within a single map. Let's imagine we have a neural network layer we'll call C, and it initially has some kind of regular arrangement (specified by its pair of marker molecules, which together form a coordinate system). And we'll feed another layer into it, we'll call the input layer R. R will also have a coordinate system specified by two marker molecules, but initially for this first demonstration we won't align anything, we'll just say there is "some mapping" between source and destination.



To set up the initial synaptic weights, we can take a random distribution with a small positive mean and a small variance. Now we will begin to stimulate R in three specific locations, as shown in the diagram. What we notice is that the synaptic connections are strengthened in such a way that all the weights move towards the stimulation sites, they cluster around the places where the input is.



The network is responding to the simultaneous occurrence of signals at two different points in space, which in this case corresponds to the statistics of the input signal - because we never show the network random dot stereograms, we always show it real objects - things with edges, contours, and surfaces. If we were to show it random (white) noise instead, there would be no correlations among the inputs, and therefore the synaptic weights would fail to differentiate.

We can extend this paradigm to the case of an associative memory, by simply splitting the input into two parts. If we have X(a) and X(b), and there is no autocorrelation, then the network will come to respond to the correlation between a and b. If there is auto-correlation, even within a single image, then the autocorrelation will be represented in the network.

In this way, a self-organizing map based on correlation will find the coordinate system that best matches the organization of the input statistics. In the language of statistics, we say that the network learns the distribution of the input. For this to happen, it is important that the representation of the image in the network "approximately match" the real statistics, in other words if the network is processing the distance between edges, then that distance should fall within the boundaries characteristically displayed by the data. In a human environment walls can typically be at opposite edges of the visual field, whereas the legs of a fly are very close together, therefore the visual system must handle a broad range of spatial frequencies. On the other hand there are rarely abrupt discontinuities in the time domain, unless they are due to occlusions or other interactions between objects, and thus the visual system can safely process the normal range of environmental velocities and refer anything else to short term memory (since it is likely there will be intervening saccades).


Buffer Memory

So there is one type of self-organization that requires movements to be precisely timed, and there is another type that doesn't care much about timing and instead extracts invariances. Are these the same? Is there a common underlying mechanism? The good news is that yes, there probably is. The bad news is that it uses a library of dozens of different kinds of plasticity. Each variety is good for something specific, and the brain uses whichever one is best suited to the task at hand.

Visual scene reconstruction requires a buffer memory to hold previous scenes belonging to the same episode. This is useful for both contextual recall and episodic storage. One of the interesting aspects of associative memory is how to recall a "subspace" without recalling the whole information field. For example if there are two clusters in memory, one called "table" and another called "chair", and we are shown an image of a chair, we'd like to be able to recall the relevant subspace without having it cluttered by irrelevant information about tables. This means our associative memory should have some advanced dynamic properties, and it also means that there should be a temporary memory that can hold one or more subspaces as long as they're in active use. We'll see how we can accomplish this, but first we need to take a look at how a network finds the right memory, and what happens when it does.


Next: Synaptic Plasticity

Back to Embedding the Timeline

Back to the Console


(c) 2026 Brian Castle
All Rights Reserved
webmaster@briancastle.com