Brian Castle
Optimization


We've looked at the concept of plasticity at the synaptic level, but when we look at neural populations instead, things get more complicated. Most of the machine learning models of neural networks use an "optimization" procedure, they usually either minimize an error term or maximize a likelihood term. In a biological network, how can we obtain global information, like for example, a Hamiltonian or Lagrangian, or any other useful measure of the population as a whole (as distinct from single neurons or single synapses)?

The classic optimization paradigm minimizes an error, usually the mean squared error, which is basically the sum of the squared distances between the predicted and actual values. The prediction comes from a model, it is the internal model that the network has of the world. This model is parametrized, there are one or more parameters (usually associated with hidden layers in the network) that are adjusted for best fit. In the classic treatment of least-squares regression, the global error is differentiated against the parameters to obtain an adjustment that reduces the error. In machine learning, this procedure is known as "gradient descent".

In a biological system, how does one determine the global error? Well, that turns out to be a complicated question, because in addition to the "global" error, it may become useful to calculate errors "regionally", say, per column or hypercolumn in the cerebral cortex (or whatever regional modularity makes sense). One approach is to put an electrode in the extracellular space, where we pick up local field potentials, which represent the aggregated activity of hundreds of neurons. Another approach is we can introduce some neurons into the circuit that add up all the local errors (unfortunately this pathway necessarily involves further delay, which is a complication we can set aside for a moment). Yet a third approach is we can use an external system, perhaps one involving glial cells.

The regional error model is especially interesting because it dovetails with predictive coding, Kuramoto dynamics, and regions of criticality. Let's quickly examine each of these areas in relation to optimization.

In the case of predictive coding, we immediately note that this is a local model and does not depend for example on a global free energy. However if we add a term representing local free energy to the error calculation we can generate new kinds of learning behavior.

In the case of Kuramoto dynamics, the phases of local oscillators are regional by definition, since they involve local circuits. The behavior depends on the shape of the coupling function between neighboring oscillators. In the human brain this shape varies from approximately Gaussian to distinctly multipeaked. Instead of a coupling constant, it is useful to think in terms of a coupling tensor, analogous to a spatial field whose value at any given point is an oscillator phase.

And finally in the case of critical regions, we can guess that these will be closely related to the phases of oscillators. Coupled oscillators tend to recruit more coupled oscillators, and these are the regions we expect to first exhibit criticality. However this linkage is not guaranteed, and determining the ways in which it can be uncoupled is an active area of current research.


Energy Function


Gradient Descent


Fitting A Model


Non-Euclidean Manifolds


Entropy


Next: Learning Mechanisms

Back to Visual System

Back to the Console


(c) 2026 Brian Castle
All Rights Reserved
webmaster@briancastle.com