There has been a continuous bleed of biological, philosophical, linguistic, and psychological concepts into computer science since the 1950s. Artificial neural networks were inspired by real ones. Simulated evolution was designed around metaphorical patterns of natural evolution. Philosophical, linguistic, and psychological ideas transferred as knowledge representation and grammars, both natural and formal.
Since computer science is a uniquely synthetic kind of science and not quite a natural one, borrowing and applying metaphors seems to be part of the normal mode of advancement in this field. There is a purely mathematical component to the field in the fundamental questions around classes of algorithms and what is computable, but there are also highly synthetic issues that arise from architectures that are contingent on physical realizations. Finally, the application to simulating intelligent behavior relies largely on three separate modes of operation:
- Hypothesize about how intelligent beings perform such tasks
- Import metaphors based on those hypotheses
- Given initial success, use considerations of statistical features and their mappings to improve on the imported metaphors (and, rarely, improve with additional biological insights)
So, for instance, we import a simplified model of neural networks as connected sets of weights representing some kind of variable activation or inhibition potentials combined with sudden synaptic firing. Abstractly we already have an interesting kind of transfer function that takes a set of input variables and has a nonlinear mapping to the output variables. It’s interesting because being nonlinear means it can potentially compute very difficult relationships between the input and output.
But we see limitations, immediately, and these are observed in the history of the field. For instance, if you just have a single layer of these simulated neurons, the system isn’t fundamentally complex enough to compute any complex functions, so we add a few layers and then more and more. But there are a great number of weights that need to be adjusted now, so we have to wait until computers get faster to see what these systems can really do. Meanwhile some folks tied outputs back into inputs to try to create a recurrent neural network that was even harder to train but seemed like it might give us temporal memory of sorts where previous exposures of stimuli were not simply forgotten but added to the computation of the new state. That might help with things like conversational dialog or planning where past states influence future decisions.
And then, fairly recently, it was hypothesized that pre-training the networks on large-scale features followed by fine tuning might help avoid getting trapped in local minima and speed up convergence towards a solution, as well. And that needed to wait until compute really caught up by using gaming video cards (GPUs) and related hardware to do more efficient large-scale parallel training of these new networks.
But there remains the problem of temporal memory given presentations, so variations on the theme of recurrent networks arises again, but in a more refined, targeted, and trainable way (LSTMs).
And what does it have to do at this point with biological neural networks? Not a whole lot. There are efforts to simulate such things, but real networks are extremely complex compared to the simplified logical models, involving varying types of inhibition, activation, mediating neurotransmitters, types of neurons, and other types of cells in different areas of the body. The January 29th AAAS Science has an incredible overview of mapping natal “connectomics” (new word for me, too) of mice.
But we are operating on a typically unstated model that given the same input and same output, refining the system in order to get closer to some approximation of intelligent behavior is both useful and might give us a better understanding of human intelligence. Perhaps the most relevant parallel to this borrowing is in theoretical physics where mathematical advances and methods are used to try to align theories internally and with experimental results. There is a parallel in CS where purely mathematical approaches to describing optimal inductive reasoning over data have relied on both computational and mathematical insights (SVDs for Latent Semantic Analysis, HMMs, Support Vector Machines, and, of course, ideas in Kolmogorov complexity.)
What improvements can we expect in the near future? Perhaps the most interesting is quantum computing of things like gradient descent, the standard training approach to many of these systems. Annealing-style quantum computing like D-Wave’s much scrutinized approach might be effective in speeding-up training in neural networks. But that has little to do with the algorithms themselves, instead only reflecting the compute substrate, not the method of intelligent behavior itself.
But there is little doubt that we will keep borrowing and computing.