Why Brain-like Computers Are Hard
Computers that run the Von Neumann architecture store their programs and data in the same memory bank.
Since both have to travel the same road to get and from the CPU, we find that the system is ultimately limited not by the CPU or GPU's computational limits but by said road.
This is the famous Von Neumann bottleneck.
In a previous video I talked about in-memory computing as a way to bring a computer’s memory closer to the compute.
But making a computer that thinks like the brain - a neuromorphic system as it is called - entails far more than just memory.
For this video, a look at these brain-inspired systems and the fundamental differences between computer and brain.
Why the Brain
First things, first. The brain.
Computer scientists have long had the desire to replicate the brain. But why? What is so special about the brain? Aren't computers just better?
Computer scientists have long admired the brain's ability to function at very low energy. The brain operates at about 12 to 20 watts of power - 20% of the body's metabolic rate.
The desktop computer on the other hand does about 175 watts. Leading edge AI accelerators like the Nvidia H100 use anything from 300 to 700 watts.
The brain also does not operate at a very fast pace - with a clock frequency of about 10 hertz. Though this varies depending on what the person is doing at the time and their mental state.
So the brain does not use a lot of power and doesn't operate very quickly. And yet it is capable of so much.
Imagine a bee. A bee's brain has less than a million neurons and runs on less than a watt of power. And yet it can fly. It can navigate to flowers and back home. It can socialize and maybe even calculate things.
A bee's brain is just as capable as a 18 billion transistor system on chip, and it can do all that with just a million neurons and virtually no power.
Perhaps we should start using brains and not silicon chips for learning? Oh wait ...
The biological brain's powerful capabilities - achieved at low power - is the single most significant motivation for building neuromorphic hardware today.
Insane Parallelism
A brain accomplishes these with parallelism.
Your brain's 86 billion neurons operate without a central clock. By which I mean that they fire their signals - referred to as "spikes" - on their own time, based only on the spikes they receive from their neighbor neurons.
Measured in floating-point operations per second, the brain is an exaflop-level compute device on par with the most advanced supercomputers.
An Nvidia H100 by comparison can only do something like 60 teraflops - depending on the variant.
The brain’s lack of synchronization is in contrast to digital circuits like a CPU. A CPU relies on signals from a central clock as a reference by which to coordinate their calculations. It lets them crunch certain tasks very quickly, but synchronicity has costs.
A central clock system costs time and energy to distribute the clock signals across the system. And there is waste as each system component does not execute its task until it is told to do so as per the central clock signal.
Chaos as a Feature
No coordination implies that neural activity is chaotic, to say the least.
No doubt about that. But brains make that chaos a feature not a bug. The neural environment is full of noise - spikes firing from neuron to neuron but going nowhere.
When a neuron receives a spike from a synapse, the majority of the time it does nothing - noise. But there are so many extensive connections - with so many synapses - that enough spikes get through to the right neurons to carry on.
Again, in huge contrast to the digital computer, which works hard to make sure that every signal matters. A modern 3.2 gigahertz Intel CPU sends a "noise" signal once every 24 hours.
But as always, there are tradeoffs. You use a lot more power to achieve this very low signal-to-noise ratio. Imagine the work you need to put in to synchronize billions of transistors.
That is how the brain achieves its efficiency. Neurons literally throw things at the wall and are not afraid to make mistakes. In doing so, they find something that works. I hope that made sense.
And by living amidst chaos, brains also become shockingly resilient and flexible - even in situations of massive damage. It is not hard to see the value of this for computer hardware systems.
Neuron
People who want to copy the brain often start with its fundamental unit - the neuron.
Neurons are cells in the brain and the rest of the nervous system. When they receive spikes from their neighbors, they can - if they so choose - send their own spikes on to their neighbors.
All neurons share several common features - dendrites, soma, and the axon. Spikes enter the neuron's cell body - formally called the soma - through dendrites.
Dendrites are the neuron's input pathways. A typical neuron in the outer layer of the cerebrum - the largest part of the brain - has 10,000 inputs.
A typical neuron in the cerebellum - the second largest part of the brain, but trying hard to get to number one - has up to a quarter of a million inputs.
If sufficiently stimulated, the neuron sends a spike to its neighbors. Or a series of spikes - a neural code, though how the code works remains somewhat of a mystery.
This spike is sent through an out-path called an axon. The axon's primary goal is to ensure the signal is forwarded faithfully - though it does not always do this.
And then from there, it enters the neighbor neurons' dendrites through what we call "synapses" - an electrochemical structure for connecting two neurons.
Such a description understates their role in computation. Variations in synaptic structure over long time periods - a thing referred to as "synaptic plasticity" can subtly change the neuron spikes and how they are received. We should not ignore them.
Diversity
So those are the common structures, but the brain's 86 billion neurons exhibit incredible diversity. As you might expect with 86 billion of anything.
Some neurons - due to the complicated chemistries of their axons - can send signals faster than the Taiwan High Speed Rail. Other neurons, slower than a Taiwan sea turtle.
Some motor neurons can stretch up to a meter long. The ones in a giraffe can be multiple meters long! Other neurons on the other hand might just be 0.1 millimeters long.
And neurons - just like us - have their own preferred stimuli. In 1981, David Huber and Torsten Wiesel won half of the Nobel Prize for Medicine for showing that some neurons fired most rapidly when shown lines going in one direction over others.
As I mentioned above, the majority of the time a neuron receives a spike from a neighbor neuron, things get left on read.
So if we are to sum it up enough to get us to the next section of the video, the brain works by propagating huge, random waves of spikes throughout its billions of diverse neurons. Many of these spikes end up as wasted noise, but many are relevant too.
Neurons and Synapses merge together the work of computation and memory - doing a bit of both. Memories are stored in the relative strength of the synapses between neurons. But those synapses can do calculations as well.
That is why the brain does not suffer the Von Neumann bottleneck. That's their secret, Cap. They aren't separate.
Going Neuromorphic
So in order to create a proper neuromorphic computer,
we not only have to implement artificial versions of neurons and synapses - But also the way they very tightly bind together the memory, compute, and communications between the two.
Now, you might be thinking, "What about artificial neural networks like those running in ChatGPT? Can we call those neuromorphic?"
It is true that these neural networks started from our understanding of how the human brain works. So many of the concepts overlap.
Perceptrons for instance are a simplified mathematical model of the meat neuron. It approximates the neuron's behavior by taking in a weighted sum of inputs, applying an "activation function" to mimic the neuron's stimulation, and fires off an output to its neighbors.
But virtually all of these artificial neural networks - especially the ones running out there in the real world - run on Von Neumann hardware - which means dealing with the bottleneck.
Changes can be made to the hardware in order to improve performance and power consumption, of course. That is why we GPU and TPU. But the tantalizing possibility remains of getting game-changing benefits by running this neural "software" on actual neural "hardware".
The neural software - often referred to as "spiking neural networks" to differentiate from Von Neumann-style ANNs and Deep Neural Networks - shall be discussed some other day. Let's talk hardware.
Silicon Neurons
Many industrial and academic players have demoed neuromorphic hardware created with traditional CMOS transistors.
So normal semiconductor manufacturing processes like that for Von Neumann computers. Let me note a few of these "silicon neurons". Many are - like today's artificial neural networks - programmatic approximations of the neuron's behavior.
We have IBM's TrueNorth project from 2014 - the first widely distributed neuromorphic chip. It is capable of running inference - recognizing that a person is doing something in a video or controlling a robot.
TrueNorth is a special CMOS integrated circuit with 4,096 cores - each with 256 programmable neurons. The whole chip has 256 million programmable synapses.
A big point that IBM made was how it uses far less power than most computer systems - the chip's 5.4 billion transistors consume about 70 milliwatts.
Other semiconductor companies have shipped silicon neurons of their own too. Intel Labs has their Loihi and Loihi 2 neuromorphic AI hardware.
The European Human Brain Project was a massive ten year science project initiated in 2013, looking to do groundbreaking work in the neural sciences.
One of their projects was the BrainScaleS project, which was first released in 2011. A second version was released in 2020 with improved local learning capabilities.
It is a mixed-signal ASIC chip that uses analog electronic circuits to mimic the spiking neurons of the brain. Very interesting though its analog neurons are not as flexible as its digital counterparts.
And then in 2019 we have the Tianjic project, a hybrid chip created by scientists at China's elite Tsinghua University. It is a hybrid platform that attempts to run various types of neural networks - including those designed for neuromorphic hardware.
Crossbar
How do we implement a neuron network in hardware? Let us briefly look at how IBM's TrueNorth does it.
TrueNorth is fabbed on a 28 nanometer CMOS process. As I mentioned, it has 4,096 of what they call neurosynaptic cores. It is capable of running widely adopted convolutional neural networks.
Each basic TrueNorth core has 256 axons, 256 neurons, and synapses fully connecting them in a 256x256 crossbar structure, implemented using SRAM.
Each of the axons are given a synaptic weight depending on the neuron that they are connected to. And each neuron has a state or "membrane potential" as well as a particular threshold for sending on a spike.
The system works on cycles. During the cycle, spikes travel to the neurons through axons, which affects the spike's value. The neuron collects the incoming spikes into buffers and then evaluates them.
They then update their own membrane potentials accordingly. Once having done that, they compare it against their threshold. If the membrane potential meets or beats the threshold, the neuron sends a spike of its own. Spikes can be sent to local neighbors or outside the core itself.
Just like the real brain, TrueNorth does not have a global clock. The elements in each of the cores work asynchronously in cycles, doing things only in response to events. They also operate at low frequency - 10 hertz.
And to implement the neuron's inherent randomness and noise, each core has a random number generator that can raise the thresholds for creating a spike. Or randomly delay or even halt a spike's transmission through a synapse.
It's like that episode of It's Always Sunny in Philadelphia. Wildcard!
CMOS Shortcomings
A significant benefit of building these "silicon neuron" chips using CMOS processes is that we get to draft in the wake of the chip giants.
They share some of the benefits of the brain. TrueNorth for instance looks to be very scalable, sips relatively lower power compared to GPUs, and can run widely used neural network software to get accurate results.
But there are drawbacks. A test of TrueNorth's performance found tradeoffs between energy efficiency and accuracy. If we want the model to give us competitively accurate results - which can make a big difference in a commercial context - we need to use more power.
These circuits are also quite large, since each of the cores have to have their own memories. Von Neumann machines benefit from having a single, very dense central memory bank. Having denser, higher capacity allows us to accommodate the larger models that are so in vogue.
And lastly, the brain is an analog device - digital devices are an ill fit for replicating their behavior. So we need to incorporate the analog element, which limits the system's flexibility.
These disadvantages have pushed scientists to look beyond CMOS for new ways to implement neuromorphic devices. The most popular such approach is the memristor.
Memristors
In 1971, Leon Chua - then a professor of Electrical Engineering and Computer Sciences at UC Berkeley - published an article proposing a new type of circuit.
By looking at the relations between the three major circuit elements, he proposed the existence of a fourth - the "memory resistor" or memristor.
This paper was very difficult to read and I will freely admit that I did not get it. So I am not going to explain any more than that. Anyway, The idea fell on the wayside until 2008, when scientists at Hewlett-Packard announced a physical implementation of the memristor.
The original 2008 memristor was a simple metal-dielectric-metal sandwich with two terminals, or points of connection. In this case, the metal electrodes were made from platinum and the dielectric was titanium oxide.
If we are to apply a voltage pulse to the memristor, the film can switch its electrical resistance - flipping between an insulating and conducting state in a non-linear fashion.
What is it about the memristor that makes it so suitable for neuromorphic computing? Their key characteristic is that the value of that electrical resistance is dependent upon the history of the voltage passing through it - ergo the name "memory".
Even better, it can remember that history even when the power goes off. This makes the memristor a form of "non-volatile memory" like flash memory or a hard drive.
People quickly drew connections between memristor behavior and that of biological synapses. Scientists envisioned them as a non-volatile memory for in-memory computing. For the past 15 years, such devices have been at the center of neuromorphic study.
But there are a few challenges. The first has to do with manufacturability. It can be very difficult to produce enough of these memristors uniformly and at scale. Can they handle many cycles of resistance switches? And how long can they store their data? Engineers are still working through these questions.
Also, the changes in the memristor's resistance are non-linear, which makes it somewhat challenging to program for. New software paradigms in neural network programming may be needed for memristor-based neuromorphic computing to work. The work goes on.
Conclusion
I think the biggest challenge with these neuromorphic systems though is the competition.
Nvidia and other adherents of "Huang's Law" are leading the way. According to Huang's Law, silicon chips powering AI more than double in power every two years.
In the past 10 years, AI inference performance in the GPU has improved 1,000 times. It might be hard to compete against this. But maybe we don’t have to. Considering each system's advantages, we might start seeing more hybrid systems.
Perhaps we can put neuromorphic and Von Neumann chiplets together so to give us the best of both worlds. Like a mullet. Business in the front. Party in the back.