Building Analog Neuromorphic Compute Units to Better Perform MAC Operations
June 14, 2024
Blog
The presence of neural networks in our lives is becoming increasingly widespread, from voice assistants to professional programs. While the range of their applications is growing, their potential is still severely limited by existing computing capabilities and related power consumption.
The neural networks are “programmed” using connections between neurons (synapses), or rather the strength of this connection. The signal supplied to the system's input, passing through the synapse, is multiplied by its “weight.” As a rule, several synapses approach one neuron, their outputs' values are summed, and some nonlinear activation function is applied to the sum. Thus, most of the computational operations of a neural network come down to multiplication accumulation (MAC). MAC is one of the most critical parameters for determining the performance of neural processors.
MAC operation is used almost everywhere in machine learning. Neural networks can be implemented on both digital and analog platforms. For years, digital circuits have dominated our computational landscape. However, digital multiplication, a crucial operation in machine learning, is resource intensive. A simple 4-bit CMOS MAC circuit requires several hundred transistors. An 8-bit one needs a few thousand. Even for precision not exceeding 8-bit, the cost of multiplication is exacerbated by the need to handle large volumes of data—inputs and weights. The weights, being static values, necessitate local memory storage near the computational unit. This approach is more effective with numerous local computational cores, each with its weight storage. Many GPUs and TPUs are built on this principle.
The analog approach was considered obsolete for decades, and there were reasons for that. There are multiple physical limitations in analog. Manufacturing technology is known for having a considerable mismatch between individual analog elements. This was one of the reasons for the rapid growth of digital technology. In analog, it’s not easy to achieve more than 4-bit precision, and it’s challenging to get higher than 8-bit.
However, analog implementations could be very attractive in use cases where a precision of about 8 bits is satisfying due to their simplicity and power efficiency, and that’s why chip designers are considering analog again. Limited-precision networks are useful for Edge AI, enabling sensor fusion and industrial IoT and dramatically improving privacy and security by processing the data locally without sending it to powerful cloud servers.
So, how exactly could analog MAC work?
It is based on Ohm’s law, which says that current is equal to the voltage divided by resistance – or the voltage multiplied by the transconductance of a resistor. Multiple currents can be summarized by simply connecting them to just one resistor per MAC instead of hundreds or thousands of transistors.
Using Ohm’s law, one can implement MAC in many possible ways, as the resistor is one of four fundamental passive components. There are four fundamental linear building blocks: a resistor, a capacitor, an inductor, and a memristor. All have similar equations connecting voltage, current, charge, and flux.
Multiple designs are based on this approach. The one based on resistors is obvious: voltage is your input, the resistor realizes weight, and output is the current.
In so-called “in-memory computing” architecture, weight resistors are typically implemented as non-volatile memory cells. A flash memory cell is a transistor with a floating gate, and this gate’s voltage could be programmed with some precision, thus modulating the transistor’s transconductance and imitating a resistor. The only thing that matters is that each array element should have a variable resistance.
As shown in Figure 4, each output current IN is a weighted sum of input voltages VN. Weights are represented by the conductance of resistive elements GNX, with some additional parasitic resistances also being shown.
The in-memory approach has two disadvantages: flash memory cells are bulky, and not every network needs all-to-all connections. So, the area overhead of a flash memory computational array is relatively high.
To reduce that need, scientists are trying to develop more compact, innovative memory cells, and quite a few of them are, in fact, variable resistors. Still, these cells could rarely achieve precise resistance control above the minimal “low/high” necessary for digital operation.
Another potential improvement is sparsity-related overhead. However, a “not-all-to-all” memory array will negate the tight optimization and likely inflate the area instead of saving it.
Now, let’s look at how other passive blocks can be used for MAC. Ohm’s Law says I = V / R, or dI = dV / R.
For a capacitor, dQ = C * dV, and Q = T * I. So, if we want to generate a voltage on the capacitor, we can have V = I * T / C. The difference is that there are four instances in the capacitor’s equation, meaning that the circuit designer has more complexity and flexibility.
There are several design options for a capacitor. The output will be the voltage, but the input could be either current or time, and the weight could be any combination of two other parameters. For example, a circuit could obtain the output voltage by charging a fixed capacitor for an input-defined time by a weight-defined current. Alternatively, the output voltage could be obtained by charging a weight-defined capacitor for a fixed time by an input-defied current.
Figure 5 shows a circuit where the capacitor is fixed, and the weight is defined by time. The time-setting circuit is complex and probably prone to PVT (Process-Voltage-Temperature) variation. It’s also most likely power hungry, as any precision time manipulations typically require the presence of high frequency.
The circuit designer can move to the charge domain to eliminate precise timing but still use capacitors. In this case, the main equation will be Q = C * V, and the charge summation will be relatively trivial.
A circuit in Figure 6 utilizes a DAC (digital-to-analog converter) with a capacitive matrix to generate a voltage proportional to the input voltage multiplied by the capacitance. The capacitance value is defined by a digital code, which is the most suitable way to program a capacitor. Typically, DAC has a fixed reference voltage and generates an output according to the digital input. Still, here, the digital input is treated as a weight, and the usually stable reference is used as an input voltage. Such an approach eliminates precise timing and high frequencies, making the output stable and accurate. However, even a very simple DAC is a complicated and large circuit, up to the point where it could become impossible to jam thousands or millions onto a chip.
Next on the list is a memristor. These were invented much later than the other three and is a leap forward in electrical engineering, with multiple different potential applications. However, the most common practical application of a memristor is that it could work as a charge-programmable resistor.
The main advantage of a memristor in integrated form is that it can be compact and positioned among top metals, not consuming precious silicon area. The same is true for various kinds of MRAM and ReRAM, which are similar to memristors in functionality.
The last element is an inductor. An inductor is only useful at high frequencies and currents, restricting its potential for low-power applications, and can be quite difficult to integrate. In contrast to almost any other integrated device, inductors don’t scale down, so each of them is usually at least two orders of magnitude larger than resistive or capacitive elements - and inductors interfere with each other, so they can’t be placed together. The combination of these factors makes inductors unsuitable for use in analog MAC.
Conclusion
We have examined different possibilities for implementing the operation of multiply-and-accumulate in analog form. We looked at the physical foundation behind different approaches, their similarities and differences, upsides and disadvantages. Resistors, capacitors, memristors, and other types of non-volatile memory are all investigated and used by academia and industry to create various neuromorphic circuits. What technology to use depends on the application. Some capacitive implementations are also known, but they are held back by complex support circuitry and the necessity of time manipulation. Academia and industry seem to favor resistance-based designs, with varying implementations in the works: memristors, flash memory cells, and ReRAM.
Chip design needs to consider some microelectronic manufacturing constraints, such as component mismatch and production yield. Most analog circuits must be designed so that their performance is not dependent on any element’s parameters but on parameter ratios or some parameters being “sufficiently large.” For example, memristors are very compact and could have a lot of clearly distinguishable states after programming, but their initial post-manufacturing resistance is much less controllable. This means that every memristor must be programmed to become functional. This is not a downside for a memory circuit or neural network on-chip training, but the memristors’ structure requires additional costs for chip validation.