We are Apogeeweb Semiconductor Electronic


Home arrow Memory arrow In-Memory Computing Technology Overview

arrow left

arrow right

In-Memory Computing Technology Overview

Author: Apogeeweb
Date: 30 Nov 2020
in-memory computing definition


In-memory computing (IMC), a technique of future computing, stores data in RAM to run calculations entirely in computer memory. With the rise of the big data era, faster data processing capabilities are required. Computer memory and storage space are also growing exponentially to adapt to large-capacity data collection and complex data analysis, which promotes the development of AI (artificial intelligence), and then derives emerging stuff, that is, in-memory computing.

In-memory Computing (IMC) Explained



Ⅰ Memory Wall: Processor /Memory Performance Gap

Ⅱ Developing Requirement

Ⅲ What Is In-memory Computing?

3.1 In-memory Computing Definition

3.2 Four Realization Methods

Ⅳ Driving Force of In-memory Computing and Market Prospects

4.1 In-memory Computing for AI

4.2 In-memory Computing Product Outlook

4.3 In-memory Computing Market and Prospect

Ⅴ Conclusion

Ⅰ Memory Wall: Processor / Memory Performance Gap

The von Neumann architecture has occupied the dominant position in computer system when the computer invented. This kind of calculation method is to store the data in the main memory first, and then fetch the instructions from the main memory to execute them in order when running. We all know that if the connecting speed of the memory cannot keep up with the performance of the CPU, the computing will be limited. This is a memory wall. At the same time, in terms of efficiency, the von Neumann architecture also has obvious shortcomings. It consumes more energy to read and write data than to calculate once time.

von neumann architecture diagram

Figure 1. Von Neumann Architecture Diagram

The performance of computer processors has developed rapidly based on Moore's Law, and has been directly improved with the invention of transistors. The main memory of the computer uses the DRAM. It is a high-density storage solution based on capacitor charging and discharging. Its performance (speed) depends on two aspects, namely the reading/writing speed of the capacitor charging and discharging in the memory and the interface bandwidth between the devices. The read/write speed of capacitor charging and discharging has increased with Moore’s Law, but the speed is not as fast as the processor. In addition, the interface between DRAM and the processor is a mixed-signal circuit, and its bandwidth increasing speed is mainly restricted by the signal integrity of the traces on the PCB. This has also caused the performance improvement of DRAM to be much slower than that of the processor. At present, the performance of DRAM has become an huge bottleneck of overall computer performance, the so-called "memory wall".  It blocks the computing performance improvement.

Moore's Law Effect

Figure 2. Moore's Law Effect


Ⅱ Developing Requirement

In the current AI technology, with the increasing amount of data and calculations, the original von Neumann architecture is facing more and more challenges. Rely on expanding CPU, the hardware architecture can’t have a large amount of calculation. Also the larger storage capacity is heavily rely on the past architecture, it is also very unsuitable for AI. When the memory capacity is large to a certain extent, it can only show that certain technologies need innovation. In order to solve the "memory wall" problem, future computers are not based on computing memory, but the in-memory computing, thereby reducing the cost of data access in the calculation process.

Conventional Computing vs In-memory Computing

Figure 3. Conventional Computing vs In-memory Computing


Ⅲ What Is In-memory Computing?

3.1 In-memory Computing Definition

In-memory computing (or in-memory computation) is a technique based on RAM data storage and indexing, which proposed by the MIT research group, and its main purpose is to accelerate the convolution calculation. We know that convolution calculations can be expanded into weighted accumulation calculations. From another perspective, it is actually a weighted average of multiple numbers. Therefore, the circuit realizes the weighted average of the charge domain. The weight (1-bit) is stored in SRAM, and the input data (7-bit digital signal) becomes an analog signal through the DAC. According to the corresponding weight in the SRAM, the output is multiplied by 1 or -1 in the analog domain, which averaged in the analog domain, and finally read out by the ADC as a digital signal. Specifically, since the weight of the multiplication is 1-bit (1 or -1), it can be controlled by using a switch and a differential line simply. If the weight is 1, the capacitor on the side of the differential line is charged to the required output value. Otherwise, let the other side of the differential line be charged to this value. As for average, connect several differential lines together in the charge domain.

Of course, there is more than one circuit for in-memory calculation, and the calculation accuracy is not limited to 1-bit. However, we can see the above examples that the core idea of in-memory calculations is generally to convert calculations into weighted calculations. Store the weights in the memory unit, then modifications on the core circuit of the memory (such as the readout circuit) are made. So that the process of reading is like a process in which the input data and weights are multiplied in the analog domain, that is, convolution. Because convolution is a core part of AI and other calculations, in-memory computing can be widely used in such applications. In-memory computing uses analog circuits for calculations, which is the difference compared with traditional digital logic calculations.

In more traditional architectures, there are some multiply-accumulate circuits (MAC) for tensor math, especially the matrix multiplication. These architectures attempt to arrange the MAC in a way that moves weights and activations to the appropriate location. Activations are calculated from the previous neural network layer. Multiplication usually involves activations and weights, both must be moved to the place where multiplies them. In-memory computing makes use of it. Therefore, if the weights are stored in memory, the memory can access through activations to obtain multiplication and accumulation. The only difference from the actual memory is that the in-memory computing concatenates all word lines at once, instead of decoding the input to get one word line only.

In-memory Computing Diagram

Figure 4. In-memory Computing Diagram

3.2 Four Realization Methods

The attempt is to enter the analog domain and treat the storage unit as an analog unit instead of a digital unit to reduce consumption. We have already got a way to use simulation on the front end of the inference engine. That is in-memory computing. Therefore, we take digital data, using a DAC to convert it to an analog value, and then driving a memory with these analog content to obtain an analog bit-line output, finally using an ADC to convert the result back to a digital format. However, the in-memory computing is still in the exploratory stage, and there are many specific implementation methods to study, currently there are three types: RRAM, Flash, SRAM, and DRAM.

  • Based on RRAM

RRAM is the most common method of doing this, because it is easy to use by applying Ohm's law to a series of resistors, but it still has the problem of relying on RRAM. The relationship between programming and resistance is non-linear, which requires more work to be done to make viable calculation circuits in RRAM memory for market. So it is just an idea, and the specific plan is still under study.


  • Based on Flash

NOR Flash memory has a more traditional word-line/bit-line structure. It is both resistive and capacitive. Generally, the memory cell is a transistor that is turned on or off. However, if it is partially conductive, it can be used as a resistor. The resistance depends on the amount of charge on the floating gate of the memory cell (capacitor). When running all the time, the cell will conduct to its maximum capacity. During this process, it does not conduct at all, however, it can be partially programmed. There is a problem is that you cannot precisely control the number of electrons. Moreover, the response to any number will vary with the process and temperature and other variables.

Two companies are studying this method. Microchip owns their memBrain array, thanks to their acquisition of SST, and Mythic is a start-up company dedicated to an inference engine that uses in-memory computing with flash memory. Both companies said that they are using extensive calibration techniques to deal with this change.

Another issue, flash cells will lose electrons over time. Electrons will flow around, which brings up an interesting topic: on this type of memory array, data retention and durability will be like.

From the application point of view, it depends on whether it is to be used in cloud computing or edge inference engine. At the edge, it may perform certain fixed reasoning functions throughout the life cycle of the device. Therefore, if there are enough arrays, then you will load the weights for the first time and don't need to program it anymore (unless you do a update), because the flash memory is non-volatile. Although you still need to move activations, there is no need to move the weights, which will be stored permanently in the array. This would indicate that data durability (number of times the device can be programmed before cumulative damage accelerates electron leakages to an unacceptable level) does not matter, it only need to program once.

In contrast, in cloud applications, the device is likely to be shared as a general-purpose computing resource, so this requires reprogramming for each new application. This means that battery life becomes more important in the cloud. Mythic claims to have a 10K write cycle, and has observed that even if it is reprogrammed every day, it will last for more than 10 years.

If set an analog value for it and use an analog value in the cell, then in theory, each electron is important. However, if there is enough electron migration, you need to refresh the storage unit, or compensate for electrons change in some way. Because the same analog input today will produce different results than a year ago. The calibration circuit can also deal with some aging problems. However, for data retention, Mythic said they do perform regular updates of the weight values stored in flash memory. This will make persistence the main wear-out mechanism rather than data retention. Microchip stated that its data retention time is TBD, but it is likely to reprogram the device quarterly or annually to restore the unit.

So they need a large number of high-quality ADCs and DACs to keep the signal-to-noise ratio (SNR) within a scope of accurate reasoning, which is the focus of designing work. Mythic claims that they provide a novel ADC, so that Microchip can share it to reduce the number required. Although ADC does consume energy, it also greatly reduces overall system consumption.


  • Based on SRAM

This idea came from a lecture at Hot Chips at Princeton University. By definition, SRAM is a bistable unit. Therefore, it cannot be in an intermediate state, how should this be handled? And the DACs and ADCs that need to be corrected more over than the array in terms of area and power consumption.

The point of this problem boils down to the question of how to simulate. They explained that this method uses more than one-bit line for calculation. Since the unit is still a digital value, it takes several bit lines to perform a calculation. The bit line can be split, and different groups perform different multiplications. The following figure illustrates it.

bit line

Figure 5. Bit Line

With 8 inputs at a time, so the input vector is sliced and several consecutive multiplications are carried out to obtain the final results. The bit line charge is deposited on the capacitor. When ready to read, the charge is read out and sent to the ADC for conversion back to the digital domain. Their basic unit structure is as follows:

Bit Cell

Figure 6. Bit Cell

These capacitors may affect chip size issues, but they said that the metal above the cell can be used. Of course, one cell is now 80% larger than the standard 6T SRAM cell (even without capacitors), but they say that their overall circuit is still much smaller than a required circuit based on standard digital implementation. In addition, since their basic array operations are still in digital form, they are less sensitive to noise and changes, which means their ADCs can be simpler and consume less power.

chip size

Figure 7. Chip Size

  • Based on DRAM

This idea refers to not using a lot of power to obtain DRAM content, and in some way incorporate calculations into the CPU or other computing structures and directly run it on the DRAM die, which is what UPMEM does. A simple processor is built on the DRAM die, also the architecture will not compete with Xeon chips, they call this set "processing in memory" or PIM.

PIM Chip

Figure 8. PIM Chip

Instead of bringing data to calculations, they bring calculations to data. The runtime is performed by the CPU in DRAM chip. That is, there is no need to move the data to any location outside of the DRAM chip, just send the calculating result back to the host system. Also, since ML calculations usually involve a lot of reduction, less data required for calculations. Although this does require some minor changes to the DRAM, they did not change the manufacturing process. Under this case, a standard DRAM module will provide multiple opportunities for distributed computing. At the same time, it becomes complicated to use this function to write a program.

They said that a server using PIM offload will consume twice as much power than a standard server connected to a DRAM module without PIM. However, with a throughput of 20 times, it still provides them with a 10 times energy efficiency advantage. In addition, this method can help defend against side-channel security attacks. Thus a group of computing threads originally contained in one or more CPUs flows to DRAM. Therefore, it is necessary to check all DRAMs and figure out where thread is in some way, but this will be a difficult task.


Ⅳ Driving Force of In-memory Computing and Market Prospects

4.1 In-memory Computing for AI

People have recognized the problem of "memory wall" for a long time, but why is in-memory computing only raised in the past two years? So we have to analyze the boost behind its rise.

The first motivation is the rise of AI based on neural networks, especially the hope that AI can be popularized in mobile and embedded devices. So that in-memory computing with a high energy efficiency ratio has attracted attention. In addition, neural networks have a high tolerance for errors in calculation accuracy. Therefore, errors introduced in simulation calculations of in-memory computing can often be accepted. That is to say in-memory computing and AI are good partners for each other.

The second motivation is the new memory. For in-memory computing, the memory characteristics often determine the efficiency of in-memory computing. Therefore, new memories improvement will often drive the development of in-memory computing. For example, the recently popular ReRAM uses resistance modulation to store data, so the readout of each bit uses a current signal instead of a traditional charge signal. In this way, it is a very natural operation for current to accumulate (combining several currents directly to achieve the sum of currents, even without additional circuits). That is to say, ReRAM is very suitable for in-memory calculations. From the perspective of memory promotion, new memories are also willing to catch up with the AT trend. Therefore, new memory manufacturers are also happy to see in-memory computing based on their own memories to accelerate AI development, which will broaden the memory market.


4.2 In-memory Computing Product Outlook

Chip products for in-memory computing are expected to come in two forms. The first form is sold as a memory IP with computing functions. Such memory IP may be traditional SRAM, or new memory such as eFlash, ReRAM, MRAM, and PCM.

The second form is to directly build AI acceleration chips based on in-memory calculations. For example, Mythic plans to make PCIe accelerator cards based on flash memory, that is, access data with the main CPU through the PCIe interface. The weight data is stored on the Mythic memory chip, so that when the data is sent to the Mythic IPU, the calculation can be directly read out. In this way, the action of reading the weights data is eliminated.

mythic is a pcie accelerator

Figure 9. Mythic is a Pcie Accelerator


4.3 In-memory Computing Market and Prospect

What impact will in-memory computing have on the AI chip market? First of all, we see that in-memory computing uses analog calculations, so its accuracy will be affected by the low signal-to-noise ratio. Usually the upper limit of accuracy is about 8-bit, and it can only do fixed-point calculations not the floating-point calculations. So in-memory computing is not suitable for the AI training market that requires high calculation accuracy. In other words, the main battlefield of in-memory computing is the AI inference market. For example, it is more suitable for embedded artificial intelligence, which has high requirements for energy efficiency not the accuracy. In fact, in-memory computing is actually most suitable for occasions where large memory is needed. For instance, flash is inherently required in IoT and other scenarios, so if you can add the in-memory computing to flash, it is quite suitable. However, introducing in-memory computing in a large storage memory may not appropriate. Based on this analysis, we believe that in-memory computing may become an important part of embedded AI (such as smart IoT) in the future.


Ⅴ Conclusion

With the rise of AI and new memories, in-memory computing has also become a new hot spot. Based on the unique characteristics of the memory, it combines with analog calculations in memory, thereby greatly reducing the memory read and write operations in AI. Although the accuracy of calculation in the memory is limited by analog calculation, it is also suitable for embedded AI applications that pursue energy efficiency most and can accept a certain loss of accuracy.


Frequently Asked Questions about In-Memory Computing Technology

1. Why do we need in memory computing?
In-Memory Computing provides super-fast performance (thousands of times faster) and scale of never-ending quantities of data, and simplifies access to increasing numbers of data sources.


2. What does in memory mean?
An in-memory database is a type of purpose-built database that relies primarily on memory for data storage, in contrast to databases that store data on disk or SSDs. ... Because all data is stored and managed exclusively in main memory, it is at risk of being lost upon a process or server failure.


3. How does in memory computing work?
In-memory computing means using a type of middleware software that allows one to store data in RAM, across a cluster of computers, and process it in parallel. Consider operational datasets typically stored in a centralized database which you can now store in “connected” RAM across multiple computers.


4. What is in memory computing in SAP HANA?
An In-Memory database means all the data from source system is stored in a RAM memory. In a conventional Database system, all data is stored in hard disk. It provides faster access of data to multicore CPUs for information processing and analysis.


5. How is data stored in memory?
Normally memory is described as a storage facility where data can be stored and retrieved by the use of an address. This is accurate but incomplete. A computer memory is a mechanism whereby if you supply it with an address it delivers up for you the data that you previously stored using that address.


6. What is in memory data processing?
In-memory processing is the practice of taking action on data entirely in computer memory (e.g., in RAM). ... Since the storage appears as one big, single allocation of RAM, large data sets can be processed all at once, versus processing data sets that only fit into the RAM of a single computer.


7. What is in memory database processing and what advantages does it provide?
The major advantage of systems using in-memory databases vs traditional database systems is: its performance speed. ... Source data is loaded into the system memory in a compressed and format. Therefore, in-memory processing reduces disk seek time for accessing data and streamlining the work involved in processing queries.


8. What is big data computing?
Big data computing is an emerging data science paradigm of multi dimensional information mining for scientific discovery and business analytics over large scale infrastructure. ... Big data is characterized by 5V's such as volume, velocity, variety, veracity, and value.

Best Sales of diode

Photo Part Company Description Pricing (USD)

Alternative Models

Part Compare Manufacturers Category Description

Ordering & Quality

Image Mfr. Part # Company Description Package PDF Qty Pricing (USD)

Related Articles


Leave a Reply

Your email address will not be published.

code image
Rating: poor fair good very good excellent

# 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z