Home  Memory

Jan 8 2019

The Function, Structure and Working Principle of Cache Memory

Warm hints: This article contains about 4000 words and reading time is about 18 min.

Introduction

In the hierarchy of computer storage systems, high speed small capacity memory between the central processor and main memory. It constitutes a level one memory together with the main memory. The scheduling and transfer of information between the cache and the main memory is automated by the hardware, and the programmer does not feel the presence of the cache, so it is transparent to the programmer. The role of the cache memory In the development of computer technology, the main memory access speed has been much slower than the central processor operation speed, so that the high-speed processing capability of the central processing unit cannot be fully utilized, and the working efficiency of the entire computer system is affected.

In the hierarchy of computer storage systems, high speed small capacity memory between the central processor and main memory. It constitutes a level one memory together with the main memory. The scheduling and transfer of information between the cache and the main memory is automated by the hardware, and the programmer does not feel the presence of the cache, so it is transparent to the programmer.

Article Core

Cache memory

Purpose

Introduce the function, structure and working principle of cache memory.

Application

Semiconductor industry.

Keywords

Cache memory

Catalog

Introduction


Principle of CACHE Implementation


Key Indicators for Evaluating CACHE Performance


Basic Operation Principle of CACHE

 

1. Composition of CACHE Storage Unit

2. The Principle of Cache Memory

 

Three Image Modes of CACHE

1.Full-associative Mapping

2. Direct Mapping

3. Multiplex (two-way) Group Connection Mode

The Role of Cache Memory


Several Problems In the Practical Use of CACHE Memory


Read Hit Rate



Principle of CACHE Implementation 

Copy a small amount of information (data or instructions) most likely to be used by the CPU from main memory to CACHE. When the CPU uses this information again, it does not have to access the slow main memory, but directly from the fast CACHE. Get it, which increases the speed.


Key Indicators for Evaluating CACHE Performance 

To have a high enough hit rate, when the CPU needs to use the data in the main memory, in most cases, it can be obtained directly from CACHE, and read the main memory as little as possible. Said the ratio of the two is the hit rate.


Basic Operation Principle of CACHE 

1. Composition of CACHE Storage Unit 

The storage unit of CACHE is composed of three parts.

1 valid bit: "0" means the unit is not used yet, "1" means the data is valid.

 Composition of CACHE Storage Unit

(1) The CACHE unit does not necessarily correspond to the main memory implementation in word units, because storing a complete main memory address occupies too many bits.

(2) When CACHE exchanges information with the main memory, it is not always exchanged in units of one main memory word. The data transfer is usually performed in the form of a cache line size.


2. The Principle of Cache Memory 

The cache memory typically consists of a high speed memory, associative memory, replacement logic circuitry, and corresponding control circuitry. In a computer system with a cache, the address of the central processor accessing the main memory is divided into three fields: a row number, a column number, and an intra-group address. Thus, the main memory is logically divided into rows; each row is divided into a number of memory cell groups; each group contains several or tens of words. The high speed memory is also divided into rows and columns of memory cell groups accordingly. The number of columns is the same, the size of the group is the same, but the number of lines in the high-speed memory is much less than the number of lines in the main memory.

Associative memory is used for address association and has the same number of rows and columns as the high-speed memory. When a certain row of memory cells in a column of the main memory is transferred to an empty memory cell group in the same column of the high speed memory, the memory cell corresponding to the associative memory records the row number of the loaded memory cell group in the main memory.

When the central processor accesses the main memory, the hardware first automatically decodes the column number field of the access address to compare all the row numbers of the column of the associative memory with the row number field of the access main memory address: Similarly, indicating that the main memory unit to be accessed is already in the high speed memory, called a hit, the hardware maps the address of the access main memory to the address of the high speed memory and performs an access operation; if they are different, indicating the unit Not in high-speed memory, called off-target, the hardware will perform access to the main memory operation and automatically transfer the main memory cell group in which the cell is located into the memory cell group in the same column of the high-speed memory, while the group is in the main The line number in the memory is stored in the unit corresponding to the Lenovo memory.

When there is off-target and there is no empty position in the corresponding column of the high-speed memory, a certain group in the column is eliminated to make a free location to store the newly transferred group. This is called replacement. The rule for determining replacement is called the replacement algorithm. Commonly used replacement algorithms are: least recently used (LRU), first in first out (FIFO) and random (RAND). Replacing the logic circuit is to perform this function. In addition, in order to maintain the consistency of the contents of the main memory and the high-speed memory when performing the write main memory operation, the hit and miss target must be processed separately: 

(1) When the write operation hits, the write direct method can be used (that is, the main memory and the high speed are simultaneously written) Memory) or writeback (ie, write only to the high speed memory and mark the group modified.

(2) Write allocation method (i.e., write to note memory and transfer the group to high-speed memory) or write-not allocation method (i.e., write only to main memory but not to high-speed memory) can be used when write operation misses target.           

The performance of cache is usually measured by hit rate. The factors affecting the hit rate are the capacity of high-speed memory, the size of storage cell group, the number of arrays, the method of address association and comparison, the replacement algorithm, the method of writing operation and the characteristics of program, etc.           

Computers using cache technology are quite common. Some computers also use multiple cache memory, such as system cache, instruction cache and address conversion cache, to improve system performance. With the increasing capacity of main memory, the capacity of cache memory is also increasing. 


Three Image Modes of CACHE 

Address mapping: When copying the data of the main memory address to the cache, the address of the main memory is also processed into a CACHE flag field after being processed by a certain function relationship. This process is called the CACHE address image.

Address translation: When the program is executed, the main memory address is converted to the address of the access CACHE. This process is called CACHE address translation.

The treatment of the two is closely related.

1.Full-associative Mapping     

1.Full-associative Mapping

Address image: When writing to CACHE, all the addresses of the main memory should be written to the flag field of CACHE.

Address translation: the entire address of the main memory is read to compare with the flag field of each unit in the CACHE.

Advantages: The hit rate is relatively high, and the Cache storage space utilization is high.

Disadvantages: The comparison address field must be compared with the flag field of each unit in the entire CACHE, so the line is complicated, the cost is too high, and it is difficult to implement, but it is only suitable for CACHE with small capacity.


2. Direct Mapping

 Direct Mapping

Address image: When writing to CACHE, only the main section number is written to the CACHE flag field.

Address translation: To access the intra-segment offset address in the main memory address to access a CACHE unit, simply compare the segment number of the main memory address with the contents of the flag field.

Advantages: The address mapping mode is simple. When data is accessed, it is only necessary to check whether the area numbers are equal, so that a relatively fast access speed can be obtained, and the hardware device is simple.

Disadvantages: frequent replacement operations, low hit rate.


3. Multiplex (two-way) Group Connection Mode 

 Multiplex (two-way) Group Connection Mode

The CACHE memory is organized into a multi-body structure of the same capacity, for example, two banks. The main memory is still divided into multiple segments of capacity equal to each CACHE bank.

The main memory address format is as follows:

Section number         The offset within the section

Multi-channel (two-way) group connection method

Address image: When writing to CACHE, only the main section number is written to the CACHE flag field.

Address translation: To access the intra-segment offset address in the main memory address to access a unit of each CACHE body, simply compare the segment number of the main memory address with the contents of the flag field.

Advantages: The collision probability of the block is relatively low, the utilization rate of the block is greatly improved, and the block failure rate is significantly reduced.

Disadvantages: Implementation difficulty and cost are higher than direct image.


The Role of Cache Memory 

In the development of computer technology, the main memory access speed has been much slower than that of the central processing unit, so that the high-speed processing capability of the central processing unit cannot be fully utilized, and the working efficiency of the entire computer system is affected. There are many ways to mitigate the speed mismatch between the central processor and the main memory. For example, using multiple general-purpose registers, multi-bank interleaving, etc., it is also a common method to use the cache at the storage level. Many large and medium-sized computers, as well as some recent minicomputers and minicomputers, also use cache memories.

The size of the cache is typically only a few hundredths of that of main memory, but its access speed matches the central processor. According to the principle of program locality, it is highly probable that those units in the vicinity of a unit of the main memory being used will be used. Thus, when the central processor accesses a unit of the main memory, the computer hardware automatically transfers the set of unit contents including the unit into the cache memory, and the main memory unit to be accessed by the central processing unit is likely Just in the set of cells that have just been loaded into the cache. Thus, the central processor can directly access the cache. Throughout the process, if most of the central processor's access to the main memory can be replaced by access to the cache, the processing speed of the computer system can be significantly improved.


Several Problems In the Practical Use of CACHE Memory

The important technical indicator of CACHE memory is its hit rate. The factors affecting CACHE hit rate are:

1. The relationship between CACHE capacity and hit rate

Although the capacity is better, the CACHE capacity has reached a certain size, and then increasing its capacity does not increase the hit rate.


2. Cache line size (the amount of CACHE exchange information with memory each time) and the hit rate:

Each time the amount of information exchanged is moderate, not in a single word, but in a few words (called CACHE line capacity, usually 4 to 32 bytes) between the main memory and CACHE to achieve information transfer.


3. The relationship between multi-level CACHE structure and hit rate:

The Relationship i-level Between MultCACHE Structure And Hit Rate:


4. The relationship between different mapping methods of CACHE and the hit rate:

Fully connected image mode is not applicable

Direct image mode hit rate is low

Multi-channel group connection mode performance / price ratio is better

In the direct image mode, the CACHE capacity is 8K words, which is divided into 1024 groups of 8 words each. At the same time, the main memory is also divided into groups of 8 words, and 1024 groups constitute one page. The main memory group 0 can only be mapped to the CACHE group 0, and the main memory group 1 can only be mapped to the CACHE group 1, and so on.


5. Write CACHE strategy and impact on the system

(1) A peripheral writes a data to the main memory. The original copy of the main memory unit is inconsistent in CACHE. The simplest way is to clear the valid bit of the corresponding unit in CACHE. When the CPU is again When this main memory unit is needed, it can only be retrieved from the main memory without using the old value in CACHE.


(2) Strategy for rewriting main memory

If the CPU rewrites the contents of the CACHE unit and has not changed the contents of the main unit, the data inconsistency occurs. 

Two solutions:

Step 1. Next, directly rewrite the contents of the main memory unit. It is simple and easy, but it may bring the problem that the system is not efficient, and it has not been used since.

Step 2. Drag the contents of the main memory unit after dragging, and drag it until there is another device to read the main memory unit whose content is outdated. First stop this read operation, then rewrite the main memory content, and then start the stopped read operation, otherwise you do not have to rewrite.

The contradiction is how to check if it should be rewritten, by monitoring the address bus, and writing down the invalid cell address for comparison. Control is more complicated, but it can provide higher system operating efficiency.


Read Hit Rate 

The CPU finds useful data in the Cache as a hit. When there is no data required by the CPU in the Cache (this is called a miss), the CPU accesses the memory. In theory, in a CPU with a Level 2 Cache, the hit rate of reading L1Cache is 80%. That is to say, the useful data found by the CPU from the L1Cache accounts for 80% of the total data, and the remaining 20% is read from the L2Cache. Since the data to be executed cannot be accurately predicted, the hit rate of reading L2 is also about 80% (reading from L2 to useful data accounts for 16% of the total data). Then there is still data that has to be called from memory, but this is already a fairly small proportion. In some high-end CPUs, we often hear L3Cache, which is designed for reading data after L2Cache is missed. In a CPU with L3Cache, only about 5% of the data needs to be called from memory. This further increases the efficiency of the CPU.

In order to ensure a high hit rate when the CPU accesses, the content in the Cache should be replaced by a certain algorithm. One of the more commonly used algorithms is the "least recently used algorithm" (LRU algorithm), which eliminates the least visited rows in the most recent period. Therefore, you need to set a counter for each line. The LRU algorithm clears the counter of the hit line and increments the other line counters by one. When the replacement is required, the data row with the largest row counter count value is eliminated. This is an efficient and scientific algorithm. The counter clearing process can eliminate some unnecessary data after frequent calls out of the Cache and improve the utilization of the Cache.

The impact of the Cache replacement algorithm on the hit rate. When the new main memory block needs to be called into the Cache and its free space location is full, the Cache data needs to be replaced, which creates a replacement strategy (algorithm) problem. According to the locality of the program, it is known that the program is always running frequently using instructions and data that have been used recently. This provides a theoretical basis for the replacement strategy. The goal of the replacement algorithm is to get the Cache to get the highest hit rate. The Cache replacement algorithm is an important factor affecting the performance of the proxy cache system. A good Cache replacement algorithm can generate a high hit rate. Common algorithms are as follows:


(1) Random method (RAND method) The random replacement algorithm uses a random number generator to generate a block number to be replaced, and replaces the block. This algorithm is simple and easy to implement, and it does not consider the past, present and future of the Cache block. The use case, but does not use the "history information" used by the upper memory, does not have the locality principle based on the memory access, so the cache hit rate cannot be improved, and the hit rate is low.


(2) First-in-first-out (FIFO) First-In-First-Out (FIFO) algorithm. It is to replace the information block that first enters the Cache. The FIFO algorithm determines the order of elimination according to the order of the Cache, and selects the block that is first transferred to the Cache for replacement. It does not need to record the usage of each block, which is relatively easy to implement and has low system overhead. The disadvantage is that it may be Program blocks that need to be used frequently (such as loop programs) are also replaced as the first block that enters the Cache, and there is no locality principle based on memory access, so the hit rate of the Cache cannot be improved. Because the information that was first transferred may be used later, or often used, such as a loop program. This method is simple and convenient, and it uses the "historical information" of the main memory, but it cannot be said that the first entry is not used frequently. The disadvantage is that it cannot correctly reflect the principle of locality of the program, the hit rate is not high, and an abnormality may occur. phenomenon.


(3) Least Recently Used (LRU) Nearest Use (LRU) algorithm. This method is to replace the information block in the least recently used Cache. This algorithm is better than the first-in first-out algorithm. However, this method does not guarantee that it will not be used frequently in the past. The LRU method is based on the use of each block, always selecting the least recently used block to be replaced. Although this method better reflects the locality of the program, this replacement method needs to record the usage of each block in the Cache at any time to determine which block is the least recently used block. The LRU algorithm is relatively reasonable, but it is complicated to implement and has a large system overhead. It is often necessary to set up a hardware or software module called a counter for each block to record its use.


You May Also Like:

The Working Principle and Classification of Semiconductor Memory

0 comment

Leave a Reply

Your email address will not be published.

 
 
   
Rating: