《英文版存储系统(参考)ppt课件.ppt》由会员分享,可在线阅读,更多相关《英文版存储系统(参考)ppt课件.ppt(53页珍藏版)》请在三一办公上搜索。
1、,Chapter 6,Memory,2,Chapter 6 Objectives,Master the concepts of hierarchical memory organization.Understand how each level of memory contributes to system performance, and how the performance is measured.Master the concepts behind cache memory, virtual memory, memory segmentation, paging and address
2、 translation., 分类信息, 新闻,3,6.1 Introduction,Memory lies at the heart of the stored-program computer.In previous chapters, we studied the components from which memory is built and the ways in which memory is accessed by various ISAs.In this chapter, we focus on memory organization. A clear understandi
3、ng of these ideas is essential for the analysis of system performance., 分类信息, 新闻,4,6.2 Types of Memory,There are two kinds of main memory: random access memory, RAM, and read-only-memory, ROM.There are two types of RAM, dynamic RAM (DRAM) and static RAM (SRAM).Dynamic RAM consists of capacitors that
4、 slowly leak their charge over time. Thus they must be refreshed every few milliseconds to prevent data loss.DRAM is “cheap” memory owing to its simple design., 分类信息, 新闻,5,6.2 Types of Memory,SRAM consists of circuits similar to the D flip-flop that we studied in Chapter 3.SRAM is very fast memory a
5、nd it doesnt need to be refreshed like DRAM does. It is used to build cache memory, which we will discuss in detail later.ROM also does not need to be refreshed, either. In fact, it needs very little charge to retain its memory.ROM is used to store permanent, or semi-permanent data that persists eve
6、n while the system is turned off., 分类信息, 新闻,6,6.3 The Memory Hierarchy,Generally speaking, faster memory is more expensive than slower memory.To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion.Small, fast storage elements are kept in the CPU, larger, sl
7、ower main memory is accessed through the data bus.Larger, (almost) permanent storage in the form of disk and tape drives is still further from the CPU., 分类信息, 新闻,7,6.3 The Memory Hierarchy,This storage organization can be thought of as a pyramid:,8,6.3 The Memory Hierarchy,To access a particular pie
8、ce of data, the CPU first sends a request to its nearest memory, usually cache. If the data is not in cache, then main memory is queried. If the data is not in main memory, then the request goes to disk.Once the data is located, then the data, and a number of its nearby data elements are fetched int
9、o cache memory., 分类信息, 新闻,9,6.3 The Memory Hierarchy,This leads us to some definitions.A hit is when data is found at a given memory level.A miss is when it is not found.The hit rate is the percentage of time data is found at a given memory level.The miss rate is the percentage of time it is not. Mi
10、ss rate = 1 - hit rate.The hit time is the time required to access data at a given memory level.The miss penalty is the time required to process a miss, including the time that it takes to replace a block of memory plus the time it takes to deliver the data to the processor.,10,6.3 The Memory Hierar
11、chy,An entire blocks of data is copied after a hit because the principle of locality tells us that once a byte is accessed, it is likely that a nearby data element will be needed soon.There are three forms of locality:Temporal locality- Recently-accessed data elements tend to be accessed again.Spati
12、al locality - Accesses tend to cluster.Sequential locality - Instructions tend to be accessed sequentially.,11,6.4 Cache Memory,The purpose of cache memory is to speed up accesses by storing recently used data closer to the CPU, instead of storing it in main memory.Although cache is much smaller tha
13、n main memory, its access time is a fraction of that of main memory.Unlike main memory, which is accessed by address, cache is typically accessed by content; hence, it is often called content addressable memory.Because of this, a single large cache memory isnt always desirable- it takes longer to se
14、arch.,12,6.4 Cache Memory,The “content” that is addressed in content addressable cache memory is a subset of the bits of a main memory address called a field.The fields into which a memory address is divided provide a many-to-one mapping between larger main memory and the smaller cache memory.Many b
15、locks of main memory map to a single block of cache. A tag field in the cache block distinguishes one cached memory block from another., 分类信息, 新闻,13,6.4 Cache Memory,The simplest cache mapping scheme is direct mapped cache.In a direct mapped cache consisting of N blocks of cache, block X of main mem
16、ory maps to cache block Y = X mod N.Thus, if we have 10 blocks of cache, block 7 of cache may hold blocks 7, 17, 27, 37, . . . of main memory.Once a block of memory is copied into its slot in cache, a valid bit is set for the cache block to let the system know that the block contains valid data.,Wha
17、t could happen without having a valid bit?,14,6.4 Cache Memory,The diagram below is a schematic of what cache looks like.Block 0 contains multiple words from main memory, identified with the tag 00000000. Block 1 contains words identified with the tag 11110101.The other two blocks are not valid.,15,
18、6.4 Cache Memory,The size of each field into which a memory address is divided depends on the size of the cache.Suppose our memory consists of 214 words, cache has 16 = 24 blocks, and each block holds 8 words.Thus memory is divided into 214 / 2 8 = 211 blocks.For our field sizes, we know we need 4 b
19、its for the block, 3 bits for the word, and the tag is whats left over:,16,6.4 Cache Memory,As an example, suppose a program generates the address 1AA. In 14-bit binary, this number is: 00000110101010.The first 7 bits of this address go in the tag field, the next 4 bits go in the block field, and th
20、e final 3 bits indicate the word within the block.,17,6.4 Cache Memory,If subsequently the program generates the address 1AB, it will find the data it is looking for in block 0101, word 011.However, if the program generates the address, 3AB, instead, the block loaded for address 1AA would be evicted
21、 from the cache, and replaced by the blocks associated with the 3AB reference.,18,6.4 Cache Memory,Suppose a program generates a series of memory references such as: 1AB, 3AB, 1AB, 3AB, . . . The cache will continually evict and replace blocks.The theoretical advantage offered by the cache is lost i
22、n this extreme case.This is the main disadvantage of direct mapped cache.Other cache mapping schemes are designed to prevent this kind of thrashing.,19,6.4 Cache Memory,Instead of placing memory blocks in specific cache locations based on memory address, we could allow a block to go anywhere in cach
23、e.In this way, cache would have to fill up before any blocks are evicted.This is how fully associative cache works. A memory address is partitioned into only two fields: the tag and the word.,20,6.4 Cache Memory,Suppose, as before, we have 14-bit memory addresses and a cache with 16 blocks, each blo
24、ck of size 8. The field format of a memory reference is:When the cache is searched, all tags are searched in parallel to retrieve the data quickly.This requires special, costly hardware.,21,6.4 Cache Memory,You will recall that direct mapped cache evicts a block whenever another memory reference nee
25、ds that block.With fully associative cache, we have no such mapping, thus we must devise an algorithm to determine which block to evict from the cache.The block that is evicted is the victim block.There are a number of ways to pick a victim, we will discuss them shortly.,22,6.4 Cache Memory,Set asso
26、ciative cache combines the ideas of direct mapped cache and fully associative cache.An N-way set associative cache mapping is like direct mapped cache in that a memory reference maps to a particular location in cache.Unlike direct mapped cache, a memory reference maps to a set of several cache block
27、s, similar to the way in which fully associative cache works.Instead of mapping anywhere in the entire cache, a memory reference can map only to the subset of cache slots.,23,6.4 Cache Memory,The number of cache blocks per set in set associative cache varies according to overall system design.For ex
28、ample, a 2-way set associative cache can be conceptualized as shown in the schematic below.Each set contains two different memory blocks.,24,6.4 Cache Memory,In set associative cache mapping, a memory reference is divided into three fields: tag, set, and word, as shown below.As with direct-mapped ca
29、che, the word field chooses the word within the cache block, and the tag field uniquely identifies the memory address.The set field determines the set to which the memory block maps.,25,6.4 Cache Memory,Suppose we have a main memory of 214 bytes.This memory is mapped to a 2-way set associative cache
30、 having 16 blocks where each block contains 8 words.Since this is a 2-way cache, each set consists of 2 blocks, and there are 8 sets.Thus, we need 3 bits for the set, 3 bits for the word, giving 8 leftover bits for the tag:,26,6.4 Cache Memory,With fully associative and set associative cache, a repl
31、acement policy is invoked when it becomes necessary to evict a block from cache.An optimal replacement policy would be able to look into the future to see which blocks wont be needed for the longest period of time.Although it is impossible to implement an optimal replacement algorithm, it is instruc
32、tive to use it as a benchmark for assessing the efficiency of any other scheme we come up with.,27,6.4 Cache Memory,The replacement policy that we choose depends upon the locality that we are trying to optimize- usually, we are interested in temporal locality.A least recently used (LRU) algorithm ke
33、eps track of the last time that a block was assessed and evicts the block that has been unused for the longest period of time.The disadvantage of this approach is its complexity: LRU has to maintain an access history for each block, which ultimately slows down the cache.,28,6.4 Cache Memory,First-in
34、, first-out (FIFO) is a popular cache replacement policy.In FIFO, the block that has been in the cache the longest, regardless of when it was last used.A random replacement policy does what its name implies: It picks a block at random and replaces it with a new block.Random replacement can certainly
35、 evict a block that will be needed often or needed soon, but it never thrashes.,29,6.4 Cache Memory,The performance of hierarchical memory is measured by its effective access time (EAT).EAT is a weighted average that takes into account the hit ratio and relative access times of successive levels of
36、memory.The EAT for a two-level memory is given by:EAT = H AccessC + (1-H) AccessMM.where H is the cache hit rate and AccessC and AccessMM are the access times for cache and main memory, respectively.,30,6.4 Cache Memory,For example, consider a system with a main memory access time of 200ns supported
37、 by a cache having a 10ns access time and a hit rate of 99%.The EAT is:0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns = 11ns.This equation for determining the effective access time can be extended to any number of memory levels, as we will see in later sections.,31,6.4 Cache Memory,Cache replacement policie
38、s must also take into account dirty blocks, those blocks that have been updated while they were in the cache.Dirty blocks must be written back to memory. A write policy determines how this will be done.There are two types of write policies,write through and write back.Write through updates cache and
39、 main memory simultaneously on every write.,32,6.4 Cache Memory,Write back (also called copyback) updates memory only when the block is selected for replacement.The disadvantage of write through is that memory must be updated with each cache write, which slows down the access time on updates. This s
40、lowdown is usually negligible, because the majority of accesses tend to be reads, not writes.The advantage of write back is that memory traffic is minimized, but its disadvantage is that memory does not always agree with the value in cache, causing problems in systems with many concurrent users.,33,
41、6.5 Virtual Memory,Cache memory enhances performance by providing faster memory access speed.Virtual memory enhances performance by providing greater memory capacity, without the expense of adding main memory.Instead, a portion of a disk drive serves as an extension of main memory.If a system uses p
42、aging, virtual memory partitions main memory into individually managed page frames, that are written (or paged) to disk when they are not immediately needed.,34,6.5 Virtual Memory,A physical address is the actual memory address of physical memory.Programs create virtual addresses that are mapped to
43、physical addresses by the memory manager.Page faults occur when a logical address requires that a page be brought in from disk.Memory fragmentation occurs when the paging process results in the creation of small, unusable clusters of memory addresses.,35,6.5 Virtual Memory,Main memory and virtual me
44、mory are divided into equal sized pages.The entire address space required by a process need not be in memory at once. Some parts can be on disk, while others are in main memory.Further, the pages allocated to a process do not need to be stored contiguously- either on disk or in memory.In this way, o
45、nly the needed pages are in memory at any time, the unnecessary pages are in slower disk storage.,36,6.5 Virtual Memory,Information concerning the location of each page, whether on disk or in memory, is maintained in a data structure called a page table (shown below).There is one page table for each
46、 active process.,37,6.5 Virtual Memory,When a process generates a virtual address, the operating system translates it into a physical memory address.To accomplish this, the virtual address is divided into two fields: A page field, and an offset field.The page field determines the page location of th
47、e address, and the offset indicates the location of the address within the page.The logical page number is translated into a physical page frame through a lookup in the page table.,38,6.5 Virtual Memory,If the valid bit is zero in the page table entry for the logical address, this means that the pag
48、e is not in memory and must be fetched from disk.This is a page fault.If necessary, a page is evicted from memory and is replaced by the page retrieved from disk, and the valid bit is set to 1.If the valid bit is 1, the virtual page number is replaced by the physical frame number.The data is then ac
49、cessed by adding the offset to the physical frame number.,39,6.5 Virtual Memory,As an example, suppose a system has a virtual address space of 8K and a physical address space of 4K, and the system uses byte addressing.We have 213/210 = 23 virtual pages.A virtual address has 13 bits (8K = 213) with 3
50、 bits for the page field and 10 for the offset, because the page size is 1024.A physical memory address requires 12 bits, the first two bits for the page frame and the trailing 10 bits the offset.,40,6.5 Virtual Memory,Suppose we have the page table shown below.What happens when CPU generates addres