体系结构-研课程要点与复习.ppt

资源描述

《体系结构-研课程要点与复习.ppt》由会员分享，可在线阅读，更多相关《体系结构-研课程要点与复习.ppt（48页珍藏版）》请在三一办公上搜索。

1、高级体系结构课程要点,第一章前言计算机技术快速法进步的原因技术进步Moore定律发展体系结构发展体系结构演化过程现代计算机体系结构的组成高级计算机体系结构研究范畴,计算机系统结构的分类Flynn分类法-定性冯氏分类法-定量,第三章Instruction-Level Parallelism and Its Dynamic Exploitation,What is pipelining?How is the pipelining Implemented?What makes pipelining hard to implement?How does CPI descend?CPI=1CPI1CP

2、I1,Ideal Performance for PipeliningIdeal speedup equal to Number of pipe stagesMIPS instruction formatWorks in the MIPS 5 stage pipelineThe MIPS pipeliningPipeline hazard:the major hurdleStructural hazards Data hazardsControl hazardsbe resolved by Stall,Solution imaginable for Structural hazards“dou

3、ble bump”Insert stallprovide another memory portsplit instruction memory and data memoryuse instruction buffer fully pipelined function unitWhy allow machine with structural hazard?,Solution imaginable for Data hazards,Interlock:insert stallsDetect:Data Hazard LogicForwarding:reduce data hazard stal

4、lscompiler to avoid load stall,Solution imaginable for Control hazards,Move the Branch Computation ForwardSimple solutionsFreeze or flush the pipelinePredict-not-taken(Predict-untaken)Treat every branch as not takenPredict-takenTreat every branch as takenDelayed brancha,b,cCancelling function,Extend

5、ing the MIPS Pipeline to Handle,complex pipeline structurePipelining time parameterLatencyInitiation intervalThe out of orderThe new types of data hazardsRAW Stalls arising WAW,Instruction-Level Parallelism,CPIpipelined=Ideal pipeline CPI+pipelined stall cycles per instruction=1+Structual stalls+RAW

6、 stalls+WAR stalls+WAW stalls+Control stallsBasic Block ILP is quite smallData Dependence and HazardsTrue Data Dependence RAW(Read after write)Name dependenceAnti-dependence WAR(Write after read)Output dependence WAW(Write after write),Some Property about,Dependences are a property of programshazard

7、 or length of any stall is a property of the pipeline(hardware)Control DependenciesBranch Behavior,Overcoming Data Hazards with Dynamic Scheduling,Key idea:Allow instructions behind stall to proceedin-order issueout-of-order executionout-of-order completion,Dynamic Scheduling with a Scoreboard,Issue

8、:a instruction is issued whenThe functional unit is available and No other active instruction has the same destination register.Avoid strutural hazard and WAW hazardRead Operands(RD)The read operation is delayed until the operands are available.This means that no previously issued but ncompleted ins

9、truction has the operand as its destination.This resolves RAW hazards dynamically Execution(EX)Notify the scoreboard when completed so the functional unit can be reused.Write result(WB)The scoreboard checks for WAR hazards and stalls the completing instruction if necessary.,Dynamic Scheduling with T

10、omasulos Algorithm(renaming in hardware!),Control avoids WAR,WAW hazardsResults to FU from RS not through registers,over Common Data Bus that broadcasts results to all FUs,Three Stages of Tomasulo Algorithm,Issueget instruction from FP Op Queue If reservation station free(no structural hazard),contr

11、ol issues instr mark reservation station available,Reservation Station Components,Reservation station:Op:Operation to perform in the unit Vj,Vk:Value of Source operandsStore buffers has V field,result to be storedQj,Qk:Reservation stations producing source registers(value to be written)Note:Qj,Qk=0=

12、readyStore buffers only have Qi for RS producing resultA:hold info.for memory address calculationBusy:Indicates reservation station or FU is busRegister result statusIndicates which functional unit will write each register,if one exists.Blank when no pending instructions that will write that registe

13、r.,Reducing Branch Costs with Dynamic Hardware Prediction(3.4),1-bit Branch-Prediction Buffer2-bit Branch-Prediction BufferCorrelating Branch Prediction BufferTournament Branch PredictorBranch Target BufferTrace Cache,Hardware-Based Speculation(3.7),基本概念：基于硬件的投机技术实质上是综合了下述三种技术的一种集成技术，它们是：应用动态转移预测技术选

14、择投机指令；应用投机技术达到在控制相关性消除以前就执行投机指令；应用动态调度技术来调度程序基本块的不同组合。,基于Tomasulo动态调度的硬件投机,乱序执行按序结束增加流水级：Commit（交付，或提交）增加流水部件：Reorder BufferReorder buffer的作用,硬件投机指令执行四个节拍的功能,Issuein orderExecuteout of orderWrite result out of orderCommit-in order,指令多发射技术,一个时钟周期里发射多条指令，即指令的多发射技术。多发射技术的两种方法(Two basic flavors)：supersc

15、alar(超标量）方法VLIW(超长指令字）方法,Scheduled,Superscalar processor has dynamic issue capability,VLIW processor has static issue capability,双发射Tomasulo流水线,7,Multiple Issue with Speculation(without speculation),Multiple Issue with Speculation(with speculation),8,14,19,第四章Exploiting Instruction Level Parallelism

16、 withSoftware Approaches,Loop UnrollingUsing Loop Unrolling and Pipeline Scheduling with Static Multiple Issue,Static Multiple Issue:the VLIW Approach,相关性的几个概念loop-carried dependence循环传递相关-Data accesses in later iterations are dependent on data values produced in earlier iterationsDependence distanc

17、e 相关距离-第i次循环调用第i+n次循环的数据,其中的n环绕相关-循环体内相关同时又有循环传递相关,如何分析数组的相关性采用最大公约数（great common divisor GCD）测试法相关性类型判断,Software:Pipelining:Symbolic Loop Unrolling-软件流水技术,Global Code Scheduling trace scheduling-路径调度,Trace selection（路径选择）Trace compaction（路径压缩）Predict miss compensate(路径补偿),第六章 Multiprocessors andThr

18、ead-Level Parallelism,并行技术定义并行性含意并行性的困难集中共享存储器式系统结构-SMP（symmetric(shared-memory)multiprocessors）-UMA（uniform memory access),分布存储器体系结构互联网络消息传递节点分布式存储器结构模型Distributed shared memory（DSM or scalable shared memory)Multiple computers,通信模型,共享存储器通信模型（shared memory）消息传递模型（message passing）两种存储器组织single addres

19、s pace(distributed shared memory)private address spaces(multiple computer)两种通信模型share memorymessage passing对应关系,What Is Multiprocessor Cache Coherence?,多处理器Cache不一致性Write BackWrite throughMemory、Cache、Another cache、Bus eventsCache coherence定义Coherence问题Consistency问题正确的一致性定义包含三个方面,两类一致性协议,1.Directory

20、 based（基于目录）把物理存储器中数据块的共享状态放在一个称为目录的结构之中。（6.5节介绍）Snooping（监听）数据块的共享状态分散保留在每一拥有该数据块copy的Cache中，即不存在集中保留共享状态的结构。由于Cache通常是与共享存储器总线相连接，所有Cache控制器监控（monitor）or监听（snoop）总线，去发现它们（cache）是否拥有总线请求的数据块。,两种监听协议,1.Write invalidate protocol（写时无效协议）2.Write update or write broadcast protocol（写时更新或写广播协议）在Write back

21、与Write through下的共享数据块（Cache）的变化snooping protocols example,一致性机制的请求和操作(写无效)：处理器与总线,Cache state transitions based on requests(Ep558)1.from CPU,Cache state transitions based on requests 2.from the bus,Directory protocol 实现方法,目录协议中Cache 块可能的状态共享（shared）该块的copies存在于一个或多个processor的caches中；未进入Cache（uncache

22、d）没有一个处理器将此块拷入其cache中；独占（exclusive）只有一个处理器保存此块的拷贝，并更新过数据，于是内存中的数据已过时。此处理器为此数据的拥有者（owner）。,目录协议的操作约定写入非独占数据时，一定会导致Cache写失配，且处理器将暂停直到一次访问结束。基于目录的Cache一致性协议与snooping 不同之处：Snooping把总线（互连机制）作为判断点，起仲裁作用。Director协议不能把互连网络作为判断点。Director写是面向消息的，（不象总线是面向事务的，可采用中断方式），所有消息必须明确应答。,处理器与目录间传递消息的种类(写无效)：三类节点,三类节点的关

23、系,本地节点（local node）指产生访问请求的节点家节点（home node）指该节点拥有要访问地址的存储器单元和目录项（即要访问的数据的家）远程节点（remote node）指拥有要访问数据拷贝的节点。关系与消息：local node读数据的流向为例本地节点家节点远程节点家节点本地节点Read miss Fecth Data value reply,目录协议中的例目录协议时Cache中的数据块状态转换图,目录中数据块记录的状态转换图,同步作用与实现,硬件原语的功能几种典型的硬件原语利用一致性实现锁同步各种自旋锁性能分析与改进自旋锁同步与锁竞争分析Barrier Synchronization实现、改进,

展开阅读全文