现代计算机体系结构.ppt

上传人:小飞机 文档编号:5791039 上传时间:2023-08-20 格式:PPT 页数:54 大小:1.62MB
返回 下载 相关 举报
现代计算机体系结构.ppt_第1页
第1页 / 共54页
现代计算机体系结构.ppt_第2页
第2页 / 共54页
现代计算机体系结构.ppt_第3页
第3页 / 共54页
现代计算机体系结构.ppt_第4页
第4页 / 共54页
现代计算机体系结构.ppt_第5页
第5页 / 共54页
点击查看更多>>
资源描述

《现代计算机体系结构.ppt》由会员分享,可在线阅读,更多相关《现代计算机体系结构.ppt(54页珍藏版)》请在三一办公上搜索。

1、现代计算机体系结构,1,现代计算机体系结构,主讲教师:张钢 教授天津大学计算机学院通信邮箱:提交作业邮箱:2016年,2,The Main Contents课程主要内容,Chapter 1.Fundamentals of Quantitative Design and AnalysisChapter 2.Memory Hierarchy DesignChapter 3.Instruction-Level Parallelism and Its ExploitationChapter 4.Data-Level Parallelism in Vector,SIMD,and GPU Architec

2、turesChapter 5.Thread-Level ParallelismChapter 6.Warehouse-Scale Computers to Exploit Request-Level and Data-Level ParallelismAppendix A.Pipelining:Basic and Intermediate Concepts,3.4/3.5/3.6,课堂讨论,2023/8/20,4,Advantages of Dynamic Scheduling,Dynamic schedulingHardware rearranges the instruction exec

3、ution to reduce stalls while maintaining data flow and exception behaviorWhats the meaning that maintaining data flow and exception behavior?,2023/8/20,5,Advantages of Dynamic Scheduling,AdvantagesIt handles cases when dependences unknown at compile time it allows the processor to tolerate unpredict

4、able delays such as cache misses,by executing other code while waiting for the miss to resolveIt allows code that compiled for one pipeline to run efficiently on a different pipeline It simplifies the compiler Why?,2023/8/20,6,HW Schemes:Instruction Parallelism,Key idea:Allow instructions behind sta

5、ll to proceedDIVDF0,F2,F4ADDDF10,F0,F8SUBDF12,F8,F14Enables out-of-order execution and allows out-of-order completion(e.g.,SUBD)In a dynamically scheduled pipeline,all instructions still pass through issue stage in order(in-order issue)What are the meaning that in-order issue,out-of-order execution,

6、out-of-order completion?,2023/8/20,7,HW Schemes:Instruction Parallelism,Will distinguish when an instruction begins execution and when it completes execution;between 2 times,the instruction is in executionWhen and Where in a pipeline?Note:Dynamic execution creates WAR and WAW hazards and makes excep

7、tions harderWhy it can create WAR and WAW?,2023/8/20,8,Dynamic Scheduling Step 1,Simple pipeline had 1 stage to check both structural and data hazards:Instruction Decode(ID),also called Instruction IssueSplit the ID pipe stage of simple 5-stage pipeline into 2 stages:IssueDecode instructions,check f

8、or structural hazards Read operandsWait until no data hazards,then read operands Understand?,Ex,IS,Ex,Wb,2023/8/20,9,A Dynamic Algorithm:Tomasulos,For IBM 360/91(before caches!)Long memory latencyGoal:High Performance without special compilersSmall number of floating point registers(4 in 360)prevent

9、ed interesting compiler scheduling of operationsThis led Tomasulo to try to figure out how to get more effective registers renaming in hardware!Why Study 1966 Computer?The descendants of this have flourished!Alpha 21264,Pentium 4,AMD Opteron,Power 5,2023/8/20,10,Tomasulo Algorithm,Control Renaming a

10、voids WAR,WAW hazardsMore reservation stations than registers,so can do optimizations compilers cant,2023/8/20,11,Tomasulo Algorithm,Results to FU from RS,not through registers,over Common Data Bus that broadcasts results to all FUsAvoids RAW hazards by executing an instruction only when its operand

11、s are availableLoad and Stores treated as FUs with RSs as wellInteger instructions can go past branches(predict taken),allowing FP ops beyond basic block in FP queue,2023/8/20,12,Tomasulo Organization,FP adders,Add1Add2Add3,FP multipliers,Mult1Mult2,From Mem,FP Registers,Reservation Stations,Common

12、Data Bus(CDB),To Mem,FP OpQueue,Load Buffers,Store Buffers,Load1Load2Load3Load4Load5Load6,2023/8/20,13,Reservation Station Components,Op:Operation to perform in the unit(e.g.,+or)Vj,Vk:Value of Source operandsStore buffers has V field,result to be storedQj,Qk:Reservation stations producing source re

13、gisters(value to be written)Note:Qj,Qk=0=readyStore buffers only have Qi for RS producing result Busy:Indicates reservation station or FU is busyRegister result statusIndicates which functional unit will write each register,if one exists.Blank when no pending instructions that will write that regist

14、er.,2023/8/20,14,Three Stages of Tomasulo Algorithm,1.Issueget instruction from FP Op Queue If reservation station free(no structural hazard),control issues instr mark reservation station available,2023/8/20,15,Three Stages of Tomasulo Algorithm,Normal data bus:data+destination(“go to”bus)Common dat

15、a bus:data+source(“come from”bus)64 bits of data+4 bits of Functional Unit source addressWrite if matches expected Functional Unit(produces result)Does the broadcastExample speed:3 clocks for Flopt.+,-;10 clocks for Flopt.*;40 clocks for Flopt./,2023/8/20,16,Tomasulo Example,2023/8/20,17,Tomasulo Ex

16、ample Cycle 1,2023/8/20,18,Tomasulo Example Cycle 2,Note:Can have multiple loads outstanding,2023/8/20,19,Tomasulo Example Cycle 3,Note:registers names are removed(“renamed”)in Reservation Stations;MULT issuedLoad1 completing;what is waiting for Load1?,2023/8/20,20,Tomasulo Example Cycle 4,Load2 com

17、pleting;what is waiting for Load2?,2023/8/20,21,Tomasulo Example Cycle 5,Timer starts down for Add1,Mult1,2023/8/20,22,Tomasulo Example Cycle 6,Issue ADDD here despite name dependency on F6?,2023/8/20,23,Tomasulo Example Cycle 7,Add1(SUBD)completing;what is waiting for it?,2023/8/20,24,Tomasulo Exam

18、ple Cycle 8,2023/8/20,25,Tomasulo Example Cycle 9,2023/8/20,26,Tomasulo Example Cycle 10,Add2(ADDD)completing;what is waiting for it?,2023/8/20,27,Tomasulo Example Cycle 11,Write result of ADDD here?All quick instructions complete in this cycle!,2023/8/20,28,Tomasulo Example Cycle 12,2023/8/20,29,To

19、masulo Example Cycle 13,2023/8/20,30,Tomasulo Example Cycle 14,2023/8/20,31,Tomasulo Example Cycle 15,Mult1(MULTD)completing;what is waiting for it?,2023/8/20,32,Tomasulo Example Cycle 16,Just waiting for Mult2(DIVD)to complete,2023/8/20,33,Faster than light computation(skip a couple of cycles),2023

20、/8/20,34,Tomasulo Example Cycle 55,2023/8/20,35,Tomasulo Example Cycle 56,Mult2(DIVD)is completing;what is waiting for it?,2023/8/20,36,Tomasulo Example Cycle 57,Once again:In-order issue,out-of-order execution and out-of-order completion.,2023/8/20,37,Why can Tomasulo overlap iterations of loops?,R

21、egister renamingMultiple iterations use different physical destinations for registers(dynamic loop unrolling).Reservation stations Permit instruction issue to advance past integer control flow operationsAlso buffer old values of registers-totally avoiding the WAR stall Other perspective:Tomasulo bui

22、lding data flow dependency graph on the fly,2023/8/20,38,Tomasulos scheme offers 2 major advantages,Distribution of the hazard detection logicdistributed reservation stations and the CDBIf multiple instructions waiting on single result,&each instruction has other operand,then instructions can be rel

23、eased simultaneously by broadcast on CDB If a centralized register file were used,the units would have to read their results from the registers when register buses are availableElimination of stalls for WAW and WAR hazards,Hardware-Based Speculation,ProblemA wide issue processor may need to execute

24、a branch every clock cycle to maintain maximum performance.just predicting branches accurately may not be sufficient to generate the desired amount of ILPSolution speculating on the outcome of branches and executing the program as if our guesses were correctfetch,issue,and execute instructions,as if

25、 branch predictions were always correctProvide mechanisms to handle the situation where the speculation is incorrect.,Hardware-Based Speculation,3 key ideasdynamic branch prediction to choose which instructions to executespeculation to allow the execution of instructions before the control dependenc

26、es are resolved with the ability to undo the effects of an incorrectly speculated sequencedynamic scheduling to deal with the scheduling of different combinations of basic blocks.,Hardware-Based Speculation,Tomasulos algorithm can be extended to support speculationseparate the bypassing of results a

27、mong instructions from the actual completion of an instructionallow an instruction to execute and to bypass its results to other instructionswithout allowing the instruction to perform any updates that cannot be undoneInstructions using speculated results become speculativeWhen an instruction is no

28、longer speculative,we allow it to update the register file or memoryinstruction commit,Hardware-Based Speculation,The key ideaallow instructions to execute out of order but to force them to commit in order prevent any irrevocable action until an instruction commits 阻止任何不可更改的活动直到指令提交such as updating

29、state or taking an exceptionseparate the process of completing execution from instruction commit 把完成执行与指令提交分开an additional set of hardware buffers that hold the results of instructions before being committedreorder buffer(ROB)a source of operands for instructionssupplies operands in the interval bet

30、ween completion of instruction execution and instruction commit.,Hardware-Based Speculation,ROBA circular bufferEntries allocated and deallocated by two revolving pointersEntries allocated to each instructionStrictly in program orderKeeps track of the execution status of the instruction,Hardware-Bas

31、ed Speculation,ROB fieldsinstruction typeopcodebranch(has no destination result)store(has a memory address destination)register operation(has register destinations).destinationthe register number or the memory addressvalueInstruction resultReadyThe value is readyAddressFor load/store operationROB re

32、places the store buffer,ROB.Instruction,ROB.Dest,ROB.Value,ROB.Ready,type,dest,value,ready,ROB.A,Hardware-Based Speculation,Register fieldsBusyRegisterState.BusyReorderRegisterState.ReorderInstruction sequence number QiRegisterState.QiValueRegisterState.Value,Busy reorder Qi Value,Hardware-Based Spe

33、culation,4 stepsIssueExecuteWrite resultCommit,Hardware-Based Speculation,4 stepsIssueIf there is an empty station and empty slot in the ROBMark the reservation station and ROB as busySend the operands to the reservation station if they are available in the register or ROBExecuteWrite resultWrite re

34、sult on the CDB and from CDB into the ROBMark the reservation station as emptyCommitWhen an instruction reaches the head of the ROBMark the ROB as emptyWhen the instruction is a branch with incorrect prediction,indicate the speculation was wrongThe ROB is flushedExecution is restarted at the correct

35、 successor of the branch,Hardware-Based Speculation,Advantages of speculationPrecise interrupt 精确中断the processor with the ROB can dynamically execute code while maintaining a precise interrupt model.flushing any pending instructions in ROB,Is,Is,Is,Ex,Ex,Ex,W,W,W,Is,Ex,W,C,C,C,C,Hardware-Based Specu

36、lation,Advantages of speculationEarly recovery from branch misprediction 从分支错误预测中提早恢复the processor can easily undo its speculative actions when a branch is found to be mispredicted.clearing the ROB for all entries that appear after the mispredicted branch清除出现在错误预测分支指令之后的所有记录allowing those that are b

37、efore the branch in the ROB to continue允许出现在错误预测分支指令之前的指令继续执行performance is more sensitive to the branch prediction mechanism,prefetch,Pre-execution,commit,Hardware-Based Speculation,Advantages of speculationLoad and store hazarda store updates memory only when it reaches the head of the ROBWAW and

38、WAR hazards through memory are eliminated with speculationactual updating of memory occurs in orderRAW hazards through memory are maintained bynot allowing a load to initiate the second step of its executionCheck if any store has an Destination field that matches the value of the loadstore r1,100(r2)load r3,100(r2),作业6,3.14(英文版第五版),作业7,3.15(英文版第五版),

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 生活休闲 > 在线阅读


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号