深入理解计算机系统复习清华ppt课件.ppt

资源描述

《深入理解计算机系统复习清华ppt课件.ppt》由会员分享，可在线阅读，更多相关《深入理解计算机系统复习清华ppt课件.ppt（70页珍藏版）》请在三一办公上搜索。

1、计算机组成原理,Zhang,Youhui(张悠慧),2010 秋季,课程回顾,Topics计算机系统结构等相关概念与范畴数的表示汇编语言与C语言代码优化,计算机系统结构等相关概念与范畴,概念计算机系统结构,编写出能够在机器上正确运行的系统程序所必须了解到的计算机系统的属性研究计算机系统软件与硬件的功能分配，确定计算机系统软件与硬件的分界面研究计算机系统的外部特性，即程序员所看到的计算机系统属性,程序员看到的计算机系统属性数据表示：硬件直接认别和处理的数据类型寻址技术：编址方式、寻址方式和定位方式寄存器定义：寄存器定义、数量和使用规则指令系统：指令的操作类型、格式、排序等存储系统：要求速度

2、高、容量大、价格便宜中断系统：中断类型、中断级别和响应方式输入输出系统：数据交换方式、交换过程控制机器工作状态：定义和切换方式，如内核态、执行态、管理态和用户态等,概念计算机组成,计算机系统的逻辑实现设计功能部件：处理器，主存储器等数据通路的宽度各种操作对功能部件的共享程度确定功能部件的并行度设计缓冲和排队策略设计控制机构采用何种可靠性技术,概念汇编语言,用符号表示的机器语言，可包括宏构造,概念冯诺依曼计算机,特点:存储程序、运算器为中心、集中控制存储器是字长固定的、顺序线性编址的一维结构，每个地址是唯一定义的由指令形式的低级机器语言驱动指令顺序执行，一般按照指令在存储器中存放的顺

3、序执行，程序分支由转移指令实现运算器为中心，输入输出设备与存储器之间的数据传送都途经运算器集中控制，运算器、存储器、输入输出设备的操作以及它们之间的联系都由控制器控制,现代处理器运算速度计算公式：P Fz X IPC X TPC其中:Fz为处理机的工作主频IPC(Instruction Per Cycle)指令级并行度TPC(Threading Per Cycle)线程级并行度例如：主频3GHz，4核Pentium4处理器的最高运算速度为：P 3GHz X 4IPC X 4TPC=48GIPS 即：每秒钟480亿次,概念处理器运算速度,提高处理器性能的主要途径(1)提高主频Fz：增加流水线级数

4、，依靠计算机系统结构缩短门电路延迟时间，依靠电子技术(2)提高指令级并行度IPC依靠并行算法和计算机系统结构(3)提高线程级并行度TPC依靠并行算法、程序设计和计算机系统结构,近期出现的新问题：线延迟大于门延迟漏电流很大功耗惊人近期提高计算机性能的途径只能依靠并行算法、程序设计和计算机系统结构，不能指望电子技术不仅对计算机系统结构，而且对并行算法、软件技术和计算机应用技术都将产生深远的影响,概念指令执行速度,平均速度,概念Amdahl定律,数的表示,Bits,Bytes,and Integers,Sizes of C Objects(in Bytes)C Data TypeTypica

5、l 32-bitIntel IA32x86-64char111short222int444long448long long888float444double888long double810/1210/16char*448Or any other pointer,Bit-Level Operations in COperations&,|,Available in CLogic Operations in C&,|,!View 0 as“False”Anything nonzero as“True”Always return 0 or 1Early terminationShift Opera

6、tionsLogical vs.ArithmeticShift amount 0 or word size,Signed vs.Unsigned in C,ConstantsBy default are considered to be signed integersUnsigned if have“U”as suffix0U,4294967259UCastingExplicit casting between signed Unsigned is dangerous!,Integer C Puzzles Revisited,x=0 x&7=7(x-1x y-x=0 x 0&y 0 x+y 0

7、 x=0-x=0(x|-x)31=-1ux 3=ux/8x 3=x/8x&(x-1)!=0,int x=foo();int y=bar();unsigned ux=x;unsigned uy=y;,Initialization,Floating Point,RepresentationBits to right of“binary point”represent fractional powers of 2Represents rational number:,1,2,4,2i1,2i,1/2,1/4,1/8,2j,Numerical Form1s M 2ESign bit s determi

8、nes whether number is negative or positiveSignificand M normally a fractional value in range 1.0,2.0).Exponent E weights value by power of twoEncodingMSB is sign bitexp field encodes Efrac field encodes MSizesSingle precision:8 exp bits,23 frac bitsDouble precision:11 exp bits,52 frac bitsExtended p

9、recision:15 exp bits,63 frac bits,Floating Point Representation,s,exp,frac,“Normalized”Numeric Values,Conditionexp 0000 and exp 1111Exponent coded as biased valueE=Exp BiasExp:unsigned value denoted by exp Bias:Bias valueSingle precision:127(Exp:1254,E:-126127)Double precision:1023(Exp:12046,E:-1022

10、1023)in general:Bias=2e-1-1,where e is number of exponent bitsSignificand coded with implied leading 1M=1.xxxx2xxxx:bits of fracMinimum when 0000(M=1.0)Maximum when 1111(M=2.0)Get extra leading bit for“free”,Denormalized Values,Conditionexp=0000ValueExponent value E=Bias+1Significand value M=0.xxxx2

11、xxxx:bits of fracCases exp=0000,frac=0000Note that have distinct values+0 and 0exp=0000,frac 0000,Conditionexp=1111Cases exp=1111,frac=0000Represents value(infinity)Operation that overflowsBoth positive and negativeexp=1111,frac 0000Not-a-Number(NaN),s exp fracEValue0 0000 000-600 0000 001-61/8*1/64

12、=1/5120 0000 010-62/8*1/64=2/5120 0000 110-66/8*1/64=6/5120 0000 111-67/8*1/64=7/5120 0001000-68/8*1/64=8/5120 0001 001-69/8*1/64=9/5120 0110 110-114/8*1/2=14/160 0110 111-115/8*1/2=15/160 0111 00008/8*1=10 0111 00109/8*1=9/80 0111 010010/8*1=10/80 1110110714/8*128=2240 1110 111715/8*128=2400 1111 0

13、00n/ainf,closest to zero,largest denorm,smallest norm,closest to 1 below,closest to 1 above,largest norm,Denormalizednumbers,Normalizednumbers,Round-To-Even,Binary Fractional Numbers“Even”when least significant bit is 0Half way when bits to right of rounding position=1002ExamplesRound to nearest 1/4

14、(2 bits right of binary point)ValueBinaryRoundedActionRounded Value2 3/3210.00011210.002(1/2up)2 1/42 7/810.11100211.002(1/2up)32 5/810.10100210.102(1/2down)2 1/2,Floating Point in C,C Guarantees Two Levelsfloatsingle precisiondoubledouble precisionConversionsCasting between int,float,and double cha

15、nges numeric values Double or float to intTruncates fractional partLike rounding toward zeroNot defined when out of range or NaNGenerally sets to Tmin or Tmax int to doubleExact conversion,as long as int has 53 bit word size int to floatWill round according to rounding mode,Floating Point Puzzles,Fo

16、r each of the following C expressions,either:Argue that it is true for all argument valuesExplain why not true,x=(int)(float)xx=(int)(double)xf=(float)(double)fd=(float)df=-(-f);2/3=2/3.0d f-f-dd*d=0.0(d+f)-d=f,int x=;float f=;double d=;,Assume neitherd nor f is NaN,汇编与C语言,movl Operand Combinations,

17、Cannot do memory-memory transfer with a single instruction,movl,Imm,Reg,Mem,Reg,Mem,Reg,Mem,Reg,Source,Dest,C Analog,Src,Dest,Indexed Addressing Modes,Most General FormD(Rb,Ri,S)MemRegRb+S*RegRi+DD:Constant“displacement”Rb:Base register:Any of 8 integer registersRi:Index register:Any,except for%espU

18、nlikely youd use%ebp,eitherS:Scale:1,2,4,or 8Special Cases(Rb,Ri)MemRegRb+RegRiD(Rb,Ri)MemRegRb+RegRi+D(Rb,Ri,S)MemRegRb+S*RegRi,Address Computation Instruction,leal Src,DestSrc is address mode expressionSet Dest to address denoted by expressionUsesComputing addresses without a memory referenceE.g.,

19、translation of p=Computing arithmetic expressions of the form x+k*yk=1,2,4,or 8.,%rax,%rdx,%rcx,%rbx,%rsi,%rdi,%rsp,%rbp,x86-64 General Purpose Registers,Extend existing registers.Add 8 new ones.Make%ebp/%rbp general purpose,%eax,%edx,%ecx,%ebx,%esi,%edi,%esp,%ebp,%r8,%r9,%r10,%r11,%r12,%r13,%r14,%r

20、15,%r8d,%r9d,%r10d,%r11d,%r12d,%r13d,%r14d,%r15d,Swap in 64-bit Mode,Operands passed in registersFirst(xp)in%rdi,second(yp)in%rsi64-bit pointersNo stack operations required32-bit dataData held in registers%eax and%edx movl operation,void swap(int*xp,int*yp)int t0=*xp;int t1=*yp;*xp=t1;*yp=t0;,swap:m

21、ovl(%rdi),%edxmovl(%rsi),%eaxmovl%eax,(%rdi)movl%edx,(%rsi)retq,Reading Condition Codes,SetX InstructionsSet single byte based on combinations of condition codes,Conditional Branch Example,int absdiff(int x,int y)int result;if(x y)result=x-y;else result=y-x;return result;,absdiff:pushl%ebpmovl%esp,%

22、ebpmovl 8(%ebp),%edxmovl 12(%ebp),%eaxcmpl%eax,%edxjle.L7subl%eax,%edxmovl%edx,%eax.L8:leaveret.L7:subl%edx,%eaxjmp.L8,Body1,SetUp,Finish,Body2,pushl%ebpmovl%esp,%ebppushl%ebxmovl8(%ebp),%ecxmovl12(%ebp),%edxmovl%ecx,%ebxsubl%edx,%ebxmovl%edx,%eaxsubl%ecx,%eaxcmpl%edx,%ecxcmovg%ebx,%eaxpopl%ebxpopl%

23、ebpret,int absdiff(int x,int y)int result;if(x y)result=x-y;else result=y-x;return result;,Gcc 4.3.4,New Conditional Branch Example,Implementing Loops,IA32All loops translated into form based on“do-while”x86-64Also make use of“jump to middle”Why the DifferenceIA32 compiler developed for machine wher

24、e all operations costlyx86-64 compiler developed for machine where unconditional branches incur(almost)no overhead,“For”“While”“Do-While”,for(Init;Test;Update)Body,Init;while(Test)Body Update;,Goto Version,Init;if(!Test)goto done;loop:Body Update;if(Test)goto loop;done:,While Version,For Version,Do-

25、While Version,Init;if(!Test)goto done;do Body Update;while(Test)done:,“For”“While”(Jump-to-Middle),for(Init;Test;Update)Body,Init;while(Test)Body Update;,Init;goto middle;loop:Body Update;middle:if(Test)goto loop;done:,While Version,For Version,Goto Version,Switch Statements,Implementation OptionsSe

26、ries of conditionalsOrganize in tree structureLogarithmic performanceJump TableLookup branch targetConstant timePossible when cases are small integer constantsGCCPicks one based on case structure,yoo,who,proc,Stack“Top”,IA32-Stack Frames,ContentsLocal variablesReturn informationTemporary spaceManage

27、mentSpace allocated when enter procedure“Set-up”codeDeallocated when return“Finish”codePointersStack pointer%esp indicates stack topFrame pointer%ebp indicates start of current frame,amI,IA32/Linux Stack Frame,Current Stack Frame(“Top”to Bottom)Parameters for function about to call“Argument build”Lo

28、cal variablesIf cant keep in registersSaved register contextOld frame pointerCaller Stack FrameReturn addressPushed by call instructionArguments for this call,Stack Pointer(%esp),Frame Pointer(%ebp),Return Addr,SavedRegisters+LocalVariables,ArgumentBuild,Old%ebp,Arguments,CallerFrame,IA32/Linux Regi

29、ster Usage,Integer RegistersTwo have special uses%ebp,%espThree managed as callee-save%ebx,%esi,%ediOld values saved on stack prior to usingThree managed as caller-save%eax,%edx,%ecxDo what you please,but expect any callee to do so,as wellRegister%eax also stores returned value,%eax,%edx,%ecx,%ebx,%

30、esi,%edi,%esp,%ebp,Caller-SaveTemporaries,Callee-SaveTemporaries,Special,%rax,%rbx,%rcx,%rdx,%rsi,%rdi,%rsp,%rbp,x86-64 Register Conventions,%r8,%r9,%r10,%r11,%r12,%r13,%r14,%r15,Return Value,Callee Saved,Argument#4,Argument#3,Argument#2,Argument#1,Stack Pointer,Callee Saved,Argument#5,Argument#6,Ca

31、llee Saved,Used for linking,C:Callee Saved,Callee Saved,Callee Saved,Callee Saved,x86-64 Registers,Arguments passed to functions via registersIf more than 6 integral parameters,then pass rest on stackThese registers can be used as caller-saved as wellAll References to Stack Frame via Stack PointerEl

32、iminates need to update%ebpOther Registers6+1 callee saved2 or 3 have special uses,x86-64 Locals in the Red Zone,Avoiding Stack Pointer ChangeCan hold all information within small window beyond stack pointer,/*Swap,using local array*/void swap_a(long*xp,long*yp)volatile long loc2;loc0=*xp;loc1=*yp;*

33、xp=loc1;*yp=loc0;,swap_a:movq(%rdi),%rax movq%rax,-24(%rsp)movq(%rsi),%rax movq%rax,-16(%rsp)movq-16(%rsp),%rax movq%rax,(%rdi)movq-24(%rsp),%rax movq%rax,(%rsi)ret,rtn Ptr,unused,%rsp,8,loc1,loc0,16,24,x86-64 NonLeaf without Stack Frame,No values held while swap being invokedNo callee save register

34、s needed,long scount=0;/*Swap ai,swap_ele_se:movslq%esi,%rsi#Sign extend i leaq(%rdi,%rsi,8),%rdi#ret,x86-64 Call using Jump,When swap executes ret,it will return from swap_elePossible since swap is a“tail call”,long scount=0;/*Swap ai,swap_ele:movslq%esi,%rsi#Sign extend i leaq(%rdi,%rsi,8),%rdi#&a

35、i leaq 8(%rdi),%rsi#&ai+1 jmp swap#swap(),Interesting Features of Stack Frame,Allocate Entire Frame at OnceAll stack accesses can be relative to%rspDo by decrementing stack pointerCan delay allocation,since safe to temporarily use red zoneSimple DeallocationIncrement stack pointer,Basic Data Types,I

36、ntegralStored&operated on in general registersSigned vs.unsigned depends on instructions usedIntelGASBytesCbyteb1unsigned charwordw2unsigned shortdouble wordl4unsigned intquad wordq8unsigned long int(x86-64)Floating PointStored&operated on in floating point registersIntelGASBytesCSingles4floatDouble

37、l8doubleExtendedt10/12/16long double,Array Allocation,Basic PrincipleT AL;Array of data type T and length LContiguously allocated region of L*sizeof(T)bytesIdentifier A can be used as a pointer to array element 0Type T*,Viewing as Multidimensional Array,DeclarationT ARC;2D array of data type TR rows

38、,C columnsType T element requires K bytesArray SizeR*C*K bytesArrangementRow-Major Ordering,int ARC;,4*R*C Bytes,Nested Array Row Access,Row Vectors Ai is array of C elementsEach element of type T requires K bytesStarting address A+i*(C*K),A,int ARC;,A+i*C*4,A+(R-1)*C*4,Nested Array Element Access,A

39、rray Elements Aij is element of type TAddress A+i*(C*K)+j*K=A+(i*C+j)*K,Aij,Aij,Ai,A,int ARC;,A+i*C*4,A+(R-1)*C*4,A+(i*C+j)*4,Nested Array Element Access Code,Array Elements pghindexdig is intAddress:pgh+20*index+4*digIA32 CodeComputes addresspgh+4*dig+4*(index+4*index)movl performs memory reference

40、,int get_pgh_digit(int index,int dig)return pghindexdig;,#%ecx=dig#%eax=indexleal 0(,%ecx,4),%edx#4*digleal(%eax,%eax,4),%eax#5*indexmovl pgh(%edx,%eax,4),%eax#*(pgh+4*dig+20*index),Array Element Accesses,Similar C referencesNested ArrayElement atMempgh+20*index+4*dig,Different address computationMu

41、lti-Level ArrayElement atMemMemuniv+4*index+4*dig,int get_pgh_digit(int index,int dig)return pghindexdig;,int get_univ_digit(int index,int dig)return univindexdig;,Using Nested Arrays,StrengthsC compiler handles doubly subscripted arraysGenerates very efficient codeAvoids multiply in index computati

42、onLimitationOnly works if have fixed array size,#define N 16typedef int fix_matrixNN;,/*Compute element i,k of fixed matrix product*/int fix_prod_ele(fix_matrix a,fix_matrix b,int i,int k)int j;int result=0;for(j=0;j N;j+)result+=aij*bjk;return result;,Dynamic Nested Arrays,StrengthCan create matrix

43、 of arbitrary sizeProgrammingMust do index computation explicitlyPerformanceAccessing single element costlyMust do multiplication,int*new_var_matrix(int n)return(int*)calloc(sizeof(int),n*n);,int var_ele(int*a,int i,int j,int n)return ai*n+j;,movl 12(%ebp),%eax#imovl 8(%ebp),%edx#aimull 20(%ebp),%ea

44、x#n*iaddl 16(%ebp),%eax#n*i+jmovl(%edx,%eax,4),%eax#Mema+4*(i*n+j),Optimizing Dynamic Array Mult.,OptimizationsPerformed when set optimization level to-O2Code MotionExpression i*n can be computed outside loopStrength ReductionIncrementing j has effect of incrementing j*n+k by nPerformanceCompiler ca

45、n optimize regular access patterns,int j;int result=0;for(j=0;j n;j+)result+=ai*n+j*bj*n+k;return result;,int j;int result=0;int iTn=i*n;int jTnPk=k;for(j=0;j n;j+)result+=aiTn+j*bjTnPk;jTnPk+=n;return result;,struct S1 char c;int i2;double v;*p;,Satisfying Alignment with Structures,Offsets Within S

46、tructureMust satisfy elements alignment requirementOverall Structure PlacementEach structure has alignment requirement KLargest alignment of any elementInitial address&structure length must be multiples of KExample(under Windows or x86-64):K=8,due to double element,c,i0,i1,v,p+0,p+4,p+8,p+16,p+24,Mu

47、ltiple of 4,Multiple of 8,Multiple of 8,Multiple of 8,Specific Cases of Alignment(IA32),Size of Primitive Data Type:1 byte(e.g.,char)no restrictions on address2 bytes(e.g.,short)lowest 1 bit of address must be 024 bytes(e.g.,int,float,char*,etc.)lowest 2 bits of address must be 0028 bytes(e.g.,doubl

48、e)Windows(and most other OSs&instruction sets):lowest 3 bits of address must be 0002Linux:lowest 2 bits of address must be 002i.e.,treated the same as a 4-byte primitive data type12 bytes(long double)Windows,Linux:lowest 2 bits of address must be 002i.e.,treated the same as a 4-byte primitive data t

49、ype,Specific Cases of Alignment(x86-64),Size of Primitive Data Type:1 byte(e.g.,char)no restrictions on address2 bytes(e.g.,short)lowest 1 bit of address must be 024 bytes(e.g.,int,float)lowest 2 bits of address must be 0028 bytes(e.g.,double,char*)Windows&Linux:lowest 3 bits of address must be 0002

50、16 bytes(long double)Linux:lowest 3 bits of address must be 0002i.e.,treated the same as a 8-byte primitive data type,Union Allocation,PrinciplesOverlay union elementsAllocate according to largest elementCan only use one field at a time,union U1 char c;int i2;double v;*up;,struct S1 char c;int i2;do

展开阅读全文

深入理解计算机系统复习 清华ppt课件.ppt

深入理解计算机系统复习清华ppt课件.ppt