DSP原理与应用-第2章CPU结构与指令集.ppt

上传人:小飞机 文档编号:5428121 上传时间:2023-07-05 格式:PPT 页数:129 大小:1.25MB
返回 下载 相关 举报
DSP原理与应用-第2章CPU结构与指令集.ppt_第1页
第1页 / 共129页
DSP原理与应用-第2章CPU结构与指令集.ppt_第2页
第2页 / 共129页
DSP原理与应用-第2章CPU结构与指令集.ppt_第3页
第3页 / 共129页
DSP原理与应用-第2章CPU结构与指令集.ppt_第4页
第4页 / 共129页
DSP原理与应用-第2章CPU结构与指令集.ppt_第5页
第5页 / 共129页
点击查看更多>>
资源描述

《DSP原理与应用-第2章CPU结构与指令集.ppt》由会员分享,可在线阅读,更多相关《DSP原理与应用-第2章CPU结构与指令集.ppt(129页珍藏版)》请在三一办公上搜索。

1、第二章 CPU结构与指令集,第二章 CPU结构与指令集,2.1 CPU结构 2.2 C67x指令集 2.3 流水线 2.4 中 断,DSP的一般结构,2.1 CPU结构,外设,CPU,内部存储器,内部总线,外 部存储器,存储器的层次,2.1.1 CPU结构框图,程序取指,指令分配,指令译码,程序执行机构,程序总线,数据总线,存储器映射(统一编址),2.1.2 CPU数据通道,2个通用寄存器组(A和B),包括32个寄存器8个功能单元(.L1、.L2、.S1、.S2、.M1、.M2、.D1、.D2)2个数据读取通路(LD1和LD2),每侧有2个32位读取总线2个数据存储通路(ST1和ST2),每侧

2、有1个32位存储总线2个寄存器组交叉通路(1x和2x)2个数据寻址通路(DA1和DA2),1.通用寄存器组功能,(1)存放数据,作为指令的源操作数和目的操作数。注意传送方向和数据字长(2)作为间接寻址的地址指针,寄存器A4A7和B4B7还能够以循环寻址方式工作(3)A1、A2、B0、B1和B2可用做条件寄存器,40位/64位寄存器对,All combinations of 40-bit registers are shown below:The registers must be from the same side.The first register must be even and th

3、e second odd.The registers must be consecutive.,2.功能单元,3.寄存器组交叉通路,.L1、.S1、.D1和.M1可以直接读写寄存器组A.L2、.S2、.D2和.M2可以直接读写寄存器组B 1x交叉通路允许数据通道A的功能单元从寄存器组B读它的源操作数2x交叉通路允许数据通道B的功能单元从寄存器组A读它的源操作数,4.数据存储器及读取存储通路,(Load指令)寄存器组A的读入通路为LD1(2个32位)寄存器组B的读入通路为LD2(2个32位)LDDW指令一次可读取64位数据到A或B寄存器(Store指令)寄存器组A的写数据通路为ST1寄存器组B的

4、写数据通路为ST2,5.数据地址通路,数据地址通路DA1和DA2来自.D功能单元 数据通道以T1、T2表示 LDW.D1T2*A03,B1D1产生地址,用LD2数据通道读入数据到B1寄存器,6.控制寄存器组,6.控制寄存器组,寻址模式寄存器(AMR),控制状态寄存器(CSR),6.控制寄存器组,指令集概述,1.指令和功能单元之间的映射 指令 功能单元乘法相关的指令都是在.M单元执行产生数据存储器地址的指令在.D功能单元算术逻辑运算大多在.L与.S单元执行,2.延迟间隙,延迟间隙在数量上等于从指令的源操作数被读取直到执行的结果可以被访问所需要的指令周期数。,3.指令操作码映射图,CPU运行时,总

5、是一次取 8条指令,组成一个取指包 所有并行执行的指令组成一个执行包,4.并行操作,3位操作码字段creg指定条件寄存器1位字段z指定是零测试还是非零测试z=1,进行零测试,即条件寄存器的内容为0是真z=0,进行非零测试,即条件寄存器的内容非0是真creg=0,z=0,则指令将无条件地执行 条件寄存器:A1、A2、B0、B1和B2 B0 ADD.L1 A1,A2,A3|!B0 ADD.L2 B1,B2,B3 以上2条指令是相互排斥,5.条件操作,6.寻址方式,全部采用间接寻址所有寄存器都可以做线性寻址的地址指针循环寻址的地址指针:A4-A7,B4-B7由AMR控制地址修改方式:线件方式或循环方

6、式循环寻址循环块的尺寸与BK0BK1内5位数值N的关系为块尺寸2(N+1)字节例如N的二进制数为10000,等于十进制16,则块尺寸2(16+1)131072字节,6.寻址方式,读取存储类指令访问数据存储器地址的汇编语法格式,ucst5-无符号二进制5位常数偏移量ucst15-无符号二进制15位常数偏移量,Little-endian ordering,in which bytes are ordered from right to left,the most significant byte having the highest addressBig-endian ordering,in wh

7、ich bytes are ordered from left to right,the most significant byte having the lowest address大小端位由DSP的LENDIAN管脚设置,终结方式(大小端位),BA 98 76 54h11,10,01,00,BA 98 76 54h00,01,10,11,终结方式(大小端位),C6000DSP 汇编指令分类,读取/存储类指令算术运算类指令逻辑与位操作类指令搬移类指令程序转移类指令空操作类指令浮点操作类指令,2.2.2 读取/存储类指令,读取指令:LDBLDBULDHLDHULDWLDDW存储指令:STBST

8、HSTW单字节(Byte)、双字节(半字Half Word)和四字节(字Word)位数不同有符号/无符号(U),符号位的扩展不同地址偏移修正因子:LDB(U)LDH(U)LDW指令分别读入字节半字字,要乘以相应的比例因子1、2、4,2.2.2 读取/存储类指令,例2-1 线性寻址下的变址计算 LDW.D1*+A41,A6此例为先修改地址,地址偏移量按14计算,计算结果如下所示:示例0201,2.2.2 读取/存储类指令,例2-2 循环寻址方式下的地址计算 LDW.D1*+A49,A1此处假设寻址模式寄存器AMR=0 x00030001,A4已被设定为循环寻址方式,块字节尺寸为24=16=10h

9、(N=3)。因为是以字为单位读取,变址偏移量为94=36=24h。线件寻址时地址应为00000124h;循外寻址时,24h对低4位(第0-3位)地址10h取模,余数为4,故实际寻址地址为00000104h。示例0202,2.2.3 算术运算类指令,1.加减运算指令(1)ADD/ADDU/SUB/SUBU 操作数为整型(32位)或长整型(40位)的指令(2)ADD2/SUB2操作数为半字(16位)的指令,ADD2SUB2指令的特点是同时进行2个16位补码数的加减运算,高半字与低半字之间没有进借位,各自独立进行(3)SADD/SSUB带饱和的有符号数加减运算指令,操作数为32位或40位有符号数(4

10、)ADDK与16位常数进行加法操作的指令(5)ADDABADDAHADDAWADDAD,SUBABSUBAHSUBAW按寻址方式的加减运算类指令,溢出问题如果运算结果超出目的操作数字长所能表示的范围,造成运算结果的高位丢失,使保存的运算结果不正确,称为溢出。通常有3种办法解决溢出问题(1)用较长的字长来存放运算结果:1632可行,3240耗时(2)用带饱和的加减运算指令做补码数加减运算:保证符号不变,并且给出提示位(3)对整个系统乘一个小于1的比例因子:实际中最常用,2.2.3 算术运算类指令,2.2.3 算术运算类指令,例2-3 减法运算举例SSUB.L2 B1,B2,B3 SUB.L2 B

11、1,B2,B3SUB.L2 B1,B2,B5:B4示例0203,溢出了!,符号位在这里无溢出!,2.2.3 算术运算类指令,例2-4 计算累加和的程序,用长型数存放和数,有8位保护位,示例0204Loop:LDW.D1*A4+,A0 NOP 4 ADD.L1 A3:A2,A0,A3:A2 SUB.L2 B1,1,B1 B1 B.S1 Loop NOP 5,2.2.3 算术运算类指令,例2-5 按寻址方式的加法运算指令 ADDAH.D1 A4,A2,A4B2=16h 对8取余得6示例0205,2.2.3 算术运算类指令,例2-6 按寻址方式的减法运算指令 SUBAB.D1 A5,A0,A5-41

12、=-4h 对10h取余得C示例0205,2.2.3 算术运算类指令,2.乘法运算指令(1)适宜于整数乘法的指令 以MPY为首字母的22条指令(2)适宜于Q格式数相乘的3条指令 SMPYSMPYLHSMPYHL以1616位的硬件乘法器为基础,整数乘法的2个源操作数都是16位字长,目的操作数为32的寄存器,不存在溢出问题。,所谓定点格式,即约定机器中所有数据的小数点位置是固定不变的。在计算机中通常采用两种简单的约定:将小数点的位置固定在数据的最高位之前,或者是固定在最低位之后。一般常称前者为定点小数,后者为定点整数。Q格式:小数点位于第 n 位之右侧,称为Qn 格式 定点小数是纯小数,约定的小数点

13、位置在符号位之后、有效数值部分最高位之前。若数据 x 的形式为 x=x0.x1x2xn(其中x0为符号位,x1xn是数值的有效部分,也称为尾数,x1为最高有效位),则在计算机中的表示形式为:,Q格式定点数,2-n|x|1-2-n,Q0格式数,定点整数是纯整数,约定的小数点位置在有效数值部分最低位之后。若数据 x 的形式为 x=x0 x1x2xn(其中x0为符号位,x1xn 是尾数,xn 为最低有效位),则在计算机中的表示形式为:,Q格式定点数,1|x|2n-1,Q15 格式数,2.2.3 算术运算类指令,例2-7 整数乘法运算(1)MPYH.M1 A1,A2,A3(2)MPYHU.M1 A1,

14、A2,A3示例0207,符号位扩展,无符号数,1.逻辑运算指令:AND、OR、XOR、NEG(求补码)2.移位指令:算术左移指令SHL、算术右移指令SHR、逻辑右移(无符号扩展右移)指令SHRU、带饱和的算术左移指令SSHL。SHR src2,src1,dst(src1的低6位指定右移位数)3.位操作指令:寄存器控制中常用位域清零/置位指令CLR/SET;带符号扩展与无符号扩展的位域提取指令EXT/EXTULMBD指令,寻找src2中与src1最低位(LSB)相同的最高位位置 NORM指令,检测有多少个冗余的符号位 4.比较及判别类指令:循环条件判断时用CMPEQ/CMPGT(U)/CMPLT

15、(U)指令用于比较两个有/无符号数的相等、大于、小于,若为真,则目的寄存器置1;反之,目的寄存器置0。,2.2.4 逻辑及位域操作指令,MV:通用寄存器之间传送数据MVC:用于在通用寄存器与控制寄存器之间传送数据,此条指令只能使用.S2功能单元 MVK:用于把16位常数送入通用寄存器MVKH/MVKLH MVKL 结合生成32位常量,2.2.5 搬移类指令,用标号label表示目标地址的转移指令B(.S)label用寄存器表示目标地址的转移指令B.S2 src2从可屏蔽中断寄存器取目标地址的转移指令B.S2 IRP从不可屏蔽中断寄存器取目标地址的转移指令B.S2 NRP转移指令有5个指令周期的

16、延迟间隙。转移指令后的 5个指令执行包全部进入CPU流水线,并相继执行。,程序转移类指令,2.2.7 浮点运算指令,(1)浮点加减法指令ADDSP/ADDDP/SUBSP/SUBDP(2)数据类型转换指令10条INTSP(U)/INTDP(U)/SPINT/DPINT/SPDP/SPTRUNC/DPSP/DPTRUNC(3)浮点乘法及32位整数乘法指令6条MPYSP MPYSPDPMPYSP2DPMPYDPMPYIMPYID(4)特殊的浮点运算指令6条 ABSSPABSDPRCPSPRCPDP(倒数)RSQRSPRSQRDP(平方根倒数)(5)单双精度浮点数的比较判决指令6条 CMPLTSPC

17、MPLTDPCMPGTSPCMPGTDPCMPEQSPCMPEQDP(6)双精度数据的读取存储指令LDDWSTDW,1.IEEE标准的浮点数表示法,s代表数的符号,0为正,1为负e是指数阶码,视做无符号数(0e255)f是尾数的分数部分,float,double,32位单精度浮点数格式,一些特殊数的单精度浮点数的符号,64位双精度浮点数格式,一些特殊数的单精度浮点数的符号,2.C672x的浮点运算控制寄存器,作用:(1)为.L、.S和.M单元的运算设置浮点舍入方式;(2)包括一些字段用来记录指令执行中遇到的问题,以便检查,包括:源操作数src1和src2是否是无效数NaN或非规格化数;结果是否

18、上溢、下溢、不准确、无穷大或者无效;是否执行了除以零的操作;是否用了NaN源操作数作比较等。,FADCR,2.C672x的浮点运算控制寄存器,Implementation of Sum of Products(SOP),Implementation of Sum of Products(SOP),SOP is the key element for most DSP algorithms.So lets write the code for this algorithm and at the same time learn the C6000 architecture.,Multiply(MP

19、Y),The multiplication of a1 by x1 is done in assembly by the following instruction:MPYa1,x1,YThis instruction is performed by a multiplier unit that is called“.M”,=a1*x1+a2*x2+.+aN*xN,The implementation in this module will be done in assembly.,Multiply(.M unit),.M,The.M unit performs multiplications

20、 in hardware MPY.Ma1,x1,Y,Note:16-bit by 16-bit multiplier provides a 32-bit result.32-bit by 32-bit multiplier provides a 64-bit result.,Addition(.?),.M,.?,MPY.Ma1,x1,prodADD.?Y,prod,Y,Add(.L unit),.M,.L,MPY.Ma1,x1,prodADD.LY,prod,Y,RISC processors such as the C6000 use registers to hold the operan

21、ds,so lets change this code.,Register File-A,MPY.Ma1,x1,prodADD.LY,prod,Y,Let us correct this by replacing a,x,prod and Y by the registers as shown above.,Specifying Register Names,MPY.MA0,A1,A3ADD.LA4,A3,A4,The registers A0,A1,A3 and A4 contain the values to be used by the instructions.,MPY.MA0,A1,

22、A3ADD.LA4,A3,A4,Register File A contains 16 registers(A0-A15)which are 32-bits wide.,Specifying Register Names,A:The operands are loaded into the registers by loading them from the memory using the.D unit.,Q:How do we load the operands into the registers?,Data loading,A:The load instructions.,Q:Whic

23、h instruction(s)can be used for loading operands from the memory to the registers?,Load Instructions(LDB,LDH,LDW,LDDW),00000000,00000004,00000008,0000000C,00000010,Data,32-bits,Before using the load unit you have to be aware that this processor is byte addressable,which means that each byte is repre

24、sented by a unique address.Also the addresses are 32-bit wide.,address,FFFFFFFF,Load Instructions(LDB,LDH,LDW,LDDW),The syntax for the load instruction is:Where:Rn is a register that contains the address of the operand to be loaded and Rm is the destination register.,00000000,00000004,00000008,00000

25、00C,00000010,Data,a1,x1,prod,32-bits,Y,address,FFFFFFFF,LDx*Rn,Rm,Load Instructions(LDB,LDH,LDW,LDDW),The syntax for the load instruction is:The question now is how many bytes are going to be loaded into the destination register?,00000000,00000004,00000008,0000000C,00000010,Data,a1,x1,prod,32-bits,Y

26、,address,FFFFFFFF,Load Instructions(LDB,LDH,LDW,LDDW),LDx*Rn,Rm,The syntax for the load instruction is:,00000000,00000004,00000008,0000000C,00000010,Data,a1,x1,prod,32-bits,Y,address,FFFFFFFF,The answer,is that it depends on the instruction you choose:LDB:loads one byte(8-bit)LDH:loads half word(16-

27、bit)LDW:loads a word(32-bit)LDDW:loads a double word(64-bit)Note:LD on its own does not exist.,Load Instructions(LDB,LDH,LDW,LDDW),LDx*Rn,Rm,00000000,00000004,00000008,0000000C,00000010,Data,32-bits,address,FFFFFFFF,D C B A,Example:If we assume that A5=0 x4 then:(1)LDB*A5,A7;gives A7=0 x00000001(2)L

28、DH*A5,A7;gives A7=0 x00000201(3)LDW*A5,A7;gives A7=0 x04030201(4)LDDW*A5,A7:A6;gives A7:A6=0 x0807060504030201,The syntax for the load instruction is:,0,1,Load Instructions(LDB,LDH,LDW,LDDW),LDx*Rn,Rm,4 3 2 1,8 7 6 5,F E 9 0,Question:If data can only be accessed by the load instruction and the.D uni

29、t,how can we load the register pointer Rn in the first place?,The syntax for the load instruction is:,Load Instructions(LDB,LDH,LDW,LDDW),LDx*Rn,Rm,00000000,00000004,00000008,0000000C,00000010,Data,32-bits,address,FFFFFFFF,D C B A,0,1,4 3 2 1,8 7 6 5,F E 9 0,The instruction MVKL will allow a move of

30、 a 16-bit constant into a register as shown below:MVKL.?a,A5(a is a constant or label)How many bits represent a full address?32 bitsSo why does the instruction not allow a 32-bit move?All instructions are 32-bit wide(see instruction opcode).,Loading the Pointer Rn,To solve this problem another instr

31、uction is available:MVKH.?a,A5Finally,to move the 32-bit address to a register we can use:,MVKL a,A5MVKH a,A5,Loading the Pointer Rn,MVKL0 x1234FABC,A5 A5=0 xFFFFFABC;Wrong,Example 1 A5=0 x87654321 MVKL0 x1234FABC,A5 A5=0 xFFFFFABC(sign extension),MVKH0 x1234FABC,A5 A5=0 x1234FABC;OK,Example 2 MVKH0

32、 x1234FABC,A5 A5=0 x12344321,Always use MVKL then MVKH,look at the following examples:,Loading the Pointer Rn,LDH,MVKL and MVKH,Creating a loop,With the C6000 processors there are no dedicated instructions such as block repeat.The loop is created using the B instruction.,So far we have only implemen

33、ted the SOP for one tap only,i.e.Y=a1*x1So lets create a loop so that we can implement the SOP for N Taps.,What are the steps for creating a loop,1.Create a label to branch to.2.Add a branch instruction,B.3.Create a loop counter.4.Add an instruction to decrement the loop counter.5.Make the branch co

34、nditional based on the value in the loop counter.,1.Create a label to branch to,2.Add a branch instruction,B.,Data Memory,Which unit is used by the B instruction?,.D,.M,.L,A0A1A2A3A15,Register File A,.,a,x,prod,32-bits,Y,.D,.S,Data Memory,3.Create a loop counter.,.D,.M,.L,A0A1A2A3A15,Register File A

35、,.,a,x,prod,32-bits,Y,.D,.S,B registers will be introduced later,4.Decrement the loop counter,.D,Data Memory,.M,.L,A0A1A2A3A15,Register File A,.,a,x,prod,32-bits,Y,.D,.S,What is the syntax for making instruction conditional?condition InstructionLabele.g.B0Bloop,(1)The condition can be one of the fol

36、lowing registers:A1,A2,B0,B1,B2.(2)Any instruction can be conditional.,5.Make the branch conditional,The condition can be inverted by adding the exclamation symbol“!”as follows:!condition InstructionLabele.g.!B0Bloop;branch if B0=0B0Bloop;branch if B0!=0,5.Make the branch conditional,Data Memory,.D,

37、.M,.L,A0A1A2A3A15,Register File A,.,a,x,prod,32-bits,Y,.D,.S,5.Make the branch conditional,Case 1:B.S1 labelRelative branch.Label limited to+/-220 offset.,More on the Branch Instruction(1),With this processor all the instructions are encoded in a 32-bit.Therefore the label must have a dynamic range

38、of less than 32-bit as the instruction B has to be coded.,More on the Branch Instruction(2),By specifying a register as an operand instead of a label,it is possible to have an absolute branch.This will allow a dynamic range of 232.,Case 2:B.S2registerAbsolute branch.Operates on.S2 ONLY!,Testing the

39、code,This code performs the following operations:a0*x0+a0*x0+a0*x0+a0*x0However,we would like to perform:a0*x0+a1*x1+a2*x2+aN*xN,Modifying the pointers,The solution is to modify the pointers A5 and A6.,Indexing Pointers,R can be any register,In this case the pointers are used but not modified.,Index

40、ing Pointers,disp specifies the number of elements size in DW(64-bit),W(32-bit),H(16-bit),or B(8-bit).disp=R or 5-bit constant.R can be any register.,In this case the pointers are modified BEFORE being usedand RESTORED to their previous values.,Indexing Pointers,In this case the pointers are modifie

41、d BEFORE being usedand NOT RESTORED to their Previous Values.,Indexing Pointers,In this case the pointers are modified AFTER being usedand NOT RESTORED to their Previous Values.,Description,Pointer+Pre-offset-Pre-offsetPre-incrementPre-decrementPost-incrementPost-decrement,Syntax,PointerModified,*R*

42、+Rdisp*-Rdisp*+Rdisp*-Rdisp*R+disp*R-disp,NoNoNoYesYesYesYes,disp specifies#elements-size in DW,W,H,or B.disp=R or 5-bit constant.R can be any register.,Indexing Pointers,Modify and testing the code,This code now performs the following operations:a0*x0+a1*x1+a2*x2+.+aN*xN,The Pointer A7 is now initi

43、alised.,Store the final result,What is the initial value of A4?,A4 is used as an accumulator,so it needs to be reset to zero.,How can we add more processing power to this processor?,.S1,.M1,.L1,.D1,A0,A1,A2,A3,A4,Register File A,.,Data Memory,A15,32-bits,Increasing the processing power!,(1)Increase

44、the clock frequency.,(2)Increase the number of Processing units.,.S1,.M1,.L1,.D1,A0,A1,A2,A3,A4,Register File A,.,Data Memory,32-bits,A15,Increasing the processing power!,Two side Processing units(A and B),Data Memory,Can they exchange operands?,Data Memory,The answer is YES but there are limitation

45、s.,To exchange operands between the two sides,some cross paths or links are required.,What is a cross path?,A cross path links one side of the CPU to the other.There are two types of cross paths:Data cross paths.Address cross paths.,Data Cross Paths,Data cross paths can also be referred to as regist

46、er file cross paths.These cross paths allow operands from one side to be used by the other side.,There are only two cross paths:one path which conveys data from side B to side A,1X.one path which conveys data from side A to side B,2X.,Data cross paths only apply to the.L,.S and.M units.The data cros

47、s paths are very useful,however there are some limitations in their use.,Data Cross Paths,Data Cross Path Limitations,(1)The destination register must be on same side as unit.(2)Source registers-up to one cross path per execute packet per side.Execute packet:group of instructions that execute simult

48、aneously.,eg:ADD.L1x A0,B2,A2MPY.M1x A0,B6,A9SUB.S1x A8,B2,A8|ADD.L1x A0,B0,A2|Means that the SUB and ADD belong to the same fetch packet,therefore execute simultaneously.,Data Cross Path Limitations,eg:ADD.L1x A0,B2,A2MPY.M1x A0,B6,A9SUB.S1x A8,B2,A8|ADD.L1x A0,B0,A2NOT VALID!,Data Cross Path Limit

49、ations,Data Cross Paths for both sides,A,2x,.L1.M1.S1,B,1x,.L2.M2.S2,Address cross paths,LDW.D1T1*A0,A5STW.D1T1 A5,*A0,(1)The pointer must be on the same side of the unit.,Load or store to either side,.D1,A,*A0,B,Data1,A5,Data2,B5,DA1=T1,DA2=T2,LDW.D1T1*A0,A5LDW.D1T2*A0,B5,Standard Parallel Loads,.D

50、1,A,A5,*A0,B,B5,.D2,Data1,*B0,LDW.D1T1*A0,A5|LDW.D2T2*B0,B5,DA1=T1,DA2=T2,Parallel Load/Store using address cross paths,.D1,A,A5,*A0,B,B5,.D2,Data1,*B0,LDW.D1T2*A0,B5|STW.D2T1 A5,*B0,DA1=T1,DA2=T2,Fill the blanks.Does this work?,.D1,A,*A0,B,.D2,Data1,*B0,LDW.D1_*A0,B5|STW.D2_ B6,*B0,DA1=T1,DA2=T2,No

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 生活休闲 > 在线阅读


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号