关于FPGA的外文文献翻译一种新的包装布局和布线工具的FPGA研究.doc

资源描述

《关于FPGA的外文文献翻译一种新的包装布局和布线工具的FPGA研究.doc》由会员分享，可在线阅读，更多相关《关于FPGA的外文文献翻译一种新的包装布局和布线工具的FPGA研究.doc（25页珍藏版）》请在三一办公上搜索。

1、译文VPR：一种新的包装，布局和布线工具的FPGA研究沃恩贝茨和乔纳森罗斯系电气与计算机工程系，多伦多大学多伦多，ON，加拿大M5S3G4沃恩，jayar eecg.toronto.edu摘要我们描述了一个基于FPGA新的功能和CAD工具使用的算法，各种途径和方（VPR）。在减少路由面积计算方面，VPR优于所有的FPGA布局布线工具，我们可以比较。虽然常用的算法是基于已知的方法，是我们目前而言改善运行时间和质量的几个有效方法。我们目前的版图和路由上的大型电路的一套新的结果，让未来的基准电路尺寸上的设计方法更多，用于今天的典型的FPGA布局布线工具工业品外观设计。VPR是针对一个范围广泛的

2、FPGA架构的能力，并且源代码是公开的。它和相关的网表翻译/群集工具VPACK已经被用在世界各地的一些研究项目，并且是有用的FPGA体系结构的研究。1 简介在FPGA的研究中，人们通常必须评估新结构特色的实用工具而做评估实验。也就是说评估基准电路技术映射，放置和FPGA的布线结构上的关系和措施的架构质量，如运算速度或区域，然后可以很容易地提取出来。因此，有相当大的对于灵活CAD工具的需求，这样才可以针对各种架构的FPGA做高效的设计，从而便于比较均匀的设计架构。本文介绍了通用的地点和路线（VPR）工具，设计很灵活，足够让许多FPGA架构的比较VPR可以执行的位置，要么全球路由或合并后的全球详细

3、路由。这是公开的http:/www.eecg.toronto.edu/jayar/软件。为了使FPGA体系结构的比较有意义，它是至关重要的CAD工具用于将每个电路架构，以地图的高品质展现。路由相优于所有的VPR在查看FPGA的路由器方面，任何标准基准测试的结果都可用，并且指出VPR的砂矿和路由器的组合胜过所有出版的FPGA布局和布线工具。本文结构如下：在第2节我们描述了一些VPR功能的FPGA架构和范围与它可能被使用的地方。在第3和第4节，我们描述了布局布线法。在第5节讲述了比较有必要的VPR曲目数量和该电路成功的布线所要求的其他已发表的工具。在第6节得出了我们的结论，并提出一些VPR将来的升

4、级。2 概述VPR图1概括了VPR 的CAD流程。VPR投入到由一个technologymapped 网表和一个文本文件描述了的FPGA架构中。VPR可以放置电路，或一个预先存在的位置，可以读入VPR可以执行或者是全局的路线或合并后的全球/详细的安置途径。VPR的输出由布局、布线和统计组成，评估一项有用的工具FPGA架构，如路由线长，跟踪计数最大净长度。给出一些可指定的建筑结构参数描述文件：逻辑块输入和输出的数量，对每个逻辑块的输入和输出端访问（S）之和逻辑等价性不同的输入和输出引脚（例如，所有对照表输入功能当量），对I /成一行或一列的FPGA适合O引脚数，逻辑块阵列的尺寸（如23

5、30的逻辑块）。此外，如果全球路由要执行，你也可以指定：横向和纵向通道的相对宽度之和在不同区域的FPGA的渠道相对宽度。最后，如果合并后的全球和详细的路由被执行，一个也会进行求值：开关块1架构（即为何路由曲目是相互关联的），曲目号码，每个逻辑块的输入引脚连接（ 1），为逻辑块输出FC值，对I / O口FC值。当前的体系结构描述格式不允许跨越多个领域和多个逻辑块和被列入路由体系结构，但我们目前加入此功能。添加新的路由架构的功能VPR相对容易，因为VPR使用体系结构描述来创建路由资源图。每个路由跟踪和建设中的每一个脚成为在这个图中的节点，图边表示为允许的连接。路由器，图形可视化和统计计算

6、程序都与此路由资源图的工作相关，所以添加新的路由架构功能仅涉及更改的子程序来建设这个图。虽然VPR最初是岛式FPGA的开发2，3，它也可以和以行为为基础的FPGA应用4。 VPR目前没有能力为目标的层次FPGA的5，显然增加一个适当的位置和成本函数设计所需的布线资源图形程序将使其能够解决这些问题。最后，VPR的内置图形允许交互式可视化的布局，路由可用资源和互连的可能途径路由资源。VPACK逻辑块包装程序/网络表翻译VPACK读取一个已经技术映射电路网表格式blif 到LUT和触发器，包装成所需的FPGA逻辑LUT和触发器块，并输出在VPR的网表。 VPACK可以针对逻辑块组成一个LUT，如图2

7、所示，因为这是一种常见的FPGA逻辑元件。 VPACK也针对逻辑块包含几个有用的LUT和几个拖动程序，有或没有共享LUT的输入6。这些“clusterbased”逻辑块类似于最近由Altera FPGA开发的工具类型。3布局算法VPR采用模拟退火算法7。我们已经尝试与几个不同的成本函数联系，发现我们称之为线性挤塞的成本函数提供了一个合理的计算时间，最好的结果8。此成本函数的函数形式就是对所有的求和电路中的网进行计算。对于每一个网，北方新宇和bby指出在其边界框的水平和垂直跨度分别为Q（n）的因数补偿。边界线长度模型中的实际低估所需的布线，就可以看成超过三个终端网，作为建议10。它的价值取决于净

8、N两端号码; Q是对总体1有3个或更少的终端，并慢慢增加了50台网逻辑与上2.79。贾夫常数x（n）、（n）为平均信道容量（在首部）在X和Y方向，分别比较全净边框和成本函数的余量，需要更多的调配路由的领域，FPGA具有窄渠道。本文中的所有结果的得到，是利用FPGA中的所有通道都有相同的原则。在这种情况下，贾夫是一个常数，函数的线性阻塞耗费降低到一个包围盒的成本函数。一个良好的退火算法的必要条件是时间表取得一个合理的高品质的解决方案与模拟退火的计算时间相关联。我们已经开发出一种新的退火附表，导致非常高品质的展示位置，并在其中给出退火参数的自动调节功能，不同的成本和电路尺寸。我们计算在初始温度相同

9、的方式为11。让Nblocks是总数逻辑块加的I / O口电路中的数量。我们首先创建一个随机安置的电路。接下来，我们执行Nblocks移动（成对掉期）的逻辑块或I / O口，并计算出不同的成本，这些Nblocks标准偏差配置。初始温度设定为20倍标准差，确保最初几乎所有的行动是在退火算法范围内被系统接受。正如在12，默认号码的行为在每个温度都有评价。这个默认的数字可以在命令行被取代，从而让不同的CPU时间和填筑质量权衡。减少温度每秒移动数的10倍，例如，加快安置到10倍，并降低了大约只有10的最终填筑质量。当温度是如此之高，几乎任何举动都可以被接受时，我们基本上从一个位置随机移动到另一个位置所

10、改善获得的成本都是小成本。相反，如果动作是很少被接受（因温度当前正处于低位，安置相当高的品质），也有不少改善成本。有了这个动机，我们提出了一个新的温度更新附表，在温度增加的时间花费在一个重要的小区域上，但不是全部动作都被接受。如表1：最后，它表明在12，13，这是可取的Raccept保证作为近似0.44的量有可能被取值。为此，就需要利用Raccept值来控制这个范围限制器。块是小于或等于交汇处的值，Dlimit单位除了在X和Y方向尝试。一个小的Dlimit增加值由Raccept确保这仅仅是块进行交换考虑。而这些“本地交换“往往导致安置成本相对较小的变化，越来越多被接受的可能性增加。最初，Dli

11、mit设置为整个芯片。每当温度降低，Dlimit整个芯片的尺寸为这个结果退火的第一部分，逐渐萎缩退火过程中的中间阶段，并正在为退火低温第1部分最后设计余量，当T退火终止“0.005*成本/ Nnets。该运动的逻辑块总是至少影响到一个网。当温度高于平均净成本的一个单位时，它是不可能接受任何成本增加的调配结果的，所以我们终止了退火。4路由算法VPR的路由器是基于试探谈判的拥塞算法14，8。基本上该算法由最初各条线路的最短路径找到网，无论任何接线段或逻辑块管脚，都可能会导致过度使用。路由器的迭代过程包含顺序抓取行动和重新路由（由最低成本路径中找到）中的每个电路网。对使用路由资源成本的函数，其对

12、资源的任何过度使用都会让当前路由发生事先迭代。通过逐渐增加的多余认购路由资源成本，该算法势力替代路线网，以避免使用超额认购资源，只剩下网最需要一个给定的资源。对于本文的实验结果，我们设置路由器的最大数量迭代为45，如果电路中路由没有成功，一定数目的目录中45迭代就被假定为不可路由通道的宽度。为了避免过于迂回路线以节省CPU时间，我们让一个去净路由最外的3个通道的净终端边界框。一个重要的执行细节值得一提。无论是原探路者算法和Vpr路由器使用的Dijkstra算法（即一个迷宫路由器15），以每个网络连接和AK用线网为依据，路由器调用通道的k - 1次执行所有需要的连接。在第一次调用迷宫路由波从净源

13、扩大，直到它到达任何的K 1值之后。路径从源到接收器作为现在这个网的路由的第一部分。波前的迷宫路由被清空，新波前扩展是从整个网络布线开始发出的。之后的K - 1路由器的迷宫调用净终端将所有k值连接。不幸的是，这种方法需要高扇出网络相当多的CPU时间。高扇出网络通常跨越大部分或所有的FPGA。因此，后者调用迷宫路由器的路由部分作为净源会非常大，它将需要相当长的时间以扩大迷宫路由器波前部分到下一个接收器。幸好，有一个更有效的方法。当达到净水槽值时，加入所有路由资源分部需要连接水槽和目前的局部路由成本为0的波前（即扩展列表）。当前不要空迷宫路由波前，只要保证继续扩大正常。由于增加新的路径路由的部分有

14、一个零成本，由于这项新路径通常相当小迷宫路由器将首先扩大它范围，也需要相对较少的时间来添加此新波，如果整个波前扩展了能实现那么下一个接收器将达到的速度远远超过现在。图3说明了差异图形。5实验结果各种FPGA在本节中使用的参数，总是选择与先前参数有明显对比的那些参数。所得结果在本节获得了逻辑的4输入LUT加上一个触发器组成的块，如图所示在图2。时钟网和时序电路没有递交，因为它通常是路由通过专用FPGA的商业网络中的路由。每个LUT的输入出现在一个逻辑块的一面，而逻辑块输出一般访问底部和右侧，如图4。每个逻辑块的输入或输出连接任何相邻通道（s）（即Fc的=宽）。每根电线段和其他布线连接到三段，而在

15、通道交叉口（即值= 3）和开关箱拓扑是“不相交” 这是因为在0磁道接线段只连接在0磁道的其他布线段。5.1实验结果与输入引脚Doglegs以往大多数FPGA布线结果认为“输入引脚doglegs”是可能。如果输入引脚之间的音轨和它连接接线盒的Fc通过独立的SRAM位控制晶体所组成，为了验证两条轨道上的这些开关通过电气连接的可能性。我们将把这个作为一个输入管脚doglegs。作为商业化的FPGA，实现从一个输入引脚接线盒到多路通道，只有一个轨道可以连接到输入引脚，使用多路复用器而不是独立通过在FPGA中的晶体管布局来保存相当的面积。另外，通常有一个缓冲轨道之间的连接块和它连接多路复用这样做的目的是

16、为了提高速度，同时这也意味着缓冲输入引脚doglegs不能被使用。因此，如果在未来FPGA的路由器测试时没有输入引脚doglegs那么我们必须让输入引脚doglegs和过去的结果公平的比较这样是最好的。在本节中我们比较了所需的最低数目，每一条成功的路径和CAD工具的路由设置。所有的基准circuits.1在表2给出结果，得到了路由Altor 16，制作了一个基于位置的工具min。列出三两步（全球和详细）路由与其它路由器进行合并后的全球和详细的路由。 VPR要求比第二，第三最佳路由器降低10的资源数目，表3列出了音轨需要执行这些标准时数新的CAD工具，同时允许地方和路线的电路的连接。列出所有电路

17、逻辑快的消息清单。 VPR使用少于13资源数目的同时，它将执行合并后的全球和详细的路由，世嘉比用于执行详细路由对AA VPR生成全版图走线。执行安置和全局路由，在试图改善绕线同时需要超过87以上VPR总资源数目。最后，让VPR配置电路而不是强迫它使用Altor内存来减少资源数目的40，这表明VPR的模拟退火算法单元远较Altor最小单元更好。5.2不输入引脚的Doglegs实验比较了VPR与SPLACE / SROUTE工具，不允许输入引脚doglegs的性能。当这两个工具都只能使用路线一，比起SROUTE轨道Altor产生的安置需求VPR减少13，。当然这些工具都支持允许布局和布线的电路，对

18、于SPLACE / SROUTE组合VPR还需要少29资源数目。无论是基于VPR和SPLACE只要是使用模拟退火算法，我们相信VPR单元在一方面优于SPLACE是因为它处理高扇出网络更有效率，让更多的动作进行评估，另一方面是因为它更有效的退火时间表给定的时间。大电路5.3实验结果在第5.1和5.2的54至358的逻辑基准块范围内使用面积计算显然太小，因为这是特殊的FPGA。因此在本节中我们目前的实验结果，20个最大的MCNC基准电路27，它的大小范围从1047到8383逻辑块。我们使用Flowmap 28以技术图每4个LUT和拖动块并为VPACK tocombine拖动块，进入我们的基本逻辑电

19、路块LUT。I / O引脚数每行或列适合设置为2，符合目前的商业化FPGA。每个电路被放置在最小的正方形FPGA可以包含它的路由并且输入引脚doglegs是不允许的。请注意三个基准bigkey，DES和dsip，是padlimited要求在FPGA架构表5比较资源数量的地方，在完全路线电路与全版图范围内所需地点与路线的电路与数字VPR，然后进行详细的路由世嘉23。表5还给出了大小每个逻辑块的数量计算电路。在世嘉列中的条目仿真无法成功，因为世嘉运行路由内存不足。由VPR增加路由产生的全版图航线曲目总数，有超过所需68路线的电路主场由VPR路由完全执行。显然，世嘉处理无法进行。因为路由大电路当输入

20、引脚doglegs是不允许的。为了鼓励其它FPGA研究人员公布的结果，以这些大型路由基准，我们发出以下“FPGA的挑战。”每次验证结果跳动的最好验证先前对这些基准结果公布，我们将每条信息支付1美元给作者（对不起，1元加币。，而不是1美元），由他们来处理如果减少需要跟踪的总数。该技术映射网表，由VPR生成和投放位置的目前最全的跟踪路由在http:/www.eecg.toronto.edu/jayar / software.html。上可以找到6结论和未来工作我们已经提出了一个优于所有这类工具的新的FPGA布局布线工具，它让我们可以进行直接的比较。此外，我们已经提出更大的电路基准测试结果。建立专门

21、用于描述精密学术的FPGA布局布线工具。我们希望下一代的FPGACAD工具将优化这些大型基点，因为他们是一系列密切的问题被映射成今天的FPGA。VPR的主要设计目标之一是保持足够的灵活性，允许工具使用在很多FPGA架构的研究上。我们目前正进行几个VPR改进，才能进一步提高其在FPGA架构的研究能力。在不久的将来VPR将支持缓冲和分段路由结构，我们计划增加定时分析仪和时序驱动的路由。外文原文VPR: A New Packing， Placement and Routing Tool forFPGA Research1Vaughn Betz and Jonathan RoseDepartment

22、of Electrical and Computer Engineering， University of TorontoToronto， ON， Canada M5S 3G4 vaughn， jayareecg.toronto.eduAbstractWe describe the capabilities of and algorithms used in a new FPGA CAD tool，Versatile Place and Route (VPR). In terms of minimizing routing area， VPR outperforms all published

23、 FPGA place and route tools to which we can compare.Although the algorithms used are based on previously known approaches， we present several enhancements that improve run-time and quality. We present placement and routing results on a new set of large circuits to allow future benchmark comparisons

24、of FPGA place and route tools on circuit sizes more typical of todays industrial designs.VPR is capable of targeting a broad range of FPGA architectures， and the source code is publicly available. It and the associated netlist translation /clustering tool VPACK have already been used in a number of

25、research projects worldwide， and should be useful in many areas of FPGA architecture research.1 IntroductionIn FPGA research， one must typically evaluate the utility of new architectural features experimentally. That is， benchmark circuits are technology mapped， placed and routed onto the FPGA archi

26、tectures of interest， and measures of the architectures quality， such as speed or area， can then readily be extracted. Accordingly， there is considerable need for flexible CAD tools that can target a wide variety of FPGA architectures efficiently， and hence allow fair comparisons of the architecture

27、s.This paper describes the Versatile Place and Route (VPR) tool， which has been designed to be flexible enough to allow comparison of many different FPGA architectures.VPR can perform placement and either global routing or combined global anddetailed routing. It is publicly available from http:/www.

28、eecg.toronto.edu/jayar/software.html.In order to make meaningful FPGA architecture comparisons， it is essential that the CAD tools used to map circuits into each architecture are of high quality. The routing phase of VPR outperforms all previously published FPGA routers for which standard benchmarks

29、 results are available， and that the combination of VPRs placer and router outperforms all published combinations of FPGA placement and routing tools.2 The organization of this paper is as follows. In Section 2 we describe some of the features of VPR and the range of FPGA architectures with which it

30、 may be used. In Sections 3 and 4 we describe the placement and routing algorithms. In Section 5， we compare the number of tracks required by VPR to successfully route circuits with that required by other published tools. In Section 6 we conclude and outline some future enhancements which will be ma

31、de to VPR.2 Overview of VPRFigure 1 outlines the VPR CAD flow. The inputs to VPR consist of a technologymapped netlist and a text file describing the FPGA architecture. VPR can place the circuit， or a pre-existing placement can be read in. VPR can then perform either a global route or a combined glo

32、bal/detailed route of the placement. VPRs output consists of the placement and routing， as well as statistics useful in assessing the utility of an FPGA architecture， such as routed wirelength， track count， and maximum net length.Some of the architectural parameters that can be specified in the arch

33、itecture description file are: the number of logic block inputs and outputs， the side(s) of the logic block from which each input and output is accessible， the logical equivalence between various input and output pins (e.g. all LUTinputs are functionally equivalent)， the number of I/O pads that fit

34、into one row or one column of the FPGA， and the dimensions of the logic block array (e.g. 23 x 30 logic blocks).In addition， if global routing is to be performed， one can also specify: the relative widths of horizontal and vertical channels， and the relative widths of the channels in different regio

35、ns of the FPGA.Finally， if combined global and detailed routing is to be performed， one also specifies: the switch block 1 architecture (i.e. how the routing tracks are interconnected)， the number of tracks to which each logic block input pin connects (Fc 1)， the Fc value for logic block outputs， an

36、d the Fc value for I/O pads.The current architecture description format does not allow segments that span more than one logic block to be included in the routing architecture， but we are presently adding this feature. Adding new routing architecture features to VPR is relatively easy， since VPR uses

37、 the architecture description to create a routing resource graph.Every routing track and every pin in the architecture becomes a node in this graph， and the graph edges represent the allowable connections. The router， graphics visualiza-tion and statistics computation routines all work only with thi

38、s routing resource graph，so adding new routing architecture features only involves changing the subroutines that build this graph.Although VPR was initially developed for island-style FPGAs 2， 3， it can also be used with row-based FPGAs 4. VPR is not currently capable of targeting hierarchical FPGAs

39、 5， although adding an appropriate placement cost function and the required routing resource graph building routines would allow it to target them.Finally， VPRs built-in graphics allow interactive visualization of the placement，the routing， the available routing resources and the possible ways of in

40、terconnecting the routing resources. 2.1 The VPACK Logic Block Packer / Netlist TranslatorVPACK reads in a blif format netlist of a circuit that has been technology-mapped to LUTs and flip-flops， packs the LUTs and flip flops into the desired FPGA logic block， and outputs a netlist in VPRs netlist f

41、ormat. VPACK can target a logic block consisting of one LUT and one FF， as shown in Figure 2， as this is a common FPGA logic element. VPACK is also capable of targeting logic blocks that contain several LUTs and several flip flops， with or without shared LUT inputs 6. These “clusterbased”logic block

42、s are similar to those employed in recent FPGAs by Altera， Xilinx and Lucent Technologies.2 Placement AlgorithmVPR uses the simulated annealing algorithm 7 for placement. We have experimented with several different cost functions， and found that what we call a linear congestion cost function provide

43、s the best results in a reasonable computation time 8.The functional form of this cost function is where the summation is over all the nets in the circuit. For each net， bbx and bby denote the horizontal and vertical spans of its bounding box， respectively. The q(n)factor compensates for the fact th

44、at the bounding box wire length model underestimatesthe wiring necessary to connect nets with more than three terminals， as suggested in 10. Its value depends on the number of terminals of net n; q is 1 for nets with 3 or fewer terminals， and slowly increases to 2.79 for nets with 50 terminals.Cav，x

45、(n) and Cav，y(n) are the average channel capacities (in tracks) in the x and y directions， respectively， over the bounding box of net n.This cost function penalizes placements which require more routing in areas of the FPGA that have narrower channels. All the results in this paper， however， are obt

46、ained with FPGAs in which all channels have the same capacity. In this case Cav is a constant and the linear congestion cost function reduces to a bounding box cost function.A good annealing schedule is essential to obtain high-quality solutions in a reasonable computation time with simulated anneal

47、ing. We have developed a new annealing schedule which leads to very high-quality placements， and in which the annealing parameters automatically adjust to different cost functions and circuit sizes. We compute the initial temperature in a manner similar to 11. Let Nblocks be the total number of logi

48、c blocks plus the number of I/O pads in a circuit. We first create a random placement of the circuit. Next we perform Nblocks moves (pairwise swaps) of logic blocks or I/O pads， and compute the standard deviation of the cost of these Nblocks different configurations. The initial temperature is set to 20 times this standard deviation，ensuring that initially virtually any move is accepted at the start of the anneal.As in 12， the default number of moves evaluated at each temperature is. This default number can be overridd

展开阅读全文