计算机专业毕业设计（论文）外文翻译.doc

资源描述

《计算机专业毕业设计（论文）外文翻译.doc》由会员分享，可在线阅读，更多相关《计算机专业毕业设计（论文）外文翻译.doc（26页珍藏版）》请在三一办公上搜索。

1、High Level Design For High Speed FPGA DevicesMan. Ng mcn99Department of ComputingImperial CollegeJune 13, 2002AcknowledgementBefore starting the report, I would like to thank the following people for helping me throughout the project. Without their help, it would be impossible for me to finish the p

2、roject:I would like to thank my supervisor Dr. Wayne Luk for giving me a lot of useful advices and encouragement throughout the project. He also guided me towards the problems I should focus on during the implementation. I would like to thank Professor Yang for letting me to implement his gel-image

3、processing algorithm on hardware. He also gave me references and example sources to understand the theories underneath. And I would like to thank for his teaching in his excellent multimedia course. The course conveyed many useful concepts for me to understand the gel image processing I would like t

4、o thank Altaf and Shay. They are two Ph.D research students who helped me a lot throughout the implementation of the application.AbstractIn the project, I have discovered a systematic approach for high-level hardware design. With this approach, I successfully implemented the sophisticated gel image

5、processing on high speed hardware. In the report, I will also introduced a new technique which can automate the process of high level hardware performance optimization by rearranging the code sequence so that the it can be run at minimum number of clock cycles. The report will be split into 4 Chapte

6、rs:Chapter 1 is Introduction. It includes the background, all the related works and my contribution to the project.Chapter 2 is Optimization. In this chapter, I will focus on the techniques for optimization. I will also demonstrate some techniques which can automate the optimization process.Chapter

7、3 is Hardware Development. In this chapter, I will generalize the steps of converting a software programme into hardware. These include several techniques which can improve the performance or save the hardware resources.Chapter 4 is Case Study : Gel Image Processing. In this chapter, I will use gel

8、image processing as an example to show the effect on resource and performance of the techniques discussed in chapter 2. In this chapter, I will also compare the performance of the application between two devices and the software version: Pilchard and RC1000.Chapter 5 is Conclusion. It includes the a

9、ssessed achievements and expected future works.There is also an online version available for this report, the URL is:http:/www.doc.ic.ac.uk/mcn99/project/report.pdfChapter 1IntroductionSince the emergent of Handel-C 5, a C-like hardware language, a complete high level FPGA design approach is realize

10、d. However, most of developers will stick on the lower-level language such as VHDL when they are aiming to design high performance hardware. It is because developers have greater control on the actual circuit implementation in low-level approach. But low-level design probably will reach its limit wh

11、en FPGA chips grow bigger and bigger. Developers will not be able to develop new application quick enough with low level design which consists of billions of gates. A high-level approach will then be the answer. The purpose of this project is to introduce a systematic way of developing high performa

12、nce hardware under high-level approach.1.1 Background and Related WorksIn this section, I am going to present the materials that are necessary to understand the content of this report.1.1.1 Field Programmable Gate Arrays(FPGAs) 1Like Programmable Logic Devices(PLDs), FPGA is a piece of hardware whic

13、h is programmable. However, while the size of PLDs is limited by power consumption and time delay, FPGA can easily implement designs with million of gates on a single IC. The re-programmable nature of FPGA allows developers implements design with shorter development times and lower cost than an equi

14、valent custom VLSI chips. It worths mentioning that development of FPGA is faster than Moores Law with capacity doubling every year. With millions of gates available on the newest chip, FPGA is an ideal platform to develop reconfigurable system which is capable of execute complicate application at p

15、erformance. Therefore, FPGA is the chip I am developing application for.1.1.2 Pilchard 2Pilchard is a reconfigurable computing platform employing a field programmable gate array(FPGA) which plugs into a standared personal computers 133MHz synchronous dynamic RAM Dual In-line Memory Modules(DIMMS)slo

16、t. Comparing to traditional FPGA devices which are utilizing the PCI nterface, Pilchard allows data to be transferred to and from the host computer in much shorter time, due to the higher bandwidth as well as the lower latency of the DIMM interface. However, as DIMMS is not originally designed for I

17、nput/Output(I/O), extra control signals will be needed for Pilchard to indicate the start and the end of data processing. As a result, high level behavioral design approach is preferable to low level structural design approach for developing applications for Pilchard. Thats proves why it is vital to

18、 have a systematic way of high level development for high performance FPGA.1.1.3 RC1000 3RC1000 is a 32-bit PCI card designed for reconfigurable computing applications. It has full board support package in Handel-C with libraries which ease the circuit design for this device. It also features 4 SRAM

19、 banks(2Mbytes each) on the board which can be accessed by the FPGA or host CPU. The board can be configured to be run between 4000KHz to 100MHz. This device is very different from Pilchard in many aspects. In the report, I will show that the development steps introduced in this project is general a

20、nd can be applicable to application development on different devices.1.1.4 VHDL 4VHDL is one of the first high-level languages emerged in the market for designing applications with programmable logic devices. VHDL provides high-level language constructs that enable designers to describe large circui

21、ts and bring products to market rapidly. It supports the creation of design libraries in which to store components for reuse in subsequent designs.Because it is a standard language (IEEE standard 1076), VHDL provides portability of code between synthesis and simulation tools, as well as device-indep

22、endent design. It also facilitates converting a design from a programmable logic to an ASIC implementation. The disadvantage of this language is it is not completely high level, the language still expects user to know the hardware behaviors of the components. Therefore, I decided to use another even

23、 higher level hardware language, i.e. Handel-C.1.1.5 Handel-C 5Handel-C is a high level C-like programming language designed for compiling program into hardware images of FPGAs or ASICs. Handel-C provides some extra features which are not appeared in C to support few hardware optimizations. One of t

24、hose is the language supports specifying the width of each signal so that just optimization can be achieved by targeting the exact resources needed by Handel-C compilers. Handel-C compilers target hardware directly by mapping the program into hardware at the netlist level in xnf or edif format. The

25、advantage of Handel-C over VHDL is that it doesnt expect users to know too much about the hardware in low level which VHDL does. It is a completely high level language! Figure 1.1 shows the design flows I will adopted in converting Handel-C program to hardware. Although several tools are involved in

26、 different steps, but users wont need to worry about the hardware detail. Because what users need to do is just clicking several buttons to launch the program for converting the file into next step, it is as simply as that.1.1.6 Extending the Handel-C language 7Dong U Lee, a Ph.D students, has inven

27、ted a language which supports both hardware and software. His approach is to combine both C and Handel-C language. In the language, user can specify which part is done by software and which part is done by hardware. In the project, he also developed an more friendly interface for communication betwe

28、en the host and the FPGA device. However, the number of devices currently supported by this language is limited. Thats why I finally gave up on using this language.1.2 Contribution I have developed an easy but efficient optimization method which can rearrange code so that it can be run in minimum of

29、 cycles. I have developed a systematic design flow for high level hardware design target for high speed devices I have implemented the complicated 2D gel image processing on hardwareChapter 2OptimizationIn this chapter, I am going to discuss various methods to optimization the high level code. Optim

30、ization is the main part which we try to exploit and utilize parallelism to achieve speed up which PC software normally is not able to do so because of the limited resources of CPU. The main focus of this chapter will be on how to automate these optimizations processes. We will also discuss some eva

31、luation equation so to measure the speedup we can achieve after optimization.2.1 Performance OptimizationThis is exploiting the potential parallelism of the program and then run different non-conflicting operations at the same clock cycle to acquire speed up. In normal applications, tens of even hun

32、dreds of operations which run sequentially on PCs CPU may be able to run in parallel. However, PC cant run them in parallel because of the limited hardware resource. But by designing specific hardware to run as many operations as possible in parallel, significant speed up can obtain. This is the mai

33、n reason why application on FPGA can sometimes run faster than the corresponding software version even though FPGA hardware run at much slower clock speed(of course we also need to take account of the CPI, but even then PC CPU can still do the same individual operation much faster). There are severa

34、l techniques we could apply:2.1.1 Balance The Delay Of Each PathBalancing the delay of each pat is important because the hardware clock speed will at most be the same as the path with longest delay. Therefore, if the delay of one particular path is much later than the others, then it means we have w

35、asted resource as other paths is capable of running at much higher speed. By balancing the delay, it can make sure that the the 5 parallel optimization will be optimal in later stage. The delay of a path can be defined as:Tdelay = Tlogic + Trouting (2.1)where Tdelay is the total delay of the pathTlo

36、gic is the delay due to logicTrouting is the delay due to routingTherefore, reducing the delay is done by reducing one of the Tlogic or Trouting or both. There are 2 main steps to achieve this: Break up complex operation into simple operations; Use components with pre-defined placement and routing c

37、onstraintsBreaking Up Complex OperationFirst of all, the simplest step to do is to break the complicated operation into several simpler operations. This step effiectively reduce the logic in each operation thus reduce Tlogic in equation 2.1. In software program, the effect of complicated operation n

38、ormally will run as quick or even quicker than the simpler operations with the same result as the compiler will optimize the instructions execution for us. However, for hardware, a complex operation mean it needs a longer clock to finish, while other simpler operations need not take that long to fin

39、ish. Figure 2.1 shows an example of breaking up a complex operation into simple operations. In this example, we can see that sometimes extra registers are needed to store intermediate result of the calculation to make the operation simple enough.Predefined Placement and Routing ComponentIf the resul

40、t of the first method is not satisfactory enough or the timing constraints still isnt met, we can then use a timing analyzer provided by the FPGA chip developer to find out the longest paths. For example, we can use Timing Analyzer coming along with Xilinx ISE Foundation 4 for timing analysis of Xil

41、inx FPGA chips. After finding out the longest path, we will then know which operation run too slow. At this time, we could try 2 methods to increase the speed of this operation. The first is try to write a constraint file ourselves which specify explicitly the placement and routing of this component

42、. This could enhance the timing as the tools which automatic do the placement and routing is normally not very smart.Then we will include this constraint file before the placement and routing process. However, this approach require the developer to have fair amount of knowledge on the FPGA chip they

43、 are using. This includes knowledge on the primitives component supported by the chip and the relative placement and routing of these primitives which can achieve the minimum delay. The second approach is easier, it is to use the macro of the predefined placement and routing components defined by th

44、e chip developer. Put it simpler, the chip developer has done the job for you, and we should just use it! For Xilinx, they have a program called Core Generator which does exactly what I mentioned. How to include these components in Handel-C program is specified in Handel-C menu. However, users must

45、know that the timing of output and input of these components requires extra care. Because of the language limitation of Handel-C. The input signal will always arrive one cycle late into the component. This step will reduce Trouting of equation 2.1 because by nicely placing the logic blocks the routi

46、ng of the signal will be much shorter thus the delay due to routing will significantly reduced.Possibility of Automating This ProcessAbove, we have discussed 2 methods to achieve this step. This step is difficult to be automated, because it is difficult to define what is complex operations and what

47、isnt. This depends on the device and chip we use as well as the functionality of the program . For device which need a very restricted timing constraint. A 16 bits multiplication may be considered as complex while for some other device which cannot run at high speed, it may be a waste to break up th

48、e operations into very simple one which can run at very high speed as the device wont be run at that speed anyway.However, we can borrow the idea from Lee again to make automation of this process possible. While compiling the source, we can include information about the device we are using as well a

49、s the timing constraint. The library of the device will include the delay of each logic unit. It will also include some information on how the FPGA development tools will route the signals. Then the compiler will be able to approximate the Tlogic and Trouting thus Tdelay of each path. The result will then be compared with the timing constraints specified. If violation is detected, the compiler will use the 2 methods mentioned above to ba

展开阅读全文