PRFloor: An Automatic Floorplanner for Partially Reconfigurable FPGA Systems

Similar documents
Power Consumption Model for Partial and Dynamic Reconfiguration

CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER

Partial Reconfigurable Implementation of IEEE802.11g OFDM

AUTOMATIC IMPLEMENTATION OF FIR FILTERS ON FIELD PROGRAMMABLE GATE ARRAYS

Chapter 3 Chip Planning

Single Chip FPGA Based Realization of Arbitrary Waveform Generator using Rademacher and Walsh Functions

Managing dynamic reconfiguration on MIMO Decoder

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

DYNAMICALLY RECONFIGURABLE PWM CONTROLLER FOR THREE PHASE VOLTAGE SOURCE INVERTERS. In this Chapter the SPWM and SVPWM controllers are designed and

Lecture 1. Tinoosh Mohsenin

Physical Synthesis of Bus Matrix for High Bandwidth Low Power On-chip Communications

On-silicon Instrumentation

REALIZATION OF FPGA BASED Q-FORMAT ARITHMETIC LOGIC UNIT FOR POWER ELECTRONIC CONVERTER APPLICATIONS

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

Heterogeneous Concurrent Error Detection (hced) Based on Output Anticipation

Analysis of Parallel Prefix Adders

Self-Aware Adaptation in FPGAbased

Modified Design of High Speed Baugh Wooley Multiplier

IJCSIET-- International Journal of Computer Science information and Engg., Technologies ISSN

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Design and implementation of LDPC decoder using time domain-ams processing

Thermal Characterization and Optimization in Platform FPGAs

SPIRO SOLUTIONS PVT LTD

An Optimized Design for Parallel MAC based on Radix-4 MBA

ISSN Vol.07,Issue.08, July-2015, Pages:

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

Power consumption reduction in a SDR based wireless communication system using partial reconfigurable FPGA

Gambit: A Tool for the Simultaneous Placement and Detailed Routing of Gate-Arrays

CS 6135 VLSI Physical Design Automation Fall 2003

A Dual-V DD Low Power FPGA Architecture

Interconnect testing of FPGA

EECS 427 Lecture 21: Design for Test (DFT) Reminders

CHAPTER III THE FPGA IMPLEMENTATION OF PULSE WIDTH MODULATION

EE 434 ASIC & Digital Systems

VLSI System Testing. Outline

A Self-Reconfigurable Implementation of the JPEG Encoder

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

Application Specific Networks-on-Chip Synthesis: An Energy Efficient Approach

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

FPGA Adders: Performance Evaluation and Optimal Design

CprE 583 Reconfigurable Computing

Efficient Multi-Operand Adders in VLSI Technology

Field Programmable Gate Arrays based Design, Implementation and Delay Study of Braun s Multipliers

Design of 16-bit Heterogeneous Adder Architectures Using Different Homogeneous Adders

ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver

CCO Commun. Comb. Optim.

A SCALABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

Energy Efficient Memory Design using Low Voltage Complementary Metal Oxide Semiconductor on 28nm FPGA

Blockage and Voltage Island-Aware Dual-VDD Buffered Tree Construction

On Built-In Self-Test for Adders

OQPSK COGNITIVE MODULATOR FULLY FPGA-IMPLEMENTED VIA DYNAMIC PARTIAL RECONFIGURATION AND RAPID PROTOTYPING TOOLS

Design and Implementation of Hybrid Parallel Prefix Adder

ISSN Vol.03,Issue.02, February-2014, Pages:

PROGRAMMABLE ASIC INTERCONNECT

A-B NODES CLASSIFICATION FOR POWER ESTIMATION. Elías Todorovich and Eduardo Boemo *

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

Advanced FPGA Design. Tinoosh Mohsenin CMPE 491/691 Spring 2012

Original Research Articles

Methodologies for Tolerating Cell and Interconnect Faults in FPGAs

FPGA Implementation of Adaptive Noise Canceller

A New Enhanced SPFD Rewiring Algorithm

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

Partial Reconfiguration on FPGAs in Practice. Tools and Applications

Design and Estimation of delay, power and area for Parallel prefix adders

JDT EFFECTIVE METHOD FOR IMPLEMENTATION OF WALLACE TREE MULTIPLIER USING FAST ADDERS

Design of an optimized multiplier based on approximation logic

Evaluation of FPGA Design and Implementation of Improved Systolic Architectures for Variable Length Median Filters

Noise Constraint Driven Placement for Mixed Signal Designs. William Kao and Wenkung Chu October 20, 2003 CAS IEEE SCV Meeting

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

FPGA Implementation of High Speed FIR Filters and less power consumption structure

Hardware-Software Co-Design Cosynthesis and Partitioning

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

Reconfigurable Hardware Implementation and Analysis of Mesh Routing for the Matrix Step of the Number Field Sieve Factorization

NOWADAYS, many Digital Signal Processing (DSP) applications,

SQRT CSLA with Less Delay and Reduced Area Using FPGA

Hotspots Elimination and Temperature Flattening in VLSI Circuits

Mapping Multiplexers onto Hard Multipliers in FPGAs

Reconfigurable Hardware Implementation and Analysis of Mesh Routing for the Matrix Step of the Number Field Sieve Factorization

Nonlinear Equalization Processor IC for Wideband Receivers and

High Speed IIR Notch Filter Using Pipelined Technique

Faster and Low Power Twin Precision Multiplier

Design and Implementation of High Speed Carry Select Adder

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

A New Quaternary FPGA Based on a Voltage-mode Multi-valued Circuit

Zhan Chen and Israel Koren. University of Massachusetts, Amherst, MA 01003, USA. Abstract

A High Definition Motion JPEG Encoder Based on Epuma Platform

Fine-Grained Architecture in Dark Silicon Era for SRAM-Based Reconfigurable Devices

DIGITAL SIGNAL PROCESSING WITH VHDL

A 0.9 V Low-power 16-bit DSP Based on a Top-down Design Methodology

DESIGN OF A HIGH SPEED MULTIPLIER BY USING ANCIENT VEDIC MATHEMATICS APPROACH FOR DIGITAL ARITHMETIC

FPGA Implementation of Self Tuned Fuzzy Controller Hand off Mechanism

Design Methodologies. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Routing ( Introduction to Computer-Aided Design) School of EECS Seoul National University

POWER OPTIMIZED DATAPATH UNITS OF HYBRID EMBEDDED CORE ARCHITECTURE USING CLOCK GATING TECHNIQUE

Transcription:

PRFloor: An Automatic Floorplanner for Partially Reconfigurable FPGA Systems Tuan D. A. Nguyen (1) & Akash Kumar (2) (1) ECE Department, National University of Singapore, Singapore (2) Chair of Processor Design, Center for Advancing Electronics Dresden, TU Dresden, Germany

Partial Reconfiguration (PR) 2

Xilinx ISE PR Design Flow Design using Xilinx XPS or ISE Generate then Import netlists into PlanAhead Determine sizes and locations (placements) for PR Regions Run Place and Route Generate bitstreams 3

Xilinx ISE PR Design Flow Design using Xilinx XPS or ISE Generate then Import netlists into PlanAhead Determine sizes and locations (placements) for PR Regions Run Place and Route Generate bitstreams 4

Floorplanning 5

Problem? 6

Problem? 8 PRRs 15 PRRs 7

So? 8

PRFloor Design using Xilinx XPS or ISE Generate then Import netlists into PlanAhead Determine sizes and locations (placements) for PR Regions Execute PRFloor Run Place and Route Generate bitstreams 9

Common Issue of Previous Works Only consider PR regions (PRRs) [Rabozzi14, Duhem13, Vipin12, Bolchini11, Montone11, Montone08] 10

Common Issue 1. PRR 1 and 2 are too far away [Rabozzi14] M. Rabozzi, J. Lillis, and M. D. Santambrogio. Floorplanning for Partially-Reconfigurable FPGA Systems via Mixed-Integer Linear Programming. In Field- Programmable Custom Computing Machines, Annual International Symposium on, pages 186 193. IEEE, 2014. 11

Common Issue 2. There is not enough DSP left for static module [Rabozzi14] M. Rabozzi, J. Lillis, and M. D. Santambrogio. Floorplanning for Partially-Reconfigurable FPGA Systems via Mixed-Integer Linear Programming. In Field- Programmable Custom Computing Machines, Annual International Symposium on, pages 186 193. IEEE, 2014. 12

GOAHEAD [Beckhoff13] [Beckhoff13] C. Beckhoff, D. Koch, and J. Torreson. Automatic floorplanning and interface synthesis of island style reconfigurable systems with GOAHEAD. In Architecture of Computing Systems ARCS 2013, pages 303 316. Springer, 2013. 13

Another issue There are so many (static + PR) modules in MPSoC, up to hundreds in total 14

Recursive Cut-size Driven Netlist Bi-partitioning [Yan10] J. Z. Yan and C. Chu. DeFer: deferred decision making enabled fixed-outline floorplanning algorithm. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 29(3):367 381, 2010. [Lim08] S. K. Lim. Practical problems in VLSI physical design automation. Springer, 2008. [Cong06] J. Cong, M. Romesis, and J. R. Shinnerl. Fast floorplanning by look-ahead enabled recursive bipartitioning. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on,25(9):1719 1732, 2006 15

Bipartitioning in FPGA 16

PRFloor - Overview Find all possible placements for modules on FPGA Use NLP-based bipartitioner to scatter the modules across the FPGA Each module is assigned a preferred location called anchor point The modules and theirs placements are heuristically filtered and sorted Find the feasible combination of the placements 17

PRFloor - Overview Find all possible placements for modules on FPGA Use NLP-based bipartitioner to scatter the modules across the FPGA Each module is assigned a preferred location called anchor point The modules and theirs placements are heuristically filtered and sorted Find the feasible combination of the placements 18

PRFloor - Overview Find all possible placements for modules on FPGA Use NLP-based bipartitioner to scatter the modules across the FPGA Each module is assigned a preferred location called anchor point The modules and theirs placements are heuristically filtered and sorted Find the feasible combination of the placements 19

Recursive Pseudo-bipartitioning Heuristic Anchor point 20

Non-linear Integer Program (NLP) The number of nets between 2 modules that cross 2 partitions The total number of crossing-nets between all modules The total number of CLBs occupied by the modules in each partition should not exceed the available CLB in that partition Balance the number of CLBs occupied in two partitions [NLP] D. Li and X. Sun. Nonlinear integer programming, volume 84. Springer Science & Business Media, 2006. [Gurobi] Gurobi Optimization version 6.0.2. http://www.gurobi.com, April, 2015. 21

PRFloor - Overview Find all possible placements for modules on FPGA Scatter the modules across the FPGA surface as uniformly as possible. Each module is assigned a preferred location called anchor point The modules and theirs placements are heuristically filtered and sorted Find the feasible combination of the placements 22

PRFloor - Overview Find all possible placements for modules on FPGA Scatter the modules across the FPGA surface as uniformly as possible. Each module is assigned a preferred location called anchor point The modules and theirs placements are heuristically filtered and sorted Find the feasible combination of the placements 23

Experiments: Synthetic Systems System No. Mod %CLB %BRAM %DSP 3 PRRs 99 65% à 85% (41% à 60%) 8 PRRs 116 65% à 85% (36% à 56%) 15 PRRs 130 65% à 87.8% (34% à 57%) 24 PRRs 126 65% à 85% (33% à 52%) 42% à 60% (9% à 26%) 28% à 31% (16% à 19%) 45% à 53% (27% à 34%) 45% à 60% (21% à 36%) 6% à 13% (4% à 11%) 14.5% à 15.1% (11.1% à 11.7%) 25% à 28% (22% à 25%) 23% à 32% (22% à 31%) 24

Execution Time Increases almost linearly with the number of modules 25

Experiments: Real systems Instantiate PR-HMPSoC [Nguyen14] with varying number of PRRs (3 to 8) Compare the maximum achievable clock frequency with the comparable static system [Nguyen14] Nguyen, T.D.A.; Kumar, A., "PR-HMPSoC: A versatile partially reconfigurable heterogeneous Multiprocessor System-on-Chip for dynamic FPGA-based embedded systems," in Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, vol., no., pp.1-6, 2-4 Sept. 2014 26

PR Systems vs. Static Systems The maximum clock frequency results obtained from PR systems are not worse than the static ones. 27

Compare with [Rabozzi14] PRR 1 and 2 are too far away Wastage is 19% lower Total Manhattan distances is 35% smaller [Rabozzi14] M. Rabozzi, J. Lillis, and M. D. Santambrogio. Floorplanning for Partially-Reconfigurable FPGA Systems via Mixed-Integer Linear Programming. In Field- Programmable Custom Computing Machines, Annual International Symposium on, pages 186 193. IEEE, 2014. 28

Compare with [Rabozzi14] For static module There is not enough DSP left for static module There is sufficient DSP resources for static module [Rabozzi14] M. Rabozzi, J. Lillis, and M. D. Santambrogio. Floorplanning for Partially-Reconfigurable FPGA Systems via Mixed-Integer Linear Programming. In Field- Programmable Custom Computing Machines, Annual International Symposium on, pages 186 193. IEEE, 2014. 29

Conclusion The automatic floorplanner, PRFloor, is presented with the NLP-based bipartitioner PRFloor can provide high quality result in couple of minutes 30

Future Work Improve the quality and performance Control the designer choices over wire-length or wastage better Accelerate the first step of finding placements for modules Support bitstream relocation [Oomen15] [Oomen15] Oomen, R.; Tuan Nguyen; Kumar, A.; Corporaal, H., "An automated technique to generate relocatable partial bitstreams for Xilinx FPGAs," in Field Programmable Logic and Applications (FPL), 2015 25th International Conference on, vol., no., pp.1-4, 2-4 Sept. 2015 31

Demo 32

Thank you! 33

Appendix 34

Large PR MPSoC [Nguyen14] [Gohringer11] [Nguyen14] Nguyen, T.D.A.; Kumar, A., "PR-HMPSoC: A versatile partially reconfigurable heterogeneous Multiprocessor System-on-Chip for dynamic FPGA-based embedded systems," in Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, vol., no., pp.1-6, 2-4 Sept. 2014 [Gohringer11] D. Gohringer, M. Hübner, E. N. Zeutebouo, and J. Becker, Operating system for runtime reconfigurable multiprocessor systems, International Journal of Reconfigurable Computing, vol. 2011, p. 3, 2011 35

FPGA Model 36

Why half-column granularity? 37

Pareto Ranking 38

Sort the placements OBJ placement =α wastage+ β dist_to_anchor 39

PRFloor - Overview Build FPGA model Create ROOT partition Find all possible placements for all modules Do recursive pseudo-verticalcut for ROOT Do recursive pseudohorizontal-cut for ROOT Calculate the normalized wastages and distances Select placement candidates Sort the placements of each module Sort the modules in decreasing order of resource Find possible combination Success? No. Move the first vertical cut-line to the right YES! DONE! 40

Recursive Pseudo-bipartitioning Heuristic 41

Estimate occupied resources x : arithmetic mean x : median σ x : standard deviation 42

Bipartitioner The available resources in two partitions can be different The resources occupied by the possible placements of one module in two partitions can be different Each type of resource occupied by modules in two partitions can be balanced individually 43

Quality of the NLP Bipartitioner [Hmetis98] G. Karypis and V. Kumar. hmetis: A hypergraph partitioning package, version 1.5. 3. 1998. [Metis13] G. Karypis and V. Kumar. Metis - a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices version 5.1.0. 4, March 2013. 44

Execution Time - Breakdown The recursive process used to find the floorplan is very fast. It takes only at most 1.2% of the total runtime. In most cases, almost 0. 45

Effect of α and β to Wire-length and Wastage OBJ placement =α wastage+ β dist_to_anchor 46

Resource requirement PRRs compared with [Rabozzi14] [Rabozzi14] M. Rabozzi, J. Lillis, and M. D. Santambrogio. Floorplanning for Partially-Reconfigurable FPGA Systems via Mixed-Integer Linear Programming. In Field- Programmable Custom Computing Machines, Annual International Symposium on, pages 186 193. IEEE, 2014. 47