# Pipelined Floating Point FFT IP Core (FFT\_PIPE) September 24, 2008 **Product Specification** ## Dillon Engineering, Inc. 4974 Lincoln Drive Edina, MN USA, 55436 Phone: 952.836.2413 Fax: 952.927.6514 E-mail: info@dilloneng.com www.dilloneng.com ## **Features** - Radix-2 Fast Fourier Transform (FFT) with pipelined butterfly rank structure - IEEE-754 Floating Point data - Uses Xilinx Coregen math operators - Customizable precision, speed, and size - Any width fixed-point builds also available - Run-time selectable length N=32 to 2<sup>m</sup>, (m=5-16) - See Table 1 for example 2<sup>m</sup> max length builds - Length up to N=64K (m=16) possible in Virtex-5 SX240 - Run-time selectable Forward/Inverse transform mode - Continuous processing at speeds up to Fmax (see Table 1). - Data rate of 250MSamples/sec in Virtex-5. - · Natural-order inputs and outputs - Includes C/C++ bit-accurate model and data generator - Model also usable from MATLAB - Includes Verilog testbench and run scripts # Table 1: Example Implementation Statistics for Xilinx® FPGAs | Family | Example Device | FFT<br>Length | Fmax<br>(MHz) | Slice FF¹ | Slice LUT <sup>1</sup> | IOB <sup>2</sup> | GCLK | BRAM | MULT/<br>DSP48/E | DCM &<br>MGT | Design<br>Tools | |-------------|----------------|---------------|---------------|-----------|------------------------|------------------|------|------|------------------|--------------|--------------------------| | Spartan®-3A | XC3SD3400A-5 | 256 | 150 | 23,867 | 30,143 | 137 | 1 | 16 | 96 | N/A | ISE® 10.1.02 | | Virtex®-4 | XC4VSX55-12 | 1024 | 200 | 21,585 | 26,079 | 137 | 1 | 26 | 352 | N/A | ISE <sup>®</sup> 10.1.02 | | Virtex®-5 | XC5VSX50T-3 | 1024 | 250 | 20,562 | 20,333 | 137 | 1 | 19 | 176 | N/A | ISE <sup>®</sup> 10.1.02 | | Virtex®-5 | XC5VSX95T-2 | 16,384 | 200 | 27,799 | 29,163 | 137 | 1 | 109 | 256 | N/A | ISE <sup>®</sup> 10.1.02 | #### Notes: - 1) Actual slice count dependent on percentage of unrelated logic see Mapping Report File for details - 2) Assuming all core I/Os and clocks are routed off-chip. | Core Facts | | | | | |----------------------------------------------|-----------------------------------------|--|--|--| | Provided with Core | | | | | | Documentation | User Guide | | | | | Design File Formats | ISE Project with EDIF/NGC netlist, | | | | | | Verilog source available for extra cost | | | | | Constraints Files | .ucf constraints | | | | | Verification | Verilog Testbench, Test Vectors | | | | | Instantiation Templates | Verilog | | | | | Reference Designs & | None | | | | | Application Notes | | | | | | Additional Items | C/C++ Model | | | | | Simulation Tool Used | | | | | | Aldec Riviera 2008.06 | | | | | | Support | | | | | | Support Provided by Dillon Engineering, Inc. | | | | | Figure 1: Pipelined FFT Block Diagram ## **General Description** Dillon Engineering's Pipelined Floating Point FFT IP Core uses a modular radix-2 Fast Fourier Transform (FFT) architecture to provide discrete transforms on data frames or continuous data streams, with sample rate up to the maximum clock frequency. This efficient structure employs a single butterfly and a single delay feedback path per rank for low localized memory usage. True IEEE-754 floating point data maintained throughout, supporting a large dynamic range of data without requiring complicated fixed-point analysis. The standard Pipelined IP Core is easily scalable to any Xilinx device and customizable to suit many FFT applications. # **Functional Description** #### **Frame** The Frame blocks use control signaling to delimit discrete data frames per the selected transform length. #### **Bit-Reverse** The Bit-Reverse block converts natural-order inputs to bit-reversed order as required by the FFT engine. ## **Pipelined Ranks** The Pipelined Rank blocks daisy-chain the FFT processing from input to output. Each rank is optimized to contain the proper radix-2 butterfly math elements, twiddle factor ROMs, and local datapath memories for efficient continuous processing. ## Variable Length Select The Variable Length Select block multiplexes the rank outputs for variable transform length support. # **Applications** The Pipelined Floating Point FFT IP Core is useful in High Performance Embedded Computing (HPEC) applications which require continuous Digital Signal Processing (DSP) at high sample rates. Floating point FFT hardware acceleration or co-processing is often a goal of scientific algorithms used in High Performance Computing (HPC). End applications and markets include radar, sonar, spectral analysis, telecommunications and image processing. September 24, 2008 2 ## **Core Modifications** The standard IP Core is available in netlist or parameterized source code and supports the following: - Netlist builds for any Xilinx device. FFT length and speed depend on chip resources and speed grade. - Per-transform length selectable in powers-of-2 from 32 to 2<sup>m</sup> points, where m=5-16. - Per-transform mode selectable between Forward and Inverse FFT. - Static length and mode configuration. Pipeline must be clear before changing these configuration settings. - IEEE-754 single precision floating point math operators using Xilinx Coregen full DSP usage/maximum latency floating point v4 0 cores. - Decimation-in-time (DIT) algorithm with internal bit-reversal, providing natural-order data inputs and outputs. Potential customized deliveries from Dillon Engineering include: - Fixed single length of 2<sup>m</sup> for a slight logic savings over run-time selectable length. - Fixed Forward or Inverse mode for a slight logic savings over run-time selectable mode. - Pipelined configuration settings. Allows dynamic mode and/or length switching on back-to-back transforms. - Bit-reversal stage removed for a slight logic savings and elimination of a BlockRAM FIFO and associated latency. (Note data must then be input in bit-reversed order to provide natural-order outputs.) - Decimation-in-frequency (DIF) build option, which inputs data in natural-order and outputs data in bit-reversed order. - Any Xilinx Floating Point operator adjustments to precision and latencies, with logic parameter settings to match. Xilinx Floating Point operators are built separately with Coregen, providing RTL source and .ngc netlists. Thus all trade-offs between speed, number of pipeline stages, DSP48/Mult macro usage, double- or custom-precision float, etc., can be supported. - Any width fixed-point math operators in lieu of floating point. Options for various scaling, rounding and saturation modes, all matched bit-accurate with the C/C++-model. Contact Dillon Engineering for more details. # Core I/O Signals The core signal I/O have not been fixed to specific device pins to provide flexibility for interfacing with user logic. Descriptions of all signal I/O are provided in Table 2. Table 2: Core I/O Signals. | Signal | Signal<br>Direction | Description | |--------|---------------------|--------------------------------------------------------------------| | CLK | Input | Clock Input. Single source used for all I/O and internal clocking. | | RST_N | Input | Active-low asynchronous reset. Resets all control logic. | | DIR | Input | Transform mode select. 0 = Forward FFT, 1 = Inverse FFT. | | |----------|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | SEL[3:0] | Input | Transform length select. Valid range is from 4'd5 (indicating transform length of 32) | | | | | to the maximum length supported by the build (e.g. 4'd10 for a transform length of 1024). Number of SEL bits is dependent on the maximum length. | | | SYNC_IN | Input | Input sync strobe. Indicates to the core to begin processing i_data on the following clock cycle. | | | A[63:0] | Input | Input data. Complex data of the form R + iQ, where R is contained in bits 63:32 and Q is contained in bits 31:0, each a single-precision floating point number. | | | SYNC_OUT | Output | Output sync strobe. Indicates the core is sending processed o_data beginning on the following clock cycle. | | | X[63:0] | Output | Output data. Complex data of the form R + iQ, where R is contained in bits 63:32 and Q is contained in bits 31:0, each a single-precision floating point number. | | ## **Critical Signal Descriptions** All interface and internal operation of the core is synchronous to CLK. Simple SYNC strobes are used on the input and output interfaces to signal that data is valid on the following clock cycle. An active SYNC coinciding with the last data point thus indicates back-to-back transforms. A SYNC\_IN strobe active while the core is already inputing data is ignored. Tying SYNC\_IN active will signal the core to perform continuous transforms, and SYNC\_OUT will strobe as normal to frame the output data. Figure 2: Interface Input Timing, 1K-Length Back-to-Back Transforms Figure 3: Interface Output Timing, 1K-Length Back-to-Back Transforms The DIR and SEL configuration inputs are by default selectable per-transform, but must be stable starting with SYNC\_IN active and must not be changed until the transformed data has been completely output from the core (i.e. 2<sup>m</sup> clocks after the corresponding SYNC\_OUT). ## **Core Assumptions** Following SYNC\_IN, the initial transform has a start-up latency dependent on the bit-reversal stage, the floating point core latencies and the length of the transform. The core provides continuous processing at steady state, though the SYNC IN to OUT latencies may vary slightly due to internal pipeline alignment. September 24, 2008 4 The standard core with transform length of 1024 has a start-up latency of around 2300 clock cycles, or 9.2usec at 250MHz clock rate. Latencies of other lengths 2<sup>m</sup> follow *approximately* the formula: Latency (in clock cycles) = $(2 \times 2^m) + (m \times ((2 \times fp\_add\_delay) + fp\_mult\_delay))$ ## **Verification Methods** The core is verified to be bit-accurate with the C/C++ data model under all supported lengths, modes, throughputs and data format, using a rigorous simulation suite of directed and random data. Our model development is evaluated in terms of SQNR with a double-precision floating point software FFT implementation. Dillon Engineering's FFT IP Cores have been proven over the years in many Xilinx designs. # **Ordering Information** This product is available directly from Dillon Engineering, Inc. Please contact Dillon Engineering for pricing and additional information about this product using the contact information on the front page of this datasheet. Visit www.dilloneng.com/fft ip to see all of Dillon Engineering's FFT IP offerings, including: - UltraLong FFTs (up to 64M points, fixed or floating point) - Parallel Butterfly FFTs (continuous FFTs at multiple points per clock cycle) - Full Parallel FFTs (extremely fast rates, up to 25GSamples/sec) - 2D FFTs (Two-dimensional transform for image processing) - Mixed Radix FFTs (for non-power of 2 FFT lengths) #### **Related Information** ### Xilinx Programmable Logic For information on Xilinx programmable logic or development system software, contact your local Xilinx sales office, or: Xilinx, Inc. 2100 Logic Drive San Jose, CA 95124 Phone: +1 408-559-7778 Fax: +1 408-559-7114 URL: <a href="www.xilinx.com">www.xilinx.com</a>