UltraLong FFT
Optimal solutions for FFT lengths that exceed the internal memory budget of FPGA or ASIC.
As FFT lengths increase, FPGA and ASIC memory usage becomes more pervasive than logic usage. When the memory usage exceeds the on-chip memory capacity, a solution is to split the FFT algorithm processing and store intermediate results in external memory. Here an "UltraLong FFT" is thus defined as an FFT length that exceeds the internal memory budget of the target device, necessitating data movement to external memory.
As FFT lengths increase, FPGA and ASIC memory usage becomes more pervasive than logic usage. When the memory usage exceeds the on-chip memory capacity, a solution is to split the FFT algorithm processing and store intermediate results in external memory. Here an "UltraLong FFT" is thus defined as an FFT length that exceeds the internal memory budget of the target device, necessitating data movement to external memory.
UltraLong FFT Algorithm
The UltraLong FFT algorithm splits an N-length transform into separate N1 and N2 FFT lengths (N=N1xN2), combined with three transpose operations in external memory and an additional rotation/twiddle multiply stage.
For highest continuous data throughput, three separate banks of memory and separate N1 and N2 FFT cores are utilized (as shown in the block diagram below). For lower performance applications and/or to save logic resources, one or two banks of memory may be used instead of three, and a single FFT core can be shared for both N1 and N2 FFTs.
For highest continuous data throughput, three separate banks of memory and separate N1 and N2 FFT cores are utilized (as shown in the block diagram below). For lower performance applications and/or to save logic resources, one or two banks of memory may be used instead of three, and a single FFT core can be shared for both N1 and N2 FFTs.
Block Diagram
Block diagram with external memory used for storage during the three required data transposes.
Versatile IP
At Dillon Engineering, we have provided our customers with efficient UltraLong FFT solutions, taking into consideration a number of design decisions, including:
- FFT engines with variable lengths and robust scaling options
- Fixed versus floating point
- Continuous processing
- Resource sharing of memory banks and FFT cores
- Overlapping data sets
- External SRAM (QDR, DDR, ZBT) or DRAM (SDR, DDR, DDR2) with on-chip caching where required
- Rotation stage via custom CORDIC
Performance
The performance of an UltraLong FFT is typically limited by the bandwidth of the external memory. Each UltraLong FFT IP Core delivered by Dillon Engineering is configured to obtain maximum performance based upon the external memory architecture available. In today's memory technologies, QDR SRAM provides the highest continuous throughput, and DDR SDRAM provides the longest possible FFT lengths.
Architecture Specific Implementations
More Information
See a recent UltraLong FFT IP Success and other news stories.
For details on UltraLong FFT algorithm processing, see our HPEC 2004 Presentation on the subject.
For details on UltraLong FFT algorithm processing, see our HPEC 2004 Presentation on the subject.
Device Fit Estimate or Additional Information
Fill out the FFT IP Fit/Information Form to obtain a device usage estimate in your target technology or to obtain additional information about a specific FFT architecture.