Mastering FPGA Optimization: High-Speed Data Movement & AI Acceleration on AMD Versal SoCs
March 20, 2025
Sponsored Blog

To implement adaptive systems with stringent throughput and latency requirements, engineers have access to AMD Versal Adaptive SoCs. Versal provides programmable logic with DSP slices, various hard processing elements, AI engines and a network-on-chip to move data fast and efficiently.
At Fidus Systems, we specialize in FPGA and ASIC design, verification and validation, for highly complex embedded systems. As an example, Fidus successfully implemented a complex algorithm for radio signal classification with a stringent throughput requirement, heavily leveraging Versal ‘s AI-engines. This post explores essential techniques for optimizing data transfer, AI Engine acceleration, and dynamic function exchange (DFX) to unlock the full potential of AMD Versal SoCs.
Optimizing Data Movement with Versal’s NoC
Data movement bottlenecks are one of the biggest challenges in high-speed computing. AMD Versal Adaptive SoCs integrate a Network-on-Chip (NoC), a high-speed interconnect fabric designed to efficiently route data between the User’s hard or fabric-based compute units, memory, and I/Os. The NoC provides:
- Transfer speeds up to 1 GHz on a 128-bit data bus.
- Multiple levels of Quality of Service (QoS) support for bandwidth prioritization.
- Optimized connectivity between PL, AI Engines, and DDR/HBM memory.
At Fidus, we used NoC bandwidth prioritization techniques to optimize a real-time 8K video processing system. By configuring isochronous QoS settings, we ensured that the 48 Gbps video stream operated with minimal latency.
Accelerating AI Workloads: The MUSIC Algorithm
MUSIC (Multiple Signal Classification) is a high-resolution algorithm for estimating the direction of arrival of signals from a single-snapshot array measurements (e.g., obtained from elements of an antenna array). MUSIC has a wide range of applicability, including radar/sonar systems and smart antennas. Other information gathering methods, such as doppler and ranging, can be used alongside MUSIC to add to the collective intelligence.
At Fidus, we leveraged Versal’s AI Engine (AIE) and compute-power to implement a MUSIC algorithm operating on a single-snapshot of data. The samples were arranged in a 128 x 8 matrix of complex data using a sample measurement rate of 128 MHz that enabled us to meet our strict 1ms processing requirement.
Fidus AIE implementation of MUSIC achieved the target throughput requirement using the following architectural choices:
- Loop-enrolling and deep-pipelining of the operations of MUSIC across 157 AI compute tiles that operate in parallel on different snapshots of data.
- Data movement between the pipeline stages through local working memory data-sharing.
- Optimization of AI Engine kernels to minimize redundant computations.
Dynamic Function Exchange (DFX): Real-Time Reconfiguration
A key advantage of Versal SoCs is their ability to support Dynamic Function Exchange (DFX), which allows certain regions of the FPGA to be reconfigured without disrupting the entire system.
At Fidus, we have successfully applied DFX to load different AI models dynamically, optimizing hardware resource usage in adaptive AI applications. This is particularly valuable in radar and video analytics, where workloads can shift in real-time.
FPGA Optimization for Next-Gen Performance
Achieving optimal performance on AMD Versal Adaptive SoCs requires a multi-layered optimization approach, including:
- High-speed NoC tuning for efficient data movement.
- AI Engine acceleration to maximize compute efficiency.
- Dynamic Function Exchange (DFX) to enable real-time adaptability.
At Fidus, we are at the forefront of FPGA and AI Engine innovation, setting new benchmarks for performance. If you are working on high-performance FPGA designs, let’s talk. Book a design consultation with us to explore how we can help optimize your next project.
Read the whitepaper: Mastering FPGA Design with AMD Zynq® UltraScale+™: The Engineer's Guide
Special thanks to Bachir Berkane, System and Algorithm Architect and Peifang Zhou, Senior Embedded Software Designer, Fidus Systems for their valuable contributions to this article. Scott Turnbull is the Chief Technology Officer (CTO) at Fidus Systems, driving technical innovation and guiding the company’s strategic direction in embedded system design. Fidus Systems is a leading provider of high-performance FPGA, AI, and embedded system design solutions. As AMD’s Partner of the Year, Fidus specializes in complex hardware acceleration, AI engine optimization, and high-speed data processing. With over 25 years of expertise, we help companies innovate faster, optimize performance, and bring cutting-edge technology to market. Learn more at www.fidus.com or book a design consultation with our experts today.