This article was originally published on October 23, 2018.
As previously noted in our FLP 2018 recap blog, Xilinx had a very strong presence at FPL. The Xilinx Silicon Architecture team had many people who presented / published papers at FPL. We are excited to share these great papers published at FPL. Check them out!
Chirag Ravishankar, Xilinx Senior Software Engineer II
The first paper was published by Chirag Ravishankar, Xilinx Senior Software Engineer II, Dinesh Gaitonde, Xilinx Distinguished Engineer, and Trevor Bauer, VP of Xilinx Silicon Architecture.
“FPGAs take advantage of 2.5D stacking technology to manufacture large capacity and high performance heterogeneous devices at reasonable costs. EDA tools need to be aware of and exploit physical characteristics of such devices, for example, the reduced connection count between SLRs, the infrequency of SLL channel occurrence in the fabric, and the aspect ratios of individual SLRs. We implement a partition-driven placer to explore various EDA options to take advantage of architectural features in 2.5D FPGAs. We improve the routability of designs by optimizing the placer for discrete SLL channels and reduced connection counts. We propose a cut schedule for the partitioner to orient the placement with awareness of the aspect ratio of SLRs to improve track demands within each SLR. Read more…"
Henri Fraisse, Xilinx Principal Engineer
The second paper was written by Henri Fraisse, Xilinx Principal Engineer, and Dinesh Gaitonde, Xilinx Distinguished Engineer.
“Many FPGA designs contain soft IP tightly connected to hard blocks such as on-chip Processor, PCIe or I/Os. Generally, these soft IPs pose significant timing closure challenges. In this paper, we propose a timing-driven Place and Route flow based on Boolean Satisfiability (SAT). Its main advantages over previous SAT-based approaches are its improved scalability and its timing awareness. We validate our flow using an IP that targets the emulation market. We demonstrate that our flow can significantly improve the usable bandwidth of FPGA I/Os. Since the proposed flow is SAT based, the performance does not depend on specific ways in which more traditional place and route are usually tuned. Read more…"
Throughput Scalability Depending on Precision on PYNQ-Z1
The FINN team showed latest developments in customized neural network acceleration with their Long-Short Term Memory (LSTM) demonstration for optical character recognition, in collaboration with University of Kaiserslautern, Germany. More info can be found at: http://www.pynq.io/ml. You can find the paper here.
Comparing BISMO against Several Recently-Proposed Implementations for Low-Precision Matrix Multiplication, Using Peak Binary Performance and Performance Per Watt as Metrics
Xilinx research lab published, in collaboration with Norwegian University of Science and Technology (NTNU).
“Matrix-matrix multiplication is a key computational kernel for numerous applications in science and engineering, with ample parallelism and data locality that lends itself well to high-performance implementations. Many matrix multiplication-dependent applications can use reduced-precision integer or fixed point representations to increase their performance and energy efficiency while still offering adequate quality of results. However, precision requirements may vary between different application phases or depend on input data, rendering constant-precision solutions ineffective. We present BISMO, a vectorized bitserial matrix multiplication overlay for reconfigurable computing. BISMO utilizes the excellent binary-operation performance of FPGAs to offer a matrix multiplication performance that scales with required precision and parallelism. We characterize the resource usage and performance of BISMO across a range of parameters to build a hardware cost model and demonstrate a peak performance of 6.5 TOPs on the Xilinx PYNQ-Z1 board. Read more…”
Advantage of Reading from On-chip Block RAMs
Xilinx research lab published, in collaboration with The University of Sydney.
“In this paper, we argue that instead of solely focusing on developing efficient architectures to accelerate well-known low-precision CNNs, we should also seek to modify the network to suit the FPGA. We develop a fully automated toolflow that focuses on modifying the network through filter pruning, such that it efficiently utilizes the FPGA hardware whilst satisfying a predefined accuracy threshold. Although fewer weights are removed in comparison to traditional pruning techniques designed for software implementations, the overall model complexity and feature map storage is greatly reduced. We implement the AlexNet and TinyYolo networks on the large-scale ImageNet and PascalVOC datasets, to demonstrate up to roughly 2X speedup in frames per second and 2X reduction in resource requirements over the original network, with equal or improved accuracy. Read more…”
Henri Fraisse, Chirag Ravishankar, Gaurav Singh, Trevor Bauer
Thanks to the Xilinx silicon architecture team for sharing awesome photos and great papers!
Editor’s Note: This content is contributed by Cathal McCabe, Xilinx University Program AE EMEA