Accelerating Hadoop MapReduce Framework on F1 and Achieving 10X Speed-Up with Xilinx SDAccel Tool

Cindy_Lee · ‎11-04-2022

This article was originally published on October 26, 2018.

Abhishek Ranjan and Mohit Kumar from BigZetta Systems gave a very interesting presentation about the Apache Hadoop MapReduce framework and how they were able to accelerate it using Amazon Web Services (AWS) F1 instances.

MapReduce Frameworks

MapReduce is at the core of almost all tools in the Hadoop ecosystem and is the preferred paradigm to solve big-data problems: machine learning, distributed databases, analytics, image/video processing, speech recognition, etc. All applications leverage MapReduce. While several MapReduce frameworks exist, Hadoop is the number one choice because of its resilience, scalability, and stability. Companies like Netflix, LinkedIn, Uber, Pinterest, and Facebook rely on Hadoop for their big-data applications. Per Gartner, AWS has sold more Hadoop capacity and hosted more Hadoop instances than all other commercial players combined. Given the above, accelerating Hadoop on AWS seems like a great opportunity for a company like BigZetta.

The two processing steps of Hadoop are map() and reduce(). Between these two steps, the Hadoop framework needs to sort, compress, and merge large quantities of data. BigZetta has analyzed that these steps can take longer than the actual map() and reduce() steps. In other words, the framework itself takes more CPU cycles than the actual computation. So, BigZetta embarked on a mission to accelerate the core functions of the Hadoop framework.

SDAccel to Optimize Data Movement: 10X on Sort and 6X on Merge

BigZetta used SDAccel to write C models of the sort() and merge() functions. They were able to try hundreds of different architectures and configurations in a couple of weeks. This would have been impossible with a traditional RTL development flow.

BigZetta also used the analysis tools built into SDAccel to quickly optimize data movement in their application. They were able to achieve a 10X performance increase on sort and 6X on the merge. Overall, their application runs 2.4X faster end-to-end on the WordCount benchmark.

Their experience with SDAccel was very positive and was key in helping them achieve great results in a short amount of time. Here is a summary.

Excellent documentation of tools as well as very extensive and vast example suite which covers all possible use cases
Tools work precisely as documented. Very stable and harly and QA issues
SDAccel has abstracted synthesis to H/W very nicely and hidden gory details from the end-user
It is very easy to modify an existing example to make ti work for a new application. Very good README and Makefiles
Getting to first version of H/W is very fast and requires minimal understanding of FPGA or H/W or HLS
Once application has been optimized with hw_emu, it works predictably n FPGA

My Take

What impressed me the most, however, is that BigZetta was incorporated in May of 2018 and they have a working product today. This means that in less than 6 months, they were able to transform their initial idea to a working and sellable FPGA product on the AWS marketplace. That’s a radical change from traditional FPGA-based product development cycles. This is a great example of how FPGAs in the cloud is a game changer for many businesses. Innovators no longer need to worry about sourcing FPGAs, designing boards, getting them manufactured, managing inventory, etc. With the cloud, all of this is taken care of automatically. FPGAs are available on-demand and scale at-will. That is truly a game changer.

With AWS, FPGAs are available on-demand. With SDAccel, FPGAs can easily be programmed from C/C++. That’s a winning combination that will allow many more innovators like BigZetta to take advantage of FPGAs in the cloud in the very near future.

To learn more about SDAccel tools, visit: https://github.com/Xilinx/SDAccel-Tutorials

Editor’s Note: This content is contributed by Thomas Bollaert, Sr. Director Product Application Engineering.