Task-Level Parallel Programming Model for HLS

amd_adaptivecomputing · ‎11-28-2022

This article was originally published on November 21, 2022.

Editor’s Note: This content is contributed by Sriranjani Ramasubramanian, Product Marketing Manager for Vitis HLS.

The Vitis™ HLS 2022.2 release offers a new way to write “task-level parallel (TLP)” code.

Introducing Task-Level Parallelism

A program written in C/C++ is executed sequentially on the CPU. To achieve high-performance hardware, the HLS tool must infer parallelism from sequential code and exploit it to achieve greater performance. Incorporating TLP improves throughput and enables more efficient FPGA utilization.

Ways to achieve Task-Level Parallelism

There are two ways to use task-level parallelism (TLP) to structure and design your application. One way is to use data-driven task-level parallelism. The other way is to use control-driven Task-level parallelism. The two methods can be used together or independently depending on the user application.

Data-Driven Task-Level Parallelism

Users can use this style for applications that do not require any interaction with external memory, and the functions can execute in parallel with no data dependencies. Implementing the data-driven TLP in the Vitis HLS tool uses simple classes for modeling tasks (hls::task) and channels (hls::stream/hls::stream_of_blocks).

Figure 1: Vitis HLS Graphical Representation of the Data-Driven TLP

Data-driven TLP supports feedback between processes and enables simulation concurrency for both C simulation and RTL co-simulation. Thus, deadlock detection support is available from both C and RTL simulations.

Control-Driven Task-Level Parallelism

Users can use this style for applications that require some interaction with external memory and have data dependencies between the tasks that execute in parallel. In this model, Vitis HLS will infer parallelism wherever possible while preserving the behavior obtained from the original C/C++ sequential code and allow the subsequent function to start before the previous one finishes. Thereby two or more sequential functions can be started simultaneously. The control-driven TLP is implemented by specifying the DATAFLOW pragma in the code.

Figure 2: Vitis HLS Graphical Representation of the Control-Driven TLP

By using this method, the task level parallelism can be visualized only during RTL co-simulation; thus, deadlock detection support is only available from RTL co-simulation.

In summary, data-driven TLP is recommended if your design is a pure streaming design, a design requiring feedback, or data dependent design. The control-driven TLP is recommended for designs with sequential semantics, designs that need control to start and stop, and designs that require non-local memory access.

Next Steps

Refer to the simple data-driven example design on GitHub to learn more about the data-driven task-level parallelism

Refer to the control-driven examples on GitHub to learn more about control-driven task-level parallelism.

Learn more about task level parallelism, read the user guide UG1399.