cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Create an Adjacency List Graph Representation in OpenCL

I want to know how is the adjacency list graph representation created in OpenCL. The Problem is that dynamic memory allocation can't be done in the kernel code(as far as I know). It can be done only in the host code. But if the nodes are to be created dynamically in the code and they are to take the benefit of parallel programming( making them in the host code will make the code sequential), how are they made in OpenCL program. Is there a way to make them in the kernel code..?  and if not in the kernel code then how to make them in the host code, saving the parallel benefits of OpenCL..?

0 Likes
3 Replies
himanshu_gautam
Grandmaster

Dynamic Memory Allocation is not allowed on kernel side.

For a workaround,

You can either try to allocate a big enough memory chunk at the first time itself.

or

I can suggest to break the algorithm into a set of kernels. Now execute a kernel, to determine the total memory that needs to be allocated. Pass this information (probably an int variable telling the memory requirements in bytes), and then create a cl_buffer object of that size. As you only want this memory to be created and not initialized by host code, try using PERSISTENT_MEM_AMD flag, which should create the memory section on the GPU directly. Check out section 4.6 of OpenCL Programming guide for details.

0 Likes

I have an Nvidia Gpu , will PERSISTENT_MEM_AMD work in my gpu..?

Also in my algorithm I can't precalculate the value of the array that is going to be dynamically allocated...! So how do I do the allocation and the processing on it..?

0 Likes

PERISTENT_MEM_AMD is a AMD Platform Specific flag - You can take advantage of it in AMD platforms.

It wont work on NVIDIA GPUs.I think your code will not even compile.

If you cannot precalculate, you only have to allocate a chunk and divide the chunk among workgroups. When each workgroup has hit its limit, it has to write out the calculated values to its chunk along with meta-information on where it left-off the computation. A subsequent kernel should read it back and resume computation....This has to happen like a pipeline with memory chunks being allocated for each iteration (and probably freed (or) re-used subsequently)

If you describe the algorithm in more details, I can help you out.

0 Likes