Archives Discussions

billyc59 · ‎04-27-2015

Over the course of developing my application, I have run out of single-dimension work items, and I now need to utilize higher dimensions.

My program uses pipes extensively to keep the calculated data on the GPU for as long as possible. However, when I verify the commumications in my pipes (using work group functions), the reserve IDs are not valid

So my questions are:
1) Am I wrong to assume that work groups do not span across dimensions?
2) Can pipes act across dimensions? (through the linearization of indexes, or by simply not using work-group pipe functions)

3) Failing that, is my only option to use the host as a buffer?

dipak · ‎04-30-2015

Please find my comments below.

What about multiple work groups operating on the same pipe?

Yes, they can. However, some kind of explicit synchronization/ordering needs to performed to get the data in orderly manner. That's why its easier to work with a single work-group. Any inter work-group synchronization is itself challenging in opencl.

My OpenCL 2.0-compatible card tells me I have a max work group size of 256 work items...I would like to have as much data remain on the card as I can, to reduce the host-gpu data transfer bottlenecks.

As normally resource usage depends on work-group size, that's why, sometimes smaller group size may give better performance specially in terms of occupancy. however, there is no fixed rule to decide the optimal work-group size. Profiling is helpful in this regard.

Don't know your exact usage, however, it's possible that multiple kernels can operate on same global device memory without being transferred to host at all. One can launch a series of kernels which will operate on the same global memory in orderly manner to produce the final result.

Does the index parameter on "read/write_pipe" only serve to index the reads/writes in a particular reserved-space? Is it possible to globally index which sections of the Pipe I can write to?

Yes, the index is only valid for that reserve id. As per the spec, following behaviour is undefined:

"a kernel calls read_pipe or write_pipe with an valid reservation ID but with an index that is not a value from 0 … num_packets – 1 specified to the corresponding call to reserve_pipe."

As of now, there is no global level reservation is supported. Only work-item and work-group levels are supported.

Your last question is not clear to me. Could you please explain little bit more?

Regards,

View solution in original post

dipak · ‎04-28-2015

Hi billyc59,

Sorry, I'm unable to follow your questions. Could you please provide a little bit more context, especially the following statement?

However, when I verify the commumications in my pipes (using work group functions), the reserve IDs are not valid

As you know, pipes can be thought as a one-dimensional FIFO buffer which can be used to communicate between two kernels. Once a region of a pipe has been reserved (at a work-item or work-group level) for reading/writing , the region can be accessed via an index similar as any 1-D linear array. So, one can easily map any 2-D and higher dimensional work-item id to 1-D index to access the reserve area.

Regards,

billyc59 · ‎04-29-2015

Thanks for the reply. I realize my question was poorly worded, partly due to my confusion on the technology. I was using work-group functions to read and write to the pipe, but after realizing that my data sizes exceed the work-group limit, I need to come up with another way to read/write to the pipe. I have tried using every work item to reserve a single space on the pipe. Below is a sample pseudocode of what I tried to attempt:

Producer kernel
reserve_id_t write_id = reserve_write_pipe(pipe, 1);
if(is_valid_reserve_write_id(write_id)){

     write_pipe(pipe, write_id, global_id, &write_array[global_id]);
     commit_write_pipe(pipe, write_id);

}

Consumer kernel
reserve_id_t read_id = reserve_read_pipe(pipe, 1);
if(is_valid_reserve_id(read_id)){
     read_pipe(pipe, read_id, global_id, &read_array[global_id]);
     commit_read_pipe(pipe, read_id);
}

Though the reads are valid (read pipe returning 0), I do not get the same data coming out of the read end.
2 questions:
1) Is there anything wrong with my thinking?

2) Is it possible to get ordered pipe data in this manner? Eg. data written by work item 0 in the producer goes to work item 0 in the consumer kernel. My workaround is currently to have a single work-item in the kernel do all of the pipe-based communications. This is inelegant, but it works, short of abandoning pipes altogether, or staying in the work-group range.

dipak · ‎04-29-2015

Pipe acts as a FIFO buffer; anything written first is popped out first unless any ordering is done using reservation mechanism. So, if you access a pipe at work-item level without maintaining any indexing, you would not get the expected order. Because the order of individual work-item level reading/writing is not guaranteed. In this case, you need some other kind of synchronization for that.

I guess, work group level reservation mechanism is a good choice for what you're trying to achieve. The flow may look like this (considering a single work-group):

1) reserve enough space for all the work-items in a group

2) read/write at work-item level using the index based on the local id

3) commit at work group level

E.g.

rid = work_group_reserve_<read/write>_pipe(myPipe, size);

<read/write>_pipe(myPipe, rid, get_local_id(0), &data);

work_group_commit_<read/write>_pipe(myPipe, rid);

In this way, i-th work-item of the consumer kernel will read the data written by i-th work-item the producer kernel. You may refer "PipeProducerConsumerKernels" sample in APP SDK 3.0 beta.

As you mentioned,

I was using work-group functions to read and write to the pipe, but after realizing that my data sizes exceed the work-group limit,

Its difficult to comment without knowing what type of resource problem you're facing. However, to reduce the resource consumption, you may consider lower work group size or may perform multiple iteration of smaller number of work-items to complete the full work-group.

Regards,

billyc59 · ‎04-29-2015

What about multiple work groups operating on the same pipe?
My OpenCL 2.0-compatible card tells me I have a max work group size of 256 work items. Streaming the data to-and-from the host and GPU is not ideal. For my operations, there are many interim calculation results, I would like to have as much data remain on the card as I can, to reduce the host-gpu data transfer bottlenecks.

Does the index parameter on "read/write_pipe" only serve to index the reads/writes in a particular reserved-space? Is it possible to globally index which sections of the Pipe I can write to?

My MO is that I'm trying different methods to bypass the single work-group limit, by using multiple work-groups, that will operate on successive sections of my large data.
If this isn't possible, please let me know.

dipak · ‎04-30-2015

Please find my comments below.

What about multiple work groups operating on the same pipe?

Yes, they can. However, some kind of explicit synchronization/ordering needs to performed to get the data in orderly manner. That's why its easier to work with a single work-group. Any inter work-group synchronization is itself challenging in opencl.

My OpenCL 2.0-compatible card tells me I have a max work group size of 256 work items...I would like to have as much data remain on the card as I can, to reduce the host-gpu data transfer bottlenecks.

As normally resource usage depends on work-group size, that's why, sometimes smaller group size may give better performance specially in terms of occupancy. however, there is no fixed rule to decide the optimal work-group size. Profiling is helpful in this regard.

Don't know your exact usage, however, it's possible that multiple kernels can operate on same global device memory without being transferred to host at all. One can launch a series of kernels which will operate on the same global memory in orderly manner to produce the final result.

Does the index parameter on "read/write_pipe" only serve to index the reads/writes in a particular reserved-space? Is it possible to globally index which sections of the Pipe I can write to?

Yes, the index is only valid for that reserve id. As per the spec, following behaviour is undefined:

"a kernel calls read_pipe or write_pipe with an valid reservation ID but with an index that is not a value from 0 … num_packets – 1 specified to the corresponding call to reserve_pipe."

As of now, there is no global level reservation is supported. Only work-item and work-group levels are supported.

Your last question is not clear to me. Could you please explain little bit more?

Regards,

billyc59 · ‎05-01-2015

The last question wasn't really a question, so much as it was a way for me to communicate my current mindset and goal, to better help you answer my questions.
I current problems have been resolved. Thank you for your help!

Archives Discussions

OpenCL Pipes: multi-dimensional usage