cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dukeleto
Adept I

is this a bug with clenqueueReadBufferRect?

Hello,

the attached code tests clEnqueueReadBufferRect by extracting

a 2D and a 3D sub-block from a cl_mem array.

It does not behave as I expect it to on AMD hardware, outputting:

2D rectangle selected is as follows

25 26 27

28 29 30

3D rectangle selected is as follows

5 6 7

8 9 10

11 12 13

On NVIDIA hardware, the output is the one I expect.

Could someone please confirm this?

Regards,

Olivier

0 Likes
8 Replies
dukeleto
Adept I

Re: is this a bug with clenqueueReadBufferRect?

I get similar symptoms with both clEnqueueWriteBufferRect() and clEnqueueCopyBufferRect;

if it is a bug rather than me misunderstanding the way the functions are supposed to work, it

probably affects all three xxxRect() functions.

Olivier

0 Likes
nou
Exemplar

Re: is this a bug with clenqueueReadBufferRect?

I compiled it and output on CPU device is like this which is IMHO correct one.

2D rectangle selected is as follows

25 26 27

29 30 31

3D rectangle selected is as follows

5 6 7

13 14 15

21 22 23

dukeleto
Adept I

Re: is this a bug with clenqueueReadBufferRect?

Thanks for testing, Nou. I tested this on ubuntu 13.10 with g++ 4.8.1 and latest AMD driver;

might it be a compiler problem? I'll try on monday with an older gcc; if it doesn't come from

the compiler/driver I am at a loss to understand why with the same code you get the correct

answer and I get a wrong one!

Thanks again

Olivier

0 Likes
nou
Exemplar

Re: is this a bug with clenqueueReadBufferRect?

Try switch to CPU device first CL_DEVICE_TYPE_CPU. IIRC someone already reported some problems with Rect functions.

dukeleto
Adept I

Re: is this a bug with clenqueueReadBufferRect?

Following your suggestion, I tried with CL_DEVICE_TYPE_CPU, and indeed it worked correctly in this case. Can someone

from AMD perhaps confirm that this is a bug for the GPU case?

Thanks again, Nou!

Olivier

0 Likes
dipak
Staff
Staff

Re: is this a bug with clenqueueReadBufferRect?

Hi Olivier,

Thanks for reporting the issue.

As per this old thread Using clEnqueueReadBufferRect to read a sub-matrix, there was a problem in old driver, but I suppose it was fixed in later versions. You can follow the German's suggestion to check whether it works for you or not . Meanwhile I'll try to reproduce the same at our end and let you know our findings. Please let me know your driver version, SDK version and hardware spec.

Regards,

0 Likes
dukeleto
Adept I

Re: is this a bug with clenqueueReadBufferRect?

Thanks Dipak,

I had indeed not come across that post while searching.

The suggestion by German does indeed work, but appears slower, and as

this is in a performance-critical loop for me, I guess I'll have to wait for the

driver fix.

My machine is running ubuntu 13.10 with fglrx 14.10.2 installed, using a 6GB Sapphire 7970.

Thanks,

Olivier

0 Likes
jason
Adept III

Re: is this a bug with clenqueueReadBufferRect?

To recap existing threads having this problem:

Re: clEnqueueReadBufferRect/clEnqueueWriteBufferRect are broken in 14.12 driver

clEnqueueWriteBufferRect does not work when region width is not equal to src pitch: broken again in ...

Using clEnqueueReadBufferRect to read a sub-matrix

is this a bug with clenqueueReadBufferRect?

How is this still a bug. CL_MEM_ALLOC_HOST_PTR also works to work around it on GPU devices without the user explicitly allocating memory but as expected this is not usable and is 10x slower at least than non host memory.  In other words somebody can use that as a drop in flag based test to see if that resolves the problems.

I get only about 1/8th of an image readback with normal buffers.  I think the write's work.  I don't know about copies.

I am baffled at how this standard (from OCL 1.0) and simple function(ality) is continuously fixed and broken.

Not having this part of the runtime working in it's oldness is incredibly hard for my simple brain to reconcile.  Makes me see green as I lost 6 hours to this yesterday thinking it was a problem with my code.

Dipak Bhattacharyya Jim Trudeau

Add the following unit test to your stack prior to issuing releases:

Perform transfer with normal buffer (who's stride is >= than the data requires - ie sizeof(T) * cols)

Perform transfer again with CL_MEM_ALLOC_HOST_PTR

Compare buffers.  If different, this is broken again.  This needs to be tracked since it's a frequent point of breakage.

0 Likes