cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

viscocoa
Adept I

Difference in accessing global memory with (int2*) and (int*)?

I just used a whole day to fix a small problem. My kernel is conceptually like below:

__kernel

void myKernel(__global int2* data)

{

     do something, and get two int values a and b

     data.x = something;

     data.y = something_else;

}

The above code works well on CPUs. On HD5870, in very rare and random cases, one of the two writes to global memory does not work. The data to be written is lost, and the value in data[] is not changed.

I then changed the code to:

__kernel

void myKernel(__global int* data)

{

     do something, and get two int values a and b

     data[a<<1] = something;

     data[(a<<1)+1] = something_else;

}

The above code work well on HD5870!

-------------------------------------------------------------------------

It looks like writing to the two components of an int2 simultaneously (maybe from two compute units) causes a conflict, and one of them will overwrite the other. However, writing to int is independent.

I wonder if this is a bug of the OpenCL implementation.

The above logic is used in many circumstances like radix sort. Hope it is helpful for those who has the same problem.

0 Likes
1 Solution

Yes it does permit it,but what he is saying the whole problem with your first code is that it does allow permit same writes across multiple threads. I am assuming because it is a vector type, the compiler loads the entire vector from data (both x and y components), modifies the .x component (but leaves the .y value unchanged) then write...

I assume the problem is inconsistant across platforms not because of the serial nature of the CPU (it is atleast parallel across multiple cores right? so you could have the same problem?), but rather how the compiler handles loading/storing of vector types on different devices- on the GPU it does not attempt to address a single individual component, but deals with it writes on vector scale, and x,y&z components simply reflect register organisation. On the CPU data.x is translated to a single 32bit int load/store).

I say this because how else could you explain race to write only conditions (with no dependent reads) resulting in the data not being modified by any of the competing threads?

View solution in original post

0 Likes
4 Replies

This is a bug in your code and not in OpenCL as you have a race condition in your writes.

The reason it works on the CPU is that each work-item is run sequentially, on the GPU they are run in parallel.

Hi Micah,

Thank you for your answer. Does OpenCL not allow to write to two components of the same vector from two threads?

How about local memory? Can I do this in a Kernel?

__local int2 localData[1024];

localData.x = num0;

localData.y = num1;

Supposing different threads may write to the same vector?

Vis Cocoa

0 Likes

Yes it does permit it,but what he is saying the whole problem with your first code is that it does allow permit same writes across multiple threads. I am assuming because it is a vector type, the compiler loads the entire vector from data (both x and y components), modifies the .x component (but leaves the .y value unchanged) then write...

I assume the problem is inconsistant across platforms not because of the serial nature of the CPU (it is atleast parallel across multiple cores right? so you could have the same problem?), but rather how the compiler handles loading/storing of vector types on different devices- on the GPU it does not attempt to address a single individual component, but deals with it writes on vector scale, and x,y&z components simply reflect register organisation. On the CPU data.x is translated to a single 32bit int load/store).

I say this because how else could you explain race to write only conditions (with no dependent reads) resulting in the data not being modified by any of the competing threads?

0 Likes

Thank you antzrhere, I agree with you!

0 Likes