i have problem that on CPU it gives right results. but if run my code on GPU it return random results.
i pinpoint that problem is in this function. event_vector array did not contain proper values.
you can find whole code here http://orwell.fiit.stuba.sk/git?p=cellula.git;a=blob;f=kernels/asyn.cl;h=95f69b0872a356b035bc6200851cd7da1bdc3713;hb=da36c88bdc666c12c8fe5131a6c59ca8e48eae30
#define NEIGHBOR 8 #define QUEUE_LEN 8 #define COORD(c) (c.y*get_global_size(0) + c.x) typedef float _time; typedef float _state; typedef struct { _time x; _state y; }_event; typedef struct { int start,end; int start2,end2; _event state; _event queue[QUEUE_LEN]; }_cell; void create_event_vector(__global _cell *cells, _event *event_vector, int2 *coord, int *index) { for(int i=0;i<NEIGHBOR;i++) event_vector = cells[COORD(coord)].queue[index]; }
Hi nou!
I just saw your topic by did't have the time to go trough the code. I did have the same behaviour and it all came down to the following.
My Buffer did have the flag CL_MEM_USE_HOST_PTR. For the CPU it turned out that the CL kernl just used that bit of memory my HostBuffer was defined on. When using the GPU the HostBuffer and the kernel memory are not the same, and I had to query clEnqueueReadBuffer to get the data. That command need's to be called in the blocking way or you need to wait for an event befor using the results.
I hope that helps, but you have possibly already thought of this. As I stated above I haven't had the time to look into your code, yet.
no i do not use any CL_MEM_*_HOST_PTR. i tried move that array into __local. interesting is that now it return another random results. i think this bug in AMD OpenCL.
What is your group and NDrange size? It seems multiple threads are writing to the same place in event_vector array.
but event_vector is private address space. soany overwrite shold not be possible.
global and local work size is both 16x16.
How do you allocate event_vector?
i have in kernel
_event event_vectot[NEIGHBOR];
then i call create_event_vector(); i tried even using local space for this array but withou suces.
thanks for reply Micah. as you suggest i try replace my _event to float2 (in future i need to chose types dynamic in this structures like int,float or int,float,int etc.) but as i replace it now it freeze whole system. i remove that while(1) loop just for sure but it do not work. on CPU it is still working normaly.
so i realy looking forward to next release.
reqd_work_group_size() solve freezing. but wrong result remains. i tried it on nVidia. and then i will be angry. on AMD because it have buggy OpenCL or on myself that i can not discover bug in my SW.
today i tried my code on nVidia card. and it works after some modifications.
nVidia do not like this. __constant int2 neighbors[] = {-1,1, 1,1, -1,-1, 1,-1, 0,1, -1,0, 1,0, 0,-1};
it refuses compile wirk some strange build error. so i switch from int2 to int. then it work.
that did't work either.so maybe nvidia bug.
Micah can't you send me new SDK on email? i write thesis and i have only two weeks left.
ok tried my code with SDK 2.1 and it works.
Hi
3d mark 11 consequences part appear to demonstrate my GPU core trimmest speed wrongly .Do this mean mean that 3d mark 11 is by my GPU at that know clock pace ,or is it just being display wrongly ?My card are x3 xFX 6970 in crossfire and the supply clock speed be supposed clock speed supposed to be 880MHZ and in reality i'have overclocked theme to 997mhz.
Hi faruk,
Can you please provide the system confiuration you are running. CPU, GPU, SDK, Driver.
also furnish some steps, on what all needs to be installed( in what order) in order to reproduce this issue.
Thanks
workitem7