cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Alexium
Adept I

Passing structure array into kernel - problem

I almost lost my mind trying to figure this problem out. I realize the forum is not for fixing every newbie's bug, but I think my problem is really weird, and I've read all available manuals, including OpenCL 1.1 spec. If anyone has any idea on what the problem is - please advise.

The code is attached, problem described in comments. GPU is RV770, but I suppose that's not the problem here.

 

typedef struct { float4 _pos; float4 _accel; float4 _velocity; float4 _force; float _invMass; } PARTICLE; kernel void integrate(int n, const global PARTICLE* world, global PARTICLE * Out) { uint gid = get_global_id(0); //Works fine PARTICLE p = world[gid]; for (uint k = 0; k < n; ++k) { if (gid != k) { //Returns zeros in all structure fields PARTICLE kP = world; } } Out[gid] = p; }

0 Likes
12 Replies
DTop
Staff

If you comment out for loop, whether it works fine as expected?

0 Likes

...

0 Likes

kP is set but never used.  Is there something missing here?

Jeff

0 Likes

Originally posted by: jeff_golds kP is set but never used.  Is there something missing here?

Jeff



Yes, it is actually used, I've only shown the statement that doesn't seem to work.

0 Likes

Originally posted by: DTop If you comment out for loop, whether it works fine as expected?

 



Good question. I've reduced the kernel to the code below. I suppose it should copy the 5th element of the input stream into every output element, but it's not the case. The first two output elements contain some data that seems to be random (like some fields are randomly extracted from another input element, other than 5th, and some other fields are 0). Further elements of the output array are all 0.

Update: it seems like the way elements are put into a buffer by host is not coherent with the way data is fetched by the device. It is as though I have declared my struct differently in the host code and in the kernel, but I didn’t. The declaration is exactly the same. I’m almost sure that if I use 5 distinct input arguments for kernel, each corresponding to one field of the structure, the code will work fine. But I really don’t want to do that, it’s inconvenient…



typedef struct { float4 _pos; float4 _accel; float4 _velocity; float4 _force; float _invMass; } PARTICLE; kernel void integrate(int n, const global PARTICLE* world, global PARTICLE * Out) { uint gid = get_global_id(0); uint k = 5; PARTICLE kP = world; Out[gid] = kP; }

0 Likes

Try to do this, split struct. This will point whether you have coherency problem in your code, not in kernel itself.

0 Likes

Originally posted by: DTop Try to do this, split struct. This will point whether you have coherency problem in your code, not in kernel itself.

 

Will do. But I think it's not the coherency problem at all, because Out[gid] = world[gid] works well...

0 Likes
En-you
Journeyman III

i don't understand,i am newbie.

0 Likes

Originally posted by: En-you i don't understand,i am newbie.

 

It's a pity, but where are not newbies? Very strange that at official AMD forum nobody gives a real answer yet. Now i'm using Brook+ because i have 3870, but soon i planned to buy something from 5xxx (probably, 5770), and I really hope that number of bad surprises in OpenCL programming will be as little as possible.

0 Likes

this is IMHO problem with different aligment of struct in OpenCL and host program.

code bellow print 80 and 68 bytes. read more in OpenCL specification section 6.1.5 and C.3

#include <CL/cl.h> #include <iostream> typedef struct { cl_float4 _pos; cl_float4 _accel; cl_float4 _velocity; cl_float4 _force; cl_float _invMass; }PARTICLE_CL; typedef struct { float _pos[4]; float _accel[4]; float _velocity[4]; float _force[4]; float _invMass; }PARTICLE; int main() { std::cout << sizeof(PARTICLE_CL) << " " << sizeof(PARTICLE) << std::endl; return 0; }

0 Likes

Alignment problem would be my first thought, too.

Are you using cl_float4 or your own float4 type in the host code? Your own float4 type may not have alignment restrictions on it.

You say you've declared it the same on the host, but *exactly* how did you define it on the host. Easy fix to see if this is the case add three more floats to the end of the host code. Make it end with: float invMass, padd, padd1, padd2;

That should ensure it's 16 byte aligned and match the CL alignment, if that is indeed the problem.

0 Likes

nou, LeeHowes, you are right! Changed float into float4 and cl_float4 in kernel and on host-side correspondingly, now everything works fine. Thank you very much!

0 Likes