cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Fuxianjun
Journeyman III

Serveral basic question.

I want to program an algebra calculation which must be devided into two parts, the latter's input is the former's output.
For example:
 The former part is to use a vector (vectorA with n factors) as its input to add another vector(vectorB with n factors),  the output is vectorC.
The latter part is to use vectorC as the input paramter of some algebra fucntions F(X)(X is vector type), because of F(X) is variable, the whole calculation must be took apart.

My problems are:
 For the first part, if the kernel is:

1.Dose each work item excute only once?
2. For vectorA (with n factors) plus vectorB(with n factors), must there be n work items to complete the  addition.
3.Are the n work items in the same work group?
4.In what condition, the work items are in different work group?

5.I want to continue the latter part of this calculation,input vectorC which is in global memory to kernel which has the same function of F(X), what can I do? Use event or commandqueue or some else? How ?

That's all,please help me!

__kernel void adder(__global const float* a, __global const float* b, __global float* result) { int idx = get_global_id(0); result[idx] = a[idx] + b[idx]; }

0 Likes
3 Replies
dravisher
Journeyman III

Hello. I'll answer as best I can, I'm sure someone will correct me if I am wrong .

 

1. Each work item executes the corresponding kernel once, yes.

2. You could have each work item do more than one element, thus not needing n work items, but that doesn't make sense in this case as far as I can see. So the logical thing to do would be to have n work items, each doing one addition [Edit: As the your code is doing].

3. No. In this case the work group size should be tuned for performance (1 for CPU, probably 64 or more for the Cypress GPU). Since you aren't using any local memory in this case the work group size is irrelevant except for performance tuning.

4. As I said in 3., since the work items aren't sharing any data / don't need to synchronize, whether work items are in the same or different work groups does not matter.

 

5. Not sure I understand the question. In the host code you'd just pass the buffer that corresponds to the result array to the F(X) kernel.

0 Likes

thank you for reply.I wonder there are too much factors in a vector,is it a workitem process addtion only once ?

0 Likes

Originally posted by: Fuxianjun thank you for reply.I wonder there are too much factors in a vector,is it a workitem process addtion only once ?

 

Sorry, not sure I understand the question. Could you rephrase?  In general each work item would do one addition like the code you attached, so if the vectors had 1 million elements you would create 1 million work items.

0 Likes