Hello. I'll answer as best I can, I'm sure someone will correct me if I am wrong .
1. Each work item executes the corresponding kernel once, yes.
2. You could have each work item do more than one element, thus not needing n work items, but that doesn't make sense in this case as far as I can see. So the logical thing to do would be to have n work items, each doing one addition [Edit: As the your code is doing].
3. No. In this case the work group size should be tuned for performance (1 for CPU, probably 64 or more for the Cypress GPU). Since you aren't using any local memory in this case the work group size is irrelevant except for performance tuning.
4. As I said in 3., since the work items aren't sharing any data / don't need to synchronize, whether work items are in the same or different work groups does not matter.
5. Not sure I understand the question. In the host code you'd just pass the buffer that corresponds to the result array to the F(X) kernel.
thank you for reply.I wonder there are too much factors in a vector,is it a workitem process addtion only once ?
Originally posted by: Fuxianjun thank you for reply.I wonder there are too much factors in a vector,is it a workitem process addtion only once ?
Sorry, not sure I understand the question. Could you rephrase? In general each work item would do one addition like the code you attached, so if the vectors had 1 million elements you would create 1 million work items.