As I understand it, using vectors in host code is implementation defined, but is there some way which is broadly supported (meaning it will work using both AMD and Nvidia SDKs)?
At present I do not even need to access or change the values of the vector type arrays in host code, I just need to be able to copy them and rearrange them. The attached code illustrates the kind of operations I want to be able to do (I haven't even tried doing this yet, as I want to aviod non-standard stuff as much as possible).
So basically, are the kinds of operations done in the attached code snippet allowed and guaranteed to work on all implementations, or is there still something that relies on non-standard behaviour?
My impression is that the only problem with vectors in host code is that there is no standard way to access the elements, so depending on how the vector components are stored (in what order) it's not possible to write truly portable code. Have I got the gist of it correctly or what?
Thanks for any help
//Create float3 buffer cl::Buffer Buffer3(context, CL_MEM_READ_WRITE, sizeof(cl_float3) * length, NULL, &err); //Create host array to hold the buffer contents cl_float3 *float3arr = new cl_float3 [length]; //Read buffer into host array queue.enqueueReadBuffer(Buffer3, CL_TRUE, 0, sizeof(float3) * length, float3arr); //Shuffle some values around (here switching first and second vectors) cl_float3 temp = float3arr; float3arr = float3arr; float3arr = temp; //Write array back into buffer. Is everything still guaranteed to be consistent? queue.enqueueWriteBuffer(Buffer3, CL_TRUE, 0, sizeof(float3) * length, float3arr);
As I see it, the only reasonable implementation hostside is to use native endianness for vector operations.
The reasons I have for this article of faith are that the portability appendix in the standard talks about OpenCL handling endianness issues between host and GPU automatically as long as you work with full vector types and this would imply element swapping on the OpenCL side if endianness differs.
It would also imply that it should not be swapped if you were using a CPU device because your CPUs presumably share endianness.
So I guess that boils down to that host order isn't implementation defined, but is simply defined by the endianness of the host. If you use intrinsics and vector operations consistently hostside there should not be any issues, but if parts of your code make assumptions about endianness you probably need separate code paths for big and little endian. Assuming you care about that kind of portability that is. You're not likely to find implementation differences within the PC world.
Yeah my understanding is also that it should be safe as long as no assumptions are made on my part. Also I just read in the standard about the .s[<index>] way of accessing vector elements. I actually thought the standard didn't specify a way of accessing vector elements in host code based on some other posts I've read, but it seems I was wrong.
So it seems the standard mandates what I need and more, so everything is peachy