HI all, I am trying to pass parts of a single precision algorithm to double precision. I have kernels that work well on float4 streams in and out, but get a compile error for double4 streams. Is this expected? Thanks
Hi, I'm just a user like you, but I think I can help you: You can use short vectors types that are up to 128 bit at most. Using float4 you get 4 x 32 = 128 Using double2 you get 2 x 64 = 128 So double2 is the maximum double type you can use.
Thanks for the information Ceq. Indeed, double2 works properly but not double4. I was fearing it would be something like that. I have no idea whether it would be unreasonably difficult to implement for AMD, but if they could abstract the 128 bit limit out, it would be great for scientific programmers who have programmed and tested something in single precision, using float4, and who then want to go to double precision as painlessly as possible! Thanks