Ok, so built code using float4s in not using the GPU (no kernel calls) and it works completely. I then attempted to port the same code over to the GPU and I get absolutely nothing (just junk numbers, so the program ends early) in CAL mode and off results in CPU mode. I am hoping someone can lend me a hand, I'm not sure if it is the kernel that is bad or something in my wrapper function, so I am including them both (the entire .br file):