Hi,

I'm using APP SDK 2.5 and Catalyst 11.11.

In my code I had a line like this:

uint4 t,tmp; uint u; ... tmp = t * 4620 + u; I figured my kernel would run a bit faster if I combined the calculation using mad(). However, I did not manage to get it done as t is a vector and the other two operands are scalars. tmp = mad(t, 4620u, u); .\barrett.cl(1631): error: no instance of overloaded function "mad" matches the argument list argument types are: (uint4, uint, uint) tmp = mad(t, 4620u, u); So I tried converting the uints to uint4 (not sure if, when it works, it would still be faster than the original line): tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); .\barrett.cl(1631): error: more than one instance of overloaded function "convert_uint4" matches the argument list: function "convert_uint4(char4) C++" function "convert_uint4(uchar4) C++" function "convert_uint4(short4) C++" function "convert_uint4(ushort4) C++" function "convert_uint4(int4) C++" function "convert_uint4(uint4) C++" function "convert_uint4(long4) C++" function "convert_uint4(ulong4) C++" function "convert_uint4(float4) C++" function "convert_uint4(double4) C++" argument types are: (uint) tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); ^ .\barrett.cl(1631): error: more than one instance of overloaded function "convert_uint4" matches the argument list: function "convert_uint4(char4) C++" function "convert_uint4(uchar4) C++" function "convert_uint4(short4) C++" function "convert_uint4(ushort4) C++" function "convert_uint4(int4) C++" function "convert_uint4(uint4) C++" function "convert_uint4(long4) C++" function "convert_uint4(ulong4) C++" function "convert_uint4(float4) C++" function "convert_uint4(double4) C++" argument types are: (uint) tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); ^ .\barrett.cl(1631): error: no instance of overloaded function "mad" matches the argument list argument types are: (uint4, <error-type>, <error-type>) tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); ^ Can someone please advice how I can get to the desired mad() without adding anything that would consume additional cycles? Why is a scalar auto-expanded in a multiplication with a vector (and also within mul_hi(), for instance), but not when used in mad()? Thanks, Bdot

this should work

mad(t, (uint4)(4620u), (uint4)(u));

but i am not sure if GPU can perform MAD operation on int.