Archives Discussions

Bdot · ‎12-16-2011

Hi,

I'm using APP SDK 2.5 and Catalyst 11.11.

In my code I had a line like this:

uint4 t,tmp; uint u; ... tmp = t * 4620 + u; I figured my kernel would run a bit faster if I combined the calculation using mad(). However, I did not manage to get it done as t is a vector and the other two operands are scalars. tmp = mad(t, 4620u, u); .\barrett.cl(1631): error: no instance of overloaded function "mad" matches the argument list argument types are: (uint4, uint, uint) tmp = mad(t, 4620u, u); So I tried converting the uints to uint4 (not sure if, when it works, it would still be faster than the original line): tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); .\barrett.cl(1631): error: more than one instance of overloaded function "convert_uint4" matches the argument list: function "convert_uint4(char4) C++" function "convert_uint4(uchar4) C++" function "convert_uint4(short4) C++" function "convert_uint4(ushort4) C++" function "convert_uint4(int4) C++" function "convert_uint4(uint4) C++" function "convert_uint4(long4) C++" function "convert_uint4(ulong4) C++" function "convert_uint4(float4) C++" function "convert_uint4(double4) C++" argument types are: (uint) tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); ^ .\barrett.cl(1631): error: more than one instance of overloaded function "convert_uint4" matches the argument list: function "convert_uint4(char4) C++" function "convert_uint4(uchar4) C++" function "convert_uint4(short4) C++" function "convert_uint4(ushort4) C++" function "convert_uint4(int4) C++" function "convert_uint4(uint4) C++" function "convert_uint4(long4) C++" function "convert_uint4(ulong4) C++" function "convert_uint4(float4) C++" function "convert_uint4(double4) C++" argument types are: (uint) tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); ^ .\barrett.cl(1631): error: no instance of overloaded function "mad" matches the argument list argument types are: (uint4, <error-type>, <error-type>) tmp = mad(t, convert_uint4(4620u), convert_uint4(u)); ^ Can someone please advice how I can get to the desired mad() without adding anything that would consume additional cycles? Why is a scalar auto-expanded in a multiplication with a vector (and also within mul_hi(), for instance), but not when used in mad()? Thanks, Bdot

nou · ‎12-16-2011

this should work

mad(t, (uint4)(4620u), (uint4)(u));

but i am not sure if GPU can perform MAD operation on int.

Bdot · ‎12-16-2011

You are right (with both remarks):

.\barrett.cl(1631): error: no instance of overloaded function "mad" matches
          the argument list
            argument types are: (uint4, uint4, uint4)
    tmp = mad(t, (uint4)4620u, (uint4)u);

The simple cast would create the correct type (if needed), but mad is only available for floating point. This also explains why it is autoexpanded in mul_hi: because mul_hi is defined for integer types.

I had seen the integer function mad_hi and simply concluded there was also a mad. I wish OpenCL would start closing the holes in the instructions (mul24_hi, mad are already two examples I'm missing).

Archives Discussions

Using mad() for vector and scalar types?