cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

Emitting dmad in OpenCL

Is there a way to get the compiler to emit dmad instructions without calling mad() or fma()? I looked at the fma macro and it does the following:

It seems the 4 mov instructions are extraneous and the in0, in1, in2, and out0 registers can be directly fed into the dmad instruction.

mdef(358)_out(1)_in(3) mov r0, in0 mov r1, in1 mov r2, in2 dmad r0.xy__, r0.xy, r1.xy, r2.xy mov out0, r0 mend

Tags (1)
0 Likes
6 Replies
MicahVillmow
Staff
Staff

Emitting dmad in OpenCL

rick.weber, this is something that will show up in the upcoming release. It did not make it into 2.2.

Also, the mov instructions in IL are not important if they do not show up in the ISA.
0 Likes
rick_weber
Adept II

Emitting dmad in OpenCL

The movs get optimized away in backend compilation then?

0 Likes
MicahVillmow
Staff
Staff

Emitting dmad in OpenCL

Yes it should. However, if it is not the case, then it is a problem that we need to look into fixing.
0 Likes
rick_weber
Adept II

Emitting dmad in OpenCL

The code I observed was generated by using clGetProgramInfo and storing the binary to a file. Is there a way to instead view the device-specific assembly code? I think most of the movs are getting optimized away as I'm getting over 50% of double precision peak in a DGEMM using fma().

0 Likes
MicahVillmow
Staff
Staff

Emitting dmad in OpenCL

set the environment variable GPU_DUMP_DEVICE_KERNEL=2 will dump the ISA for each kernel. Another solution is to use Stream Kernel Analyzer to get the ISA.
0 Likes
rick_weber
Adept II

Emitting dmad in OpenCL

Thanks! So it appears those mov instructions indeed were optimized away in backend compilation.

0 Likes