cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

realhet
Miniboss

Using AMD_IL code under OpenCL runtime environment.

Hi,

In order to make my stuff up to date, I'm leaving CAL behind and trying to use OpenCL for buffer/kernel management.

My goal is to execute an amd_il source code under the OpenCL api, so I can still code in low level and also access all the older cards with a WORKING api.

In order to reach the goal, I did the following:

1. compile AMD_IL source with calcl, make a cal_elf image. (Thats the only part I somewhat rely on good old cal)

2. compile a skeleton opencl source to an ocl elf image. (skeleton code 'defines' the buffers (uav,cb))

3. replace the cal_elf section inside the ocl_elf with the generated cal_elf image (step 1).

At the moment, the skeleton code is as simple as this:

__kernel void main(__global int* uav,__constant int* cb) { uav[get_global_id(0)]+=cb[0]; }

//one uav, one cb, and a global id, that's all.

//this code is modifyes(!) the contents of the uav

It consists of 3 basic operations: cb.read, uav.read, uav.write

I took out the .il source of this small opencl program, and recompiled with calcl. Inserted to the skeleton ocl_elf, and gave it to OpenCL API, the results was like:

All operations worked well except the uav.read which is always returned zero.

I investigated further and found a difference between the IL->ISA compilers of OpenCL and CAL. (I can only use explicitly CAL's compiler)

This difference was the VFETCH instruction's BUFFER_ID value:

calCl: VFETCH R0.x___, R1.x, fc0

OpenCL's internal compiler: VFETCH R0.x___, R1.x, fc153

(note that it was the result of the same .il program)

I don't want to do an automatic patcher to this, because it would be not a smooth solution, but I wonder if anyone know a better way to this problem: to compile AMD_IL code and execute it with the latest OpenCL.

I think both CAL and OCL uses the same compiler, but maybe OCL is passing it a special parameter and thus it generates slightly different code.

I hope there is a new enum for that replaces CAL_LANGUAGE_IL. ( calclCompile(obj, CAL_LANGUAGE_IL, PAnsiChar(AIL), ATarget) )

(Attaching dumped elf images, sources for the two methods (the thing in !ocl_ocl\ works and the one in !ocl_il fails). You can see, both .il files are the same, but the .isa files has one byte difference)

Anyone has some experience with this?

Or is there a way to access the AMD_IL compiler from within the OpenCL runtime itself (that would be a jackpot ).

If AMD guys read it: It would be awesome if you could officially enable amd_il and binary_isa as an option to create OpenCL kernels out of.

Thanks for replies in advance!

0 Likes
1 Solution

If I generate a skeleton elf, it wont put the .amdil section into it on th HD6970.

On HD4850 it generates .amdil, but not on 6xxx, so the only way I found was to recompile the .text section with calCl.

I thought there was a reason. I just checked and it works fine on Cayman HD6970 for me.

I also tried an older and a most recent compiler (12/11beta11) and both produce amdil and run ok.

Or is it another bug I have?

If I don't use the -fno-amdil flag on HD6970 it doesn't include .amdil.

And If I use the  -fno-amdil flag on HD4850 it will include .amdil.

This is sick, rofl

Hum, I wonder. You didn't say you did use the -f-amdil flag. Is it possible the default behaviour

for including amdil has changed with newer compiler versions? Did you try -f-amdil?

........... (classic maintenance voodoo magic problem) As now I dont have access

Yes, classic

View solution in original post

0 Likes
7 Replies
drallan
Challenger

I have written IL code  in a similar way for targeting Barts, Caymen, and Tahiti.

One difference is that I'm not using the cal compiler. First I make a skeleton openCL kernel to define my IO needs and then substitute my IL code but with a slight difference, I only substitute the 'user portion' and leave the setup code from the opencl compiler in place.  By user code, I mean the block of code that is indented with tab characters, in fact, I use the tab for making the the substitution.

This has always worked for me. I believe the only significant difference is how the cal compiler and the opencl compiler handle uav buffers. I'm not sure this is the kind of thing you are looking for, perhaps you need to use the cal compiler for other things.

0 Likes

If I generate a skeleton elf, it wont put the .amdil section into it on th HD6970.

On HD4850 it generates .amdil, but not on 6xxx, so the only way I found was to recompile the .text section with calCl.

Or is it another bug I have?

If I don't use the -fno-amdil flag on HD6970 it doesn't include .amdil.

And If I use the  -fno-amdil flag on HD4850 it will include .amdil.

This is sick, rofl

>I'm not sure this is the kind of thing you are looking for, perhaps you need to use the cal compiler for other things.

My problem is this: My client who is not in my country said my program (cal+amd_il) is not working ok with multigpu. (classic maintenance voodoo magic problem) As now I dont have access to multigpu, I decided to advance further and rewrite it to use the latest OpenCL environment, so I don't have to say him to use x.y version of old catalysts and stuff, I simply want it to work.

(Just got a report that it crashes on Tahiti. :S Well it works on Cayman and now finally on tv770... At least I learn all the different AMD_IL dialects in the process )

0 Likes

If I generate a skeleton elf, it wont put the .amdil section into it on th HD6970.

On HD4850 it generates .amdil, but not on 6xxx, so the only way I found was to recompile the .text section with calCl.

I thought there was a reason. I just checked and it works fine on Cayman HD6970 for me.

I also tried an older and a most recent compiler (12/11beta11) and both produce amdil and run ok.

Or is it another bug I have?

If I don't use the -fno-amdil flag on HD6970 it doesn't include .amdil.

And If I use the  -fno-amdil flag on HD4850 it will include .amdil.

This is sick, rofl

Hum, I wonder. You didn't say you did use the -f-amdil flag. Is it possible the default behaviour

for including amdil has changed with newer compiler versions? Did you try -f-amdil?

........... (classic maintenance voodoo magic problem) As now I dont have access

Yes, classic

0 Likes

Thank you! That different default behaviour tricked me.

With explicitly specifying all 4 -f[no-]bin-params (source,llvmir,amdil,exe) it worked like a charm on HD4850 and HD6970.

I'll do HD7970 tests soon, I hope that amd_il variant is not so different.

0 Likes

On the HD7xxx the IL is slightly different, mainly for global memory accesses.

0 Likes

I've tested IO with a small il kernel. I've tried to make the most 'intermediate' version out of it.

il_cs_2_0

dcl_num_thread_per_group 256,1,1

dcl_raw_uav_id(0) //uav translation if id=0 : 5xxx, 6xxx: -> 11, 7xxx -> 10

dcl_cb cb0[15] ; Constant buffer that holds ABI data, must declare on 4xxx

dcl_cb cb1[2]  //cb1[0].x : uav offset

dcl_cb cb2[20] //actual cb

  ishl r65.x, vAbsTid.x, 2

  iadd r65.x, cb1[0].x, r65.x  //add uav offset

  uav_raw_load_id(0)_cached r0.x, r65.x //load, 4xxx needs _cached hint

  iadd r66.x,cb2[0].x,r0.x

  uav_raw_store_id(0) mem.x___, r65.x, r66.x  //mask,addr,data

end

And the Corresponding OpenCL code is this:

__kernel __attribute__((reqd_work_group_size(256,1,1)))

void main(__global int* uav,__constant int* cb)

{  uav[get_global_id(0)]+=cb[0]; }

And the only difference amongs the  tested platforms is the UAV ID:

4850: 0

5770, 6790: 11

7970: 10

And yes, 7970 declares a fully specified dcl_typeless_uav with _stride, _access and stuff. But for a simple UAV dcl_raw_uav works well.

Another unique thing was that uav_reads have to be _cached on the 4850 (maybe it doesn't use caching by default). And if I forgot to declare the kernel parameter block (Constant buffer that holds ABI data) on 4850, then the kernel fails.

0 Likes

I'd suggest always having the CB for the ABI data declared. I think you're only getting away with it in your kernel as you never reference cb0 (unless I've missed it ). Also note that on SI (7xxx) the UAV can change (10, 11, 12, etc.) and different global memory paramters can live in different UAVs. You'll see this if you compile some more complex kerenels. Look in the .rodata section to see the UAV numbers for the global buffers.

0 Likes