cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

foomanchoo
Adept I
Adept I

gcn assembler

when is the gcn assembler coming?

opencl is driving me nuts again.

it thinks that precalculating 20 offsets into LDS in the outer loop is more important than staying

below 128 registers.

and i didnt even enable optimization.

please can i at least disable all "optimizations"?

17 Replies
cguenther
Adept II
Adept II

Re: gcn assembler

For the fakt that the precalculating should be more important, i think you should check your kernel with the AMD codeCL profiler and see the diagrams, which resources limits the number of active threads.

I promote your point the the cl compiler does various uninfluenceable optimizations, which lead to undetermined compiler output. So i would really suggest some methods like register reusage, when the are encapsulated inside { } for an example.

I don't know why, but the encapsulation with these braces, gives a very undetermined usage of the SGPRS and VGPRS. So i can't find a way to define the specific usage of them with standard OpenCL without going to assambler.

heman
Adept II
Adept II

Re: gcn assembler

Hi foomanchoo,

It will be nice if you can attach a small example, for which this unsuitable optimization happens. Please also furnish details about the driver, APP SDK, CPU & GPU you are using.

Anyways, have you tried using "-cl-opt-disable" in the build options for kernel. This might help in disabling the optimizations.

0 Kudos
Reply
jacksonfurrier
Adept I
Adept I

Re: gcn assembler

Yeah I agree. It would be really nice to have a PTX like programming feature in OpenCL with AMD cards
Is there any chance that AMD will do it?

0 Kudos
Reply
himanshu_gautam
Grandmaster
Grandmaster

Re: gcn assembler

Hi,

Can you please give some key points to support your request? I can forward them to relevant people, but i myself have no idea about PTX.

0 Kudos
Reply
realhet
Miniboss
Miniboss

Re: gcn assembler

Hi,

This is another thread that is about inline assembly in OpenCL.

I'd also like to have this feature. So I only had to write a good inner loop in asm and the other things can be in well maintainable OpenCL language.

And the inlined asm wouldn't be AMD_IL, but rather VLIW or GCN asm, because only there you have full control over register usage and over pretty much everything (On GCN: s_registers, or calling subroutines in order to stay inside the Instruction Cache).

But thinking of it, how hard would it be to implement inline asm that goes through opencl -> llvmir -> amd_il and finally reaches the desired low level. Al levels are doing optimizations that should make optimization-decisions based on the register usage of the inline asm sections...

jacksonfurrier
Adept I
Adept I

Re: gcn assembler

Yes, I'd vote on the GCN ASM
I know that, it would be really hard to implement but I think the guys at AMD could solve it.
In my university they did the same with the old IBM Cell CPU, they wrote the "average" things in C and the other "algorithm" parts in Cell ASM and now they have lots of prime records in the  pocket.

0 Kudos
Reply
realhet
Miniboss
Miniboss

Re: gcn assembler

Of course "the guys at AMD could solve it", but does that effort worth it for AMD? I doubt so...

BTW: I have a weird plan: Not inline asm, but inline high level sections inside the gcn asm.

With my script lang I can already 'unroll' and optimize (constant calculation elimination with handling commutativity) arithmetic operations. But have to make it to produce gcn V code, and that's not that straightforward, like SSE.

I'm afraid of things like MAD with clamp/negate double/quadruple/halve modifiers.

In my last project I used NASM-like macros to improve gcn asm. And soon as my asm code reached 2-300 lines I realized that I can't handle registers manually (I've used aliases mapped to physical regs, but there's much chance to make a mistake and reuse an already used reg). So I've made enter/leave blocks with temporary register tracking and allocation inside the block.

From there this high level function thing can be the next step but it's kinda complicated.

function add(a,b:int):int;begin result:=a+b; end;

This can be easily translated to     v_add_i32 result, vcc, a, b

But what if:

- b is scalar or constant -> Compiler have to exchange operands (and know if it can exchange or not)

- both a, b are scalar or constant -> Compiler have to insert a v_mov_b32 to provide VOP2's operand requirements.

And there are so many things like this, I'm not even dare think of.

0 Kudos
Reply
foomanchoo
Adept I
Adept I

Re: gcn assembler

wait - hetpas produces elf files - they should be executable on linux.

well that is all i need.

is there a chance to get the source code for the assembler so that i can port it to linux? (excluding the

pascal part and the IDE)

0 Kudos
Reply
realhet
Miniboss
Miniboss

Re: gcn assembler

A year ago I used it to develop cal.elf kernels on Win, and then executed it on Linux. Probably it will work for ocl.elf too.

I don't want to put up the whole thing and the assembler is integrated with the script engine so badly. So I attach the relevant parts of the assembler only, I guess you can still dig up something useful from it.

Note that there are lot of stuff missing: for example int64, float64, and images.

0 Kudos
Reply