Is there a function that can create or extract assembly code from the OpenCL kernel code? If not, is there any tool or program that I can use instead?
a) call clBuildProgram() with "-save-temps=temp_directory" parameter, and it will produce a lot of info.
b) Use CodeXL shader/kernel analyzer.
Thank you for the answer! I tried the first method and got disassembly code in isa file.
I have one more question regarding the usage of isa file. Is it possible to create an OpenCL program from this intermediate isa file instead of with source or binary? My goal is to add an instruction into the assembly code and build the program from this updated code.
Check out this project -> CLRadeonExtender
One of its aim is to be maximally compatible with the AMD generated isa source codes. Still you have to give it a header. But I'm not sure... the disassembler in it maybe give you that type of header as well. Just look at it and found it out!
I have an assembler too, but I rather write the interface between the drivers and the asm by hand, reverse engineering simple ocl kernels on the given drivers.
I think it is too easy to say, that you want only to tweak an OCL generated isa code. I usually end up rather rewriting it from scratch, because unrolled/optimized GCN code is so complicated.
Thank you for the tool. I managed to disassemble the binary and get assembly code out of it. Right, I see that the code is very complicated... I was trying to insert just a single instruction (s_dcache_wb_vol: write back dirty data in the scalar data cache volatile lines) after barrier function (I found s_barrier instruction in the code). But it seems that even inserting a single instruction breaks the code somehow that the kernel won't run properly anymore. Is there any tip or advice on inserting an instruction into already created assembly code? It will be a big help for me!
Actually I never patched/modified driver generated kernel code because it needs too much time to understant them, how are they working. I rather write kernels from scratch.
Can you say in a few whords what's the problem is? Maybe I can help.
If you only need to REPLACE a specific instruction, then modify the binary with the new machine code. But don't INSERT bytes unless you break the integrity of that file.
I'm so outdated I don't even know what s_dcache_wb_vol is for, lol. I gotta play more on my Fury
Hi. I think I got the instruction insertion working . I was basically trying to add an instruction after barrier function to flush (write back) cache. Reason for doing this was to somehow resolve cache incoherency problem in APU (APU only provides memory coherency between CPU and GPU, but not cache). After playing with the code, I found that barrier (work_group_barrier) function creates a single 4B instruction (s_barrier) whereas s_dcache_wb_vol is an 8B instruction. So the solution was to simply write two barrier functions in the code where I wanted s_dcache_wb_vol instruction to be inserted, extract assembly code, then replace two s_barrier instructions with s_dcache_wb_vol.
Now I have another problem that s_dcache_wb_vol instruction does not actually do what I intended, which is to flush all cached data in GPU to physical memory. After reading up on GCN, it seems there are three caches (L1, scalar data, L2). I found that there are instructions for L1 and scalar data cache, but not L2. I would have to ask this in some other forum related to GCN. If anyone knows how to flush L2 cache in GCN (v1.2), it would be great! Thanks.
Retrieving data ...