greetings.
is it in the realm of possibility to push and encrypted instruction stream onto a GCN gpu and have a deobsfucator kernel
decode the stream so as to make it a non trivial task to disassemble a kernel?
what would be needed would be to write to gpu code segments/have self modifying code and be in control of caches that
may have to get flushed i think. any insights?
Hi,
I think it's possible. At least there are some building bricks:
- You can jump to any 64bit address in memory.
- Also you can extract the address of uavs from their 128bit resource constants.
- And there is cache control too. (you can invalidate them or just flank them with slc/glc flags)
I believe it's impossible at the current level of OpenCL API and runtime.
1. The app doesn't have access to the kernel's binary code from the kernels, neither can execute the generated code somewhere in memory without excessive runtime hacking.
2. In theory the kernel's code can be placed in memory with read-only page attribute. Thus self-modifying is impossible even if the app could get/hack the binary's address in memory and somehow "invalidate" the instruction cache.
3. I don't think current HW is capable to flush all caches, including instruction cache, from the kernels. So the app would have to give a control to the host, before the generated kernel execution. Thus that leaves a potential whole for hacking/disassembling if necessary.
Probably in the future self-modifying/generated code will be possible, but I don't think it can prevent disassembling if necessary. The only question is how important the code that someone attempts to disassemble to the amount of efforts:-)
Hi,
I just tested it on GCN and it works perfectly! (Not on OpenCL, because that'd be impossible for sure.)
It can access/modify/run code inside its whole 48bit address space. There are no protection features as in an x86. It only provide range checking with useful 2d indexing, but you can modify the memory range you're writing into if you want. (Buffer Resources: SI ISA manual, Table 8.5)
I tested it with a small kernel that does the same as this:
__kernel test(__global int* buf) {
****
buf[get_global_id(0)] = get_global_id(0) + 1234567;
}
But in GCN asm of course, and I placed the self-modifying code into the position marked with ****.
It successfully replaced the 1234567 constant with the value 1000000.
Here's how: http://x.pgy.hu/~worm/het/gcn_self_modify.jpg
GCN is much more than just a mass-paralell ALU
Slicker than vacuum and simple. It never thought to clone a resource buffer for the PC address.
I know the entire address space is visible, which I use for a simple memory viewer that goes from 0 to the
end of the address space.
Great work.
"It never thought to clone a resource buffer for the PC address." I checked the size field in it, and it was $FFFFFF00, so why not hijack that
But now it seems like that's enough to specify Data_format=32bit and it works:
s_getpc_b64 s[24:25] //Address=PC, stride=0
s_mov_b32 s26,-1 //size=4GB
s_mov_b32 s27,$20000 //DataFmt=32bit it also says: swizzle 0000, but it doesn't matter
I guess that's how you do it too.
I saw that the big 'UAV pool' starts at 0x2000. I wonder what interesting thing can be at 0..0x1FFF ?