I try to use Attribute[GroupSize(...)] to get access to shared memory; perhaps it turns on "compute shader" mode (in IL assembly, header becomes il_cs_2_0 instead of il_ps_2_0).
If address translation is turned off (-r brcc option), the kernel fails to launch with message "No appropriate map technique found".
If address translation is turned on, the kernel executes successfully, but after it, other kernels of standard type (ps_2_0) start to produce incorrect results!
Please help, how to use it? The only sample in SDK, "lds" fails to launch as well if compiled without address translation.
In compute mode, runtime always selects address translated code.
So kernel which uses shared memory always works slow with DRAM arrays?