Failed optimization ?

Discussion created by hazeman on Apr 11, 2011
Latest reply on May 1, 2011 by omion
restrict keyword ( or -fno-alias ) has no impact on ISA

Restrict keyword is supposed to enable cached reads from global memory.

But lets look at the attached kernel. Whether we use rescrict keyword or not there is no difference in produced ISA code. At the moment the only difference is that IL code has "_cached" added to uavs but it has no real impact on kernel execution.

Why kernel with restrict keyword does have to wait for write completion ( WAIT_ACK: Outstanding acks<=0 ). Is it failed optimization ? Can we expect this fixed someday or is it required by hardware ( = failed hardware design ) ?

PS. Compiled with driver  11.3, 11.4-preview. The problem is visible in every kernel I've tested.

PS2. -fno-alias gives the same results.

__kernel void test( __global float4 const* restrict in , __global float4* out ) { unsigned int gid = get_global_id(0); out[2*gid] = in[gid]; out[gid] = in[2*gid]; } ;-------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(17) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 0 x: MULLO_INT ____, R1.x, KC0[1].x y: MULLO_INT ____, R1.x, KC0[1].x z: MULLO_INT ____, R1.x, KC0[1].x w: MULLO_INT R0.w, R1.x, KC0[1].x 1 z: ADD_INT R0.z, R0.x, PV0.w 2 y: ADD_INT R0.y, PV1.z, KC0[6].x 3 x: LSHL R0.x, PV2.y, (0x00000005, 7.006492322e-45f).x z: LSHL R0.z, PV2.y, (0x00000004, 5.605193857e-45f).y 4 x: ADD_INT R0.x, KC1[0].x, PV3.x y: ADD_INT R0.y, KC1[0].x, PV3.z z: ADD_INT R0.z, KC1[1].x, PV3.z w: ADD_INT R0.w, KC1[1].x, PV3.x 5 x: LSHR R0.x, PV4.w, (0x00000002, 2.802596929e-45f).x y: LSHR R0.y, PV4.x, (0x00000004, 5.605193857e-45f).y w: LSHR R0.w, PV4.y, (0x00000004, 5.605193857e-45f).y 01 TEX: ADDR(64) CNT(1) 6 VFETCH R1, R0.w, fc175 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 02 MEM_RAT_CACHELESS_STORE_DWORD__NI_ACK: RAT(11)[R0], R1, ARRAY_SIZE(4) MARK VPM 03 WAIT_ACK: Outstanding_acks <= 0 04 TEX: ADDR(66) CNT(1) 7 VFETCH R1, R0.y, fc175 FORMAT(32_32_32_32_FLOAT) FETCH_TYPE(NO_INDEX_OFFSET) 05 ALU: ADDR(49) CNT(2) 8 x: LSHR R0.x, R0.z, (0x00000002, 2.802596929e-45f).x 06 MEM_RAT_CACHELESS_STORE_DWORD__NI: RAT(11)[R0], R1, ARRAY_SIZE(4) MARK VPM 07 END END_OF_PROGRAM