cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

aisesal
Journeyman III

OpenCL textures.

I'm a bit confused about read_imagef behaviour. I've attached few kernels with disassembly. The disassembly for KernelLinear and KernelNearest is different. It seems that in case of KernelNearest coordinates are multiplied by texture dimmensions, rounded, and then multiplied by inverse of texture dimmensions. It seems like emulation of nearest filtering. But that seems unecessary, because nearest filtering can be done in hardware. Maybe it has to do something with some sort of precission requirements for OpenCL? It gets worse then using integer coordinates. First they get converted to floating point (because SAMPLE instruction takes floating point coordinates), then rounded (why? they already are "round").

My biggest problem is that I want to use 2D texture as some sort of random access array (because of texture cache), and I'd like to access it by integer coordinates, without filtering, but now I get lots of coordinate calculations due to read_imagef behaviour. It seems that OpenCL always use "SAMPLE" instruction, though sometimes it could use "LD" instruction (in case of integer non-normalized coordinates & nearest filtering) and skip all the coordinate conversion.

I could write my kernels in CAL IL, but CAL is deprecated. Besides CAL only supports 1D & 2D textures, but I also need 3D textures.

 

__kernel void KernelLinear( __global float4 *output, __global const float2 *coord, __read_only image2d_t input) { const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_NONE | CLK_FILTER_LINEAR; uint globalId = get_global_id(0); output[globalId] = read_imagef(input, sampler, coord[globalId]); } ; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(11) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 0 t: MULLO_INT ____, R1.x, KC0[1].x 1 y: ADD_INT ____, R0.x, PS0 2 w: ADD_INT ____, PV1.y, KC0[6].x 3 x: LSHL ____, PV2.w, 4 z: LSHL ____, PV2.w, 3 4 x: ADD_INT ____, KC1[1].x, PV3.z y: ADD_INT ____, KC1[0].x, PV3.x 5 x: LSHR R1.x, PV4.y, 2 w: LSHR R0.w, PV4.x, 3 01 TEX: ADDR(48) CNT(1) 6 VFETCH R0.xy__, R0.w, fc174 FORMAT(32_32_FLOAT) MEGA(8) FETCH_TYPE(NO_INDEX_OFFSET) 02 TEX: ADDR(50) CNT(1) 7 SAMPLE R0, R0.xy0x, t0, s0 03 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R1], R0, ARRAY_SIZE(4) MARK VPM END_OF_PROGRAM __kernel void KernelNearest( __global float4 *output, __global const float2 *coord, __read_only image2d_t input) { const sampler_t sampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; uint globalId = get_global_id(0); output[globalId] = read_imagef(input, sampler, coord[globalId]); } ; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(13) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 0 t: MULLO_INT ____, R1.x, KC0[1].x 1 w: ADD_INT ____, R0.x, PS0 t: I_TO_F R0.x, KC1[2].x 2 z: ADD_INT ____, PV1.w, KC0[6].x t: I_TO_F R0.y, KC1[2].y 3 y: LSHL ____, PV2.z, 3 z: LSHL ____, PV2.z, 4 4 y: ADD_INT ____, KC1[0].x, PV3.z w: ADD_INT ____, KC1[1].x, PV3.y 5 x: LSHR R2.x, PV4.y, 2 z: LSHR R0.z, PV4.w, 3 01 TEX: ADDR(64) CNT(1) 6 VFETCH R1.xy__, R0.z, fc174 FORMAT(32_32_FLOAT) MEGA(8) FETCH_TYPE(NO_INDEX_OFFSET) 02 ALU: ADDR(45) CNT(6) KCACHE0(CB1:0-15) 7 z: MUL_e ____, R1.y, R0.y w: MUL_e ____, R1.x, R0.x 8 x: FLOOR ____, PV7.w w: FLOOR ____, PV7.z 9 x: MUL_e R0.x, KC0[3].x, PV8.x y: MUL_e R0.y, KC0[3].y, PV8.w 03 TEX: ADDR(66) CNT(1) 10 SAMPLE R0, R0.xy0x, t0, s0 04 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R2], R0, ARRAY_SIZE(4) MARK VPM END_OF_PROGRAM __kernel void KernelIntegerCoords( __global float4 *output, __global const int2 *coord, __read_only image2d_t input) { const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; uint globalId = get_global_id(0); output[globalId] = read_imagef(input, sampler, coord[globalId]); } ; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(11) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 0 t: MULLO_INT ____, R1.x, KC0[1].x 1 w: ADD_INT ____, R0.x, PS0 2 z: ADD_INT ____, PV1.w, KC0[6].x 3 y: LSHL ____, PV2.z, 3 z: LSHL ____, PV2.z, 4 4 y: ADD_INT ____, KC1[0].x, PV3.z w: ADD_INT ____, KC1[1].x, PV3.y 5 x: LSHR R1.x, PV4.y, 2 z: LSHR R0.z, PV4.w, 3 01 TEX: ADDR(48) CNT(1) 6 VFETCH R0.xy__, R0.z, fc174 FORMAT(32_32_FLOAT) MEGA(8) FETCH_TYPE(NO_INDEX_OFFSET) 02 ALU: ADDR(43) CNT(4) 7 t: I_TO_F ____, R0.x 8 x: FLOOR R0.x, PS7 t: I_TO_F ____, R0.y 9 y: FLOOR R0.y, PS8 03 TEX: ADDR(50) CNT(1) 10 SAMPLE R0, R0.xy0x, t0, s0 UNNORM(XYZW) 04 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R1], R0, ARRAY_SIZE(4) MARK VPM END_OF_PROGRAM

0 Likes
1 Reply

This is because the hardware does not fully comply with the OpenCL requirements, so we have to do extra calculations to guarantee correctness.
Also, OpenCL is built on top of CAL, so if OpenCL supports it, CAL does to(though it might not be documented).
0 Likes