cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Mikey
Journeyman III

Where to keep data?

I have few quiet large arrays of precomputed values (8*256*8B + 10*8B = over 16kB) that I keep in __constant address space. All values are of ulong type and are used by all work-items (because actually I have no idea how to split their task for more than one work-item :/). Is __constant the best place to put those values?

 

Next question - AFAIR on HD4850 __local address space is emulated from __global so it won't be any faster?

0 Likes
8 Replies
n0thing
Journeyman III

Right now I don't see any performance improvement by using _constant qualifier instead of _global so currently it doesn't seem to use any cache.

Yes, local memory is emulated by using global memory in 4800 series hence using local memory will actually degrade your performance.

0 Likes

Originally posted by: n0thing Right now I don't see any performance improvement by using _constant qualifier instead of _global so currently it doesn't seem to use any cache.


Are you sure n0thing?

Let's suppose I have this kernel argument:

"const int value"

This value won't be cached in constant memory at the moment??

0 Likes

Unless the OpenCL compiler is really bad I don't see why it wouldn't use the constant buffer/cache!?

Have you tried both? Did you see any performance improvement (only if it's bound by that)?

0 Likes

I am not sure but from what I observed I didn't see any performance improvement on using _constant qualifier.

 

0 Likes

what i see is 2 read from global memory. and only one use of constatnt cache. but i think as ATI improve OpenCL compiler it will begn use constant cache and use of __constant improve performance. so if you loking forwad you should use __constant

__kernel void cons(__global float *a, __global float *b, __constant float *c, const float d) { size_t gid = get_global_id(0); b[gid] = a[gid]+c[gid]+d; } ; -------- Disassembly -------------------- 00 ALU: ADDR(32) CNT(11) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 0 t: MULLO_INT ____, R1.x, KC0[1].x 1 z: ADD_INT ____, R0.x, PS0 2 y: ADD_INT ____, PV1.z, KC0[6].x 3 x: LSHL T0.x, PV2.y, (0x00000002, 2.802596929e-45f).x 4 y: ADD_INT ____, KC1[0].x, PV3.x w: ADD_INT ____, KC1[2].x, PV3.x 5 y: ADD_INT R0.y, KC1[1].x, T0.x z: LSHR R0.z, PV4.w, (0x00000002, 2.802596929e-45f).x w: LSHR R0.w, PV4.y, (0x00000002, 2.802596929e-45f).x 01 TEX: ADDR(48) CNT(2) 6 VFETCH R0.x___, R0.z, fc156 MEGA(4) FETCH_TYPE(NO_INDEX_OFFSET) 7 VFETCH R1.x___, R0.w, fc156 MEGA(4) FETCH_TYPE(NO_INDEX_OFFSET) 02 ALU: ADDR(43) CNT(4) KCACHE0(CB1:0-15) 8 x: LSHR R1.x, R0.y, (0x00000002, 2.802596929e-45f).x z: ADD ____, R0.x, R1.x 9 x: ADD R0.x, KC0[3].x, PV8.z 03 MEM_RAT_CACHELESS_STORE_RAW: RAT(1)[R1].x___, R0, MARK VPM END_OF_PROGRAM

0 Likes

Do you get an extra VFETCH instruction when you remove the 'const' from the 4th argument?

0 Likes

no. when i change __constant to __global nothing change.

but non pointer type apperars that is in constant cache.

0 Likes

constant address space pointers do not use the constant cache since the size of the pointer is not known at compile time and the hardware constant cache sizes do not conform to the OpenCL spec. We are working on a way to allow programmers to put the data in the constant caches, but currently the data resides in global memory.
0 Likes