cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

High "write unit stalled" counter values. Any tricks to lower?

Few my kernels show quite high write unit stalled percentage (~80%).

What could be done to rearrange writes maybe to decrease this counter (% measured in CodeXL profiler)?

Currently kernel writes 4 floats at ones per workitem in adjacent threads.

Kernel's ISA looks like:

143 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R1], R0, ARRAY_SIZE(4)  MARK  VPM

144 ALU: ADDR(1728) CNT(2)

    466  x: MULADD_e    R2.x,  R5.x,  R3.w, -R1.y     

         y: MULADD_e    R2.y,  R5.y,  R3.w,  R1.w     

145 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R3], R2, ARRAY_SIZE(4)  MARK  VPM

And CL looks like:

__global float4* gpu_dechirped

...

uint tid = get_global_id(0);

uint dchunk=get_global_id(1);

....

gpu_dechirped[32*(FFT_SIZE/2)*dchunk+(2*i+1)*(FFT_SIZE/2)+tid]=cur_dechirp;
0 Likes
1 Reply
ravikeshri
Journeyman III

Are you using a dGpu or integrated GPU? How have you created the global buffers?

In the case of dGpu, if you are using CL_MEM_USE_HOST_PTR for the output buffer, it will give you worse performance. I would recommend to create a device output buffer and do a single clReadBuffer call after the kernel execution.

0 Likes