cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

NURBS
Journeyman III

kernel of vey low ALUBusy

Can yoy recommend anything for a kernel that write to global memory a lot

Hi, I have a kernel with very low ALUBusy of 2%. It reads a value from global memory and writes back 8 values(with trival changes) back to the global memory. Any recommendation to make it run faster?

NURBS 

0 Likes
2 Replies
thesmileman
Journeyman III

kernel of vey low ALUBusy

Originally posted by: NURBS Hi, I have a kernel with very low ALUBusy of 2%. It reads a value from global memory and writes back 8 values(with trival changes) back to the global memory. Any recommendation to make it run faster?

NURBS 

You have low ALUBusy because you are using all your time accessing memory. Try reading multiple inputs at once to decrease your memory access. You say you are doing trivial changes so really you shouldn't be doing it on the GPU. I don't know your hardware or what all you are doing but trivial changes to data usually doesn't help you at all. So do more once the data gets to the ALU (if there is anything else you want to add) and read multiple inputs at a time. Also what hardware are you using? What is your input data shape 1D, 2D, 3d? What datatypes? Why don't you just post exactly what you are doing. Also what hardware are you running it on?

 

Jim

0 Likes
himanshu_gautam
Grandmaster

kernel of vey low ALUBusy

Hi NURBS,

A few general suggestions i can give is:

1. Do not do it on GPU unless data is already on GPU because of some previous procesing, or is required afterwards for more processing.

2. If you are still interested, then try to use caches. You have got L1 and L2 caches. If the access pattern is tile-based, try using images and get benifits from 2-D L2 cache. Else stick to sequential access patterns.

3. Make sure that consecutive workitems , specially inside same wavefront access consecutive memory locations. Also try to write code so that a wavefront only uses a single channel for global access, so many wavefronts can run together.

I hope these tips help. But the final decision is in your hands based on your problem.

0 Likes