cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Zeroing few bytes flag - what way better?

via zeroing kernel or via buffer write?

I need to initialize float or float4 flag before kernel call.
What better to use, some memset kernel of one workitem or by writing float or float4 zero value from host to GPU memory?
In which case call overhead will be lower?

0 Likes
10 Replies
genaganna
Journeyman III

Originally posted by: Raistmer I need to initialize float or float4 flag before kernel call. What better to use, some memset kernel of one workitem or by writing float or float4 zero value from host to GPU memory? In which case call overhead will be lower?


Just add one more parameter to actual kernel and set kernel argument with required value and call kernel.

0 Likes
Raistmer
Adept II

But how can I ensure then that last launched workitems don't overwrite already setted flag ?
for example:
__kernel void abc(__global int* flag, int zero){
flag[0]=zero;
if ( ....... ){
flag[0]=1;
}
}

if condition not true in last wavefront I always will recive zero flag, not ?
0 Likes

Originally posted by: Raistmer But how can I ensure then that last launched workitems don't overwrite already setted flag ? for example: __kernel void abc(__global int* flag, int zero){ flag[0]=zero; if ( ....... ){ flag[0]=1; } } if condition not true in last wavefront I always will recive zero flag, not ?


Make it constant so that no one able to overwrite.

0 Likes
Raistmer
Adept II

LoL, if I make it constant then how flag can be setted if needed condition is true ?
Looks like I miss your idea completely What I need:
if some event happened due kernel run (in my case it's some signal is found) flag setted to 1, no matter what workitem found it and no matter many workitems found it or just single one. But if no one found it - flag should remain zero. Then, after kernel launch CPU code can download only flag from GPU memory and decide if it needs whole data array to be downloaded too or not.
But that flag should be initialized. If I will initialize it inside kernel - no synching between workgroups possible AFAIK. That is, some workitem in prev workgroup can find signal, then next workgroup will re-initialize flag that kills whole idea....
0 Likes

Originally posted by: Raistmer LoL, if I make it constant then how flag can be setted if needed condition is true ? Looks like I miss your idea completely What I need: if some event happened due kernel run (in my case it's some signal is found) flag setted to 1, no matter what workitem found it and no matter many workitems found it or just single one. But if no one found it - flag should remain zero. Then, after kernel launch CPU code can download only flag from GPU memory and decide if it needs whole data array to be downloaded too or not. But that flag should be initialized. If I will initialize it inside kernel - no synching between workgroups possible AFAIK. That is, some workitem in prev workgroup can find signal, then next workgroup will re-initialize flag that kills whole idea....


You can solve this by using atomics.   if you find singal, increment the value of flag by 1.

0 Likes
Raistmer
Adept II

yes, but how I will set initial value? increment from random initial number will not help...
0 Likes

Originally posted by: Raistmer yes, but how I will set initial value? increment from random initial number will not help...


 

create buffer with using host ptr where host ptr value is initialized with required value  and set kernel argument and run kernel.

0 Likes
Raistmer
Adept II

And if I need this in loop?
As I understand you propose:
1) create buffer
2) call kernel.
3) destroy buffer


Will it faster than
1) use mem copy to buffer (created once at beginning of program)
2) call kernel

?
0 Likes
Raistmer
Adept II

And third way:
1)call memset kernel
2)call kernel

What way will faster ?
0 Likes

Originally posted by: Raistmer And third way: 1)call memset kernel 2)call kernel What way will faster ?


Second is better than third becasue third is Second + kernel invocation + kernel execution.

 

Second is the best among the three.

0 Likes