10 Replies Latest reply on Aug 10, 2010 5:45 AM by genaganna

    Zeroing few bytes flag - what way better?

    Raistmer
      via zeroing kernel or via buffer write?

      I need to initialize float or float4 flag before kernel call.
      What better to use, some memset kernel of one workitem or by writing float or float4 zero value from host to GPU memory?
      In which case call overhead will be lower?

        • Zeroing few bytes flag - what way better?
          genaganna

           

          Originally posted by: Raistmer I need to initialize float or float4 flag before kernel call. What better to use, some memset kernel of one workitem or by writing float or float4 zero value from host to GPU memory? In which case call overhead will be lower?


          Just add one more parameter to actual kernel and set kernel argument with required value and call kernel.

          • Zeroing few bytes flag - what way better?
            Raistmer
            But how can I ensure then that last launched workitems don't overwrite already setted flag ?
            for example:
            __kernel void abc(__global int* flag, int zero){
            flag[0]=zero;
            if ( ....... ){
            flag[0]=1;
            }
            }

            if condition not true in last wavefront I always will recive zero flag, not ?
              • Zeroing few bytes flag - what way better?
                genaganna

                 

                Originally posted by: Raistmer But how can I ensure then that last launched workitems don't overwrite already setted flag ? for example: __kernel void abc(__global int* flag, int zero){ flag[0]=zero; if ( ....... ){ flag[0]=1; } } if condition not true in last wavefront I always will recive zero flag, not ?


                Make it constant so that no one able to overwrite.

              • Zeroing few bytes flag - what way better?
                Raistmer
                LoL, if I make it constant then how flag can be setted if needed condition is true ?
                Looks like I miss your idea completely What I need:
                if some event happened due kernel run (in my case it's some signal is found) flag setted to 1, no matter what workitem found it and no matter many workitems found it or just single one. But if no one found it - flag should remain zero. Then, after kernel launch CPU code can download only flag from GPU memory and decide if it needs whole data array to be downloaded too or not.
                But that flag should be initialized. If I will initialize it inside kernel - no synching between workgroups possible AFAIK. That is, some workitem in prev workgroup can find signal, then next workgroup will re-initialize flag that kills whole idea....
                  • Zeroing few bytes flag - what way better?
                    genaganna

                     

                    Originally posted by: Raistmer LoL, if I make it constant then how flag can be setted if needed condition is true ? Looks like I miss your idea completely What I need: if some event happened due kernel run (in my case it's some signal is found) flag setted to 1, no matter what workitem found it and no matter many workitems found it or just single one. But if no one found it - flag should remain zero. Then, after kernel launch CPU code can download only flag from GPU memory and decide if it needs whole data array to be downloaded too or not. But that flag should be initialized. If I will initialize it inside kernel - no synching between workgroups possible AFAIK. That is, some workitem in prev workgroup can find signal, then next workgroup will re-initialize flag that kills whole idea....


                    You can solve this by using atomics.   if you find singal, increment the value of flag by 1.

                  • Zeroing few bytes flag - what way better?
                    Raistmer
                    yes, but how I will set initial value? increment from random initial number will not help...
                    • Zeroing few bytes flag - what way better?
                      Raistmer
                      And if I need this in loop?
                      As I understand you propose:
                      1) create buffer
                      2) call kernel.
                      3) destroy buffer


                      Will it faster than
                      1) use mem copy to buffer (created once at beginning of program)
                      2) call kernel

                      ?
                      • Zeroing few bytes flag - what way better?
                        Raistmer
                        And third way:
                        1)call memset kernel
                        2)call kernel

                        What way will faster ?