1 Reply Latest reply on Apr 15, 2015 2:57 PM by grodgers

    Parameter macro

    greenflops

      Is there some more exemple on how to use the macro SNK_INIT_LPARM ? (I don't really understand!)

      How to use it for exemple if the kernel parameters doesn't have the same size?

      With lparm->gdims and lparm->ldims?

        • Re: Parameter macro
          grodgers

          Thanks for giving the new version of snack a try.  And this is an great question.   This string macro is defined in the generated header file as follows. 

           

          /* This string macro is used to declare launch parameters set default values  */

          #define SNK_INIT_LPARM(X,Y) snk_lparm_t * X ; snk_lparm_t  _ ## X ={.ndim=1,.gdims={Y},.ldims={64},.stream=-1,.barrier=SNK_ORDERED,.acquire_fence_scope=2,.release_fence_scope=2,.num_edges_in=0,.edges_in=NULL,.rank=0} ; X = &_ ## X ;

           

          It is a bit of trickery just to declare a POINTER to a data structure instead of a raw data structure.   

           

          Lets assume you use it like this. .

           

          SNK_INIT_LPARM(lparm,N);

           

          Now your code has a pointer (lparm) to a data structure of type snk_lparm_t.   It has all the default settings that you may or may not like.  For example the default assumes dimension of 1 with ldims (local block dimensions of {64}.   The macro requires you to specify the number of global threads for dimension 1.   In the above use, you said to create N global threads with ldim of 64.  

           

          OK, so you may not like all these defaults.  What should you do?  But more importantly, you may not understand all the defaults.   What are all these new fields?   And will there be more in the future? 

          Overriding the defaults is easy.   Suppose you want a 2D grid.  The matmul example shows how the default values are changed for a 2d grid. 

           

          lparm->ndim=2;

          lparm->gdims[0]=C.hpad;

          lparm->gdims[1]=C.stride

          lparm->ldims[0]=BLOCK_SIZE;

          lparm->ldims[1]=BLOCK_SIZE;

           

          You make these changes before you call your kernel and you are all set.

           

          OK.  What are all those other fields (stream, barrier, etc.  );

           

          Think of any of the fields as attributes of the task you are about to launch.   We don't have good documentation of all these values and I promise there will be more.  So it is good to use a MACRO to declare your lparm so in the future you get new values automatically.     But let me introduce you to two very cool new fields.

           

          lparm->stream has a default value of -1.   The negative value says to the HSA system to launch the task synchronously which is how snack and cloc have always worked.   But you can change this to 0 or some number up to the current maximum (8) and your kernel will launched asynchronously.   That is, your host program will call your SNACK function and the SNACK function will return to your host function very quickly.   Now you can do something else on the CPU including launching other kernels asynchronously.    When you want your host to wait for all the kernels you launched to complete, you call the function stream_sync() with the same number you set lparm->stream to.  For example if yuo set lparm->stream=0;.  Later you call stream_sync(0); to wait.

           

          There is a new snack example called async_vecsum that demonstrates the use of asynchronous execution to implement a lock-less reduction (sum all the values of an array).   Please try this example and see how multiple streams are implemented and how this improves performance.  The example sets lparm->barrier = SNK_UNORDERED for better performance because the kernels that are launched do not need to run in order.  

           

          As for the other fields, I refer you to the generated source and how this information is used in the HSA packet.  Then ref to the HSA manuals to understand what that does.     Some fields are noted as not currently implemented.  This is an indication that we are going to add more to the generated HSA logic in the future to model new tasking capabilities. 

           

          You certainly could declare lparm without the string macro and initialize these fields yourself.  But you could get it wrong and we could (will) add new fields in the future.   So please use the macro, it is much easier.

           

          I hope this helps. 

           

          Greg

           

          PS,  Why a pointer to a structure and not a structure?   Some compilers intend to use SNACK and a pointer is a fixed size to put on the stack to reserve instead of putting sizeof(snk_lparm_t) which could change in the future.