2 Replies Latest reply on Aug 17, 2010 4:37 AM by genaganna

    clEnqueueNativeKernel user_func calling convention?

    danbartlett@ntlworld.com

      Hi, what is the calling convention for user_funcs used in clEnqueueNativeKernel?  By trial and error it seems to be cdecl (on MS Windows), but it should really state what convention should be used in the spec, because otherwise different implementors may use different calling conventions (like what happened with early version of OpenCL).  I'm using Delphi, and I assumed the calling convention would stdcall on MS Windows like the rest of the OpenCL functions, but using this it would crash after returning from the native kernel.

      Also related to this, in the new OpenGL extension "GL_ARB_debug_output", http://www.opengl.org/registry/specs/ARB/debug_output.txt it states that the callback function used there use the same calling convention as the GL functions, so maybe OpenCL should require this too? (If this were to happen, it would require AMD changing the calling convention currently used - they've only just enabled native kernels, so should probably be done ASAP if this happens).

      I've attached the code I used to test, in case I've made any mistake, or if anyone wants to see some example code using clEnqueueNativeKernel.

      type TMyArgs = record First: Integer; Last: Integer; ValToAdd: Integer; InputBuffer: PSingleArray; OutputBuffer: PSingleArray; end; procedure MyProc(var args: TMyArgs); cdecl; var I: Integer; begin for I := args.First to args.Last do args.OutputBuffer[i] := args.InputBuffer[i] + args.ValToAdd; end; procedure TForm30.Button5Click(Sender: TObject); var status: Tcl_int; I: Integer; args: TMyArgs; clmem_list: array[0..1] of Tcl_mem; args_mem_loc: array[0..1] of Pointer; begin // Fill our mem list clmem_list[0] := inputBuffer.Handle; clmem_list[1] := outputBuffer.Handle; // Find the addresses of the fields that hold mem objects args_mem_loc[0] := @args.InputBuffer; args_mem_loc[1] := @args.OutputBuffer; // We want to add 42 to items 0..5 args.First := 0; args.Last := 5; args.ValToAdd := 42; // note: clEnqueueNativeKernel will fill the // args.InputBuffer + args.OutputBuffer fields for us, // since we have told it the location of these fields by providing their address // in the args_mem_loc array. // Run native kernel status := clEnqueueNativeKernel(queue.Handle, @MyProc, @args, SizeOf(TMyArgs), 2, @clmem_list, @args_mem_loc, 0, nil, nil); if status<>CL_SUCCESS then Exit(); // Add 100 to items 8..10 args.First := 8; args.Last := 10; args.ValToAdd := 100; // Run native kernel status := clEnqueueNativeKernel(queue.Handle, @MyProc, @args, SizeOf(TMyArgs), 2, @clmem_list, @args_mem_loc, 0, nil, nil); if status<>CL_SUCCESS then Exit(); // Force the code to run status := clFinish(queue.Handle); if status<>CL_SUCCESS then Exit(); // Now display the results (CL_MEM_USE_HOST_PTR was used) for I := 0 to 9 do begin memo2.Lines.Add(FloatToStr(outputs[I])); end; end;

        • clEnqueueNativeKernel user_func calling convention?
          Illusio

          Interesting problem. I ended up with having to use cdecl as well -  from the C# side, so I guess there's a fair chance that this is a case where the standard should get a minor patch if it wants to be portable. Or at least it would be a good idea if the hardware vendors decided to agree on a calling convention for the major platforms.

          Anyone know if nVidia supports this function yet, and if so - are they incompatible with the current ATI driver?

           

            • clEnqueueNativeKernel user_func calling convention?
              genaganna

               

              Originally posted by: Illusio Interesting problem. I ended up with having to use cdecl as well -  from the C# side, so I guess there's a fair chance that this is a case where the standard should get a minor patch if it wants to be portable. Or at least it would be a good idea if the hardware vendors decided to agree on a calling convention for the major platforms.

               

              Anyone know if nVidia supports this function yet, and if so - are they incompatible with the current ATI driver?

               

               



              We are incorrectly using cdecl on windows. Probably we end up with stdcall in windows.