Hi, what is the calling convention for user_funcs used in clEnqueueNativeKernel? By trial and error it seems to be cdecl (on MS Windows), but it should really state what convention should be used in the spec, because otherwise different implementors may use different calling conventions (like what happened with early version of OpenCL). I'm using Delphi, and I assumed the calling convention would stdcall on MS Windows like the rest of the OpenCL functions, but using this it would crash after returning from the native kernel.
Also related to this, in the new OpenGL extension "GL_ARB_debug_output", http://www.opengl.org/registry/specs/ARB/debug_output.txt it states that the callback function used there use the same calling convention as the GL functions, so maybe OpenCL should require this too? (If this were to happen, it would require AMD changing the calling convention currently used - they've only just enabled native kernels, so should probably be done ASAP if this happens).
I've attached the code I used to test, in case I've made any mistake, or if anyone wants to see some example code using clEnqueueNativeKernel.
type TMyArgs = record First: Integer; Last: Integer; ValToAdd: Integer; InputBuffer: PSingleArray; OutputBuffer: PSingleArray; end; procedure MyProc(var args: TMyArgs); cdecl; var I: Integer; begin for I := args.First to args.Last do args.OutputBuffer := args.InputBuffer + args.ValToAdd; end; procedure TForm30.Button5Click(Sender: TObject); var status: Tcl_int; I: Integer; args: TMyArgs; clmem_list: array[0..1] of Tcl_mem; args_mem_loc: array[0..1] of Pointer; begin // Fill our mem list clmem_list[0] := inputBuffer.Handle; clmem_list[1] := outputBuffer.Handle; // Find the addresses of the fields that hold mem objects args_mem_loc[0] := @args.InputBuffer; args_mem_loc[1] := @args.OutputBuffer; // We want to add 42 to items 0..5 args.First := 0; args.Last := 5; args.ValToAdd := 42; // note: clEnqueueNativeKernel will fill the // args.InputBuffer + args.OutputBuffer fields for us, // since we have told it the location of these fields by providing their address // in the args_mem_loc array. // Run native kernel status := clEnqueueNativeKernel(queue.Handle, @MyProc, @args, SizeOf(TMyArgs), 2, @clmem_list, @args_mem_loc, 0, nil, nil); if status<>CL_SUCCESS then Exit(); // Add 100 to items 8..10 args.First := 8; args.Last := 10; args.ValToAdd := 100; // Run native kernel status := clEnqueueNativeKernel(queue.Handle, @MyProc, @args, SizeOf(TMyArgs), 2, @clmem_list, @args_mem_loc, 0, nil, nil); if status<>CL_SUCCESS then Exit(); // Force the code to run status := clFinish(queue.Handle); if status<>CL_SUCCESS then Exit(); // Now display the results (CL_MEM_USE_HOST_PTR was used) for I := 0 to 9 do begin memo2.Lines.Add(FloatToStr(outputs)); end; end;