11 Replies Latest reply on Jan 22, 2011 5:14 PM by krishnan

    Using stdcl to fork to gpu

    krishnan

      Hi Folks

      I've recently started using OpenCL and Brown Deer's stdcl. I wrote some code based on the NBody tutorial on the Brown Deer website and on the AMD Webinars, and things haven't been going according to plan. It seems to me that the processing isn't even shifting to the GPU, and my feeling is that there is some issue in the invocation of clfork() (which is a part of the stdcl).

      [Edit: Code formatted at the bottom]

      Now, this code should at the very least run and display "This kernel is executing" as many times as there are threads. However, I don't see it displayed even once, which leads me to believe that the code isn't even going into the kernel.

      Is my diagnosis right? If so, any thoughts as to how I can fix this?

      Thanks in advance for any help.

      --- //Some initial declarations int i,j,ops; int step; int N = 256; int nstep = 10; int nthread = 64; float dx = 1.0/N; float dt = 0.25*dx*dx; //allocate memory for 2 N*N arrays. cl_float* A = (cl_float*)clmalloc(stdgpu,N*N*sizeof(cl_float),0); cl_float* B = (cl_float*)clmalloc(stdgpu,N*N*sizeof(cl_float),0); /* Initialize A to some values */ /* ... */ /* End initialization */ // Create a handle to the kernel, `fd_par.cl' void* h = clopen(stdgpu,"fd_par.cl",CLLD_NOW); cl_kernel krn = clsym(stdgpu,h,"fd_par.cl",CLLD_NOW); // Set the range to the size of A clndrange_t ndr = clndrange_init1d(0,N*N,nthread); clmsync(stdgpu,0,A,CL_MEM_DEVICE|CL_EVENT_NOWAIT); for(step=0; step clarg_set_global(krn,0,A); clarg_set_global(krn,1,B); clfork(stdgpu,0,krn,&ndr,CL_EVENT_NOWAIT); clmsync(stdgpu,0,B,CL_MEM_HOST|CL_EVENT_NOWAIT); } ---- My kernel is: ---- __kernel void heat_kern( __global float* A, __global float* B, ){ //test if kernel is executing printf("This kernel is executing\n"); /* Some computation */ /* ... */ /* End computation */ } ----

        • Using stdcl to fork to gpu
          krishnan

          Minor Correction folks.

          The line that reads:

            cl_kernel krn = clsym(stdgpu,h,"fd_par.cl",CLLD_NOW);

          Should instead read:

              cl_kernel krn = clsym(stdgpu,h,"fd_par",CLLD_NOW);

          • Using stdcl to fork to gpu
            Meteorhead

            There are two things, which may cause your problem.

            My first guess would be that you use printf() inside your kernel, but do not explicitly use the pragma enabling the use of printf functions inside kernels.

            Second guess is more of an opinion, rather than a real guess. Using wrappers of C APIs to facilitate programming is worthwile so long as one does not encounter problems (such as this might be). Using wrappers that do many things instead of the programmer does make life easier, if one EXACTLY knows what the wrapper is doing and one FULLY understands what he should have written without the wrapper. Without either of the two, there is no telling where your program might go wrong.

            First advice is to insert the required pragma at the begininng of the kernel. The compiler should be giving an error if the pragma is not set, so I guess that is somewhere taken care of by stdcl. Second advice is, if you cannot solve the isue in a reasonable amount of time, forget about wrappers that someoneelse wrote, becuase debugging your programs will be a living hell. Take the time to use the standard API, and when you get really bored of the extensive coding, make a wrapper of your own which you are fully aware of. (To avoid issues such as the kernel you wrote should not be able to compile at all (,as far as I see).)

              • Using stdcl to fork to gpu
                krishnan

                 

                My first guess would be that you use printf() inside your kernel, but do not explicitly use the pragma enabling the use of printf functions inside kernels.


                Thanks. I'm sorry if this is a naive question, but what pragram should I use to explicitly enable stdio functions inside the kernel? To be honest, I don't need any stdio functions in the kernel at all. I just wanted to use them to test whether I am getting inside the kernel and there are cleaner ways to do that.

                 

                 

                Second guess is more of an opinion, rather than a real guess. Using wrappers of C APIs to facilitate programming is worthwile so long as one does not encounter problems (such as this might be). Using wrappers that do many things instead of the programmer does make life easier, if one EXACTLY knows what the wrapper is doing and one FULLY understands what he should have written without the wrapper. Without either of the two, there is no telling where your program might go wrong.

                 

                First advice is to insert the required pragma at the begininng of the kernel. The compiler should be giving an error if the pragma is not set, so I guess that is somewhere taken care of by stdcl. Second advice is, if you cannot solve the isue in a reasonable amount of time, forget about wrappers that someoneelse wrote, becuase debugging your programs will be a living hell. Take the time to use the standard API, and when you get really bored of the extensive coding, make a wrapper of your own which you are fully aware of. (To avoid issues such as the kernel you wrote should not be able to compile at all (,as far as I see).)



                 

                That's really good advice Meteorhead, thanks. I started using the stdcl wrappers because I saw them in the NBody tutorial and assumed that they are near standard. I'm happier using the standard API myself. Do you know where I could find nice tutorials that would explain the use of the standard API? Ben Gaster's Hello World tutorial is written using C++ and I'd prefer to write in C.

                Thanks again

                  • Using stdcl to fork to gpu
                    nou

                    see OpenCL specification to enable extension in kernel. enable cl_amd_printf

                      • Using stdcl to fork to gpu
                        Meteorhead

                        I do not know how good tutorials are available on the web. I started with OpenCL a bit more than a year ago and I started with hacking the Hello World example from StreamSDK further and further from it's original outlook. It is simple enough to understand the basics (platform, contexts, buffers, etc.) and then I started modifying the kernel, changing reutrn types, adding new buffers...

                        Since there were no good tutorials when OpenCL started, I read the spec pdf a LOT, forums, code fragments from differenent places... but mainly the spec pdf contains what the API is capable of. One keeps that in mind, and when you come across a problem of your own, you know what means are there to achieve our goal.

                        I'm sorry that I don't know any good tutorials, but I found hacking step-by-step and reading forums a good way of learning. Starting off with a wrapper is definately not a good way to understand something new. stdcl does sound to be standard (and wants to be), but is not. The API is still changing and I do not wish to rely on someone else's work, who might not continue to update the wrapper according to OpenCL 1.2 (for eg.).

                        http://developer.amd.com/gpu/ATIStreamSDK/assets/opencl-1.1-rev33.pdf

                        http://developer.amd.com/gpu/ATIStreamSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf

                        These two documents are the most useful and the ones to consult first whenever you encounter a problem. If neither gives a good solution, then one starts reading forums, and when nothing is related, then comes posting to forums.

                        Anyhow, good luck, feel free to ask should you come across something nasty, and if someone knows good tutorials, feel free to post. (There is a tutorial section under the StreamSDK documents)

                        http://developer.amd.com/GPU/ATISTREAMSDK/DOCUMENTATION/Pages/default.aspx

                         

                          • Using stdcl to fork to gpu
                            krishnan

                            Thanks nou and Meteorhead! I decided to follow Meteorhead's suggestions and rework the whole code in OpenCL without any C++ bindings or stuff like stdcl. I used the template given in a simple array multiplication sample given in the docs associated with:

                            http://developer.amd.com/zones/OpenCLZone/courses/pages/Introduction-OpenCL-Programming-May-2010.aspx (See the Programming Exercise Material)

                            The minimal error handling in the sample code helps me understand where I'm going wrong. At first, my kernel wasn't building. Now it is, but the output isn't what I want it to be. I'm going to post the doubts I have on that in a separate thread.

                            Thanks again!

                              • Using stdcl to fork to gpu
                                dar

                                Krishnan, the problem was that you did not use the pragma, as was pointed out.  For that reason its good to test your kernels with the llc tool that AMD has distributed, before run-time, to make sure it will compile.

                                I generally agree with the comments in this thread regarding the need to understand what an API is doing and for that reason I highly recommend the OpenCL standard specification as the starting point for anyone trying to use OpenCL. 

                                However, in my opinion, OpenCL is far too verbose to be used directly in application development and some of the semantics are better cast in more traditional UNIX/C programming style.  Memory buffers are a good example - these should be memory allocations in my opinion - that is how most programmers have managed memory for decades.  This is a matter of opinion as with any interface, but this was the rationale for the design of STDCL.  some programmers will like it, some will not.  I personally find the C++ bindings to be unappealing.  Others think they are wonderful.

                                I can tell you that having worked with customers converting CUDA code to OpenCL, the verbose nature of the latter is a significant turn-off and the STDCL interface has proven useful in putting OpenCL on par with CUDA in terms of programmer expectations.  This is in the context of HPC applications.

                                As far as STDCL goes, there is no attempt to hide what is going on from the programmer.  It is open-source and the code for all of the API calls can be found at http://www.github.com/browndeer/coprthr .  You can browse the source code online.

                                The latest release candidate for v1.1 s now available.  Provided in the hope that it will be useful to some programmers.

                                -DAR

                                  • Using stdcl to fork to gpu
                                    Meteorhead

                                    dar,

                                    by "hiding" things from the programmer I did not mean it literally, but in the sense that wrappers such as stdcl aim at making the code easier to read and put it in a style more familiar to programmers. This definately involves less amount of code, and it is inevitable to either lose functionality by having API calls with less paramteres, or making automatic assumptions that seem general each time anyone uses that certain API call.

                                    In my opinion, I do not like to read other peoples codes, to see what the library does exactly. I do not say it is not useful, but I find more hazards in it than explicit benefits. Even I might start to use it, if I do not find a neat way to solve error handling which causes code to be really long. I have opened a topic recently regarding this problem, so I'm open to any suggestions to make a customizable and versatile error handling function that requires the least number of lines in the code.

                                      • Using stdcl to fork to gpu
                                        dar

                                        Meteorhead,

                                        Yes, I dislike reading other people's code also, I must admit, although there are some exceptions.  By "hide" I just meant that there is no attempt to hide straighforward code behind smoke-and-mirrors to make it seem like magic, and to point out that the code is open-source.  Its just an interface and a rather dull one at that since its inspration comes from basic UNIX calls that have been around for a very long time.  I do not like complex interfaces that attempt to automate too much.   As for loss of functionality or efficiency, that was also a design goal - to avoid it.  Many people have suggested that they cannot do X only to find out that in fact they can.  If there are instances where the interface fails, comments are surely welcome.  Its an active project.  

                                        Rather than read the source code, there is a reference manual.  http://www.browndeertechnology.com/docs/stdcl-manual.html

                                        Error handling.  This may not be what you are looking for, but I avoid inserting error checking entirely during development and use a tool that we created over a year ago.  cltrace.  Its designed to behave exactly like strace, but for OpenCL.  It intercepts every host-code call and reports arguments and errors.  It will time each call as an option.  No code instrumentation is required.  You just run your code with it.  Its more of an expert tool in that you must know something is not working, and then you can use it to dump all the error codes.  Its relatively easy to scan the output and see where errors start to appear.  Not very sophisticated but could be used as a starting point for a more user friendly tool.  Something we are considering.  If interested its, free.  Go to the github link.

                                        Thanks for your comments.  I appreciate your point of view regarding the use of APIs. 

                                        -DAR

                                          • Using stdcl to fork to gpu
                                            Meteorhead

                                            That sort of error handling is a good idea, although I was looking for something more "incorporated". This is good for development, but I come across the problem (unfortunately quite often) that I develop something for Radeons, and I get some error with the NV implementation. Or some problem only arises under Windows...

                                            This is the curse if somebody aims at making cross-vendor and cross-platform code. But thanks for the idea.

                                        • Using stdcl to fork to gpu
                                          krishnan

                                          Hey dar

                                          Thanks for your reply. I do like the stdcl approach since it seems a little more familiar to me. That being said, I have the (dis)advantage of not migrating to OpenCL from CUDA. I'm very new to HPC and spent some time talking to folks who know more about the field than I do about whether to start with CUDA or OpenCL. The broad consensus was that OpenCL would probably be the future.

                                          So while the verbosity of OpenCL does take some getting used to, I doesn't hurt me as much as it might hurt someone who shifted from using CUDA. And while stdcl does seem to make things much simpler, I think it's a good idea for me as a beginner to play with and get to know the base API while I'm dealing with small problems. That'll probably help me shift to something like stdcl (if I still feel the need) when I begin to tackle much bigger problems.