6 Replies Latest reply on Jun 1, 2009 9:11 AM by sarnath

    How do I write a Kernel?

    sarnath
      Newbie question

      I am a newbie and would like to know how to write a kernel purely on top of CAL.

      I understand that Brook+ is the high level interface for writing kernels.

      I want to write directly on top of CAL. Is it done in assembly???

      I cant figure out much from the Streams user guide. I find it very difficult to decipher the contents.

        • How do I write a Kernel?
          ryta1203

          Use the CAL API with IL language. Look at the samples and the documentation, you're not going to get ANYTHING about IL from the Users Guide (sadly). AMD is not very good at documenting or supporting the things they do.

            • How do I write a Kernel?
              sarnath

              Ryta1203,

              Thanks for the quick answer.

              I am basically a CUDA guy and am tryin to understand AMD Streams.

              I understand that a kernel instance is basically for an index in the output stream. So, what if there are multiple Output streams?

              And, Is there a possibility that I can access neighbouring elements in a stream. Like gettng indexof(stream) and then doing a ++ to some component and accessing the stream?

              I am just finding it too limiting to concentrate only on one element in a kernel instance. ( I understand that that is why we have streamGroup and other primitives to group-ify elements of a stream)

              Appreciate your time!

              Thanks & Best Regards,

              Sarnath

                • How do I write a Kernel?
                  gaurav.garg

                  If you are using color buffers (regular streams in Brook+) as output, the position of output is always determined by kernel instance.

                  If you want to write to random places and index them yourself, you should use global buffers (scatter stream in Brook+)

                    • How do I write a Kernel?
                      sarnath

                      Thanks for this input. I have not read things fully (as u can c).

                      Few more questions please. Appreciate yur time,

                      1. So, when I use global buffers like b[][] , they need to be copied over to GPU memory using CAL APIs, is that right?

                      2. The guide always talks about PCI-e memory and local memory. What are the typical sizes of these? And, What exactly are they? In CUDA world, it is only global memory/shared memory. We know global is the big fat device RAM and the shared memory is like the L1 cache for the gpu. What are the equivalents in AMD? (Dont mistake my cuda thing... If I can map concepts, it would b easy to learn)

                      3. Is it possible to write a brooks kernel that does NOT take a single stream<> argument? Ideally, I would like to configure the offset,size using the domain control and use plain buffers. This way, I can access whatever I want.

                      4. Can a brooks kernel access pinned memory? i.e. I dont want to copy stuff onto GPU. But rather want GPU to directly access my buffers? If so, is this supported on all hardware?

                      5. Can I call a brooks kernel from a CPP file (using a kernel wrapper) ?

                      Thanks for your time,

                      Best Regards,

                      Sarnath

                        • How do I write a Kernel?
                          gaurav.garg

                          1. Global memory resides in device RAM and if you want to read/write memory data, you need to use CAL APIs for data transfer.

                          2. PCI-e memory is similar to CUDA mapped memory(introduced in CUDA 2.2) and it resides on host. Local memory means  GPU local memory (similar to CUDA global memory). LDS (local data share) is equivalent to CUDA shared memory

                          3. You can write such brook kernels.

                          4. ATI stream computing user guide uses pinned for a different functionality. But, PCI-e memory that CAL mentions can be directly used in kernel.

                          5. Yes, you can do that. All the sampler under $(BROOKROOT)\samples\CPP do that.

                            • How do I write a Kernel?
                              sarnath

                              Dear Gaurav,

                              Thanks for your time! I get a clear picture now!

                              The PCI-E memory was too much confusing. Now, its all clear. You seem to know pretty much about CUDA. Thanks a lot!!!

                              I think Brooks can be executed with a CPU backend as well (just like device emulation in CUDA). I have just downloaded  a copy.

                              I think its gonna be fun!

                              Thanks,

                              Best Regards,

                              Sarnath