14 Replies Latest reply on Feb 10, 2010 9:22 AM by Fr4nz

    Top 4 Things Wanted for Next Quarterly Release

    jcpalmer

      I am speculating on the quarterly part, but for the sake of the actual things, let's say I am in the ball park.

      1.  Image support (firmly No. 1; the rest much further back & more of a toss up).  There is simply no work around for this one.

      2. Headless, 1U form factor viability.  Said another way, no longer require the card to be attached to a display.

      3. SDK optional.  Everything needed to run a pre-compiled program is on the display driver.

      4.  Multi platform actually working.  Earlier this would have been higher, but my priorities have changed.  

        • Top 4 Things Wanted for Next Quarterly Release
          nou

          what i want in next release.

          1. more than 256MB of memory. and allocate more buffers than is device memory. on CPU i can allocate 4*1GB memory buffer even as it report 3GB as maximum device memory.

          2. OpenCL runtime library put into catalyst and create runtime instalator for people without ATI GPU which want use CPU only

          3. double support

           

          jcpalmer: i can run OpenCL application on linux remotely without attached monitor.

            • Top 4 Things Wanted for Next Quarterly Release
              Mikey

              I'd like to see:

              1) support for cl_khr_byte_addressable_store on GPU

              2) support for images

              but of course the more new stuff, the better

                • Top 4 Things Wanted for Next Quarterly Release
                  davibu

                  1) a GPU profiler for Linux;

                  2) no more crash/freeze/kernel faults if you do something wrong in your code. It is quite a pain to reboot and reopen everything;

                  3) image support;

                  4) headless support (in case doesn't yet work) and full access to all memory for additional cards dedicated only to OpenCL;

                   

                • Top 4 Things Wanted for Next Quarterly Release
                  Fr4nz

                   

                  Originally posted by: nou what i want in next release.

                   

                  1. more than 256MB of memory. and allocate more buffers than is device memory. on CPU i can allocate 4*1GB memory buffer even as it report 3GB a

                   

                  Hey nou, point 1 violates OpenCL specs...

                  About "wanted things": a profiler for Linux would be REALLY appreciated...

                    • Top 4 Things Wanted for Next Quarterly Release
                      nou

                      Fr4nz: could you point me where it violate spec?

                      what i want is that for example i have system with 8GB of RAM and GPU have 256MB global memory. i have huge data workset for example around 6GB. so i split this data to chunks with 256MB each. then enqueue kernel with one chunk at the time.

                      clSetKernelArg(kernel, 0, 256*1024*1024, mem[0]);
                      clEnqueueNDRange(queue, kernel, ...);
                      clSetKernelArg(kernel, 0, 256*1024*1024, mem[1]);
                      clEnqueueNDRange(queue, kernel, ...);
                      clSetKernelArg(kernel, 0, 256*1024*1024, mem[2]);
                      clEnqueueNDRange(queue, kernel, ...);

                      OpenCL runtime will ensure loading apropiate data into device memory from host memory.

                      what about multi GPU system. when i have two or more device each 256MB memory i want run 2*256MB data chunk at the same time. this is another example why OpenCL runtime should dynamically load memory object into device memory as it is necessary.

                      and when i enqueue kernel with memory object which size sum execeed global memory size then it should return CL_MEM_OBJECT_ALLOCATION_FAILURE or CL_OUT_OF_RESOURCE.

                      i tried with OpenGL create 500 - 1024x1024 textures which is total 2GB. and with GL_ATI_meminfo follow memory usage on card. and free memory begin decrease after that as i use that textures in draw.  i am wrong when i except similiar behaviour from OpenCL?

                      and yes Linux profiler will be appreciated.

                  • Top 4 Things Wanted for Next Quarterly Release
                    MicahVillmow
                    koveras,
                    Can you please clarify what you mean by gcc support? Our samples build with GCC on linux and compiling OpenCL applications should have no problem with GCC.
                      • Top 4 Things Wanted for Next Quarterly Release
                        Fr4nz

                         

                        Originally posted by: MicahVillmow koveras, Can you please clarify what you mean by gcc support? Our samples build with GCC on linux and compiling OpenCL applications should have no problem with GCC.


                         

                        He's surely asking to support GCC also under Windows. And it wouldn't be a bad idea IMHO...

                          • Top 4 Things Wanted for Next Quarterly Release
                            bubu

                            For me, this is the order:

                            1. Fix the OpenCL installer because I currently can't install the ATI OpenCL Sdk !!!

                             

                            2. Image support. The SDK is almost useless without that!

                             

                            3. Documentation! I would like to see more visual schematics like the CUDA memory colaescing patterns, shared-memory bank conflicts, cache policies, etc! Put special interest in optimization techniques!

                            For example,

                            http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_Performance_Notes.pdf

                             

                            is a good start but lack info like: how much cache have the textures? How is it organized? Is a texture2d_t(float3) efficient? Would be better a texture2d_t(float)? Is is better to fetch a float4 image or 4x times a float one? How should I access the constant buffers for optimal access? Sequentially? Randomly? Does the local memory have a "broadcast" mechanism like CUDA's one? etc etc

                             

                            Other thing I want to see there is a clear table showing the different R600, 700 and 800 capabilities, working group/wavefront sizes, etc... Exactly like the CUDA's one where you can find a table with the # of multiprocessors, compute capabilities, shared memory size, etc...

                             

                            It's critical to get very good documentation when you start a new API. If not, we will be lost, completely.

                             

                            Btw... idea: "GPU Gems" book but in ATI's way with a lot or pages dedicated to DX11, DirectCompute and OpenCL. Write that book, NOW!

                             

                            4. I need to allocate 1Gb of VRAM, not 128Mb max as allows currently! If my card has 2Gb... why I cannot allocate that quantity(more or less, excluding the framebuffer) in 1D linear buffer? OpenGL 3.2 specifies "jumbo" textures too... why I cannot get that in OpenCL???

                             

                            5. A debugger. ( yes, that includes code running on the GPU, not only on the CPU! ). I would do a stand-alone debugger .exe. You could integrate it into Eclipse, VS, Xcode, etc, but that would be more work for you. Just create a Qt/wxWidgets portable standalone debugger and voilá. Something like DX's PIX, where I could even watch the textures!

                            Btw, idea: add a DEBUG_FLAG like DX10 and OpenGL 3.2 do when you create a "context". Output validation messages via OutputDebugString(), etc while you're debugging in VS the program.

                             

                            6. Improve the profiler ( show more data, more warnings, etc... ). Add also a static code analyzer like the VS one: detect possible branch flushes/divergencies, incorrect memory coalescing, accuracy loss due to casts, launching a kernel without a size divisible by the wavefront size, read/write from/to unaligned memory, etc...

                             

                            7. Multicore CPU support. Get those Phenom 2's cores hot omg!

                             

                            8. A virtual memory/paging system. We have AGP and PCI express... use them! I personally would add a modifier to each buffer... something like "shared" to indicate the memory could be shared by the GPU/host.

                            Sorry to say but the average 512Mb installed on the GPU is faaaaaaaaar from enough to make some computations! That 400M-polys Shrek 4 model won't fit the VRAM for my GPGPU ray tracing renderer, even with a 4Gb VGA, nope! But... it will fit my Phenom 2 with x64 and 32Gb of DDR3 mapped through the PCI-express... it will be slower... but it will work!

                             

                            9. Give us a driver/SDK update EACH month ( until is more or less bug-free and well optimized ). I personally don't like to be halted waiting that critical bug to be solved...

                              • Top 4 Things Wanted for Next Quarterly Release
                                genaganna

                                 

                                Originally posted by: bubu For me, this is the order:

                                1. Fix the OpenCL installer because I currently can't install the ATI OpenCL Sdk !!!

                                I hope you are able to install manually msi's separately.

                                3. Documentation! I would like to see more visual schematics like the CUDA memory colaescing patterns, shared-memory bank conflicts, cache policies, etc! Put special interest in optimization techniques!

                                For example,http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_Performance_Notes.pdf

                                is a good start but lack info like: how much cache have the textures? How is it organized? Is a texture2d_t efficient? Would be better a texture2d_t? How should I access the constant buffers for optimal access? Sequentially? Randomly? Does the local memory have a "broadcast" mechanism like CUDA's one? etc etc

                                Btw... idea: "GPU Gems" book but in ATI's way: OpenCL gems.

                                It's critical to get very good documentation when you start a new API. If not, we will be lost, completely.

                                Presently image are not support that is why nothing explained about textures.  Performance document will be improved every release.

                                7. Multicore CPU support.

                                OpenCL supports Multicore CPU.

                              • Top 4 Things Wanted for Next Quarterly Release
                                genaganna

                                 

                                Originally posted by: Fr4nz
                                Originally posted by: MicahVillmow koveras, Can you please clarify what you mean by gcc support? Our samples build with GCC on linux and compiling OpenCL applications should have no problem with GCC.


                                 

                                He's surely asking to support GCC also under Windows. And it wouldn't be a bad idea IMHO...

                                 

                                Fr4nz,

                                        Few users are able to run OpenCL using GCC under windows. Please see following post

                                        http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=122916&forumid=9