Archives Discussions

jcpalmer · ‎02-06-2010

I am speculating on the quarterly part, but for the sake of the actual things, let's say I am in the ball park.

1. Image support (firmly No. 1; the rest much further back & more of a toss up). There is simply no work around for this one.

2. Headless, 1U form factor viability. Said another way, no longer require the card to be attached to a display.

3. SDK optional. Everything needed to run a pre-compiled program is on the display driver.

4. Multi platform actually working. Earlier this would have been higher, but my priorities have changed.

nou · ‎02-07-2010

what i want in next release.

1. more than 256MB of memory. and allocate more buffers than is device memory. on CPU i can allocate 4*1GB memory buffer even as it report 3GB as maximum device memory.

2. OpenCL runtime library put into catalyst and create runtime instalator for people without ATI GPU which want use CPU only

3. double support

jcpalmer: i can run OpenCL application on linux remotely without attached monitor.

Mikey · ‎02-07-2010

I'd like to see:

1) support for cl_khr_byte_addressable_store on GPU

2) support for images

but of course the more new stuff, the better

davibu · ‎02-07-2010

1) a GPU profiler for Linux;

2) no more crash/freeze/kernel faults if you do something wrong in your code. It is quite a pain to reboot and reopen everything;

3) image support;

4) headless support (in case doesn't yet work) and full access to all memory for additional cards dedicated only to OpenCL;

Fr4nz · ‎02-07-2010

Originally posted by: nou what i want in next release.

1. more than 256MB of memory. and allocate more buffers than is device memory. on CPU i can allocate 4*1GB memory buffer even as it report 3GB a

Hey nou, point 1 violates OpenCL specs...

About "wanted things": a profiler for Linux would be REALLY appreciated...

nou · ‎02-07-2010

Fr4nz: could you point me where it violate spec?

what i want is that for example i have system with 8GB of RAM and GPU have 256MB global memory. i have huge data workset for example around 6GB. so i split this data to chunks with 256MB each. then enqueue kernel with one chunk at the time.

clSetKernelArg(kernel, 0, 256*1024*1024, mem[0]);
clEnqueueNDRange(queue, kernel, ...);
clSetKernelArg(kernel, 0, 256*1024*1024, mem[1]);
clEnqueueNDRange(queue, kernel, ...);
clSetKernelArg(kernel, 0, 256*1024*1024, mem[2]);
clEnqueueNDRange(queue, kernel, ...);

OpenCL runtime will ensure loading apropiate data into device memory from host memory.

what about multi GPU system. when i have two or more device each 256MB memory i want run 2*256MB data chunk at the same time. this is another example why OpenCL runtime should dynamically load memory object into device memory as it is necessary.

and when i enqueue kernel with memory object which size sum execeed global memory size then it should return CL_MEM_OBJECT_ALLOCATION_FAILURE or CL_OUT_OF_RESOURCE.

i tried with OpenGL create 500 - 1024x1024 textures which is total 2GB. and with GL_ATI_meminfo follow memory usage on card. and free memory begin decrease after that as i use that textures in draw. i am wrong when i except similiar behaviour from OpenCL?

and yes Linux profiler will be appreciated.

spectral · ‎02-08-2010

Hi,

It will be wonderful to have a debuger ?

NVidia is working on Nexus... it will be fine to have something similar, maybe into Visual Studio 2008 too ?

Because now writing OpenCL code take 10x more time when the algorithms are a little bit complex !

koveras · ‎02-08-2010

For me it's only one thing that I need:

support to gcc compiler.

I don't know why ATI doesn't give support to that compiler, because it is too extended...

afo · ‎02-09-2010

Could be possible to release a hotfix that enable us to use more than 1/4th of the video board memory?

Is really annoying to be locked to use at most only 1/4th of the video memory no matter the number or size of the buffers created with clCreateBuffer...

Alfonso

MicahVillmow · ‎02-08-2010

koveras,
Can you please clarify what you mean by gcc support? Our samples build with GCC on linux and compiling OpenCL applications should have no problem with GCC.

Fr4nz · ‎02-09-2010

Originally posted by: MicahVillmow koveras, Can you please clarify what you mean by gcc support? Our samples build with GCC on linux and compiling OpenCL applications should have no problem with GCC.

He's surely asking to support GCC also under Windows. And it wouldn't be a bad idea IMHO...

bubu · ‎02-10-2010

For me, this is the order:

1. Fix the OpenCL installer because I currently can't install the ATI OpenCL Sdk !!!

2. Image support. The SDK is almost useless without that!

3. Documentation! I would like to see more visual schematics like the CUDA memory colaescing patterns, shared-memory bank conflicts, cache policies, etc! Put special interest in optimization techniques!

For example,

http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_Performance_Notes.pdf

is a good start but lack info like: how much cache have the textures? How is it organized? Is a texture2d_t(float3) efficient? Would be better a texture2d_t(float)? Is is better to fetch a float4 image or 4x times a float one? How should I access the constant buffers for optimal access? Sequentially? Randomly? Does the local memory have a "broadcast" mechanism like CUDA's one? etc etc

Other thing I want to see there is a clear table showing the different R600, 700 and 800 capabilities, working group/wavefront sizes, etc... Exactly like the CUDA's one where you can find a table with the # of multiprocessors, compute capabilities, shared memory size, etc...

It's critical to get very good documentation when you start a new API. If not, we will be lost, completely.

Btw... idea: "GPU Gems" book but in ATI's way with a lot or pages dedicated to DX11, DirectCompute and OpenCL. Write that book, NOW!

4. I need to allocate 1Gb of VRAM, not 128Mb max as allows currently! If my card has 2Gb... why I cannot allocate that quantity(more or less, excluding the framebuffer) in 1D linear buffer? OpenGL 3.2 specifies "jumbo" textures too... why I cannot get that in OpenCL???

5. A debugger. ( yes, that includes code running on the GPU, not only on the CPU! ). I would do a stand-alone debugger .exe. You could integrate it into Eclipse, VS, Xcode, etc, but that would be more work for you. Just create a Qt/wxWidgets portable standalone debugger and voilá. Something like DX's PIX, where I could even watch the textures!

Btw, idea: add a DEBUG_FLAG like DX10 and OpenGL 3.2 do when you create a "context". Output validation messages via OutputDebugString(), etc while you're debugging in VS the program.

6. Improve the profiler ( show more data, more warnings, etc... ). Add also a static code analyzer like the VS one: detect possible branch flushes/divergencies, incorrect memory coalescing, accuracy loss due to casts, launching a kernel without a size divisible by the wavefront size, read/write from/to unaligned memory, etc...

7. Multicore CPU support. Get those Phenom 2's cores hot omg!

8. A virtual memory/paging system. We have AGP and PCI express... use them! I personally would add a modifier to each buffer... something like "shared" to indicate the memory could be shared by the GPU/host.

Sorry to say but the average 512Mb installed on the GPU is faaaaaaaaar from enough to make some computations! That 400M-polys Shrek 4 model won't fit the VRAM for my GPGPU ray tracing renderer, even with a 4Gb VGA, nope! But... it will fit my Phenom 2 with x64 and 32Gb of DDR3 mapped through the PCI-express... it will be slower... but it will work!

9. Give us a driver/SDK update EACH month ( until is more or less bug-free and well optimized ). I personally don't like to be halted waiting that critical bug to be solved...

genaganna · ‎02-10-2010

Originally posted by: bubu For me, this is the order:

1. Fix the OpenCL installer because I currently can't install the ATI OpenCL Sdk !!!

I hope you are able to install manually msi's separately.

3. Documentation! I would like to see more visual schematics like the CUDA memory colaescing patterns, shared-memory bank conflicts, cache policies, etc! Put special interest in optimization techniques!
For example,http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_Performance_Notes.pdf
is a good start but lack info like: how much cache have the textures? How is it organized? Is a texture2d_t efficient? Would be better a texture2d_t? How should I access the constant buffers for optimal access? Sequentially? Randomly? Does the local memory have a "broadcast" mechanism like CUDA's one? etc etc
Btw... idea: "GPU Gems" book but in ATI's way: OpenCL gems.
It's critical to get very good documentation when you start a new API. If not, we will be lost, completely.

Presently image are not support that is why nothing explained about textures. Performance document will be improved every release.

7. Multicore CPU support.

OpenCL supports Multicore CPU.

genaganna · ‎02-10-2010

Originally posted by: Fr4nz
Originally posted by: MicahVillmow koveras, Can you please clarify what you mean by gcc support? Our samples build with GCC on linux and compiling OpenCL applications should have no problem with GCC.

He's surely asking to support GCC also under Windows. And it wouldn't be a bad idea IMHO...

Fr4nz,

Few users are able to run OpenCL using GCC under windows. Please see following post

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=122916&forumid=9

Fr4nz · ‎02-10-2010

Originally posted by: genaganna

Few users are able to run OpenCL using GCC under windows. Please see following post

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=122916&forumid=9

I know, but it would be nice to have an "official" support, if possible.

Oh, I've a question: do you plan to release a profiler for Linux in the future? Maybe a plugin for Eclipse...? It would be wonderful...

Archives Discussions

Top 4 Things Wanted for Next Quarterly Release