cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

island
Journeyman III

Suggest Feature you want in AMD APP

Newbie here.

Assembler support for 5xxx/6xxx.

I've developed signal processing applications for another manufacturer's GPU chips. About half of the code, all of the important kernels, were written in assembler.

I've just dropped by to investigate AMD GPUs and see whether I can do something similar, but dissappointed to learn that I can't get closer than IL (though still faintly hoping I've misunderstood). Without proper assembler support, I wouldn't even bother to try to use these processors.

 

0 Likes
Atmapuri
Journeyman III

Suggest Feature you want in AMD APP

Hi!

To reduce overhead of buffers when it comes to Fusion and CPU implementations of Open CL (which uses common shared host memory), it would make sense to allow the programmer to completely bypass the clCreateBuffer and allow a HOST_PTR to be passed directly to the clSetKernelArg. Currently there is overhead of several miliseconds associated with buffer handling (even for minute sizes). clSetKernelArg has an overhead of 1us. Are there some special reasons why this should not be possible?

It is my understanding that buffers were introduced to handle split memory hardware configuration. But when memory is shared, there is no purpose in the buffer part of Open CL API. (and its associated overhead).

Thanks!
Atmapuri

0 Likes
Meteorhead
Challenger

Suggest Feature you want in AMD APP

clCreateBuffer need only be executed once every application run, after that it is only a question of copying and maping. A few milisecond is acceptable in my opinion. clSetKernelArg of 1us is as good as it can get, as that has to imply to reinterpret a pointer. Even if it is on same physical memory, kernels use different addresses (which might even be virtual), and thus such calls have to make it through the API (I believe at least for a dozen other reasons too). 1us is about the time of 1 clock tick of a CPU, and that's about as fast as it can get.

0 Likes
Atmapuri
Journeyman III

Suggest Feature you want in AMD APP

Originally posted by: Meteorhead clCreateBuffer need only be executed once every application run, after that it is only a question of copying and maping. A few milisecond is acceptable in my opinion.


It is copying and maping that takes a few ms.  The size of the setup defines which algorithms can be accelerated with Open CL. If you only think i size of HD images, then that is indeed fine.

Originally posted by: MeteorheadclSetKernelArg of 1us is as good as it can get, as that has to imply to rinterpret a pointer. Even if it is on same physical memory, kernels use different addresses (which might even be virtual), and thus such calls have to make it through the API (I believe at least for a dozen other reasons too). 1us is about the time of 1 clock tick of a CPU, and that's about as fast as it can get.


If the memory is same physical memory it is same memory within the same address space. THere is no other way to put it 🙂

If some address is virtual that affects only GPU devices which dont have common memory.


Thanks!
Atmapuri

0 Likes
morganritchie
Journeyman III

Suggest Feature you want in AMD APP

Thanks for this informative piece! cheers..;)

0 Likes
morganritchie
Journeyman III

Suggest Feature you want in AMD APP

Thanks for this informative piece! cheers..;)

0 Likes
LucasCampos
Journeyman III

Suggest Feature you want in AMD APP

I'd like to see some built-in random number generator, with a few distributions, such as gaussian and uniform

0 Likes
xyke
Journeyman III

Suggest Feature you want in AMD APP

Originally posted by: Starglider
Originally posted by: Meteorhead I will not be the heretic to copy-paste the feature-list of the new CUDA SDK 4.0, but let me post a link for those who are really curious.


 

 

 

The direct GPU->GPU memcopy, without having to go through host memory is awesome. However this feature would be useless in OpenCL without having reliable, performant multi-GPU support first! This is yet more motivation to switch back to CUDA as the app I am working on would benefit significantly from GPU->GPU DMA.

 

 

Should actually be able to do PCI->PCI DMA so that one could integrate AMD GPUs with Infiniband like, http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=116&menu_section=34#tab-one for doing zero copy RDMA http://www.google.com/search?q=infiniband+zero+copy+rdma

 

0 Likes
himanshu_gautam
Grandmaster

Suggest Feature you want in AMD APP

Thanks Lucas and xyke for the suggestions.

0 Likes
rougered
Journeyman III

Suggest Feature you want in AMD APP

Hi,

       the thing i miss the most is BLAS and LAPACK written in OpenCL. Of course performance should be optimal on ATI cards...but they should be portable to other platforms.

also it would be nice if the clpp project was supported more since i believe it has potential and right now it is very slow on ATIs

thank you

Riccardo

0 Likes