cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

stroia
Staff

AMD APP SDK 2.5 now available!

AMD APP SDK 2.5 Provides Enhanced Performance and Major New Capabilities

We just released an updated version of the AMD APP SDK!  Check out these blogs for details on the new features:

1.       AMD APP SDK 2.5 Provides Enhanced Performance and Major New Capabilities

2.       CPU-to-GPU data transfers exceed 15GB/s using APU zero copy path

In other OpenCL news, AMD will be at Siggraph next week offering OpenCL courses and training on using gDEBugger, an OpenGL and OpenCL debugger and memory analyzer.  You might also be interested in reading this blog post about the OpenCL Programming Guide, a book that into the details of the OpenCL 1.1 spec, case studies, and techniques.  And, last but not least, ALL of the session video from AFDSare online now! 

0 Likes
32 Replies
uelkfr
Journeyman III

Thank you!

Is gDEBugger ATI cards compatible only? I mean do it have x86 support?

0 Likes

DELETED

0 Likes

Wow, I just installed while upgrading the drivers to Catalyst 11.7... Result: my own test application more than doubled in performance while using the GPU!

0 Likes

- ignore

0 Likes

Originally posted by: rbarreira Wow, I just installed while upgrading the drivers to Catalyst 11.7... Result: my own test application more than doubled in performance while using the GPU!


This release is seriously _off the hook_! I observe a twofold performance boost on both the 5870 and my E-350 laptop! Not in some test application, but on my actual production simulation code!

What have you guys done to slow it down before 😉

0 Likes

headless operation on debian 64bit does not seem to work.  clinfo without running x-server shows only CPU, as soon as i start the x-server (and set the proper DISPLAY env-var) clinfo will show the GPU running the X. please note that with older drivers/SDK i could access BOTH my GPUs as soon as the x-server would run with one of them. i had to change my xorg.conf and add a second screen to get both GPUs working.

this sucks. our crunchin-boxes are blackboxes - i have to spend resources on a display and a x-server which i never use. and i bet that even eats some GPU-resources, which i really need.

so actually this is a step back from headless. now maybe AMD's definition of headless differs - but to me that does somehow imply not having to run a x-server. is it that headless means to AMD that i can run a window-less app on a graphical ui? if so.... y - a - y ....

please correct me if i am wrong. running debian here.

 

ps: before you flame me bout using GPUs on linux without X: with NVIDIA's drivers it works without X... just need to load the kernelmodule.

clinfo output when X-server with 2 screens is running: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Name: AMD Accelerated Parallel Processing Number of devices: 3 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Device Topology: PCI[ B#3, D#0, F#0 ] Max compute units: 18 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 700Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f252f63d060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1457 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Device Topology: PCI[ B#5, D#0, F#0 ] Max compute units: 18 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 700Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f252f63d060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1457 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 3423Mhz Address bits: 64 Max memory allocation: 2147483648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 8377356288 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f252f63d060 Name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.1 Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)

0 Likes

Originally posted by: sonicx headless operation on debian 64bit does not seem to work.  clinfo without running x-server shows only CPU, as soon as i start the x-server (and set the proper DISPLAY env-var) clinfo will show the GPU running the X. please note that with older drivers/SDK i could access BOTH my GPUs as soon as the x-server would run with one of them. i had to change my xorg.conf and add a second screen to get both GPUs working.

 

this sucks. our crunchin-boxes are blackboxes - i have to spend resources on a display and a x-server which i never use. and i bet that even eats some GPU-resources, which i really need.

 

so actually this is a step back from headless. now maybe AMD's definition of headless differs - but to me that does somehow imply not having to run a x-server. is it that headless means to AMD that i can run a window-less app on a graphical ui? if so.... y - a - y ....

 

please correct me if i am wrong. running debian here.

 

ps: before you flame me bout using GPUs on linux without X: with NVIDIA's drivers it works without X... just need to load the kernelmodule.

 

Try on Ubuntu or RHEL. Debian is not supported.

0 Likes

It works for me (debian testing, x86_64).

Well for some reason it does not work when I try from a local tty _while_ X server is running, but I have no problems ssh'ing the system and running apps with no DISPLAY set.

My biggest concern though is multi-gpu. Please fix GPU_USE_SYNC_OBJECTS. That 100% CPU usage is killing performance and increasing power consumption which is bad bad bad 😞

0 Likes

Originally posted by: gat3way It works for me (debian testing, x86_64).

 

Well for some reason it does not work when I try from a local tty _while_ X server is running, but I have no problems ssh'ing the system and running apps with no DISPLAY set.

 

My biggest concern though is multi-gpu. Please fix GPU_USE_SYNC_OBJECTS. That 100% CPU usage is killing performance and increasing power consumption which is bad bad bad 😞

 

GPU_USE_SYNC_OBJECTS issue will be fixed in upcoming drivers. Please see release notes of driver whether it is fixed or not.

0 Likes

Very glad to hear that. I thought I have to wait for months until the new SDK comes out...

0 Likes

Originally posted by: gat3way It works for me (debian testing, x86_64).

 

Well for some reason it does not work when I try from a local tty _while_ X server is running, but I have no problems ssh'ing the system and running apps with no DISPLAY set.

 

My biggest concern though is multi-gpu. Please fix GPU_USE_SYNC_OBJECTS. That 100% CPU usage is killing performance and increasing power consumption which is bad bad bad 😞

 

 

That sounds promising, same setup here. SSHing too. Installing another OS isnt really an option for me. Long term maybe, but not right now. Guess ill try reinstalling drivers and so on some more. Maybe i got some leftovers in there making trouble. Looking forward to multi-gpu fix also.

But apart from these "minor" issues i really have to give out some respectz to the AMD OpenCL crew - right now i have not regrets about switching over from NVIDIA hardware. The support here is much better, and i have a feeling that they are actually developing with the future of OCL in mind - instead of trying to force-feed some proprietary API down our throats. Thanks and keep it comin' AMD!

0 Likes
barno
Journeyman III

Support for headless GPU operation.


Do I need to do anything to make that work? My GPU only came up after extending the desktop onto the Monitor connected to it.

0 Likes

Originally posted by: barno
Support for headless GPU operation.


 

Do I need to do anything to make that work? My GPU only came up after extending the desktop onto the Monitor connected to it.

 

No need to do any extra thing. Remove display and run clinfo. Please paste clinfo log here.

0 Likes

dont use AMD APP with Ubuntu 11.04/natty or Debian testing. few thing are broken like offline compilation, OGL/OCL interoperability and maybe multi-GPU.

0 Likes

Originally posted by: genaganna
Originally posted by: barno
Support for headless GPU operation.


 

 

 

Do I need to do anything to make that work? My GPU only came up after extending the desktop onto the Monitor connected to it.

 

 

 

 

No need to do any extra thing. Remove display and run clinfo. Please paste clinfo log here.

 

I needed to upgrade to Catalyst 11.7 and then it worked.... makes sense 🙂

0 Likes
barno
Journeyman III

After installing sdk2.5 and Catalyst 11.7 my HD 6970 (Cayman) now shows up with:

GLOBAL_MEM_SIZE: 800MB
MAX_MEM_ALLOC_SIZE: 204800KB

My HD 6870 (Barts) show exactly the same values.

Before my HD 6970 for example came up with:

GLOBAL_MEM_SIZE: 1024MB
MAX_MEM_ALLOC_SIZE: 262144KB

 

For my HD 6870 its actually an improvement.... but I very much enjoyed the MAX_MEM_ALLOC_SIZE: 262144KB of my HD 6970. Is that intended and if so why?

0 Likes
Raistmer
Adept II

After installation ther were some warnings, but pressing "show log" button shows nothing. IT tries to create new tab in browser but immediately close it.
That is, it's impossible to read installation log after Exepress SDK 2.5 installation.
0 Likes

Originally posted by: Raistmer After installation ther were some warnings, but pressing "show log" button shows nothing. IT tries to create new tab in browser but immediately close it. That is, it's impossible to read installation log after Exepress SDK 2.5 installation.


Could you please give OS information?

IT means?

0 Likes

Originally posted by: genaganna

Originally posted by: Raistmer After installation ther were some warnings, but pressing "show log" button shows nothing. IT tries to create new tab in browser but immediately close it. That is, it's impossible to read installation log after Exepress SDK 2.5 installation.





Could you please give OS information?




IT means?



I did SDK 2.5 uninstall - same problem (no log after pressing corresponding button). Then redone installation - all the same. That is, on my PC it can be reproduced.

OS is Vista x86.
0 Likes
Raistmer
Adept II

And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?
0 Likes

Originally posted by: Raistmer And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?


Yes. Code is not shipped with SDK. 

0 Likes

Originally posted by: genaganna
Originally posted by: Raistmer And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?


 

Yes. Code is not shipped with SDK. Because we want ship binary with developer packages.

 

0 Likes

I also cannot get headless GPGPU operation working with debian testing, catalyst 11.8 & APP 2.5 ?

If I ssh into the machine with X started I can access all the resources by setting the display. If I kill the WDM and try I cannot. If I try to controll the cards with aticonfig I get. Needs to be run with X running ?at

Do you mean by headless operation that you don't need a screen ? Or Xserver is not needed ?

0 Likes

Originally posted by: genaganna

Originally posted by: genaganna
Originally posted by: Raistmer And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?





 




Yes. Code is not shipped with SDK. Because we want ship binary with developer packages.




 








To have source code for device identification simplifies app development, it's common routine IMHO. To have separate pre-compiled binary is good, but not to have device identification code... is bad IMHO, especially if it was already written...
0 Likes

I liked using clinfo for quickly checking local memory sizes, etc. since I have two different platforms with ati cards to run my code. Now I need to look for my card specifications every time 😞

0 Likes

Originally posted by: fpaboim

I liked using clinfo for quickly checking local memory sizes, etc. since I have two different platforms with ati cards to run my code. Now I need to look for my card specifications every time 😞



binary still available in sdk, so you can chack gpu capabilities as before... but if new sdk user would like to query gpu inside his own program he would better to use old SKD samples with CLinfo code...
0 Likes

Nice! Thanks for the info 🙂

0 Likes

raistmer,

I think opencl has a fairly general way of querying device/platform properties.

And almost all samples query some or the other device properties, so I don't think clInfo code removal is such a bad idea that user might think to use previous SDKs.

And ofcourse output from clInfo.exe would now be more reliable as user can't modify the code. So it is better for troubleshooting problems.

0 Likes

Originally posted by: himanshu.gautam

raistmer,




I think opencl has a fairly general way of querying device/platform properties.




And almost all samples query some or the other device properties, so I don't think clInfo code removal is such a bad idea that user might think to use previous SDKs.




And ofcourse output from clInfo.exe would now be more reliable as user can't modify the code. So it is better for troubleshooting problems.



Programmers usually use libraries to not type common places from scratch.
There is quite standart set of quering info - and because of very standart set I propose to put that example back. It's easier to modify already typed text to own needs than to type those few pages of queries from scratch in each and every project that needed GPU info... look at it as template, do you use template sometimes? 😉

Surely it's not worth to stay with SDK 2.4 just for this, but it's worth t make few clicks and download/extract older CLInfo sample than exercise own fingers with typing few pages of common queries, I prefer to type unique code instead 😛

And about sequirity - weak argument IMHO. Just rename ClInfo.exe to ClInfo_genuine.exe and distribute it in this form - you will get exactly same level of confidence in listed info then.
0 Likes
muyiwamc2
Journeyman III

Hi,

do you support the HD4290 onboard gpu. Just installed APP SDK 2.5 and all the samples keep giving "gpu not found " and defaulting to the CPU.

does the SDK support the HD4290?

0 Likes

Originally posted by: muyiwamc2 Hi,

 

do you support the HD4290 onboard gpu. Just installed APP SDK 2.5 and all the samples keep giving "gpu not found " and defaulting to the CPU.

 

does the SDK support the HD4290?

 

OpenCL not supported on HD4290.

0 Likes
kunzjacq
Journeyman III

I'd like to know the status of multi-GPU support on linux. Is it possible to use simultaneously the two GPUs of a 6990 card with SDK 2.5 ?

thank you for your help.

0 Likes