We just released an updated version of the AMD APP SDK! Check out these blogs for details on the new features:
1. AMD APP SDK 2.5 Provides Enhanced Performance and Major New Capabilities
2. CPU-to-GPU data transfers exceed 15GB/s using APU zero copy path
In other OpenCL news, AMD will be at Siggraph next week offering OpenCL courses and training on using gDEBugger, an OpenGL and OpenCL debugger and memory analyzer. You might also be interested in reading this blog post about the OpenCL Programming Guide, a book that into the details of the OpenCL 1.1 spec, case studies, and techniques. And, last but not least, ALL of the session video from AFDSare online now!
Thank you!
Is gDEBugger ATI cards compatible only? I mean do it have x86 support?
DELETED
Wow, I just installed while upgrading the drivers to Catalyst 11.7... Result: my own test application more than doubled in performance while using the GPU!
- ignore
Originally posted by: rbarreira Wow, I just installed while upgrading the drivers to Catalyst 11.7... Result: my own test application more than doubled in performance while using the GPU!
This release is seriously _off the hook_! I observe a twofold performance boost on both the 5870 and my E-350 laptop! Not in some test application, but on my actual production simulation code!
What have you guys done to slow it down before 😉
headless operation on debian 64bit does not seem to work. clinfo without running x-server shows only CPU, as soon as i start the x-server (and set the proper DISPLAY env-var) clinfo will show the GPU running the X. please note that with older drivers/SDK i could access BOTH my GPUs as soon as the x-server would run with one of them. i had to change my xorg.conf and add a second screen to get both GPUs working.
this sucks. our crunchin-boxes are blackboxes - i have to spend resources on a display and a x-server which i never use. and i bet that even eats some GPU-resources, which i really need.
so actually this is a step back from headless. now maybe AMD's definition of headless differs - but to me that does somehow imply not having to run a x-server. is it that headless means to AMD that i can run a window-less app on a graphical ui? if so.... y - a - y ....
please correct me if i am wrong. running debian here.
ps: before you flame me bout using GPUs on linux without X: with NVIDIA's drivers it works without X... just need to load the kernelmodule.
clinfo output when X-server with 2 screens is running: Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Name: AMD Accelerated Parallel Processing Number of devices: 3 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Device Topology: PCI[ B#3, D#0, F#0 ] Max compute units: 18 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 700Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f252f63d060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1457 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Device Topology: PCI[ B#5, D#0, F#0 ] Max compute units: 18 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 2 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 2 Max clock frequency: 700Mhz Address bits: 32 Max memory allocation: 268435456 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f252f63d060 Name: Cypress Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.1 Driver version: CAL 1.4.1457 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 4 Native vector width double: 0 Max clock frequency: 3423Mhz Address bits: 64 Max memory allocation: 2147483648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 8377356288 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0x7f252f63d060 Name: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.1 Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
Originally posted by: sonicx headless operation on debian 64bit does not seem to work. clinfo without running x-server shows only CPU, as soon as i start the x-server (and set the proper DISPLAY env-var) clinfo will show the GPU running the X. please note that with older drivers/SDK i could access BOTH my GPUs as soon as the x-server would run with one of them. i had to change my xorg.conf and add a second screen to get both GPUs working.
this sucks. our crunchin-boxes are blackboxes - i have to spend resources on a display and a x-server which i never use. and i bet that even eats some GPU-resources, which i really need.
so actually this is a step back from headless. now maybe AMD's definition of headless differs - but to me that does somehow imply not having to run a x-server. is it that headless means to AMD that i can run a window-less app on a graphical ui? if so.... y - a - y ....
please correct me if i am wrong. running debian here.
ps: before you flame me bout using GPUs on linux without X: with NVIDIA's drivers it works without X... just need to load the kernelmodule.
Try on Ubuntu or RHEL. Debian is not supported.
It works for me (debian testing, x86_64).
Well for some reason it does not work when I try from a local tty _while_ X server is running, but I have no problems ssh'ing the system and running apps with no DISPLAY set.
My biggest concern though is multi-gpu. Please fix GPU_USE_SYNC_OBJECTS. That 100% CPU usage is killing performance and increasing power consumption which is bad bad bad 😞
Originally posted by: gat3way It works for me (debian testing, x86_64).
Well for some reason it does not work when I try from a local tty _while_ X server is running, but I have no problems ssh'ing the system and running apps with no DISPLAY set.
My biggest concern though is multi-gpu. Please fix GPU_USE_SYNC_OBJECTS. That 100% CPU usage is killing performance and increasing power consumption which is bad bad bad 😞
GPU_USE_SYNC_OBJECTS issue will be fixed in upcoming drivers. Please see release notes of driver whether it is fixed or not.
Very glad to hear that. I thought I have to wait for months until the new SDK comes out...
Originally posted by: gat3way It works for me (debian testing, x86_64).
Well for some reason it does not work when I try from a local tty _while_ X server is running, but I have no problems ssh'ing the system and running apps with no DISPLAY set.
My biggest concern though is multi-gpu. Please fix GPU_USE_SYNC_OBJECTS. That 100% CPU usage is killing performance and increasing power consumption which is bad bad bad 😞
That sounds promising, same setup here. SSHing too. Installing another OS isnt really an option for me. Long term maybe, but not right now. Guess ill try reinstalling drivers and so on some more. Maybe i got some leftovers in there making trouble. Looking forward to multi-gpu fix also.
But apart from these "minor" issues i really have to give out some respectz to the AMD OpenCL crew - right now i have not regrets about switching over from NVIDIA hardware. The support here is much better, and i have a feeling that they are actually developing with the future of OCL in mind - instead of trying to force-feed some proprietary API down our throats. Thanks and keep it comin' AMD!
Support for headless GPU operation.
Do I need to do anything to make that work? My GPU only came up after extending the desktop onto the Monitor connected to it.
Originally posted by: barno Support for headless GPU operation.
Do I need to do anything to make that work? My GPU only came up after extending the desktop onto the Monitor connected to it.
No need to do any extra thing. Remove display and run clinfo. Please paste clinfo log here.
dont use AMD APP with Ubuntu 11.04/natty or Debian testing. few thing are broken like offline compilation, OGL/OCL interoperability and maybe multi-GPU.
Originally posted by: genaganna Originally posted by: barnoSupport for headless GPU operation.
Do I need to do anything to make that work? My GPU only came up after extending the desktop onto the Monitor connected to it.
No need to do any extra thing. Remove display and run clinfo. Please paste clinfo log here.
I needed to upgrade to Catalyst 11.7 and then it worked.... makes sense 🙂
After installing sdk2.5 and Catalyst 11.7 my HD 6970 (Cayman) now shows up with:
GLOBAL_MEM_SIZE: 800MB
MAX_MEM_ALLOC_SIZE: 204800KB
My HD 6870 (Barts) show exactly the same values.
Before my HD 6970 for example came up with:
GLOBAL_MEM_SIZE: 1024MB
MAX_MEM_ALLOC_SIZE: 262144KB
For my HD 6870 its actually an improvement.... but I very much enjoyed the MAX_MEM_ALLOC_SIZE: 262144KB of my HD 6970. Is that intended and if so why?
Originally posted by: Raistmer After installation ther were some warnings, but pressing "show log" button shows nothing. IT tries to create new tab in browser but immediately close it. That is, it's impossible to read installation log after Exepress SDK 2.5 installation.
Could you please give OS information?
IT means?
Originally posted by: genaganna Originally posted by: Raistmer After installation ther were some warnings, but pressing "show log" button shows nothing. IT tries to create new tab in browser but immediately close it. That is, it's impossible to read installation log after Exepress SDK 2.5 installation.
Could you please give OS information?
IT means?
Originally posted by: Raistmer And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?
Yes. Code is not shipped with SDK.
Originally posted by: genaganna Originally posted by: Raistmer And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?
Yes. Code is not shipped with SDK. Because we want ship binary with developer packages.
I also cannot get headless GPGPU operation working with debian testing, catalyst 11.8 & APP 2.5 ?
If I ssh into the machine with X started I can access all the resources by setting the display. If I kill the WDM and try I cannot. If I try to controll the cards with aticonfig I get. Needs to be run with X running ?at
Do you mean by headless operation that you don't need a screen ? Or Xserver is not needed ?
Originally posted by: genaganna Originally posted by: genagannaOriginally posted by: Raistmer And another issue - I didn't find CLinfo sample code. Only clinfo.exe binary in SDK. Why you stopped distribute sources for device identification ?
Yes. Code is not shipped with SDK. Because we want ship binary with developer packages.
I liked using clinfo for quickly checking local memory sizes, etc. since I have two different platforms with ati cards to run my code. Now I need to look for my card specifications every time 😞
Originally posted by: fpaboim
I liked using clinfo for quickly checking local memory sizes, etc. since I have two different platforms with ati cards to run my code. Now I need to look for my card specifications every time 😞
Nice! Thanks for the info 🙂
raistmer,
I think opencl has a fairly general way of querying device/platform properties.
And almost all samples query some or the other device properties, so I don't think clInfo code removal is such a bad idea that user might think to use previous SDKs.
And ofcourse output from clInfo.exe would now be more reliable as user can't modify the code. So it is better for troubleshooting problems.
Originally posted by: himanshu.gautam
raistmer,
I think opencl has a fairly general way of querying device/platform properties.
And almost all samples query some or the other device properties, so I don't think clInfo code removal is such a bad idea that user might think to use previous SDKs.
And ofcourse output from clInfo.exe would now be more reliable as user can't modify the code. So it is better for troubleshooting problems.
Hi,
do you support the HD4290 onboard gpu. Just installed APP SDK 2.5 and all the samples keep giving "gpu not found " and defaulting to the CPU.
does the SDK support the HD4290?
Originally posted by: muyiwamc2 Hi,
do you support the HD4290 onboard gpu. Just installed APP SDK 2.5 and all the samples keep giving "gpu not found " and defaulting to the CPU.
does the SDK support the HD4290?
OpenCL not supported on HD4290.
I'd like to know the status of multi-GPU support on linux. Is it possible to use simultaneously the two GPUs of a 6990 card with SDK 2.5 ?
thank you for your help.