26 Replies Latest reply on Jan 5, 2010 1:41 AM by genaganna
      • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
        Bullit

        Shoud i uninstall the beta4 or can i install above it?

          • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
            genaganna

             

            Originally posted by: Bullit Shoud i uninstall the beta4 or can i install above it?

             

            Bullit,

                  Uninstall previous one and install above one.

              • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                awkehwe82397rfaowUI

                Hello, I have just tried out the ATI Stream SDK 2.0 final and there appears to be an issue with the SDK and OpenCL. Previously, with the Beta 4, I could run the samples and use it with the OpenTK and the CLOO wrappers for .Net. However, with the final version of the SDK, nothing will run that tries to create a context with OpenCL, not even the included ATI samples in the SDK 2.0 final!


                For the included samples in the SDK 2.0 final, I get an error message that says "Error: clCreateContextFromType failed. Error code : CL_INVALID_PLATFORM" when I try to run them.

                There is a thread over at OpenTK where we are discussing this issue: http://opentk.com/node/1463

                Some info on my computer:

                Windows 7 64bit, Visual Studio 2008, all updates from Windows Update installed, AMD cpu with SSE 3.x, no OpenCL enabled GPU

                I have also tried uninstalling and reinstalling the SDK 2.0 final to no success. The error message is the same. I suspect there is a bug in that SDK with the change in requirements for setting up a context.

                 

                Please advise on how to fix this.

                 

                 

                  • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                    genaganna

                     

                    Originally posted by: awkehwe82397rfaowUI Hello, I have just tried out the ATI Stream SDK 2.0 final and there appears to be an issue with the SDK and OpenCL. Previously, with the Beta 4, I could run the samples and use it with the OpenTK and the CLOO wrappers for .Net. However, with the final version of the SDK, nothing will run that tries to create a context with OpenCL, not even the included ATI samples in the SDK 2.0 final!

                     

                    For the included samples in the SDK 2.0 final, I get an error message that says "Error: clCreateContextFromType failed. Error code : CL_INVALID_PLATFORM" when I try to run them.

                     

                    There is a thread over at OpenTK where we are discussing this issue: http://opentk.com/node/1463

                     

                    Some info on my computer:

                     

                    Windows 7 64bit, Visual Studio 2008, all updates from Windows Update installed, AMD cpu with SSE 3.x, no OpenCL enabled GPU

                     

                    I have also tried uninstalling and reinstalling the SDK 2.0 final to no success. The error message is the same. I suspect there is a bug in that SDK with the change in requirements for setting up a context.

                     

                     

                     

                    Please advise on how to fix this.

                     

                     

                     

                     

                     

                    You need to change your code as per the ICD changes.  Please read ICD article available at http://developer.amd.com/support/KnowledgeBase/Lists/KnowledgeBase/DispForm.aspx?ID=71

                      • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                        awkehwe82397rfaowUI

                        The code using the Cloo wrappers for OpenCL were changed to reflect the final SDK 2.0. And the included samples in the SDK 2.0 FINAL won't even run because of the problem with the SDK. It is a problem with the SDK itself and I have tried running updated code to no success.

                          • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                            genaganna

                             

                            Originally posted by: awkehwe82397rfaowUI The code using the Cloo wrappers for OpenCL were changed to reflect the final SDK 2.0. And the included samples in the SDK 2.0 FINAL won't even run because of the problem with the SDK. It is a problem with the SDK itself and I have tried running updated code to no success.

                             

                            In previous beta releases functions such as clGetDeviceIDs() and clCreateContext()
                            accepted a NULL value for the platform parameter. This release no longer
                            allows this - the platform must be a valid one obtained by using the platform
                            API.

                            could you please verifiy whether platform parameter for clCreateContextFromType is a valid platform or NULL?
                              • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                awkehwe82397rfaowUI
                                Yes, I have double checked that I'm not simply passing null parameters to the context function and am actually passing the device properties. I have debugged the values in Visual Studio and there does not appear to be anything wrong with my code. In addition, this bug also affects the code samples given with the ATI Stream SDK 2.0 Final. It is something to do with the SDK itself that causes it not to accept the values even when the code is from the SDK itself. I think the requirement to pass device properties to the context should have been put through the betas instead of releasing it as a final without testing.
                                  • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                    genaganna

                                     

                                    Originally posted by: awkehwe82397rfaowUI Yes, I have double checked that I'm not simply passing null parameters to the context function and am actually passing the device properties. I have debugged the values in Visual Studio and there does not appear to be anything wrong with my code. In addition, this bug also affects the code samples given with the ATI Stream SDK 2.0 Final. It is something to do with the SDK itself that causes it not to accept the values even when the code is from the SDK itself. I think the requirement to pass device properties to the context should have been put through the betas instead of releasing it as a final without testing.


                                     

                                    It seems you have older OpenCL libraries on your system.  Make sure it picks correct dll.

                                      • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                        awkehwe82397rfaowUI
                                        I don't see any extra instances of OpenCL libraries on my system. I've uninstalled the SDK 2.0 and made sure to delete the ATI folders in Program Files / Program Files x86. I can confirm that there are no other OpenCL libraries left because when I try running the samples in the SDK 2.0, they will complain that the OpenCL libraries are not found. Do you have the SDK 2.0 Final running successfully? Is there anything about your configuration different from mine?
                                          • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                            genaganna

                                             

                                            Originally posted by: awkehwe82397rfaowUI I don't see any extra instances of OpenCL libraries on my system. I've uninstalled the SDK 2.0 and made sure to delete the ATI folders in Program Files / Program Files x86. I can confirm that there are no other OpenCL libraries left because when I try running the samples in the SDK 2.0, they will complain that the OpenCL libraries are not found. Do you have the SDK 2.0 Final running successfully? Is there anything about your configuration different from mine?


                                            Are you using 64bit installer?

                                            Please read Prerequisites and intalling SDK on windows systems from installation note doc.   Repeat steps given in installing SDK on windows system.  Let me know whether you are able to run or not?

                                            You can find installation notes doc at http://developer.amd.com/gpu/ATIStreamSDK/pages/Documentation.aspx

                                            I am able run on XP64 and Vista64 without any problem.

                                             

                                            Edit : After following all steps, Please run CLInfo.exe and Reduction.exe --device cpu

                                                     and let me know what error you are getting(copy output here)

                                              • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                awkehwe82397rfaowUI

                                                Yes, I have verified that I'm using the 64-bit installer from http://developer.amd.com/Downloads/ati-stream-sdk-v2.0-vista-win7-64.exe and I have Windows 7-bit with all the latest patches and Visual Studio 2008 Professional installed. I have just tried reinstalling it again with no change.

                                                Here is the output from the programs requested:

                                                 

                                                Number of platforms:                             1
                                                  Plaform Profile:                               FULL_PROFILE
                                                  Plaform Version:                               OpenCL 1.0 ATI-Stream-v2.0.0
                                                  Plaform Name:                                  ATI Stream
                                                  Plaform Vendor:                                Advanced Micro Devices, Inc.


                                                  Plaform Name:                                  ATI Stream
                                                Number of devices:                               1
                                                  Device Type:                                   CL_DEVICE_TYPE_CPU
                                                  Device ID:                                     4098
                                                  Max compute units:                             2
                                                  Max work items dimensions:                     3
                                                    Max work items[0]:                           1024
                                                    Max work items[1]:                           1024
                                                    Max work items[2]:                           1024
                                                  Max work group size:                           1024
                                                  Preferred vector width char:                   16
                                                  Preferred vector width short:                  8
                                                  Preferred vector width int:                    4
                                                  Preferred vector width long:                   2
                                                  Preferred vector width float:                  4
                                                  Preferred vector width double:                 0
                                                  Max clock frequency:                           1900Mhz
                                                  Address bits:                                  64
                                                  Max memeory allocation:                        1073741824
                                                  Image support:                                 No
                                                  Max size of kernel argument:                   4096
                                                  Alignment (bits) of base address:              1024
                                                  Minimum alignment (bytes) for any datatype:    128
                                                  Single precision floating point capability
                                                    Denorms:                                     Yes
                                                    Quiet NaNs:                                  Yes
                                                    Round to nearest even:                       Yes
                                                    Round to zero:                               No
                                                    Round to +ve and infinity:                   No
                                                    IEEE754-2008 fused multiply-add:             No
                                                  Cache type:                                    Read/Write
                                                  Cache line size:                               64
                                                  Cache size:                                    65536
                                                  Global memory size:                            3221225472
                                                  Constant buffer size:                          65536
                                                  Max number of constant args:                   8
                                                  Local memory type:                             Global
                                                  Local memory size:                             32768
                                                  Profiling timer resolution:                    1
                                                  Device endianess:                              Little
                                                  Available:                                     Yes
                                                  Compiler available:                            Yes
                                                  Execution capabilities:
                                                    Execute OpenCL kernels:                      Yes
                                                    Execute native function:                     No
                                                  Queue properties:
                                                    Out-of-Order:                                No
                                                    Profiling :                                  Yes
                                                  Platform ID:                                   00000000010F4598
                                                  Name:                                          AMD Athlon(tm) 64 X2 Dual-Core
                                                Processor TK-57
                                                  Vendor:                                        AuthenticAMD
                                                  Driver version:                                1.0
                                                  Profile:                                       FULL_PROFILE
                                                  Version:                                       OpenCL 1.0 ATI-Stream-v2.0.0
                                                  Extensions:                                    cl_khr_global_int32_base_atomic
                                                s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
                                                cal_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomi
                                                cs cl_khr_byte_addressable_store

                                                 

                                                 

                                                C:\Users\Main>"C:\Users\Main\Documents\ATI Stream\samples\opencl\bin\x86_64\Redu
                                                ction.exe" --device cpu

                                                Input
                                                1 0 1 1 4 1 2 3 5 1 0 1 3 4 5 5 0 3 1 2 2 1 5 3 1 1 4 2 0 0 2 2 2 4 5 5 0 1 5 2
                                                1 3 0 3 2 4 0 4 4 2 1 5 2 5 5 2 0 1 0 5 0 5 3 1 1 0 5 2 3 0 2 2 0 0 5 5 0 0 3 3
                                                4 5 2 0 5 5 3 5 4 3 0 2 4 3 0 0 5 0 0 5 4 0 2 2 3 0 3 1 4 2 2 0 0 2 0 4 4 1 1 5
                                                2 1 5 5 3 2 4 3 1 3 1 2 3 4 1 4 5 4 2 3 4 4 0 2 0 0 0 1 1 5 1 2 5 1 0 0 3 0 0 0
                                                5 5 5 4 0 2 0 2 0 3 5 1 4 4 2 4 1 0 3 5 2 4 4 1 0 3 1 3 1 3 0 3 2 4 0 4 3 4 2 1
                                                4 3 2 1 1 3 5 4 5 5 5 3 1 4 1 4 5 1 5 4 5 2 4 1 0 2 0 2 5 1 1 5 0 1 1 0 3 5 1 0
                                                0 4 2 1 5 0 3 1 2 1 5 4 3 1 0 4 5 4 2 2 0 3 2 5 1 4 2 2 0 5 2 0 2 1 2 1 3 5 3 5
                                                5 0 0 1 3 2 4 4 5 1 2 1 2 0 0 2 4 4 1 3 2 2 0 5 0 2 3 1 5 1 3 2 5 3 0 0 3 1 4 1
                                                0 4 1 5 4 4 0 2 5 2 2 2 4 2 1 2 0 5 1 5 1 1 5 3 0 0 5 3 0 3 2 3 4 4 0 5 0 4 0 3
                                                4 2 0 5 3 1 5 3 5 1 4 3 4 1 2 4 0 3 4 1 5 4 3 1 2 0 1 2 2 3 3 3 3 0 4 3 3 4 1 4
                                                0 4 3 3 5 1 5 0 5 4 3 4 0 2 3 2 2 2 1 3 0 0 2 5 5 0 3 0 3 5 0 0 1 5 4 5 0 3 3 3
                                                2 2 2 5 3 1 3 0 4 3 3 1 3 3 4 1 2 4 4 1 1 1 1 4 0 3 1 0 2 2 0 0 0 1 4 3 0 5 1 3
                                                1 2 2 0 1 3 1 5 5 5 5 1 0 4 2 4 4 0 2 5 2 1 3 5 0 1 3 5 2 0 2 5 4 5 4 3 0 2 0 1
                                                2 5 2 4 0 2 2 2 1 1 4 3 2 2 2 4 1 0 2 0 4 3 2 0 4 4 1 5 4 1 2 0 5 3 5 2 4 5 2 1
                                                5 2 1 0 2 1 3 1 0 0 4 5 3 5 5 0 2 5 1 5 2 2 1 1 5 3 0 2 3 1 1 3 0 3 3 3 0 0 1 4
                                                2 4 0 1 4 0 0 4 1 1 1 3 1 3 3 1 3 3 3 5 3 3 5 1 3 0 2 2 2 1 2 2 1 1 5 2 1 5 4 5
                                                0 0 2 1 3 0 3 2 3 1 0 4 2 3 5 5 5 0 0 1 4 3 4 0 4 2 0 5 4 0 1 3 4 5 1 2 4 2 4 3
                                                4 3 5 3 0 2 1 2 1 0 2 1 1 1 0 4 2 4 1 3 0 3 3 1 0 1 4 1 4 5 4 5 5 0 2 4 5 3 0 4
                                                2 1 1 2 3 4 0 4 3 3 5 2 0 1 0 4 2 4 4 2 0 2 3 0 1 5 5 3 1 2 2 1 4 4 0 2 1 5 1 2
                                                5 4 3 4 1 1 1 4 4 5 2 3 0 2 2 1 4 2 3 0 2 1 5 3 3 1 0 0 0 1 1 4 1 5 4 1 3 1 2 5
                                                4 3 3 3 2 2 4 4 5 0 1 5 2 5 3 3 2 5 1 2 0 1 2 4 2 2 2 5 0 3 3 3 0 1 0 1 4 0 0 4
                                                1 2 2 4 4 4 5 2 0 1 3 4 1 4 5 4 3 1 5 4 5 5 1 2 2 3 1 4 2 0 3 4 2 3 2 5 4 1 4 3
                                                1 4 5 0 0 1 5 3 5 4 2 2 0 3 1 5 4 1 0 5 0 0 1 3 2 2 2 3 1 5 1 5 0 5 5 5 5 2 4 0
                                                0 1 4 2 3 3 2 0 5 2 4 1 1 5 1 1 5 0 5 2 3 4 1 2 3 1 2 5 1 1 1 2 2 0 5 1 3 0 5 4
                                                3 1 4 0 0 2 3 1 2 2 3 2 1 5 3 2 0 1 5 0 4 2 1 1 4 1 5 5 0 1 2 0 4 1 1 4 2 1 0 5
                                                0 4 5 4 0 2 5 1 1 3 5 3 3 4 4 1 5 3 5 0 5 1 2 5


                                                                        BUILD LOG
                                                 ************************************************

                                                Link failed
                                                 ************************************************
                                                Error: clBuildProgram failed. Error code : CL_BUILD_PROGRAM_FAILURE

                                                 

                                                 

                                                I have checked that my path variables are pointing to C:\Users\Main\Documents\ATI Stream\bin\x86_64;C:\Users\Main\Documents\ATI Stream\bin\x86;C:\Program Files (x86)\ATI Stream\bin\x86_64;C:\Program Files (x86)\ATI Stream\bin\x86 and the ATISTREAMSDKROOT variable exists and points to C:\Program Files (x86)\ATI Stream\

                                                Anything you can see wrong with my system from this info?

                                                  • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                    genaganna

                                                    awkehwe82397rfaowUI,

                                                           Looks like every thing ok.  could you please check you have write permissions for TEMP folder.   TEMP environment variable points the TEMP folder.

                                                     

                                                    was beta4 samples working on your system?

                                                      • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                        awkehwe82397rfaowUI

                                                         

                                                        Originally posted by: genaganna awkehwe82397rfaowUI,

                                                         

                                                               Looks like every thing ok.  could you please check you have write permissions for TEMP folder.   TEMP environment variable points the TEMP folder.

                                                         

                                                         

                                                         

                                                        was beta4 samples working on your system?

                                                         

                                                         

                                                        Yep, I've checked, I can read and write to the TEMP folder for my user profile. I can also execute programs from within my TEMP folder. The Beta4 samples worked fine for me, it's the SDK 2.0 Final that broke everything. The other guys on this thread: http://opentk.com/node/1463 also have the same problem and they have also noticed that the SDK 2.0 Final samples run fine on the Nvidia's OpenCL implementation but not with the ATI Stream SDK 2.0 Final. Like me, they had been successfully running the ATI Stream SDK 2.0 Beta4.

                                                        Is there some way to get an AMD dev to look into this bug and try the ATI Stream SDK 2.0 Final on different machines to see this problem?

                                                          • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                            awkehwe82397rfaowUI

                                                            Ok, here is another piece of evidence that there is something wrong with the ATI Stream SDK 2.0 Final. I've just setup another completely new laptop with Windows 7-64 bit and I installed the ATI Stream SDK 2.0 Final onto it to see if I could get it to work. The samples from the ATI Stream SDK 2.0 Final once again failed to run and give the same error messages that I'm seeing on my old Windows 7 laptop.

                                                              • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                                genaganna

                                                                 

                                                                Originally posted by: awkehwe82397rfaowUI Ok, here is another piece of evidence that there is something wrong with the ATI Stream SDK 2.0 Final. I've just setup another completely new laptop with Windows 7-64 bit and I installed the ATI Stream SDK 2.0 Final onto it to see if I could get it to work. The samples from the ATI Stream SDK 2.0 Final once again failed to run and give the same error messages that I'm seeing on my old Windows 7 laptop.

                                                                 

                                                                Are you getting  BUILD_FAILURE error or INVALID_PLATFORM error?

                                                                Could you please give us your system details of two laptops?

                                                                System details like  OS (XP64, SP1),  GPU(RV770), CPU( AMD Phenom Quad-Core), OpenCL SDK( ATI Stream SDK v2.0) and DRIVER(catalyst_9.12_hotfix)

                                                                 

                                                                  • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                                    awkehwe82397rfaowUI

                                                                    On both systems, I am getting the same message: "Error: clCreateContextFromType failed. Error code : CL_INVALID_PLATFORM" when I try running the Binary Sort sample.

                                                                    I'll try to list the as much of the specs for the new laptop as I can, since it is I was setting it up for someone else and don't have it with me right now:

                                                                    Specs for this new laptop:

                                                                    Model Name: NP G70-250: http://h10025.www1.hp.com/ewfrf/wc/product?product=3860216&lc=en&cc=us&dlc=⟨=&cc=us

                                                                    OS: Windows 7 Professional 64-bit

                                                                    GPU: Intel GMA 4500M

                                                                    CPU: Intel Pentium Dual-Core T4200 2GHz

                                                                    OpenCL SDK: ATI Stream SDK v2.0 Final 64-bit Vista/7

                                                                    GPU Drivers: integrated Intel, no GPU OpenCL support

                                                                     

                                                                     

                                                                    Specs for the old laptop (also see the info print out in the previous posts):

                                                                    Model Name: Acer Aspire 7520

                                                                    OS: Windows 7 Ultimate 64-bit

                                                                    GPU: integrated, Nvidia GeForce 7000M

                                                                    CPU: AMD Athlon 64 X2 Dual Core TK-57

                                                                    OpenCL SDK: ATI Stream SDK v2.0 Final 64-bit Vista/7

                                                                    GPU Drivers: integrated Nvidia, no GPU OpenCL support

                                                                     

                                                                    What is common about these two laptops is that they both use integrated graphics that do not support OpenCL and both have 64-bit versions of Windows 7.

                                                                      • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                                        genaganna

                                                                        awkehwe82397rfaowUI,

                                                                                Previously you said you are getting BUILD_FAILURE error. 

                                                                        Please  replace BinarySearch.cpp with attached code, complile and run with following option and post message printed on command line

                                                                        BinarySearch.exe --device cpu

                                                                         

                                                                        /* ============================================================ Copyright (c) 2009 Advanced Micro Devices, Inc. All rights reserved. Redistribution and use of this material is permitted under the following conditions: Redistributions must retain the above copyright notice and all terms of this license. In no event shall anyone redistributing or accessing or using this material commence or participate in any arbitration or legal action relating to this material against Advanced Micro Devices, Inc. or any copyright holders or contributors. The foregoing shall survive any expiration or termination of this license or any agreement or access or use related to this material. ANY BREACH OF ANY TERM OF THIS LICENSE SHALL RESULT IN THE IMMEDIATE REVOCATION OF ALL RIGHTS TO REDISTRIBUTE, ACCESS OR USE THIS MATERIAL. THIS MATERIAL IS PROVIDED BY ADVANCED MICRO DEVICES, INC. AND ANY COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" IN ITS CURRENT CONDITION AND WITHOUT ANY REPRESENTATIONS, GUARANTEE, OR WARRANTY OF ANY KIND OR IN ANY WAY RELATED TO SUPPORT, INDEMNITY, ERROR FREE OR UNINTERRUPTED OPERA TION, OR THAT IT IS FREE FROM DEFECTS OR VIRUSES. ALL OBLIGATIONS ARE HEREBY DISCLAIMED - WHETHER EXPRESS, IMPLIED, OR STATUTORY - INCLUDING, BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, ACCURACY, COMPLETENESS, OPERABILITY, QUALITY OF SERVICE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ADVANCED MICRO DEVICES, INC. OR ANY COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, PUNITIVE, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, REVENUE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED OR BASED ON ANY THEORY OF LIABILITY ARISING IN ANY WAY RELATED TO THIS MATERIAL, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE ENTIRE AND AGGREGATE LIABILITY OF ADVANCED MICRO DEVICES, INC. AND ANY COPYRIGHT HOLDERS AND CONTRIBUTORS SHALL NOT EXCEED TEN DOLLARS (US $10.00). ANYONE REDISTRIBUTING OR ACCESSING OR USING THIS MATERIAL ACCEPTS THIS ALLOCATION OF RISK AND AGREES TO RELEASE ADVANCED MICRO DEVICES, INC. AND ANY COPYRIGHT HOLDERS AND CONTRIBUTORS FROM ANY AND ALL LIABILITIES, OBLIGATIONS, CLAIMS, OR DEMANDS IN EXCESS OF TEN DOLLARS (US $10.00). THE FOREGOING ARE ESSENTIAL TERMS OF THIS LICENSE AND, IF ANY OF THESE TERMS ARE CONSTRUED AS UNENFORCEABLE, FAIL IN ESSENTIAL PURPOSE, OR BECOME VOID OR DETRIMENTAL TO ADVANCED MICRO DEVICES, INC. OR ANY COPYRIGHT HOLDERS OR CONTRIBUTORS FOR ANY REASON, THEN ALL RIGHTS TO REDISTRIBUTE, ACCESS OR USE THIS MATERIAL SHALL TERMINATE IMMEDIATELY. MOREOVER, THE FOREGOING SHALL SURVIVE ANY EXPIRATION OR TERMINATION OF THIS LICENSE OR ANY AGREEMENT OR ACCESS OR USE RELATED TO THIS MATERIAL. NOTICE IS HEREBY PROVIDED, AND BY REDISTRIBUTING OR ACCESSING OR USING THIS MATERIAL SUCH NOTICE IS ACKNOWLEDGED, THAT THIS MATERIAL MAY BE SUBJECT TO RESTRICTIONS UNDER THE LAWS AND REGULATIONS OF THE UNITED STATES OR OTHER COUNTRIES, WHICH INCLUDE BUT ARE NOT LIMITED TO, U.S. EXPORT CONTROL LAWS SUCH AS THE EXPORT ADMINISTRATION REGULATIONS AND NATIONAL SECURITY CONTROLS AS DEFINED THEREUNDER, AS WELL AS STATE DEPARTMENT CONTROLS UNDER THE U.S. MUNITIONS LIST. THIS MATERIAL MAY NOT BE USED, RELEASED, TRANSFERRED, IMPORTED, EXPORTED AND/OR RE-EXPORTED IN ANY MANNER PROHIBITED UNDER ANY APPLICABLE LAWS, INCLUDING U.S. EXPORT CONTROL LAWS REGARDING SPECIFICALLY DESIGNATED PERSONS, COUNTRIES AND NATIONALS OF COUNTRIES SUBJECT TO NATIONAL SECURITY CONTROLS. MOREOVER, THE FOREGOING SHALL SURVIVE ANY EXPIRATION OR TERMINATION OF ANY LICENSE OR AGREEMENT OR ACCESS OR USE RELATED TO THIS MATERIAL. NOTICE REGARDING THE U.S. GOVERNMENT AND DOD AGENCIES: This material is provided with "RESTRICTED RIGHTS" and/or "LIMITED RIGHTS" as applicable to computer software and technical data, respectively. Use, duplication, distribution or disclosure by the U.S. Government and/or DOD agencies is subject to the full extent of restrictions in all applicable regulations, including those found at FAR52.227 and DFARS252.227 et seq. and any successor regulations thereof. Use of this material by the U.S. Government and/or DOD agencies is acknowledgment of the proprietary rights of any copyright holders and contributors, including those of Advanced Micro Devices, Inc., as well as the provisions of FAR52.227-14 through 23 regarding privately developed and/or commercial computer software. This license forms the entire agreement regarding the subject matter hereof and supersedes all proposals and prior discussions and writings between the parties with respect thereto. This license does not affect any ownership, rights, title, or interest in, or relating to, this material. No terms of this license can be modified or waived, and no breach of this license can be excused, unless done so in a writing signed by all affected parties. Each term of this license is separately enforceable. If any term of this license is determined to be or becomes unenforceable or illegal, such term shall be reformed to the minimum extent necessary in order for this license to remain in effect in accordance with its terms as modified by such reformation. This license shall be governed by and construed in accordance with the laws of the State of Texas without regard to rules on conflicts of law of any state or jurisdiction or the United Nations Convention on the International Sale of Goods. All disputes arising out of this license shall be subject to the jurisdiction of the federal and state courts in Austin, Texas, and all defenses are hereby waived concerning personal jurisdiction and venue of these courts. ============================================================ */ #include "BinarySearch.hpp" #include <malloc.h> /* * \brief set up program input data */ int BinarySearch::setupBinarySearch() { /* allocate and init memory used by host */ cl_uint inputSizeBytes = length * sizeof(cl_uint); input = (cl_uint *) malloc(inputSizeBytes); if(input==NULL) { sampleCommon->error("Failed to allocate host memory. (input)"); return SDK_FAILURE; } cl_uint max = length * 20; /* random initialisation of input */ input[0] = 0; for(cl_int i = 1; i < length; i++) input[i] = input[i-1] + (cl_uint) (max * rand()/(float)RAND_MAX); #if defined (_WIN32) output = (cl_uint *)_aligned_malloc(sizeof(cl_uint4), 16); #else output = (cl_uint *)memalign(16, sizeof(cl_uint4)); #endif if(output==NULL) { sampleCommon->error("Failed to allocate host memory. (output)"); return SDK_FAILURE; } /* * Unless quiet mode has been enabled, print the INPUT array. */ if(!quiet) { sampleCommon->printArray<cl_uint>( "Sorted Input", input, length, 1); } return SDK_SUCCESS; } /* * \brief OpenCL related initialisations are done here. * Context, Device list, Command Queue are set up. * Calls are made to set up OpenCL memory buffers that this program uses * and to load the programs into memory and get kernel handles. * Load and build OpenCL program and get kernel handles. * Set up OpenCL memory buffers used by this program. */ int BinarySearch::setupCL(void) { cl_int status = 0; size_t deviceListSize; cl_device_type dType; if(deviceType.compare("cpu") == 0) { dType = CL_DEVICE_TYPE_CPU; } else //deviceType = "gpu" { dType = CL_DEVICE_TYPE_GPU; } /* * Have a look at the available platforms and pick either * the AMD one if available or a reasonable default. */ cl_uint numPlatforms; cl_platform_id platform = NULL; status = clGetPlatformIDs(0, NULL, &numPlatforms); if(!sampleCommon->checkVal(status, CL_SUCCESS, "clGetPlatformIDs failed.")) { return SDK_FAILURE; } if (0 < numPlatforms) { cl_platform_id* platforms = new cl_platform_id[numPlatforms]; status = clGetPlatformIDs(numPlatforms, platforms, NULL); if(!sampleCommon->checkVal(status, CL_SUCCESS, "clGetPlatformIDs failed.")) { return SDK_FAILURE; } for (unsigned i = 0; i < numPlatforms; ++i) { char pbuf[100]; status = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); if(!sampleCommon->checkVal(status, CL_SUCCESS, "clGetPlatformInfo failed.")) { return SDK_FAILURE; } platform = platforms[i]; if (!strcmp(pbuf, "Advanced Micro Devices, Inc.")) { printf("FOUND AMD platform \n"); break; } } delete[] platforms; } /* * If we could find our platform, use it. Otherwise pass a NULL and get whatever the * implementation thinks we should be using. */ cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platform, 0 }; /* Use NULL for backward compatibility */ cl_context_properties* cprops = (NULL == platform) ? NULL : cps; context = clCreateContextFromType( cprops, dType, NULL, NULL, &status); if(!sampleCommon->checkVal(status, CL_SUCCESS, "clCreateContextFromType failed.")) return SDK_FAILURE; /* First, get the size of device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, 0, NULL, &deviceListSize); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetContextInfo failed.")) return SDK_FAILURE; /* Now allocate memory for device list based on the size we got earlier */ devices = (cl_device_id *)malloc(deviceListSize); if(devices==NULL) { sampleCommon->error("Failed to allocate memory (devices)."); return SDK_FAILURE; } /* Now, get the device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, deviceListSize, devices, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetGetContextInfo failed.")) return SDK_FAILURE; { /* The block is to move the declaration of prop closer to its use */ cl_command_queue_properties prop = 0; if(timing) prop |= CL_QUEUE_PROFILING_ENABLE; commandQueue = clCreateCommandQueue( context, devices[0], prop, &status); if(!sampleCommon->checkVal( status, 0, "clCreateCommandQueue failed.")) return SDK_FAILURE; } inputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * length, input, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateBuffer failed. (inputBuffer)")) return SDK_FAILURE; outputBuffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint4), output, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateBuffer failed. (outputBuffer)")) return SDK_FAILURE; /* create a CL program using the kernel source */ streamsdk::SDKFile kernelFile; kernelFile.open("BinarySearch_Kernels.cl"); const char * source = kernelFile.source().c_str(); size_t sourceSize[] = { strlen(source) }; program = clCreateProgramWithSource( context, 1, &source, sourceSize, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateProgramWithSource failed.")) return SDK_FAILURE; /* create a cl program executable for all the devices specified */ status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clBuildProgram failed.")) return SDK_FAILURE; /* get a kernel object handle for a kernel with the given name */ kernel = clCreateKernel(program, "binarySearch", &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateKernel failed.")) return SDK_FAILURE; return SDK_SUCCESS; } int BinarySearch::runCLKernels(void) { cl_int status; cl_event events[2]; size_t globalThreads[1]; size_t localThreads[1]; globalThreads[0] = numSubdivisions; localThreads[0] = 1; /* Check group size against kernelWorkGroupSize */ status = clGetKernelWorkGroupInfo(kernel, devices[0], CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &kernelWorkGroupSize, 0); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetKernelWorkGroupInfo failed.")) { return SDK_FAILURE; } if((cl_uint)(localThreads[0]) > kernelWorkGroupSize) { std::cout<<"Out of Resources!" << std::endl; std::cout<<"Group Size specified : "<<localThreads[0]<<std::endl; std::cout<<"Max Group Size supported on the kernel : " <<kernelWorkGroupSize<<std::endl; return SDK_FAILURE; } /** * Since a plain binary search on the GPU would not achieve much benefit over the GPU * we are doing an N'ary search. We split the array into N segments every pass and therefore * get log (base N) passes instead of log (base 2) passes. * * In every pass, only the thread that can potentially have the element we are looking for * writes to the output array. For ex: if we are looking to find 4567 in the array and every * thread is searching over a segment of 1000 values and the input array is 1, 2, 3, 4,... * then the first thread is searching in 1 to 1000, the second one from 1001 to 2000, etc. * The first one does not write to the output. The second one doesn't either. The fifth one however is from * 4001 to 5000. So it can potentially have the element 4567 which lies between them. * * This particular thread writes to the output the lower bound, upper bound and whether the element equals the lower bound element. * So, it would be 4001, 5000, 0 * * The next pass would subdivide 4001 to 5000 into smaller segments and continue the same process from there. * * When a pass returns 1 in the third element, it means the element has been found and we can stop executing the kernel. * If the element is not found, then the execution stops after looking at segment of size 1. */ cl_uint globalLowerBound = 0; cl_uint globalUpperBound = length - 1; cl_uint subdivSize = (globalUpperBound - globalLowerBound + 1)/numSubdivisions; cl_uint isElementFound = 0; if((input[0] > findMe) || (input[length-1] < findMe)) { output[0] = 0; output[1] = length - 1; output[2] = 0; return SDK_SUCCESS; } output[3] = 1; /*** Set appropruiate arguments to the kernel ***/ /* * First argument of the kernel is the output buffer */ status = clSetKernelArg( kernel, 0, sizeof(cl_mem), (void *)&outputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg 0(OutputBuffer) failed.")) return SDK_FAILURE; /* * Second argument is input buffer */ status = clSetKernelArg( kernel, 1, sizeof(cl_mem), (void *)&inputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg 1(inputBuffer) failed.")) return SDK_FAILURE; /* * Third is the element we are looking for */ status = clSetKernelArg( kernel, 2, sizeof(cl_uint), (void *)&findMe); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg 2(findMe) failed.")) return SDK_FAILURE; while(subdivSize > 1 && output[3] != 0) { output[3] = 0; /* Enqueue readBuffer*/ status = clEnqueueWriteBuffer( commandQueue, outputBuffer, CL_TRUE, 0, sizeof(cl_uint4), output, 0, NULL, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueWriteBuffer failed.")) return SDK_FAILURE; /* Wait for the write buffer to finish execution */ status = clWaitForEvents(1, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return SDK_FAILURE; clReleaseEvent(events[1]); /* * Fourth argument is the lower bound for the full segment for this pass. * Each thread derives its own lower and upper bound from this. */ status = clSetKernelArg( kernel, 3, sizeof(cl_uint), (void *)&globalLowerBound); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg 3(globalLowerBound) failed.")) return SDK_FAILURE; /* * Similar to the above, but it is the upper bound */ status = clSetKernelArg( kernel, 4, sizeof(cl_uint), (void *)&globalUpperBound); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg 4(globalUpperBound) failed.")) return SDK_FAILURE; /* * The size of the subdivision for each thread */ status = clSetKernelArg( kernel, 5, sizeof(cl_uint), (void *)&subdivSize); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg 5(sumdivSize) failed.")) return SDK_FAILURE; /* * Enqueue a kernel run call */ status = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, globalThreads, localThreads, 0, NULL, &events[0]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueNDRangeKernel failed.")) return SDK_FAILURE; /* wait for the kernel call to finish execution */ status = clWaitForEvents(1, &events[0]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return SDK_FAILURE; /* Enqueue readBuffer*/ status = clEnqueueReadBuffer( commandQueue, outputBuffer, CL_TRUE, 0, sizeof(cl_uint4), output, 0, NULL, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueReadBuffer failed.")) return SDK_FAILURE; /* Wait for the read buffer to finish execution */ status = clWaitForEvents(1, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return SDK_FAILURE; clReleaseEvent(events[1]); globalLowerBound = output[0]; globalUpperBound = output[1]; subdivSize = (globalUpperBound - globalLowerBound + 1)/numSubdivisions; } for(cl_uint i=globalLowerBound; i<= globalUpperBound; i++) { if(input[i] == findMe) { output[0] = i; output[1] = i+1; output[2] = 1; return SDK_SUCCESS; } } /* The findMe element is not found from globalLowerBound to globalUpperBound */ output[2] = 0; return SDK_SUCCESS; } /** * CPU verification for the BinarySearch algorithm */ int BinarySearch::binarySearchCPUReference() { cl_uint globalLowerBound = output[0]; cl_uint globalUpperBound = output[1]; cl_uint isElementFound = output[2]; if(isElementFound) { if(input[globalLowerBound] == findMe) return 1; else return 0; } else { for(cl_int i=0; i< length; i++) { if(input[i] == findMe) return 0; } return 1; } } int BinarySearch::initialize() { /*Call base class Initialize to get default configuration*/ if(!this->SDKSample::initialize()) return SDK_FAILURE; /* Now add customized options */ streamsdk::Option* array_length = new streamsdk::Option; if(!array_length) { sampleCommon->error("Memory allocation error.\n"); return SDK_FAILURE; } array_length->_sVersion = "x"; array_length->_lVersion = "length"; array_length->_description = "Lenght of the input array"; array_length->_type = streamsdk::CA_ARG_INT; array_length->_value = &length; sampleArgs->AddOption(array_length); streamsdk::Option* find_me = new streamsdk::Option; if(!find_me) { sampleCommon->error("Memory allocation error.\n"); return SDK_FAILURE; } find_me->_sVersion = "f"; find_me->_lVersion = "find"; find_me->_description = "element to be found"; find_me->_type = streamsdk::CA_ARG_INT; find_me->_value = &findMe; sampleArgs->AddOption(find_me); streamsdk::Option* sub_div = new streamsdk::Option; if(!sub_div) { sampleCommon->error("Memory allocation error.\n"); return SDK_FAILURE; } sub_div->_sVersion = "d"; sub_div->_lVersion = "divisions"; sub_div->_description = "number of subdivisions"; sub_div->_type = streamsdk::CA_ARG_INT; sub_div->_value = &numSubdivisions; sampleArgs->AddOption(sub_div); streamsdk::Option* num_iterations = new streamsdk::Option; if(!num_iterations) { sampleCommon->error("Memory allocation error.\n"); return SDK_FAILURE; } num_iterations->_sVersion = "i"; num_iterations->_lVersion = "iterations"; num_iterations->_description = "Number of iterations for kernel execution"; num_iterations->_type = streamsdk::CA_ARG_INT; num_iterations->_value = &iterations; sampleArgs->AddOption(num_iterations); return SDK_SUCCESS; } int BinarySearch::setup() { if(!sampleCommon->isPowerOf2(length)) length = sampleCommon->roundToPowerOf2(length); if(setupBinarySearch()!=SDK_SUCCESS) return SDK_FAILURE; int timer = sampleCommon->createTimer(); sampleCommon->resetTimer(timer); sampleCommon->startTimer(timer); if(setupCL()!=SDK_SUCCESS) return SDK_FAILURE; setupTime = (cl_double)(sampleCommon->readTimer(timer)); return SDK_SUCCESS; } int BinarySearch::run() { int timer = sampleCommon->createTimer(); sampleCommon->resetTimer(timer); sampleCommon->startTimer(timer); std::cout << "Executing kernel for " << iterations << " iterations" << std::endl; std::cout << "-------------------------------------------" << std::endl; for(int i = 0; i < iterations; i++) { /* Arguments are set and execution call is enqueued on command buffer */ if(runCLKernels()!=SDK_SUCCESS) return SDK_FAILURE; } sampleCommon->stopTimer(timer); totalKernelTime = (double)(sampleCommon->readTimer(timer)) / iterations; if(!quiet) { cl_uint globalLowerBound = output[0]; cl_uint globalUpperBound = output[1]; cl_uint isElementFound = output[2]; printf("l = %d, u = %d, isfound = %d, fm = %d\n", globalLowerBound, globalUpperBound, isElementFound, findMe); } return SDK_SUCCESS; } int BinarySearch::verifyResults() { if(verify) { verificationInput = (cl_uint *) malloc(length*sizeof(cl_int)); if(verificationInput==NULL) { sampleCommon->error("Failed to allocate host memory. (verificationInput)"); return SDK_FAILURE; } memcpy(verificationInput, input, length*sizeof(cl_int)); /* reference implementation * it overwrites the input array with the output */ int refTimer = sampleCommon->createTimer(); sampleCommon->resetTimer(refTimer); sampleCommon->startTimer(refTimer); cl_int verified = binarySearchCPUReference(); sampleCommon->stopTimer(refTimer); referenceKernelTime = sampleCommon->readTimer(refTimer); /* compare the results and see if they match */ if(verified) { std::cout<<"Passed!\n"; return SDK_SUCCESS; } else { std::cout<<"Failed\n"; return SDK_FAILURE; } } return SDK_SUCCESS; } void BinarySearch::printStats() { std::string strArray[3] = {"Length", "Time(sec)", "kernelTime(sec)"}; std::string stats[3]; totalTime = setupTime + totalKernelTime; stats[0] = sampleCommon->toString(length , std::dec); stats[1] = sampleCommon->toString(totalTime, std::dec); stats[2] = sampleCommon->toString(totalKernelTime, std::dec); this->SDKSample::printStats(strArray, stats, 3); } int BinarySearch::cleanup() { /* Releases OpenCL resources (Context, Memory etc.) */ cl_int status; status = clReleaseKernel(kernel); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseKernel failed.")) return SDK_FAILURE; status = clReleaseProgram(program); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseProgram failed.")) return SDK_FAILURE; status = clReleaseMemObject(inputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseMemObject failed.")) return SDK_FAILURE; status = clReleaseMemObject(outputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseMemObject failed.")) return SDK_FAILURE; status = clReleaseCommandQueue(commandQueue); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseCommandQueue failed.")) return SDK_FAILURE; status = clReleaseContext(context); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseContext failed.")) return SDK_FAILURE; /* release program resources (input memory etc.) */ if(input) free(input); if(output) { #if defined (_WIN32) _aligned_free(output); #else free(output); #endif } if(devices) free(devices); if(verificationInput) free(verificationInput); return SDK_SUCCESS; } int main(int argc, char * argv[]) { BinarySearch clBinarySearch("OpenCL Binary Search"); if(clBinarySearch.initialize()!=SDK_SUCCESS) return SDK_FAILURE; if(!clBinarySearch.parseCommandLine(argc, argv)) return SDK_FAILURE; if(clBinarySearch.setup()!=SDK_SUCCESS) return SDK_FAILURE; if(clBinarySearch.run()!=SDK_SUCCESS) return SDK_FAILURE; if(clBinarySearch.verifyResults()!=SDK_SUCCESS) return SDK_FAILURE; if(clBinarySearch.cleanup()!=SDK_SUCCESS) return SDK_FAILURE; clBinarySearch.printStats(); return SDK_SUCCESS; }

                                                                          • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                                            genaganna

                                                                            awkehwe82397rfaowUI,

                                                                            Could please give more details of OS like sevice pack, beta or released or build details?

                                                                              • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                                                awkehwe82397rfaowUI

                                                                                Okay, I've determined what is causing this bug. In all the OpenCL samples shipped with the ATI Stream SDK 2.0 Final, there is a logical bug under every int SAMPLE::setupCL(void) where it is:

                                                                                 

                                                                                    if(deviceType.compare("cpu") == 0)
                                                                                    {
                                                                                        dType = CL_DEVICE_TYPE_CPU;
                                                                                    }
                                                                                    else //deviceType = "gpu" 
                                                                                    {
                                                                                        dType = CL_DEVICE_TYPE_GPU;
                                                                                    }

                                                                                 

                                                                                when in fact, it should actually be:

                                                                                 

                                                                                    if(deviceType.compare("cpu") == 1)
                                                                                    {
                                                                                        dType = CL_DEVICE_TYPE_CPU;
                                                                                    }
                                                                                    else //deviceType = "gpu" 
                                                                                    {
                                                                                        dType = CL_DEVICE_TYPE_GPU;
                                                                                    }

                                                                                 

                                                                                The zero should be a one and everything works! I think this calls for a minor SDK release to fix this bug with the samples.

                                                                                  • AMD's new release of ATI Stream SDK v2.0 w/ OpenCL(tm) 1.0 Support
                                                                                    awkehwe82397rfaowUI

                                                                                    Ok, IGNORE my previous post. The SDK sample codes are correct. What I've found out is that changing the zero to a one did in fact allow me to run the samples successfully. But, then I tried running a command prompt AND changed to the SDK samples directory for the current directory. Doing so allowed me to successfully run the samples with the --device cpu flag. If the current directory is not set to the directory where the sample executables are, then the --device cpu flag doesn't work!

                                                                                    For example, below, I first try running the samples while my current directory is C:\. However, this fails to run correctly and I have to change to C:\Users\Main\Documents\ATI Stream\samples\opencl\bin\debug\x86 where the sample executables are before the --device cpu flag works correctly:

                                                                                     

                                                                                    C:\>"C:\Users\Main\Documents\ATI Stream\samples\opencl\bin\debug\x86\BinarySearc
                                                                                    h.exe" --device cpu

                                                                                    Sorted Input
                                                                                    0 1 722 969 2004 2752 3366 3814 4960 6013 6968 7190 8289 9198 9855 10244 10263 1
                                                                                    0379 10845 11033 11245 12510 13080 13232 13237 13248 13731 14411 15142 15912 166
                                                                                    89 16901 17749 18326 18776 18849 19626 20628 21655 22320 22706 23827 24757 25980
                                                                                     27164 27854 28036 28627 28928 30031 30299 31296 32375 33650 34929 35711 36213 3
                                                                                    6553 36933 38008 38038 38519 38637 39503

                                                                                    Error: clBuildProgram failed. Error code : CL_BUILD_PROGRAM_FAILURE


                                                                                    C:\>cd "C:\Users\Main\Documents\ATI Stream\samples\opencl\bin\debug\x86"

                                                                                    C:\Users\Main\Documents\ATI Stream\samples\opencl\bin\debug\x86>BinarySearch.exe
                                                                                     --device cpu

                                                                                    Sorted Input
                                                                                    0 1 722 969 2004 2752 3366 3814 4960 6013 6968 7190 8289 9198 9855 10244 10263 1
                                                                                    0379 10845 11033 11245 12510 13080 13232 13237 13248 13731 14411 15142 15912 166
                                                                                    89 16901 17749 18326 18776 18849 19626 20628 21655 22320 22706 23827 24757 25980
                                                                                     27164 27854 28036 28627 28928 30031 30299 31296 32375 33650 34929 35711 36213 3
                                                                                    6553 36933 38008 38038 38519 38637 39503

                                                                                    Executing kernel for 1 iterations
                                                                                    -------------------------------------------
                                                                                    l = 0, u = 7, isfound = 0, fm = 5

                                                                                     

                                                                                    There is still a bug in the Cloo and OpenTK libraries I'm using that I still have to resolve, but I will leave that to those projects.