3 Replies Latest reply on Apr 13, 2012 2:03 PM by solver

    clAmdBlasTune errors

    rahulgarg

      I tried running clAmdBlasTune, but got strange errors.

      Platform: Linux (ubuntu 11.10) 64-bit

      clAmdBlas version: 1.6.236

       

      Errors are something along:

      /tmp/OCLAlZ3aw.cl(89): warning: variable "ax" was declared but never referenced

                    const uint ax = k / 0;

                               ^

       

      3 errors detected in the compilation of "/tmp/OCLAlZ3aw.cl".

       

      Internal error: clc compiler invocation failed.

       

      ========================================================

       

      An internal kernel build error occurred!

       

       

       

       

      Output of clinfo:

      umber of platforms:                         1
        Platform Profile:                          FULL_PROFILE
        Platform Version:                          OpenCL 1.1 AMD-APP (851.4)
        Platform Name:                             AMD Accelerated Parallel Processing
        Platform Vendor:                           Advanced Micro Devices, Inc.
        Platform Extensions:                       cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

       

       

        Platform Name:                             AMD Accelerated Parallel Processing
      Number of devices:                           2
        Device Type:                               CL_DEVICE_TYPE_GPU
        Device ID:                                 4098
        Board name:                                ATI Radeon HD 5800 Series
        Device Topology:                           PCI[ B#2, D#0, F#0 ]
        Max compute units:                         18
        Max work items dimensions:                 3
      Max work items[0]:                       256
      Max work items[1]:                       256
      Max work items[2]:                       256
        Max work group size:                       256
        Preferred vector width char:               16
        Preferred vector width short:              8
        Preferred vector width int:                4
        Preferred vector width long:               2
        Preferred vector width float:              4
        Preferred vector width double:             2
        Native vector width char:                  16
        Native vector width short:                 8
        Native vector width int:                   4
        Native vector width long:                  2
        Native vector width float:                 4
        Native vector width double:                2
        Max clock frequency:                       599Mhz
        Address bits:                              32
        Max memory allocation:                     134217728
        Image support:                             Yes
        Max number of images read arguments:       128
        Max number of images write arguments:      8
        Max image 2D width:                        8192
        Max image 2D height:                       8192
        Max image 3D width:                        2048
        Max image 3D height:                       2048
        Max image 3D depth:                        2048
        Max samplers within kernel:                16
        Max size of kernel argument:               1024
        Alignment (bits) of base address:          2048
        Minimum alignment (bytes) for any datatype:128

        Single precision floating point capability

      Denorms:                                 No
      Quiet NaNs:                              Yes
      Round to nearest even:                   Yes
      Round to zero:                           Yes
      Round to +ve and infinity:               Yes
      IEEE754-2008 fused multiply-add:         Yes
        Cache type:                                None
        Cache line size:                           0
        Cache size:                                0
        Global memory size:                        536870912
        Constant buffer size:                      65536
        Max number of constant args:               8
        Local memory type:                         Scratchpad
        Local memory size:                         32768
        Kernel Preferred work group size multiple: 64
        Error correction support:                  0
        Unified memory for Host and Device:        0
        Profiling timer resolution:                1
        Device endianess:                          Little
        Available:                                 Yes
        Compiler available:                        Yes
        Execution capabilities:                          
      Execute OpenCL kernels:                  Yes
      Execute native function:                 No
        Queue properties:                        
      Out-of-Order:                            No
      Profiling :                              Yes
        Platform ID:                               0x7faa4267a100
        Name:                                      Cypress
        Vendor:                                    Advanced Micro Devices, Inc.
        Device OpenCL C version:                   OpenCL C 1.1
        Driver version:                            CAL 1.4.1664
        Profile:                                   FULL_PROFILE
        Version:                                   OpenCL 1.1 AMD-APP (851.4)
        Extensions:                                cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

       

       

        Device Type:                               CL_DEVICE_TYPE_CPU
        Device ID:                                 4098
        Board name:                              
        Max compute units:                         4
        Max work items dimensions:                 3
      Max work items[0]:                       1024
      Max work items[1]:                       1024
      Max work items[2]:                       1024
        Max work group size:                       1024
        Preferred vector width char:               16
        Preferred vector width short:              8
        Preferred vector width int:                4
        Preferred vector width long:               2
        Preferred vector width float:              4
        Preferred vector width double:             0
        Native vector width char:                  16
        Native vector width short:                 8
        Native vector width int:                   4
        Native vector width long:                  2
        Native vector width float:                 4
        Native vector width double:                0
        Max clock frequency:                       2800Mhz
        Address bits:                              64
        Max memory allocation:                     2147483648
        Image support:                             Yes
        Max number of images read arguments:       128
        Max number of images write arguments:      8
        Max image 2D width:                        8192
        Max image 2D height:                       8192
        Max image 3D width:                        2048
        Max image 3D height:                       2048
        Max image 3D depth:                        2048
        Max samplers within kernel:                16
        Max size of kernel argument:               4096
        Alignment (bits) of base address:          1024
        Minimum alignment (bytes) for any datatype:128

        Single precision floating point capability

      Denorms:                                 Yes
      Quiet NaNs:                              Yes
      Round to nearest even:                   Yes
      Round to zero:                           Yes
      Round to +ve and infinity:               Yes
      IEEE754-2008 fused multiply-add:         Yes
        Cache type:                                Read/Write
        Cache line size:                           64
        Cache size:                                65536
        Global memory size:                        7860842496
        Constant buffer size:                      65536
        Max number of constant args:               8
        Local memory type:                         Global
        Local memory size:                         32768
        Kernel Preferred work group size multiple: 1
        Error correction support:                  0
        Unified memory for Host and Device:        1
        Profiling timer resolution:                1
        Device endianess:                          Little
        Available:                                 Yes
        Compiler available:                        Yes
        Execution capabilities:                          
      Execute OpenCL kernels:                  Yes
      Execute native function:                 Yes
        Queue properties:                        
      Out-of-Order:                            No
      Profiling :                              Yes
        Platform ID:                               0x7faa4267a100
        Name:                                      AMD Phenom(tm) II X4 925 Processor
        Vendor:                                    AuthenticAMD
        Device OpenCL C version:                   OpenCL C 1.1
        Driver version:                            2.0
        Profile:                                   FULL_PROFILE
        Version:                                   OpenCL 1.1 AMD-APP (851.4)
        Extensions:                                cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt
        • Re: clAmdBlasTune errors
          kknox

          Hi Rahul~

           

          Thank you for the clinfo dump, that helps to see your system configuration.  We have not had problems before with the Tune program on Linux, so I have no solution to offer you yet. 

           

          Could you provide to me the command line you used to invoke clAmdBlasTune, and could you verify for me what driver and SDK you installed to get CAL 1.4.1664 / OpenCL 1.1 AMD-APP (851.4).

           

          Kent

            • Re: clAmdBlasTune errors
              rahulgarg

              Unfortunately I did not remember what version of driver I had installed earlier. So I just grabbed the latest catalyst (12.3) and tried again

              Got the same errors as before. Note that the application does not terminate upon giving an error reported, instead I force terminate it after seeing the error.

               

              New clinfo is attached below.

              AMD APP version 2.6

              Called clAmdBlasTune from the bin64 folder as follows : "./clAmdBlasTune" with no further parameters.

              AMD_CLBLAS_STORAGE_PATH variable is set appropriately (to a folder in my home folder to which i have permission).

               

              One other detail, in case it makes any difference: My monitor is connected to an integrated Radeon 4200, Radeon 5850 is for OpenCL duties

               

              I incorrectly reported my Ubuntu version as 11.10, I am still on 11.04 (64-bit).

               

               

               

              Number of platforms:                         1
                Platform Profile:                          FULL_PROFILE
                Platform Version:                          OpenCL 1.1 AMD-APP (898.1)
                Platform Name:                             AMD Accelerated Parallel Processing
                Platform Vendor:                           Advanced Micro Devices, Inc.
                Platform Extensions:                       cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

               

               

                Platform Name:                             AMD Accelerated Parallel Processing
              Number of devices:                           2
                Device Type:                               CL_DEVICE_TYPE_GPU
                Device ID:                                 4098
                Board name:                                ATI Radeon HD 5800 Series
                Device Topology:                           PCI[ B#2, D#0, F#0 ]
                Max compute units:                         18
                Max work items dimensions:                 3
              Max work items[0]:                       256
              Max work items[1]:                       256
              Max work items[2]:                       256
                Max work group size:                       256
                Preferred vector width char:               16
                Preferred vector width short:              8
                Preferred vector width int:                4
                Preferred vector width long:               2
                Preferred vector width float:              4
                Preferred vector width double:             2
                Native vector width char:                  16
                Native vector width short:                 8
                Native vector width int:                   4
                Native vector width long:                  2
                Native vector width float:                 4
                Native vector width double:                2
                Max clock frequency:                       599Mhz
                Address bits:                              32
                Max memory allocation:                     134217728
                Image support:                             Yes
                Max number of images read arguments:       128
                Max number of images write arguments:      8
                Max image 2D width:                        8192
                Max image 2D height:                       8192
                Max image 3D width:                        2048
                Max image 3D height:                       2048
                Max image 3D depth:                        2048
                Max samplers within kernel:                16
                Max size of kernel argument:               1024
                Alignment (bits) of base address:          2048
                Minimum alignment (bytes) for any datatype:128

                Single precision floating point capability

              Denorms:                                 No
              Quiet NaNs:                              Yes
              Round to nearest even:                   Yes
              Round to zero:                           Yes
              Round to +ve and infinity:               Yes
              IEEE754-2008 fused multiply-add:         Yes
                Cache type:                                None
                Cache line size:                           0
                Cache size:                                0
                Global memory size:                        536870912
                Constant buffer size:                      65536
                Max number of constant args:               8
                Local memory type:                         Scratchpad
                Local memory size:                         32768
                Kernel Preferred work group size multiple: 64
                Error correction support:                  0
                Unified memory for Host and Device:        0
                Profiling timer resolution:                1
                Device endianess:                          Little
                Available:                                 Yes
                Compiler available:                        Yes
                Execution capabilities:                          
              Execute OpenCL kernels:                  Yes
              Execute native function:                 No
                Queue properties:                        
              Out-of-Order:                            No
              Profiling :                              Yes
                Platform ID:                               0x7fcb81c46480
                Name:                                      Cypress
                Vendor:                                    Advanced Micro Devices, Inc.
                Device OpenCL C version:                   OpenCL C 1.1
                Driver version:                            CAL 1.4.1703
                Profile:                                   FULL_PROFILE
                Version:                                   OpenCL 1.1 AMD-APP (898.1)
                Extensions:                                cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_amd_meminfo

               

               

                Device Type:                               CL_DEVICE_TYPE_CPU
                Device ID:                                 4098
                Board name:                              
                Max compute units:                         4
                Max work items dimensions:                 3
              Max work items[0]:                       1024
              Max work items[1]:                       1024
              Max work items[2]:                       1024
                Max work group size:                       1024
                Preferred vector width char:               16
                Preferred vector width short:              8
                Preferred vector width int:                4
                Preferred vector width long:               2
                Preferred vector width float:              4
                Preferred vector width double:             0
                Native vector width char:                  16
                Native vector width short:                 8
                Native vector width int:                   4
                Native vector width long:                  2
                Native vector width float:                 4
                Native vector width double:                0
                Max clock frequency:                       2800Mhz
                Address bits:                              64
                Max memory allocation:                     2147483648
                Image support:                             Yes
                Max number of images read arguments:       128
                Max number of images write arguments:      8
                Max image 2D width:                        8192
                Max image 2D height:                       8192
                Max image 3D width:                        2048
                Max image 3D height:                       2048
                Max image 3D depth:                        2048
                Max samplers within kernel:                16
                Max size of kernel argument:               4096
                Alignment (bits) of base address:          1024
                Minimum alignment (bytes) for any datatype:128

                Single precision floating point capability

              Denorms:                                 Yes
              Quiet NaNs:                              Yes
              Round to nearest even:                   Yes
              Round to zero:                           Yes
              Round to +ve and infinity:               Yes
              IEEE754-2008 fused multiply-add:         Yes
                Cache type:                                Read/Write
                Cache line size:                           64
                Cache size:                                65536
                Global memory size:                        7860842496
                Constant buffer size:                      65536
                Max number of constant args:               8
                Local memory type:                         Global
                Local memory size:                         32768
                Kernel Preferred work group size multiple: 1
                Error correction support:                  0
                Unified memory for Host and Device:        1
                Profiling timer resolution:                1
                Device endianess:                          Little
                Available:                                 Yes
                Compiler available:                        Yes
                Execution capabilities:                          
              Execute OpenCL kernels:                  Yes
              Execute native function:                 Yes
                Queue properties:                        
              Out-of-Order:                            No
              Profiling :                              Yes
                Platform ID:                               0x7fcb81c46480
                Name:                                      AMD Phenom(tm) II X4 925 Processor
                Vendor:                                    AuthenticAMD
                Device OpenCL C version:                   OpenCL C 1.1
                Driver version:                            2.0
                Profile:                                   FULL_PROFILE
                Version:                                   OpenCL 1.1 AMD-APP (898.1)
                Extensions:                                cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt