6 Replies Latest reply on Jul 19, 2012 11:32 AM by heman

    printf("") giving segmentation fault

    gautam.himanshu

      Hi AMD,

      My system specification is: Intel i3 sandybridge CPU, HD 7970 GPU, Ubuntu 12.04 OS

      Device OpenCL C version:                       OpenCL C 1.2

        Driver version:                                CAL 1.4.1720 (VM)

        Profile:                                       FULL_PROFILE

        Version:                                       OpenCL 1.2 AMD-APP (923.1)

      It may not be a serious issue, but it certainly is annoying.

       

      Here is the backtrace when i ran it on CPU:

      Program received signal SIGSEGV, Segmentation fault.

      0x00007ffff613809b in llvm::ConstantArray::isString() const () from /usr/lib/libamdocl64.so

      (gdb) bt

      #0  0x00007ffff613809b in llvm::ConstantArray::isString() const () from /usr/lib/libamdocl64.so

      #1  0x00007ffff59ad1e6 in (anonymous namespace)::AMDILPrintfConvert::expandPrintf(llvm::ilist_iterator<llvm::Instruction>*) ()

         from /usr/lib/libamdocl64.so

      #2  0x00007ffff59acd24 in (anonymous namespace)::AMDILPrintfConvert::runOnFunction(llvm::Function&) () from /usr/lib/libamdocl64.so

      #3  0x00007ffff61dc630 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /usr/lib/libamdocl64.so

      #4  0x00007ffff61dc742 in llvm::FunctionPassManagerImpl::run(llvm::Function&) () from /usr/lib/libamdocl64.so

      #5  0x00007ffff61dc8d9 in llvm::FunctionPassManager::run(llvm::Function&) () from /usr/lib/libamdocl64.so

      #6  0x00007ffff5482812 in amd::CompilerImpl::llvmCodeGen(llvm::Module*, std::string&, amd::CompilerTargetInfo&, llvm::JunkJITBinary**) () from /usr/lib/libamdocl64.so

      #7  0x00007ffff5486e18 in amd::CompilerImpl::llvmLinkOptCG(std::string&, std::string&, amd::CompilerTargetInfo&, llvm::JunkJITBinary**) () from /usr/lib/libamdocl64.so

      #8  0x00007ffff54c3f37 in gpu::NullProgram::compileBinaryToIL(amd::option::Options*) () from /usr/lib/libamdocl64.so

      #9  0x00007ffff54e769c in gpu::NullProgram::linkImpl(amd::option::Options*) () from /usr/lib/libamdocl64.so

      #10 0x00007ffff548ed3e in device::Program::build(std::string const&, char const*, amd::option::Options*) ()

         from /usr/lib/libamdocl64.so

      #11 0x00007ffff549c338 in amd::Program::build(std::vector<amd::Device*, std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool) () from /usr/lib/libamdocl64.so

      #12 0x00007ffff547d5e7 in clBuildProgram () from /usr/lib/libamdocl64.so

        • Re: printf("") giving segmentation fault
          gautam.himanshu

          another small bug is in clinfo output. Local Memory Size on tahiti is 64KB( it reports 32KB), GPU Max Clock is 925MHz( it reports 0MHZ).

           

          clinfo:

          Number of platforms:                             2
            Platform Profile:                              FULL_PROFILE
            Platform Version:                              OpenCL 1.2 AMD-APP (923.1)
            Platform Name:                                 AMD Accelerated Parallel Processing
            Platform Vendor:                               Advanced Micro Devices, Inc.
            Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
            Platform Profile:                              FULL_PROFILE
            Platform Version:                              OpenCL 1.1 LINUX
            Platform Name:                                 Intel(R) OpenCL
            Platform Vendor:                               Intel(R) Corporation
            Platform Extensions:                           cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread


            Platform Name:                                 AMD Accelerated Parallel Processing
          Number of devices:                               2
            Device Type:                                   CL_DEVICE_TYPE_GPU
            Device ID:                                     4098
            Board name:                                    AMD Radeon HD 7900 Series
            Device Topology:                               PCI[ B#1, D#0, F#0 ]
            Max compute units:                             32
            Max work items dimensions:                     3
              Max work items[0]:                           256
              Max work items[1]:                           256
              Max work items[2]:                           256
            Max work group size:                           256
            Preferred vector width char:                   16
            Preferred vector width short:                  8
            Preferred vector width int:                    4
            Preferred vector width long:                   2
            Preferred vector width float:                  4
            Preferred vector width double:                 2
            Native vector width char:                      16
            Native vector width short:                     8
            Native vector width int:                       4
            Native vector width long:                      2
            Native vector width float:                     4
            Native vector width double:                    2
            Max clock frequency:                           0Mhz
            Address bits:                                  32
            Max memory allocation:                         536870912
            Image support:                                 Yes
            Max number of images read arguments:           128
            Max number of images write arguments:          8
            Max image 2D width:                            8192
            Max image 2D height:                           8192
            Max image 3D width:                            2048
            Max image 3D height:                           2048
            Max image 3D depth:                            2048
            Max samplers within kernel:                    16
            Max size of kernel argument:                   1024
            Alignment (bits) of base address:              2048
            Minimum alignment (bytes) for any datatype:    128
            Single precision floating point capability
              Denorms:                                     No
              Quiet NaNs:                                  Yes
              Round to nearest even:                       Yes
              Round to zero:                               Yes
              Round to +ve and infinity:                   Yes
              IEEE754-2008 fused multiply-add:             Yes
            Cache type:                                    Read/Write
            Cache line size:                               64
            Cache size:                                    16384
            Global memory size:                            2147483648
            Constant buffer size:                          65536
            Max number of constant args:                   8
            Local memory type:                             Scratchpad
            Local memory size:                             32768
            Kernel Preferred work group size multiple:     64
            Error correction support:                      0
            Unified memory for Host and Device:            0
            Profiling timer resolution:                    1
            Device endianess:                              Little
            Available:                                     Yes
            Compiler available:                            Yes
            Execution capabilities:
              Execute OpenCL kernels:                      Yes
              Execute native function:                     No
            Queue properties:
              Out-of-Order:                                No
              Profiling :                                  Yes
            Platform ID:                                   0x7f658d6880a0
            Name:                                          Tahiti
            Vendor:                                        Advanced Micro Devices, Inc.
            Device OpenCL C version:                       OpenCL C 1.2
            Driver version:                                CAL 1.4.1720 (VM)
            Profile:                                       FULL_PROFILE
            Version:                                       OpenCL 1.2 AMD-APP (923.1)
            Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt


            Device Type:                                   CL_DEVICE_TYPE_CPU
            Device ID:                                     4098
            Board name:
            Max compute units:                             4
            Max work items dimensions:                     3
              Max work items[0]:                           1024
              Max work items[1]:                           1024
              Max work items[2]:                           1024
            Max work group size:                           1024
            Preferred vector width char:                   16
            Preferred vector width short:                  8
            Preferred vector width int:                    4
            Preferred vector width long:                   2
            Preferred vector width float:                  4
            Preferred vector width double:                 0
            Native vector width char:                      16
            Native vector width short:                     8
            Native vector width int:                       4
            Native vector width long:                      2
            Native vector width float:                     4
            Native vector width double:                    0
            Max clock frequency:                           3300Mhz
            Address bits:                                  64
            Max memory allocation:                         2147483648
            Image support:                                 Yes
            Max number of images read arguments:           128
            Max number of images write arguments:          8
            Max image 2D width:                            8192
            Max image 2D height:                           8192
            Max image 3D width:                            2048
            Max image 3D height:                           2048
            Max image 3D depth:                            2048
            Max samplers within kernel:                    16
            Max size of kernel argument:                   4096
            Alignment (bits) of base address:              1024
            Minimum alignment (bytes) for any datatype:    128
            Single precision floating point capability
              Denorms:                                     Yes
              Quiet NaNs:                                  Yes
              Round to nearest even:                       Yes
              Round to zero:                               Yes
              Round to +ve and infinity:                   Yes
              IEEE754-2008 fused multiply-add:             Yes
            Cache type:                                    Read/Write
            Cache line size:                               64
            Cache size:                                    32768
            Global memory size:                            4138541056
            Constant buffer size:                          65536
            Max number of constant args:                   8
            Local memory type:                             Global
            Local memory size:                             32768
            Kernel Preferred work group size multiple:     1
            Error correction support:                      0
            Unified memory for Host and Device:            1
            Profiling timer resolution:                    1
            Device endianess:                              Little
            Available:                                     Yes
            Compiler available:                            Yes
            Execution capabilities:
              Execute OpenCL kernels:                      Yes
              Execute native function:                     Yes
            Queue properties:
              Out-of-Order:                                No
              Profiling :                                  Yes
            Platform ID:                                   0x7f658d6880a0
            Name:                                          Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
            Vendor:                                        GenuineIntel
            Device OpenCL C version:                       OpenCL C 1.2
            Driver version:                                2.0 (sse2,avx)
            Profile:                                       FULL_PROFILE
            Version:                                       OpenCL 1.2 AMD-APP (923.1)
            Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt


            Platform Name:                                 Intel(R) OpenCL
          Number of devices:                               1
            Device Type:                                   CL_DEVICE_TYPE_CPU
            Device ID:                                     32902
            Max compute units:                             4
            Max work items dimensions:                     3
              Max work items[0]:                           1024
              Max work items[1]:                           1024
              Max work items[2]:                           1024
            Max work group size:                           1024
            Preferred vector width char:                   16
            Preferred vector width short:                  8
            Preferred vector width int:                    4
            Preferred vector width long:                   2
            Preferred vector width float:                  4
            Preferred vector width double:                 2
            Native vector width char:                      16
            Native vector width short:                     8
            Native vector width int:                       4
            Native vector width long:                      2
            Native vector width float:                     4
            Native vector width double:                    2
            Max clock frequency:                           3300Mhz
            Address bits:                                  64
            Max memory allocation:                         1034635264
            Image support:                                 Yes
            Max number of images read arguments:           480
            Max number of images write arguments:          480
            Max image 2D width:                            8192
            Max image 2D height:                           8192
            Max image 3D width:                            2048
            Max image 3D height:                           2048
            Max image 3D depth:                            2048
            Max samplers within kernel:                    480
            Max size of kernel argument:                   3840
            Alignment (bits) of base address:              1024
            Minimum alignment (bytes) for any datatype:    128
            Single precision floating point capability
              Denorms:                                     Yes
              Quiet NaNs:                                  Yes
              Round to nearest even:                       Yes
              Round to zero:                               No
              Round to +ve and infinity:                   No
              IEEE754-2008 fused multiply-add:             No
            Cache type:                                    Read/Write
            Cache line size:                               64
            Cache size:                                    262144
            Global memory size:                            4138541056
            Constant buffer size:                          131072
            Max number of constant args:                   480
            Local memory type:                             Global
            Local memory size:                             32768
            Kernel Preferred work group size multiple:     128
            Error correction support:                      0
            Unified memory for Host and Device:            1
            Profiling timer resolution:                    1
            Device endianess:                              Little
            Available:                                     Yes
            Compiler available:                            Yes
            Execution capabilities:
              Execute OpenCL kernels:                      Yes
              Execute native function:                     Yes
            Queue properties:
              Out-of-Order:                                Yes
              Profiling :                                  Yes
            Platform ID:                                   0x1974468
            Name:                                                  Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz
            Vendor:                                        Intel(R) Corporation
            Device OpenCL C version:                       OpenCL C 1.1
            Driver version:                                1.1
            Profile:                                       FULL_PROFILE
            Version:                                       OpenCL 1.1 (Build 31360.31426)
            Extensions:                                    cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread

          • Re: printf("") giving segmentation fault
            MicahVillmow

            Please supply a test case that shows the failure so we can debug this.

              • Re: printf("") giving segmentation fault
                heman

                Hi Micah,

                I think the test case is easy to construct. I just modified bitonic sort kernel as follows and the sample crashed for me on my windows machine also.

                 

                #pragma OPENCL EXTENSION cl_amd_printf:enable

                __kernel

                void bitonicSort(__global uint * theArray,

                                 const uint stage,

                                 const uint passOfStage,

                                 const uint width,

                                 const uint direction)

                {

                    uint sortIncreasing = direction;

                    uint threadId = get_global_id(0);

                 

                    uint pairDistance = 1 << (stage - passOfStage);

                    uint blockWidth   = 2 * pairDistance;

                          printf("");

                    uint leftId = (threadId % pairDistance)

                                   + (threadId / pairDistance) * blockWidth;

                 

                ........

                 

                and thanks for clarifying about LOCAL Mem Size

                1 of 1 people found this helpful