AnsweredAssumed Answered

hung in aclCompile()

Question asked by shri on Jun 19, 2018

hi,

 

I have two AMD GPU when I do clinfo i get below information. I am using OpenCL 1.2 and WX9100

clinfo

 

Number of platforms: 1

  Platform Profile: FULL_PROFILE

  Platform Version: OpenCL 2.1 AMD-APP (2580.4)

  Platform Name: AMD Accelerated Parallel Processing

  Platform Vendor: Advanced Micro Devices, Inc.

  Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

 

 

  Platform Name: AMD Accelerated Parallel Processing

Number of devices: 2

  Device Type: CL_DEVICE_TYPE_GPU

  Vendor ID: 1002h

  Board name: Radeon (TM) Pro WX 9100

  Device Topology: PCI[ B#96, D#0, F#0 ]

  Max compute units: 64

  Max work items dimensions: 3

    Max work items[0]: 1024

    Max work items[1]: 1024

    Max work items[2]: 1024

  Max work group size: 256

  Preferred vector width char: 4

  Preferred vector width short: 2

  Preferred vector width int: 1

  Preferred vector width long: 1

  Preferred vector width float: 1

  Preferred vector width double: 1

  Native vector width char: 4

  Native vector width short: 2

  Native vector width int: 1

  Native vector width long: 1

  Native vector width float: 1

  Native vector width double: 1

  Max clock frequency: 1500Mhz

  Address bits: 64

  Max memory allocation: 4244635648

  Image support: Yes

  Max number of images read arguments: 128

  Max number of images write arguments: 8

  Max image 2D width: 16384

  Max image 2D height: 16384

  Max image 3D width: 2048

  Max image 3D height: 2048

  Max image 3D depth: 2048

  Max samplers within kernel: 16

  Max size of kernel argument: 1024

  Alignment (bits) of base address: 2048

  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms: No

    Quiet NaNs: Yes

    Round to nearest even: Yes

    Round to zero: Yes

    Round to +ve and infinity: Yes

    IEEE754-2008 fused multiply-add: Yes

  Cache type: Read/Write

  Cache line size: 64

  Cache size: 16384

  Global memory size: 16978542592

  Constant buffer size: 4244635648

  Max number of constant args: 8

  Local memory type: Scratchpad

  Local memory size: 32768

  Max pipe arguments: 0

  Max pipe active reservations: 0

  Max pipe packet size: 0

  Max global variable size: 0

  Max global variable preferred total size: 0

  Max read/write image args: 0

  Max on device events: 0

  Queue on device max size: 0

  Max on device queues: 0

  Queue on device preferred size: 0

  SVM capabilities:

    Coarse grain buffer: No

    Fine grain buffer: No

    Fine grain system: No

    Atomics: No

  Preferred platform atomic alignment: 0

  Preferred global atomic alignment: 0

  Preferred local atomic alignment: 0

  Kernel Preferred work group size multiple: 64

  Error correction support: 0

  Unified memory for Host and Device: 0

  Profiling timer resolution: 1

  Device endianess: Little

  Available: Yes

  Compiler available: Yes

  Execution capabilities:

    Execute OpenCL kernels: Yes

    Execute native function: No

  Queue on Host properties:

    Out-of-Order: No

    Profiling : Yes

  Queue on Device properties:

    Out-of-Order: No

    Profiling : No

  Platform ID: 0x7f8e18162350

  Name: gfx900

  Vendor: Advanced Micro Devices, Inc.

  Device OpenCL C version: OpenCL C 1.2

  Driver version: 2580.4 (PAL,HSAIL)

  Profile: FULL_PROFILE

  Version: OpenCL 1.2 AMD-APP (2580.4)

  Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

 

 

  Device Type: CL_DEVICE_TYPE_GPU

  Vendor ID: 1002h

  Board name: Radeon (TM) Pro WX 9100

  Device Topology: PCI[ B#61, D#0, F#0 ]

  Max compute units: 64

  Max work items dimensions: 3

    Max work items[0]: 1024

    Max work items[1]: 1024

    Max work items[2]: 1024

  Max work group size: 256

  Preferred vector width char: 4

  Preferred vector width short: 2

  Preferred vector width int: 1

  Preferred vector width long: 1

  Preferred vector width float: 1

  Preferred vector width double: 1

  Native vector width char: 4

  Native vector width short: 2

  Native vector width int: 1

  Native vector width long: 1

  Native vector width float: 1

  Native vector width double: 1

  Max clock frequency: 1500Mhz

  Address bits: 64

  Max memory allocation: 4244635648

  Image support: Yes

  Max number of images read arguments: 128

  Max number of images write arguments: 8

  Max image 2D width: 16384

  Max image 2D height: 16384

  Max image 3D width: 2048

  Max image 3D height: 2048

  Max image 3D depth: 2048

  Max samplers within kernel: 16

  Max size of kernel argument: 1024

  Alignment (bits) of base address: 2048

  Minimum alignment (bytes) for any datatype: 128

  Single precision floating point capability

    Denorms: No

    Quiet NaNs: Yes

    Round to nearest even: Yes

    Round to zero: Yes

    Round to +ve and infinity: Yes

    IEEE754-2008 fused multiply-add: Yes

  Cache type: Read/Write

  Cache line size: 64

  Cache size: 16384

  Global memory size: 16978542592

  Constant buffer size: 4244635648

  Max number of constant args: 8

  Local memory type: Scratchpad

  Local memory size: 32768

  Max pipe arguments: 0

  Max pipe active reservations: 0

  Max pipe packet size: 0

  Max global variable size: 0

  Max global variable preferred total size: 0

  Max read/write image args: 0

  Max on device events: 0

  Queue on device max size: 0

  Max on device queues: 0

  Queue on device preferred size: 0

  SVM capabilities:

    Coarse grain buffer: No

    Fine grain buffer: No

    Fine grain system: No

    Atomics: No

  Preferred platform atomic alignment: 0

  Preferred global atomic alignment: 0

  Preferred local atomic alignment: 0

  Kernel Preferred work group size multiple: 64

  Error correction support: 0

  Unified memory for Host and Device: 0

  Profiling timer resolution: 1

  Device endianess: Little

  Available: Yes

  Compiler available: Yes

  Execution capabilities:

    Execute OpenCL kernels: Yes

    Execute native function: No

  Queue on Host properties:

    Out-of-Order: No

    Profiling : Yes

  Queue on Device properties:

    Out-of-Order: No

    Profiling : No

  Platform ID: 0x7f8e18162350

  Name: gfx900

  Vendor: Advanced Micro Devices, Inc.

  Device OpenCL C version: OpenCL C 1.2

  Driver version: 2580.4 (PAL,HSAIL)

  Profile: FULL_PROFILE

  Version: OpenCL 1.2 AMD-APP (2580.4)

  Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

 

below is the trace. I have dirvers installed as amdgpu-pro-18.10-572953.tar.xz. 

 

 

Same piece of code was woring on with older drivers.

 

1 - Any idea why my clinfo showing zeros

Why zero's

  Max pipe arguments: 0

  Max pipe active reservations: 0

  Max pipe packet size: 0

  Max global variable size: 0

  Max global variable preferred total size: 0

  Max read/write image args: 0

  Max on device events: 0

  Queue on device max size: 0

  Max on device queues: 0

  Queue on device preferred size: 0

 

2 - Any reason why infinite hung in "aclCompile()".

Outcomes