cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jape
Adept I

no double support for Radeon HD 7500/7600 Series?

Hi All,

I've got a toshiba Satellite-S875D which has an A10 processor and a HD 7610M second video card.

The 7610m seems to get treated as a 7500/7600 in linux:

jape@jape-Satellite-S875D:~$ aticonfig --list-adapters

* 0. 00:01.0 AMD Radeon HD 7660G

  1. 01:00.0 AMD Radeon HD 7500/7600 Series

What I'm wondering about is the lack of any fp64 extension for the 7610m:

  Extensions:                                cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt

in windows the 7660g reports cl_amd_fp64 as an extension.

Is this expected?  Does the 7610M not have any support for doubles in opencl?

regards,

jp

0 Likes
9 Replies
yurtesen
Miniboss

http://developer.amd.com/tools/hc/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Progra...

See page A-15 near the end.

None of the current APUs support double-precision, you might have seen the CPU part extensions accidentally perhaps?

0 Likes

1. In opencl specification 1.2, it says that...

double:  A  64-bit float ing-point.  The  double  data type must conform to the

IEEE 754 double  precision storage format. 

The double scalar type is an optional type that is supported if CL_DEVICE_DOUBLE_FP_CONFIG for the device is not zero.

0 Likes

Under windows, the 7660g reports cl_amd_fp64:

  Extensions:     cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing

and the CPU reports both cl_khr_fp64 and cl_amd_fp64:

Extensions:     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing

Where my understanding is that cl_amd_fp64 is some limited subset of double operations.

cl_amd_fp64 is noticeably absent from the second GPU though.

jp

There are several examples in app sdk which uses cl_amd_fp64, did you try running some and see what they do?

BlackScholesDP_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable

FluidSimulation2D_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable

LUDecomposition_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable

Mandelbrot_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable

MatrixMulDouble_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable

MonteCarloAsianDP_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable

Interestingly, I found this where clinfo shows cl_khr_fp64 for your APU also...

http://images.anandtech.com/reviews/mobile/Trinity-CLinfo.zip

0 Likes

I haven't tried using cl_amd_fp64 yet, since the 7660g does not appear to be supported in linux yet and I dont' have a build environment setup in windows.

Also, the programming guide is pretty confusing.  For example:

"Before using double data types, double-precision floating point operators, and/or

double-precision floating point routines in OpenCL™ C kernels, include the

#pragma OPENCL EXTENSION cl_amd_fp64 : enable directive. See Table A.1

for a list of supported routines."

But then if you go to where table A.1(?)  should be, there is no table of functions but this :

"AMD OpenCL is now cl_khr_fp64-compliant on devices compliant with OpenCL

1.1 and greater (every GPU later than 7xx, and all CPUs). Thus, cl_amd_fp64

is now a synonym for cl_khr_fp64 on all supported devices"

My reading of the above is that the 7610m should support doubles, but doesn't report it. 

Why would the CPU report BOTH cl_amd_fp64 and cl_khr_fp64 while the 7660g only reports cl_amd_fp64 if they mean the same thing?

Seems to me that the programming guide could use some updating.

regards

jp

0 Likes

In windows the samples come pre-compiled. You can simply run them after installing AMD APP SDK Samples (which is bundled with APP SDK installer). At least the DP versions do not run on my E-450's APU. Please let us know how it goes for you. The samples also come in binary form also in Linux, so you dont really have to compile them yourself. You can test them on Linux also.

Yes, there seems to be little problem with what the text about cl_amd_fp64 means exactly in the documentation. I made a separate thread for it, usually AMD catch it up and fix in documentation... (or they say they will )

http://devgurus.amd.com/message/1283795

0 Likes

Many thanks for all the pointers.

I did try out the DP examples you pointed out above on windows since the 7760g doesn't show up at all in linux. 

The results printed for BlackScholesDP_Kernels are identical if i use --device gpu vs --device cpu so it appears that the 7760g has reasonable double support.

BlackScholesDP.exe --device cpu -t
Option Samples           Time(sec)                [Transfer+kernel]Time(sec)Options/sec
262144                   0.365735                 0.0225102                716759

BlackScholesDP.exe --device gpu -t
Option Samples           Time(sec)                [Transfer+kernel]Time(sec)Options/sec
262144                   0.5922                   0.0353749                442662

FluidSimulation2D.exe gets about 70FPS for CPU and 55FPS for GPU.  mandelbrot seems a bit faster on the CPU device but I didn't see any options to generate timing data

MatrixMulDouble.exe -x 1024 -y 1024 -z 1024 --device cpu -t
GFlops achieved : 0.847905
MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
1024x1024                1024x1024                2.88939                  2.53269

MatrixMulDouble.exe -x 1024 -y 1024 -z 1024 --device gpu -t
GFlops achieved : 10.8667
MatrixA                  MatrixB                  Time(sec)                KernelTime(sec)
1024x1024                1024x1024                0.529516                 0.197621

MonteCarloAsianDP.exe --device cpu -t
Steps                    Time(sec)                [Transfer+kernel](sec)   Samples used /sec
10                       4.90963                  4.54223                  24798.4

MonteCarloAsianDP.exe --device gpu -t
Steps                    Time(sec)                [Transfer+kernel](sec)   Samples used /sec
10                       4.03406                  2.79827                  40253.4

0 Likes

Perhaps those samples were not optimized for APUs, but at least they work (also DP performance might be much lower than FP performance). I should get my hands on one of those APUs

Hopefully AMD can fix the Linux drivers soon also!

0 Likes
amdkid
Adept I

Hello jape.

Try to use that

char ext[2, 2048];

char dev[2, 2048];

int i:=0;

cl_device_id Device[2];

cl_platform_id platform;

clGetPlatformIds(1, &platform, NULL);

clGetDeviceIds(platform, CL_DEVICE_TYPE_GPU, 2, &Device[0], NULL);

for(i=0; i>1; i++)

{

clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 2048, &ext[i, 0], NULL);

clGetDeviceInfo(device, CL_DEVICE_NAME, 2048, &dev[i, 0], NULL);

}

after that only ned show in loop dev[i,0] and ext[i,0].

0 Likes