See page A-15 near the end.
None of the current APUs support double-precision, you might have seen the CPU part extensions accidentally perhaps?
1. In opencl specification 1.2, it says that...
double: A 64-bit float ing-point. The double data type must conform to the
IEEE 754 double precision storage format.
The double scalar type is an optional type that is supported if CL_DEVICE_DOUBLE_FP_CONFIG for the device is not zero.
1 of 1 people found this helpful
Under windows, the 7660g reports cl_amd_fp64:
Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
and the CPU reports both cl_khr_fp64 and cl_amd_fp64:
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing
Where my understanding is that cl_amd_fp64 is some limited subset of double operations.
cl_amd_fp64 is noticeably absent from the second GPU though.
There are several examples in app sdk which uses cl_amd_fp64, did you try running some and see what they do?
BlackScholesDP_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable FluidSimulation2D_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable LUDecomposition_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable Mandelbrot_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable MatrixMulDouble_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable MonteCarloAsianDP_Kernels.cl:#pragma OPENCL EXTENSION cl_amd_fp64 : enable
Interestingly, I found this where clinfo shows cl_khr_fp64 for your APU also...
I haven't tried using cl_amd_fp64 yet, since the 7660g does not appear to be supported in linux yet and I dont' have a build environment setup in windows.
Also, the programming guide is pretty confusing. For example:
"Before using double data types, double-precision floating point operators, and/or
double-precision floating point routines in OpenCL™ C kernels, include the
#pragma OPENCL EXTENSION cl_amd_fp64 : enable directive. See Table A.1
for a list of supported routines."
But then if you go to where table A.1(?) should be, there is no table of functions but this :
"AMD OpenCL is now cl_khr_fp64-compliant on devices compliant with OpenCL
1.1 and greater (every GPU later than 7xx, and all CPUs). Thus, cl_amd_fp64
is now a synonym for cl_khr_fp64 on all supported devices"
My reading of the above is that the 7610m should support doubles, but doesn't report it.
Why would the CPU report BOTH cl_amd_fp64 and cl_khr_fp64 while the 7660g only reports cl_amd_fp64 if they mean the same thing?
Seems to me that the programming guide could use some updating.
In windows the samples come pre-compiled. You can simply run them after installing AMD APP SDK Samples (which is bundled with APP SDK installer). At least the DP versions do not run on my E-450's APU. Please let us know how it goes for you. The samples also come in binary form also in Linux, so you dont really have to compile them yourself. You can test them on Linux also.
Yes, there seems to be little problem with what the text about cl_amd_fp64 means exactly in the documentation. I made a separate thread for it, usually AMD catch it up and fix in documentation... (or they say they will )
Many thanks for all the pointers.
I did try out the DP examples you pointed out above on windows since the 7760g doesn't show up at all in linux.
The results printed for BlackScholesDP_Kernels are identical if i use --device gpu vs --device cpu so it appears that the 7760g has reasonable double support.
BlackScholesDP.exe --device cpu -t Option Samples Time(sec) [Transfer+kernel]Time(sec)Options/sec 262144 0.365735 0.0225102 716759 BlackScholesDP.exe --device gpu -t Option Samples Time(sec) [Transfer+kernel]Time(sec)Options/sec 262144 0.5922 0.0353749 442662
FluidSimulation2D.exe gets about 70FPS for CPU and 55FPS for GPU. mandelbrot seems a bit faster on the CPU device but I didn't see any options to generate timing data
MatrixMulDouble.exe -x 1024 -y 1024 -z 1024 --device cpu -t GFlops achieved : 0.847905 MatrixA MatrixB Time(sec) KernelTime(sec) 1024x1024 1024x1024 2.88939 2.53269 MatrixMulDouble.exe -x 1024 -y 1024 -z 1024 --device gpu -t GFlops achieved : 10.8667 MatrixA MatrixB Time(sec) KernelTime(sec) 1024x1024 1024x1024 0.529516 0.197621 MonteCarloAsianDP.exe --device cpu -t Steps Time(sec) [Transfer+kernel](sec) Samples used /sec 10 4.90963 4.54223 24798.4 MonteCarloAsianDP.exe --device gpu -t Steps Time(sec) [Transfer+kernel](sec) Samples used /sec 10 4.03406 2.79827 40253.4
Perhaps those samples were not optimized for APUs, but at least they work (also DP performance might be much lower than FP performance). I should get my hands on one of those APUs
Hopefully AMD can fix the Linux drivers soon also!
Try to use that
char ext[2, 2048];
char dev[2, 2048];
clGetPlatformIds(1, &platform, NULL);
clGetDeviceIds(platform, CL_DEVICE_TYPE_GPU, 2, &Device, NULL);
for(i=0; i>1; i++)
clGetDeviceInfo(device[i], CL_DEVICE_EXTENSIONS, 2048, &ext[i, 0], NULL);
clGetDeviceInfo(device[i], CL_DEVICE_NAME, 2048, &dev[i, 0], NULL);
after that only ned show in loop dev[i,0] and ext[i,0].