Can anyone explain whether opencl1.2 supports double precision? I understand that this is optional for 1.2 . If so how to enable the support?
Yes, OpenCL 1.2 supports double precision. Whether or not your device does is another question.
You can query your device to test this.
Page 39 onwards of the OpenCL 1.2 Specification: www.khronos.org/registry/cl/specs/opencl-1.2.pdf
If double precision is not supported, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE must return 0.
If double precision is not supported, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE must return 0.
Thanku ankhster .. Can you just tell me if the device supports double precision, what should be the value ?
As it says, the device must return zero if the device does not support double precision. Depending on the device you will get a zero or non-zero answer.
The values I get are:
Preferred Double Vector Width 1 // This is HD 7970 GPU
Preferred Double Vector Width 2 // This is Intel Q9450 CPU
indicating both devices support double precision.
I tried to get the info from the device Intel(R) Core(TM)2 Duo CPU E7500 and I got both the values as zero.
So does it mean that the CPU doesn't support double precision. Then how can I use double precision algorithms??
you should check for cl_khr_fp64 device extension.
can I check using clinfo?
Indeed you can, you should get something like this:
CL_DEVICE_EXTENSIONS: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_dx9_media_sharing
Do we have any other API to read and write data from and to the device other than , clEnqueueReadBuffer and clEnqueueWriteBuffer ?
Actually I wanted to reduce the time of data transfer between host and device. But clEnqueuereadbuffer & clenqueuewritebuffer and clenqueuemapbuffer are consuming same time .. How can I reduce the transfer time????
Please take a look at http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_fp64.html
On the kernel side, You need to explicitly enable double precision support using: #pragma OPENCL EXTENSION cl_khr_fp64 : enable
On the host side, you need to check if the device has double precision support. I think earlier posts answer that.
iirc, The APP SDK has a sample demonstrating the various memory bandwidths (especially PCIe memory bandwidth). Kindly check the source code of the sample. It should give you a fair idea of how to use it.
On your question on memory transfer time -- The actual memory transfer can be scheduled by the OpenCL run-time as it thinks appropriate. So, just timing the cl* calls is not sufficient. One way to overcome this is to use "BLOCKING" flag to make sure that the call happens. Similar case with "Map". Sometimes the copy is done only when you unmap and so on. You need to read the spec a bit more on Map/UnMap semantics and certain parts may be dictated by the OpenCL run-time implementation.
Best Regards,WorkItem 6
Signature: get_global_id(0) == 5
You need to give more details, possibly a code snippet on how you are measuring the performance.
Recommended way is to use cl_events objects. And query time taken using clGetEventProfilingInfo API.
But a safer and more robust way is to use system timing function, by using clFlush/clFinish calls properly.
One way to reduce transfer latency is to overlap transfer and execution if you have multiple kernel launches.
This can be done as long as there is no dependency between transferred buffers and executing kernel.
AMD APP SDK has a sample called "TransferOverlap" which demonstrates this.
Retrieving data ...