If I execute NVIDIA's oclBandWidth Test with my 5970 I achieve very poor results compared to a NVIDIA GTX 260:
5970:
oclBandwidthTest.exe Starting...
WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!
Running on...
Device Cypress
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1575.1
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2454.8
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 49974.0
NVIDIA GTX 260
./oclBandwidthTest Starting...
Running on...
Device GeForce GTX 260
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5251.1
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5256.2
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 91350.3
TEST PASSED
Originally posted by: stefan_w If I execute NVIDIA's oclBandWidth Test with my 5970 I achieve very poor results compared to a NVIDIA GTX 260:
5970:
oclBandwidthTest.exe Starting...
WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!
Running on...
Device Cypress Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1575.1
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 2454.8
Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 49974.0
NVIDIA GTX 260 ./oclBandwidthTest Starting...
Running on...
Device GeForce GTX 260 Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5251.1
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5256.2
Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 91350.3
TEST PASSED
These are known issues. These will be addressed in upcoming releases.
Are there any plans to support page locked/pinned memory (like NVIDIA does)?
i think clEnqueueMapBuffer() use pinned memory.
I use
host_mem = clCreateBuffer(context,
CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE,
size,NULL,&ocl_err);
*ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem,
CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,
0,size,0,NULL,&evt,&ocl_err);
to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.
Originally posted by: stefan_w I use
host_mem = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE, size,NULL,&ocl_err); *ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem, CL_TRUE,CL_MAP_READ|CL_MAP_WRITE, 0,size,0,NULL,&evt,&ocl_err);
to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.
Uses of Pinned memory is not implemented yet. Pinned memory expected to be introduced in upcoming releases.
I still got low PCIe transfer rate. Same openCL code, I got 5-6 GB/s with GTX280 but less than 2 GB/s with 5870. How can I improve the numbers with the Cypress?
$ ./oclBandwidthTestGeneric
Using device 1: GeForce GTX 280
D2H Bandwidth =5.52 GB/s
H2D Bandwidth =5.34 GB/s
$ ./oclBandwidthTestGeneric
Using device 1: Cypress
D2H Bandwidth =1.35 GB/s
H2D Bandwidth =0.49 GB/s
I am using Stream SDK 2.1
If you set oclBandwidthTest to use pinned and mapped memory, that achieves near theoretical results on an ATI 4870 and 5670. On NVIDIA 260, I get near theoretical results using pinned and direct memory settings.