cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

stefan_w
Journeyman III

poor transfer

If I execute NVIDIA's oclBandWidth Test with my 5970 I achieve very poor results compared to a NVIDIA GTX 260:

 

5970:

oclBandwidthTest.exe Starting...

WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!

Running on...

Device Cypress
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1575.1

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2454.8

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 49974.0

 

NVIDIA GTX 260
./oclBandwidthTest Starting...

Running on...

Device GeForce GTX 260
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5251.1

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5256.2

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 91350.3


TEST PASSED

 

 

0 Likes
7 Replies
genaganna
Journeyman III

Originally posted by: stefan_w If I execute NVIDIA's oclBandWidth Test with my 5970 I achieve very poor results compared to a NVIDIA GTX 260:

 

5970:

 

oclBandwidthTest.exe Starting...

 

WARNING: NVIDIA OpenCL platform not found - defaulting to first platform!

 

Running on...

 

Device Cypress Quick Mode

 

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1575.1

 

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 2454.8

 

Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 49974.0

 

 

NVIDIA GTX 260 ./oclBandwidthTest Starting...

 

Running on...

 

Device GeForce GTX 260 Quick Mode

 

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5251.1

 

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5256.2

 

Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 91350.3

 

TEST PASSED

 

 

These are known issues. These will be addressed in upcoming releases.

0 Likes

Are there any plans to support page locked/pinned memory (like NVIDIA does)?

0 Likes

i think clEnqueueMapBuffer() use pinned memory.

0 Likes

I use

host_mem = clCreateBuffer(context,
                            CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE,
                            size,NULL,&ocl_err);
*ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem,
                                   CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,
                                   0,size,0,NULL,&evt,&ocl_err);

to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.

0 Likes

Originally posted by: stefan_w I use

 

host_mem = clCreateBuffer(context,                             CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE,                             size,NULL,&ocl_err); *ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem,                                    CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,                                    0,size,0,NULL,&evt,&ocl_err);

 

to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.

 

Uses of Pinned memory is not implemented yet. Pinned memory expected to be introduced in upcoming releases.

0 Likes

I still got low PCIe transfer rate. Same openCL code, I got 5-6 GB/s with GTX280 but less than 2 GB/s with 5870. How can I improve the numbers with the Cypress?

$ ./oclBandwidthTestGeneric
Using device 1: GeForce GTX 280
D2H Bandwidth =5.52 GB/s
H2D Bandwidth =5.34 GB/s

$ ./oclBandwidthTestGeneric
Using device 1: Cypress
D2H Bandwidth =1.35 GB/s
H2D Bandwidth =0.49 GB/s

I am using Stream SDK 2.1

 

0 Likes

If you set oclBandwidthTest to use pinned and mapped memory, that achieves near theoretical results on an ATI 4870 and 5670. On NVIDIA 260, I get near theoretical results using pinned and direct memory settings.

0 Likes