cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

yurtesen
Miniboss

Map/Unmap performance of SDK 2.7 with Catalyst 12.6

To summarize it, I am getting very poor map/unmap performance. The device is a Radeon 7970 which runs on a bulldozer based machine (so NOT PCIe 3.0, but PCIe 2.1 16x).

I used both CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR, ran map/unmap at least twice to make sure that the issue is not the pinning costs. The result is for 2.7GB of data, the transfer takes about 1.8 seconds which makes roughtly 1.5GB/sec transfer speed (whch is ridiculously slow.

With Tesla M2050 for 2.5GB of data, the map/unmap takes about 0.5 secoonds... which makes about 5GB/sec expected speed... Also the write map copies the data from card to the host memory and back, while read map correctly only copies data from card to host.

I am not sure if AMD's implementation does some trick and do not copy from device to host because there was no kernel ran in the device in my test case (which would be smart), maybe thats why it does not transfer it in? In either case, AMD loses when map/unmap is used. So what is the reason?  A bug in the SDK? The sample program attached. Can anybody have a look?

So, can you tell

On Radeon 7970

Size: 225000000 x 4 = 900000000 bytes * 3 = 2700000000

Tahiti

WALL time for CL_MEM_ALLOC_HOST_PTR =   0.00 seconds

CL_MAP_WRITE 0

WALL time for Map #0 =   0.01 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   2.39 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_WRITE 1

WALL time for Map #1 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   1.76 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 0

WALL time for Map #0 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.00 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 1

WALL time for Map #1 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.00 seconds

Unmap #events: 3 time: 0.0000 seconds.

Mapping with CL_MAP_WRITE and writing to mapped area

WALL time for Map #1 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for write mapped area =   1.31 seconds

WALL time for Unmap =   1.79 seconds

Unmap #events: 3 time: 0.0000 seconds.

Allocating memory using memalign 4096

WALL time for memalign =   0.00 seconds

WALL time for CL_MEM_USE_HOST_PTR =   0.18 seconds

CL_MAP_WRITE 0

WALL time for Map #0 =   0.01 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   2.18 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_WRITE 1

WALL time for Map #1 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   1.74 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 0

WALL time for Map #0 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.00 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 1

WALL time for Map #1 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.00 seconds

Unmap #events: 3 time: 0.0000 seconds.

Mapping with CL_MAP_WRITE and writing to mapped area

WALL time for Map #1 =   0.00 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for write mapped area =   1.33 seconds

WALL time for Unmap =   1.78 seconds

Unmap #events: 3 time: 0.0000 seconds.

With Teslla M2050

Size: 210000000 x 4 = 840000000 bytes * 3 = 2520000000

Tesla M2050

WALL time for CL_MEM_ALLOC_HOST_PTR =   0.00 seconds

CL_MAP_WRITE 0

WALL time for Map #0 =   0.53 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.53 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_WRITE 1

WALL time for Map #1 =   0.87 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.53 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 0

WALL time for Map #0 =   0.83 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.10 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 1

WALL time for Map #1 =   0.82 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.10 seconds

Unmap #events: 3 time: 0.0000 seconds.

Mapping with CL_MAP_WRITE and writing to mapped area

WALL time for Map #1 =   0.82 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for write mapped area =   0.54 seconds

WALL time for Unmap =   0.57 seconds

Unmap #events: 3 time: 0.0000 seconds.

Allocating memory using memalign 4096

WALL time for memalign =   0.00 seconds

WALL time for CL_MEM_USE_HOST_PTR =   0.59 seconds

CL_MAP_WRITE 0

WALL time for Map #0 =   0.51 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.48 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_WRITE 1

WALL time for Map #1 =   0.56 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.48 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 0

WALL time for Map #0 =   0.56 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.01 seconds

Unmap #events: 3 time: 0.0000 seconds.

CL_MAP_READ 1

WALL time for Map #1 =   0.56 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for Unmap =   0.00 seconds

Unmap #events: 3 time: 0.0000 seconds.

Mapping with CL_MAP_WRITE and writing to mapped area

WALL time for Map #1 =   0.56 seconds

Map #events: 3 time: 0.0000 seconds.

WALL time for write mapped area =   0.32 seconds

WALL time for Unmap =   0.48 seconds

Unmap #events: 3 time: 0.0000 seconds.

0 Likes
14 Replies