cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

fulcrum_xyz
Journeyman III

OpenCL Throughput benchmarking code ??

Any code on OpenCL for throughput benchmarkign on the HD 5870 ??

 

something similar to the trhroughput example in the CAL SDK.

I jsut wnated to benchmark the dveice to device memory bandwidth on the 5870, and see if it can achieve the specified 158 GB/s..

 

0 Likes
13 Replies
bealto
Journeyman III

I've put some OpenCL benchmark code on http://www.bealto.com/gpu-benchmarks.html

I could not reach the advertised 158 GB/s.

Apparently the current versions of the drivers are "sub-optimal". (there is a note about this on a page about SiSoftware benchmarks http://www.sisoftware.net/index.html?dir=qa&location=gpu_opencl&langx=en&a=

-- Eric

 

0 Likes

This kernel gives me around 50GB/s for 5770 -

__kernel void write_OpenCLPerfOutputSpeedGPU_13(float4 c0,
__global float4 *output0,
__global float4 *output1,
__global float4 *output2,
__global float4 *output3,
__global float4 *output4,
__global float4 *output5,
__global float4 *output6,
__global float4 *output7,
__global float4 *output8,
__global float4 *output9,
__global float4 *output1
__global float4 *output11,
__global float4 *output12,
__global float4 *output13,
__global float4 *output14,
__global float4 *output15)
{
uint gid = get_global_id(0);
output0[gid] = c0;
output1[gid] = c0;
output2[gid] = c0;
output3[gid] = c0;
output4[gid] = c0;
output5[gid] = c0;
output6[gid] = c0;
output7[gid] = c0;
output8[gid] = c0;
output9[gid] = c0;
output10[gid] = c0;
output11[gid] = c0;
output12[gid] = c0;
output13[gid] = c0;
output14[gid] = c0;
output15[gid] = c0;
}

0 Likes

I am gettting linking erros with this code ??

 

I am trying it on a windows 32 machine with a dual core intel Xeon, on an AMD 5870, latest catalyst driver + stream sdk 2.0 final..

 

1>MSVCRTD.lib(crtexew.obj) : error LNK2019: unresolved external symbol _WinMain@16 referenced in function ___tmainCRTStartup

1>C:\ATI_SDK\samples\opencl\cl\app\MPBenchmarks\vs2008\Debug-32\MPBenchmarks.exe : fatal error LNK1120: 1 unresolved externals

 

0 Likes

That is not a kernel issue, try changing the Runtime library flag in Project Properties/C++/Code Generation to /MTd.

0 Likes

i figured this is not a code probelm..but somethign do with the config + vs...

 

tried Mtd - still doesn't work !!

0 Likes

My code uses Qt for the display, Qt provides WinMain in qtmain.lib.

To compile the project without Qt, set CONFIG_USE_QT to 0 in Config.h, and in the MPBenchmarks project properties, select Linker>System>SubSystem = Console instead of Windows.

-- Eric

 

0 Likes

It worked thanks bealto !!

But, my system keeps rebooting...whenever I run the program !!

 

could be some driver issue ??

the GPU is at ~ 58 degrees, that wouldn;t be a problem i guess - i dont think that too high ??

 

but most of the other examples from the SDK run fine !! - thats the puzzling part...

any diagnostic tool..you guys are aware of ??

0 Likes

Originally posted by: fulcrum_xyz It worked thanks bealto !!

But, my system keeps rebooting...whenever I run the program !!

could be some driver issue ??

the GPU is at ~ 58 degrees, that wouldn;t be a problem i guess - i dont think that too high ??

 but most of the other examples from the SDK run fine !! - thats the puzzling part...

any diagnostic tool..you guys are aware of ??

It could be a problem with your program.  Could you please send us code which produces the rebooting issue?  Please mension your OS, SDK version and Driver version also.

0 Likes

Os - Win XP 32 bit

CPU - Intel Xeon 5150 @ 2.66 GHz, 3.25 GB RAM

GPU - AMD Radeon 5870

SDK - Stream SDK v2.0 final

Catalyst driver v9.12



0 Likes

nothing,
That kind of kernel will not reach peak because of how pointers are handled on the hardware. In order to reach close to peak, you will need to do strided writes of float4 to the same pointer and not writes to multiple pointers.
0 Likes

well i tried write performance. and this is my result. first is with normal system timer second is from profiler time.

0.0625 MiB 328.947 MiB/s 0.00019 0.125 MiB 668.449 MiB/s 0.000187 0.25 MiB 1262.63 MiB/s 0.000198 0.5 MiB 2463.05 MiB/s 0.000203 1 MiB 2421.31 MiB/s 0.000413 2 MiB 4842.62 MiB/s 0.000413 4 MiB 9324.01 MiB/s 0.000429 8 MiB 18518.5 MiB/s 0.000432 16 MiB 38554.2 MiB/s 0.000415 32 MiB 52032.5 MiB/s 0.000615 64 MiB 62378.2 MiB/s 0.001026 128 MiB 76969.3 MiB/s 0.001663 ---------- 0.0625 MiB 7521.06 MiB/s 0.125 MiB 12312.8 MiB/s 0.25 MiB 18936.5 MiB/s 0.5 MiB 32247.7 MiB/s 1 MiB 41353.1 MiB/s 2 MiB 45644.4 MiB/s 4 MiB 49912.7 MiB/s 8 MiB 53545.7 MiB/s 16 MiB 55116 MiB/s 32 MiB 55291.1 MiB/s 64 MiB 55338.2 MiB/s 128 MiB 56070.1 MiB/s //without enabled profilig WG: 256 0.0625 MiB 568.182 MiB/s 0.00011 0.125 MiB 1086.96 MiB/s 0.000115 0.25 MiB 2212.39 MiB/s 0.000113 0.5 MiB 4347.83 MiB/s 0.000115 1 MiB 3134.8 MiB/s 0.000319 2 MiB 14492.8 MiB/s 0.000138 4 MiB 12084.6 MiB/s 0.000331 8 MiB 24464.8 MiB/s 0.000327 16 MiB 49079.8 MiB/s 0.000326 32 MiB 60377.4 MiB/s 0.00053 64 MiB 67156.3 MiB/s 0.000953 128 MiB 82051.3 MiB/s 0.00156

0 Likes

Nou - the numbers you have got are on what card ??

0 Likes

radeon 5850. but i wonder what is profiling precision.

0 Likes