Any code on OpenCL for throughput benchmarkign on the HD 5870 ??
something similar to the trhroughput example in the CAL SDK.
I jsut wnated to benchmark the dveice to device memory bandwidth on the 5870, and see if it can achieve the specified 158 GB/s..
I've put some OpenCL benchmark code on http://www.bealto.com/gpu-benchmarks.html
I could not reach the advertised 158 GB/s.
Apparently the current versions of the drivers are "sub-optimal". (there is a note about this on a page about SiSoftware benchmarks http://www.sisoftware.net/index.html?dir=qa&location=gpu_opencl&langx=en&a=
-- Eric
This kernel gives me around 50GB/s for 5770 -
__kernel void write_OpenCLPerfOutputSpeedGPU_13(float4 c0,
__global float4 *output0,
__global float4 *output1,
__global float4 *output2,
__global float4 *output3,
__global float4 *output4,
__global float4 *output5,
__global float4 *output6,
__global float4 *output7,
__global float4 *output8,
__global float4 *output9,
__global float4 *output1
__global float4 *output11,
__global float4 *output12,
__global float4 *output13,
__global float4 *output14,
__global float4 *output15)
{
uint gid = get_global_id(0);
output0[gid] = c0;
output1[gid] = c0;
output2[gid] = c0;
output3[gid] = c0;
output4[gid] = c0;
output5[gid] = c0;
output6[gid] = c0;
output7[gid] = c0;
output8[gid] = c0;
output9[gid] = c0;
output10[gid] = c0;
output11[gid] = c0;
output12[gid] = c0;
output13[gid] = c0;
output14[gid] = c0;
output15[gid] = c0;
}
I am gettting linking erros with this code ??
I am trying it on a windows 32 machine with a dual core intel Xeon, on an AMD 5870, latest catalyst driver + stream sdk 2.0 final..
1>MSVCRTD.lib(crtexew.obj) : error LNK2019: unresolved external symbol _WinMain@16 referenced in function ___tmainCRTStartup
1>C:\ATI_SDK\samples\opencl\cl\app\MPBenchmarks\vs2008\Debug-32\MPBenchmarks.exe : fatal error LNK1120: 1 unresolved externals
That is not a kernel issue, try changing the Runtime library flag in Project Properties/C++/Code Generation to /MTd.
i figured this is not a code probelm..but somethign do with the config + vs...
tried Mtd - still doesn't work !!
My code uses Qt for the display, Qt provides WinMain in qtmain.lib.
To compile the project without Qt, set CONFIG_USE_QT to 0 in Config.h, and in the MPBenchmarks project properties, select Linker>System>SubSystem = Console instead of Windows.
-- Eric
It worked thanks bealto !!
But, my system keeps rebooting...whenever I run the program !!
could be some driver issue ??
the GPU is at ~ 58 degrees, that wouldn;t be a problem i guess - i dont think that too high ??
but most of the other examples from the SDK run fine !! - thats the puzzling part...
any diagnostic tool..you guys are aware of ??
Originally posted by: fulcrum_xyz It worked thanks bealto !!
But, my system keeps rebooting...whenever I run the program !!
could be some driver issue ??
the GPU is at ~ 58 degrees, that wouldn;t be a problem i guess - i dont think that too high ??
but most of the other examples from the SDK run fine !! - thats the puzzling part...
any diagnostic tool..you guys are aware of ??
It could be a problem with your program. Could you please send us code which produces the rebooting issue? Please mension your OS, SDK version and Driver version also.
Os - Win XP 32 bit
CPU - Intel Xeon 5150 @ 2.66 GHz, 3.25 GB RAM
GPU - AMD Radeon 5870
SDK - Stream SDK v2.0 final
Catalyst driver v9.12
well i tried write performance. and this is my result. first is with normal system timer second is from profiler time.
0.0625 MiB 328.947 MiB/s 0.00019 0.125 MiB 668.449 MiB/s 0.000187 0.25 MiB 1262.63 MiB/s 0.000198 0.5 MiB 2463.05 MiB/s 0.000203 1 MiB 2421.31 MiB/s 0.000413 2 MiB 4842.62 MiB/s 0.000413 4 MiB 9324.01 MiB/s 0.000429 8 MiB 18518.5 MiB/s 0.000432 16 MiB 38554.2 MiB/s 0.000415 32 MiB 52032.5 MiB/s 0.000615 64 MiB 62378.2 MiB/s 0.001026 128 MiB 76969.3 MiB/s 0.001663 ---------- 0.0625 MiB 7521.06 MiB/s 0.125 MiB 12312.8 MiB/s 0.25 MiB 18936.5 MiB/s 0.5 MiB 32247.7 MiB/s 1 MiB 41353.1 MiB/s 2 MiB 45644.4 MiB/s 4 MiB 49912.7 MiB/s 8 MiB 53545.7 MiB/s 16 MiB 55116 MiB/s 32 MiB 55291.1 MiB/s 64 MiB 55338.2 MiB/s 128 MiB 56070.1 MiB/s //without enabled profilig WG: 256 0.0625 MiB 568.182 MiB/s 0.00011 0.125 MiB 1086.96 MiB/s 0.000115 0.25 MiB 2212.39 MiB/s 0.000113 0.5 MiB 4347.83 MiB/s 0.000115 1 MiB 3134.8 MiB/s 0.000319 2 MiB 14492.8 MiB/s 0.000138 4 MiB 12084.6 MiB/s 0.000331 8 MiB 24464.8 MiB/s 0.000327 16 MiB 49079.8 MiB/s 0.000326 32 MiB 60377.4 MiB/s 0.00053 64 MiB 67156.3 MiB/s 0.000953 128 MiB 82051.3 MiB/s 0.00156
Nou - the numbers you have got are on what card ??
radeon 5850. but i wonder what is profiling precision.