13 Replies Latest reply on Jan 27, 2010 6:50 AM by fulcrum_xyz

    OpenCL Throughput benchmarking code ??

    fulcrum_xyz

      Any code on OpenCL for throughput benchmarkign on the HD 5870 ??

       

      something similar to the trhroughput example in the CAL SDK.

      I jsut wnated to benchmark the dveice to device memory bandwidth on the 5870, and see if it can achieve the specified 158 GB/s..

       

        • OpenCL Throughput benchmarking code ??
          bealto

          I've put some OpenCL benchmark code on http://www.bealto.com/gpu-benchmarks.html

          I could not reach the advertised 158 GB/s.

          Apparently the current versions of the drivers are "sub-optimal". (there is a note about this on a page about SiSoftware benchmarks http://www.sisoftware.net/index.html?dir=qa&location=gpu_opencl&langx=en&a=

          -- Eric

           

            • OpenCL Throughput benchmarking code ??
              n0thing

              This kernel gives me around 50GB/s for 5770 -

              __kernel void write_OpenCLPerfOutputSpeedGPU_13(float4 c0,
              __global float4 *output0,
              __global float4 *output1,
              __global float4 *output2,
              __global float4 *output3,
              __global float4 *output4,
              __global float4 *output5,
              __global float4 *output6,
              __global float4 *output7,
              __global float4 *output8,
              __global float4 *output9,
              __global float4 *output1
              __global float4 *output11,
              __global float4 *output12,
              __global float4 *output13,
              __global float4 *output14,
              __global float4 *output15)
              {
              uint gid = get_global_id(0);
              output0[gid] = c0;
              output1[gid] = c0;
              output2[gid] = c0;
              output3[gid] = c0;
              output4[gid] = c0;
              output5[gid] = c0;
              output6[gid] = c0;
              output7[gid] = c0;
              output8[gid] = c0;
              output9[gid] = c0;
              output10[gid] = c0;
              output11[gid] = c0;
              output12[gid] = c0;
              output13[gid] = c0;
              output14[gid] = c0;
              output15[gid] = c0;
              }

              • OpenCL Throughput benchmarking code ??
                dheevatsa

                 

                I am gettting linking erros with this code ??

                 

                I am trying it on a windows 32 machine with a dual core intel Xeon, on an AMD 5870, latest catalyst driver + stream sdk 2.0 final..

                 

                1>MSVCRTD.lib(crtexew.obj) : error LNK2019: unresolved external symbol _WinMain@16 referenced in function ___tmainCRTStartup

                1>C:\ATI_SDK\samples\opencl\cl\app\MPBenchmarks\vs2008\Debug-32\MPBenchmarks.exe : fatal error LNK1120: 1 unresolved externals

                 

              • OpenCL Throughput benchmarking code ??
                MicahVillmow
                nothing,
                That kind of kernel will not reach peak because of how pointers are handled on the hardware. In order to reach close to peak, you will need to do strided writes of float4 to the same pointer and not writes to multiple pointers.
                  • OpenCL Throughput benchmarking code ??
                    nou

                    well i tried write performance. and this is my result. first is with normal system timer second is from profiler time.

                    0.0625 MiB 328.947 MiB/s 0.00019 0.125 MiB 668.449 MiB/s 0.000187 0.25 MiB 1262.63 MiB/s 0.000198 0.5 MiB 2463.05 MiB/s 0.000203 1 MiB 2421.31 MiB/s 0.000413 2 MiB 4842.62 MiB/s 0.000413 4 MiB 9324.01 MiB/s 0.000429 8 MiB 18518.5 MiB/s 0.000432 16 MiB 38554.2 MiB/s 0.000415 32 MiB 52032.5 MiB/s 0.000615 64 MiB 62378.2 MiB/s 0.001026 128 MiB 76969.3 MiB/s 0.001663 ---------- 0.0625 MiB 7521.06 MiB/s 0.125 MiB 12312.8 MiB/s 0.25 MiB 18936.5 MiB/s 0.5 MiB 32247.7 MiB/s 1 MiB 41353.1 MiB/s 2 MiB 45644.4 MiB/s 4 MiB 49912.7 MiB/s 8 MiB 53545.7 MiB/s 16 MiB 55116 MiB/s 32 MiB 55291.1 MiB/s 64 MiB 55338.2 MiB/s 128 MiB 56070.1 MiB/s //without enabled profilig WG: 256 0.0625 MiB 568.182 MiB/s 0.00011 0.125 MiB 1086.96 MiB/s 0.000115 0.25 MiB 2212.39 MiB/s 0.000113 0.5 MiB 4347.83 MiB/s 0.000115 1 MiB 3134.8 MiB/s 0.000319 2 MiB 14492.8 MiB/s 0.000138 4 MiB 12084.6 MiB/s 0.000331 8 MiB 24464.8 MiB/s 0.000327 16 MiB 49079.8 MiB/s 0.000326 32 MiB 60377.4 MiB/s 0.00053 64 MiB 67156.3 MiB/s 0.000953 128 MiB 82051.3 MiB/s 0.00156