xfaure

[XF] Float16 vs 16 float

Discussion created by xfaure on Sep 2, 2011
Latest reply on Sep 6, 2011 by MicahVillmow
Interest of float16

Hello every body,

I'm new with OpenCL. I try to illustrate the power of float16, but I failed to.
I built a program which add to 1024*1024*16-array of float. With GPU, when I run with float16, the time of computation is 0.03 secondes. With GPU when I run with 16 * float, the time of computation is 0.006 secondes. And with CPU, the time of computation is 2 secondes. But Why it's longer with float16 than 16 * float?

Thanks for your help.

A part of my code :
[code]
Fichier Main.cpp :

// Define an index space (global work size) of threads for execution.
// A workgroup size (local work size) is not required, but can be used.
size_t globalWorkSize[1];
size_t localWorkSize[1];
// There are nbKernel threads
globalWorkSize[0] = nbKernel/16;
localWorkSize[0] = 512;

// Execute the kernel.
// 'globalWorkSize' is the 1D dimension of the work-items
status = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL, globalWorkSize,
localWorkSize, 0, NULL, NULL);

clFinish(cmdQueue);

Fichier.cl :

__kernel void vecadd(__global float16 const * const A, __global float16 const * const B, __global float16 * const C)
{

unsigned int const i = get_global_id(0);

C = A + B;
[/code]

Thanks

Xavier Faure

Outcomes