Archives Discussions

sofiene · ‎05-29-2013

Bonjours

je teste la fonction atomin_inc d'opencl dans une boucle dont le code est :

__kernel void atomicinc( __global int *x)

{

atomic_inc(x);

}

le code host est :

for(i=1;i<10000;i++)

{

ret = clEnqueueTask(command_queue2, task[0], 0, NULL, event+1); //atomicinc

checkErrors (ret, "clEnqueueTask", __LINE__);

}

L’exécution montre que l'espace mémoire utilisé par le programme augmente considérablement jusqu'au blocage : cl_out_of_host_ptr@

merci pour votre aide.

roger512 · ‎05-29-2013

j'pense que t'as oublié de mettre un clfinish() dans la boucle, lancer 10000 noyaux en asynchrone ça doit pas être simple à gérer pour le driver

twintip31 · ‎05-29-2013

Pourquoi ne lance tu pas plutot plusieurs EnqueueNDRange avec le global arg modifié et le local arg calé au max des capacités de ta carte graphique ?

(regarde ce que te donne l'execution de la commande clinfo au niveau du work group size, ce qui va te donner le nombre de fois que tu peux executer ton kernel en parallele par work item, et ensuite a répartir sur le nombre de compute units dispo sur ta carte graphique, ce qui te permettrait de faire une premiere passe d'execution parallele avec une seule commande EnqueueNDRange)

himanshu_gautam · ‎05-30-2013

Check AtomicCounter Sample from AMD APP SDK.

sofiene · ‎05-30-2013

hello.

I actually did a simulation of the algorithm of Kohonen (Self organization

map) and I have two nested loops;

Call the kernel is inside. When I start running the program starts with a

space of 120 MB and ends with 1 giga byte that blocks the program knowing

that I simulate on GeFORCE Nvidea GT 525M.

is it that every time a new kernel is provided a memory space? This

explains the increase in the memory space reserved for the execution!

*

*Cordialement *

*

himanshu_gautam · ‎05-31-2013

Are you working on NVIDIA hardware or AMD's.? Why not ask it on NVIDIA forums then?

From my knowledge, if two kernels are running at the same time, then the buffer they work on need to be in GPU's memory, before kernel starts. As of now I am not very sure if AMD supports concurrent kernel Execution, but IMHO NVIDIA certainly allows that.

But the Task Manager snapshot above talks about RAM and not GPU's memory. Not sure how things happen in NVIDIA side, but maybe you are allocating memory inside your nested loops (maybe intentionally for every buffer to be sent to gpu), which is causing the excessive usage of RAM as shown.

Archives Discussions

appel d'un thread plusieurs fois