cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

likewind
Journeyman III

clBuildProgram performance?

I have a problem similar to this topic clBuildProgram performance and limits?

I'm trying to compile OpenCL program on my 7850. It is strange that clBuildProgram takes 100s ms.When I using graphic card of other brands, it  only takes 1-2 ms.

Is this normal? Or did I use it in a wrong way?

cl_program program = clCreateProgramWithSource(cl_gpu_context_,1,(const char **)&ptr_program_source,&program_length,&cl_error_num);
if(cl_error_num != CL_SUCCESS)
{
   return cl_error_num;
}

cl_error_num = clBuildProgram(program,0,NULL,ptr_build_option,NULL,NULL);

My environment:

OS: Win7 64bits

SDK:AMD-APP-SDK-v2.8

Graphic card: AMD Radeon HD 7850

Thanks~

0 Likes
1 Solution
kozmo
Adept I

That's a normal behavior. If by other brands you mean NVIDIA they cache built programs in your TEMP directory and use it behind the curtain.

You can override this by defining environment variable CUDA_CACHE_DISABLE.

View solution in original post

5 Replies
kozmo
Adept I

That's a normal behavior. If by other brands you mean NVIDIA they cache built programs in your TEMP directory and use it behind the curtain.

You can override this by defining environment variable CUDA_CACHE_DISABLE.

kozmo,

Thanks for the information.

0 Likes

Thank you  kozmo.

0 Likes

Hello,

I am facing the same problem as above, clBuildProgram takes a lot of time to build kenrle. I have a big kernel taking about 18 seconds to compile which I consider as normal, and the kernel takes only 1.2 seconds to be executed.(my GPU is AMD HD 6850). But It is anormal when processing 200 images to build the kernel each time and spend 18s redoing things.

I am used to use Nvidia GPUs where the kernel is by default cached. Hence, only the first program run takes about 30 seconds to build kernel and for next runs, the ptx code(intermediate code) is cached.

So, for the second run, the binary is generated instantly.

Is there any option for AMD/ATI GPUs to force caching binary/intermediate code to fast building kernels?

Thank you!

0 Likes

you can cache the file but you'll have to do it yourself explicitly. you have to use clCreateProgramWithBinary

Here is a well explained and simple example using the above function :

HelloBinaryWorld.cpp - opencl-book-samples - Source code to the example programs from the OpenCL Pro...

my kernel compilation time went from 540 to 20 sec...

0 Likes