I have a problem similar to this topic clBuildProgram performance and limits?
I'm trying to compile OpenCL program on my 7850. It is strange that clBuildProgram takes 100s ms.When I using graphic card of other brands, it only takes 1-2 ms.
Is this normal? Or did I use it in a wrong way?
cl_program program = clCreateProgramWithSource(cl_gpu_context_,1,(const char **)&ptr_program_source,&program_length,&cl_error_num); | |
if(cl_error_num != CL_SUCCESS) | |
{ | |
return cl_error_num; | |
} |
cl_error_num = clBuildProgram(program,0,NULL,ptr_build_option,NULL,NULL); | |
My environment:
OS: Win7 64bits
SDK:AMD-APP-SDK-v2.8
Graphic card: AMD Radeon HD 7850
Thanks~
Solved! Go to Solution.
That's a normal behavior. If by other brands you mean NVIDIA they cache built programs in your TEMP directory and use it behind the curtain.
You can override this by defining environment variable CUDA_CACHE_DISABLE.
That's a normal behavior. If by other brands you mean NVIDIA they cache built programs in your TEMP directory and use it behind the curtain.
You can override this by defining environment variable CUDA_CACHE_DISABLE.
kozmo,
Thanks for the information.
Thank you kozmo.
Hello,
I am facing the same problem as above, clBuildProgram takes a lot of time to build kenrle. I have a big kernel taking about 18 seconds to compile which I consider as normal, and the kernel takes only 1.2 seconds to be executed.(my GPU is AMD HD 6850). But It is anormal when processing 200 images to build the kernel each time and spend 18s redoing things.
I am used to use Nvidia GPUs where the kernel is by default cached. Hence, only the first program run takes about 30 seconds to build kernel and for next runs, the ptx code(intermediate code) is cached.
So, for the second run, the binary is generated instantly.
Is there any option for AMD/ATI GPUs to force caching binary/intermediate code to fast building kernels?
Thank you!
you can cache the file but you'll have to do it yourself explicitly. you have to use clCreateProgramWithBinary
Here is a well explained and simple example using the above function :
my kernel compilation time went from 540 to 20 sec...