Hello!
I'm new to OpenCL and now I'm trying to port one video filter to OCL (primarily for studying 🙂).
All works fine except not very fast speed (actually it's a little faster than previous CPU code). Now I'm working on it's optimization. I need to highly reuse 2 constant arrays of float of size less than 128. My GPU is 4850, so I can't use Images. Local memory works, but limits max. Work Group Size to 64. Now I'm trying to use __constant memory. The fastest way I found is to initialize them in kernel as constants (generated in host code before compilation). But I didn't found any way to do it in OpenCL C. Moreover, I can't find example to init. one non-array variable. Code
__kernel void test()
{
__constant int ow = 1;
}
doesn't compile with error
error: non-kernel
function: variable with automatic storage duration cannot be stored
in a named address space
__constant int ow = 1;
^
in AMD APP KernelAnalyzer.
If i replace __constant with __private all works, but GRP usage is unacceptable.
__kernel void test() { __constant int ow = 1; }
just create another buffer and set him as __constant kernel argument.
__constant,__global and __local variables must be a pointers.
also if you create normal const variable compiler will just place it where it is needed. so no register space is taken.
Yes, I know that I can do it as kernel argument, but as I realized, it's not the fastest way of accessing them according to AMD Accelerated Parallel Processing OpenCL™ Programming Guide, p. 4.6 Constant Memory Optimization. And my access is primaryly not the "Same Index".
According to http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=138434 and some other threads there is a way to init. __constant inside kernel, but nothing works for me 😞
Is there any other possibility?
|
"const" uses registers, code below shows 10 GRP usage under KernelAnalyzer
__kernel void test(__global int *out) { const int ow[] = {1,2,3,4,5,6,7,8,9,10}; out[get_local_size(0)] = ow[get_local_size(0)]; }
Wow, thanks Micah!!!
It's even better to have it global! Short sample compiles good, tomorrow I'll try how it works for real app.