Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Adept II

Deathray - OpenCL GPU accelerated NLM De-noising of Video

Performs spatial/temporal non-local means de-noising on video using the Avisynth scripting environment

Earlier this week I started a thread for an Avisynth plug-in filter I've called Deathray that performs de-noising on video:

OK - first attempt at posting this thread failed miserably. So if you want to see the source code or use the software please see the thread linked above - forum software that works.

Deathray is BSD licensed.

AMD staff might like to take a look at the kernels. Though they're not particularly complex, one of them, NLMMultiFrameFourPixel uses 55 GPRs on Cypress and 56 on Cayman. I'm quite sure that's far in excess of what's needed.

I plan to make some changes, including making the inner loop use TEX instead of local memory. This reduces the GPR allocation.

The GPR allocation doesn't really impact performance, because the workgroup size is 256 and the inner loop is ~300 cycles long (iterated 49 times) with no off-die memory accesses nor group barriers. But there is a couple of percent impact while the kernel manipulates local memory with numerous group barriers, due to there being only 1 workgroup per SIMD.

7 Replies
Journeyman III

great! i'll look into it after the next week.


Given these prior lines:

int kernel_radius = 3;
int sample_radius = kernel_radius * sample_expand;

And the fact that target is an int2, this line works:

int2 sample_start = {max(target.x - sample_radius, 3), max(target.y - sample_radius, 3)};

But, I can also write this line as follows:

int2 sample_start = max(target - sample_radius, 3);

This also works. But not with NVidia's compiler.

The language specification appears to say this second version of the line is invalid, because the gentype of both arguments to max() is not the same. So I presume this is a bug in AMD's compiler.

Is that so?


is even first version valid? {} are for array definition not for vector type. it should be inside (int2)().

this should be valid int2 sample_start = max(target - sample_radius, (int2)(3));


I guess if the first version isn't valid (certainly can't find an example of it in the spec, which only indicates what you say) then both AMD and NVidia compilers are wrong.

Is there a warning option that I'm missing?


If you can provide a small test case that shows this, we can get it fixed for our next release if it is found to be a valid issue.

Aha, I've just been reading the specs for 1.0 and 1.1 more closely and noticed sgentype was introduced by 1.1 as a valid argument type for the second argument of min, max and clamp (third argument for that, too). sgentype, here, means a scalar of the same type as the other argument and the return value. So my original line of code is valid. (I'm afraid I write code in SKA and only refer to the spec when it breaks badly.)

The NVidia device this code is failing on is 1.0 (e.g. GTX260), which explains why my code fails - I didn't realise I'd written 1.1 there.

Micah, what do you think about the use of curly braces, { and } to initialise an int2, say? I use them quite extensively, because they work!


They shouldn't, they are for array initialization, vector initialization is via the (type)(data) syntax.