Yes, I've done it for multi-gpu cube-collision detection handling in C#. It worked well, until I tried to add cluster PCs to it. It somehow stops local area connections but using opencl is absolutely easy.
here is a sum of squared differences optimization benchmark(mis-written the equation, it should have been ci = Sum((ai-bj)*(ai-bj)) for all j values which makes it O(N²) complexity.
depending on the C#.Net version, speedup may be reduced to 50x or 20x (because some are really good optimized)