In my return to this project, I decided to start from scratch on the brook+ end of things, and implement the kernels inside of the existing open-source noise library, libnoise. I'm laying things out so it can seemlessly replace the existing library, however to get any real advantage one would have to explicitly specify the use of METHOD_STREAM when initializing noise modules.
Unfortunately there are 2 major differences between METHOD_STREAM, and METHOD_CPU. The primary way of calling noise equations in the past, was a value at a time.
The call changes from:
double myModule::GetValue(double x, double y, double z)
void myModule::GetValue(float3* input, uint size, float* output)
Both overloads are available for both METHODs, however, the latter causes casting to float on the CPU, and the sooner returns a cast to double from float from the GPU; not to mention calling the GPU a value at a time is much slower due to the overhead of setting the environment up just to do one calculation.
The second change is that the core noise algorithms are fundamentally altered, therefor no matter how much work I put in to try and make the two noise algorithms look similar, they can never look identical. This has been a struggle, as it's hard to really match up the frequencies and scales between CPU and GPU noise.
If you briefly skim through the following tutorial links, you will see roughly the same process of setting up my demo, however, I use my own 3d renderer, passing the GetValue my input buffer, and writing out to my global mesh to render. The tutorial uses lib noise's built in image renderer to make a colored height map. This is possible as well, but for it to faster than the CPU noise, one would have to modify the heightmap builder to call a whole buffer instead of value by value... Even then it would only be faster on larger buffers.
Without further adu, here are my equivocable renders in a photobucket album and in no specific order
Note the speedup becomes much less drastic as module complexity increases. This may or may not be improved with optimization at a later point... However, a module like turbulence would be very challenging to improve to any great extent, especially when it is connected to a complicated module like Select.
I plan on releasing the module, likely on sourcefourge, sometime in the future. I will also include my terrain demo. Both the library and the demo should be able to compile with a little TLC on any OS with Brook+ and SDL support. However this is not going to happen for a little while, there's still a lot of modules to be converted, and things also need a little cleanup.
The module is being designed with the idea that someone could also come in and write METHOD_OPENCL or METHOD_CUDA, in fact the call for GetValue considers those two enumerator values, but for now returns 0.0 or just breaks out in the buffer overloaded function.
Enjoy the screenshots, any and all commentary is always appreciated!