I admit that brook+ is a more general and easy-use gpu language.
But when I complete a application, I could not find a way to optimize it.
Although CUDA is a more complex gpu language, but with it I could make many optimization for better and better performance.
Is there any guide about optimization for Brook+ program?