I want to create an n queen solver using ati cards and brook+. I am thinking of suplying the device with a stream that conteins the starting conditions for each thread. But here is the problem: what if i use large board sizes like 50.000x50.000? Then each thread has to take some KB of memory. Will that slow down the application (due to memory bottlenecks)? Or each thread will continue normally since there is no data shared among the threads?
I guess i will have to try this and see for myself...
geekmaster,
50000 x 50000 x 1kb = 25 x 10^11 bytes = 2.5 TBytes while GPUs have only ~1GB memory.
No way! Better to do in chunks.
The performance of the program simply depends on the ALU:Fetch ratio.
It all depends on the algorithm you are using.
For example, if you are ALU bound Fetching wont affect performance.
And, if you are memory bound you aren't going to get any performance benefit by adding useless ALU operations (providing all else is equal).
Well the number of threads would be like 128 - 256 and not 50.000 so there is enough memory. But i just remembered that brook+ does not support local arrays so there is no way to make my program in brook. Maybe in opencl. I realy need an upgrade my 3870 is showing its age.
Does opencl have local array support?
HI geekmaster,
AFAIK,Brook+ 1.4 do support local arrays but there are some restrictions(regarding writing process).
Although I support your decision to try OpenCL.Yes local arrays are present in OpenCL and they are quite flexible to use.
Refer to openCL Spec1.1 for details.
For more info about __local qualifier in OpenCL:http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/local.html
I hope it helps.
All the best for your Queens algorithm.
Himanshu
Thank you for the great support!