cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

niravshah00
Journeyman III

Brook+ project tutor- ready to Pay

Hi ,

Don't get me wrong here .

I am not asking anyone to do my project. I just need someone who can be my guide in the project so that I can learn and complete my project.
Considering that there is virtually no help for brook+ apart from this forum I see no other way to finish my project within the deadline .

I need some one to guide me with my project.

This might sound a little wierd but trust me I have put in a lot of effort trying to go  through the examples(basically i am a slow learner).

I don't know how to put this in a better way .

Thanks

 

0 Likes
31 Replies
gaurav_garg
Adept I

Brook+ project tutor- ready to Pay

If you are having trouble with Stream SDK document. I guess these links can help you.

http://www.nada.kth.se/~tomaso/Stream2008/M1.pdf

Change M1.pdf to M2.pdf, M3.pdf etc. to get all the modules.

0 Likes
Ceq
Journeyman III

Brook+ project tutor- ready to Pay

Hi niravshah00, I think it would be good to explain a bit more about your project. The kind of work you are doing, the mathematical algorithms, the current CPU execution times and your expected results... any information can help.

Note that it may be possible that you could be having difficulties not because of your skills, but because Brook+ programming model could not be the better approach. Stream programming has some restrictions, so depending on the algorithm it may not be easy or the final performance may not very good. That's why I ask you about the algorithms you're implementing.

A good start is to implement an OpenMP version of the program under the assumption that you are going to have thousands of threads. If you can't dispatch work to every one of them you'll have to rethink the parallelization strategy, on the other hand if you use a lot of inter-thread communication or atomic/critical sections you may need a more flexible language like OpenCL/CUDA.

0 Likes
niravshah00
Journeyman III

Brook+ project tutor- ready to Pay

A bit about my project .

I have to solve the equation A^x + B^y = C^z.

So i am sloving for 'z'  like z = log(A^x + B^y)/log(C) .

Now the value ranges for A,B,C would be flexible and might run for 1000 - 10,000 or even more and x and y for smaller range like 3 - 10.

So my current approach was to start a kernel with a 3D stream so as to get all possbile combination for A,B,C and then each thread running the for loops for x and y.

My idea was to create as many threads as possible so as to take advantage of the processing power of GPU.May be this approach is wrong.

But currently i am stuck as the strean size is limited to 8192*8192.
Secondly each thread will return the corressponding values of  A,B,C,x,y,z only is z is with some specified range. So I am not sure how do i do that from kernel since i dont want to have a outputstream having 6 numbers and then read the stream on host code and then filter the whole stream . Possible only very few like 3-4 thread would find a solution. I dont want to filter the stream of 10,000 or 20,000 .

Few suggestion i got from the forum is to use domainSize but its doesn work for 3D stream.

I am not sure if by creating huge number of threads would improve my performance.

Another option(also from forum) is I might have to work in Tiles like solve for A,B,C form 1000 -2000, then 2000- 3000  and so on .

I have the code in Java and also in brook+(currently working ) if u need i will attach the code.

Thanks

0 Likes
Ceq
Journeyman III

Brook+ project tutor- ready to Pay

Hi niravshah00, I'm not sure to understand well what you want to do.

The arithmetic intensity of that equation is quite small, however looks like you want to obtain all the solutions having 5 unrestricted variables, so the number of combinations is huge and the problem is no longer trivial.

If you were using less variables or you can lock the value of some variables to a certain number, I think the best approach would be to use numerical analysis instead of using brute force.

From your equations, I think you are trying to obtain a program to solve Diophantine equations, however in other post you said that z was not an integer but a floating point value and that confused me a bit.

Returning whether there is a solution for a certain (A, B, C) is easy, however returning a list of solutions does not seem that flexible/efficient in Brook+. If you only need a boolean value to know whether there is a solution, you could assign not one but a range of computations to each GPU thread in the output stream, thus eliminating the size limit restriction. Each thread may even return the first solution it found, if any.

0 Likes
niravshah00
Journeyman III

Brook+ project tutor- ready to Pay

Hi ,

The equation i am solving is beals conjecture.

About Z being float value ,  I am sloving the equation for Z and then only if Z in within the  range of 10^-12 (which is as close to interger ) i would return that combination as possible solution.

Secondly i did not understand ur suggestion for returning the result i.e 6 variables.

Also any suggestion for the approach i should use .

I am attaching the code with this  post

 

//Kernel Code #include<stdio.h> kernel int findGcd(int u,int v) { int gcd = 1; int r ; int num1=u; int num2 =v; while (1) { if (num2 == 0) { gcd = num1; break; } else { r = num1 % num2; num1 = num2; num2 = r; } } return gcd; } kernel int isWithinRange(float z, float epsillon) { float fractional = frac(z); //float floor = Math.floor((double)z); if(fractional<=epsillon ) return 1; else return 0; } kernel void threadABC(int startRange,out int a<>) { int X,Y,Z; int A,B,C; int gcdAB,gcdAC,gcdBC; A = instance().x+startRange; B = instance().y+startRange; C = instance().z+startRange; gcdAB = findGcd(A,B); gcdAC = findGcd(A,C); gcdBC = findGcd(B,C); if(gcdAB==1 && gcdAC==1 && gcdBC==1){ //threadXY(instance().x+1000,instance().y+1000,instance()+1000.z,a); for( X = 3; X < 10; X++) { for( Y = 3; Y < 10; Y++) { // will have to use modulo since the values might go out of range float sum = pow((float )A, (float )X)+pow((float )B, (float )Y); float Z = (log((float )sum)/log((float)C)); float epsillon = 10E-4f; if(isWithinRange(Z,epsillon)){ // here the possible solution should be stored and returned to host code } } } } } //host code #include "brookgenfiles\beals.h" #include "conio.h" int main(int argc, char ** argv) { // just testing the if code works for A,B,C for 100 unsigned int dim[] = {100,100,100}; brook::Stream<int> aStream(2,di threadABC(1000,aStream); getch(); return 0; }

0 Likes
Ceq
Journeyman III

Brook+ project tutor- ready to Pay

Ok, now I understand a bit more about the problem, looks like Beal's conjecture is a well documented problem and there are several examples on the net.

I hadn't much time to look at your code, but I have a question: how are you going to deal with arbitrary precission arithmetic? The maximum representable float without losing precision is 16777216, and some Brook+ intrinsic functions don't work with double data type. Did you have any solution in mind?

0 Likes
niravshah00
Journeyman III

Brook+ project tutor- ready to Pay

Well for the precision problem i was thinking of using the Mod N where N would be the Largest prime number for float ,

Like (A^X mod N + B^Y mod N )Mod N  = C^Z Mod N. (not sure if i get this correct)

I hope this is correct way to do since i have to use log which wont work on float values only.

I have got suggestions of moving to OpenCL ,do you think moving to OpenCL would be helpful tomy project .I have to take a call soon.

 

 

0 Likes
Ceq
Journeyman III

Brook+ project tutor- ready to Pay

Well, I think either OpenCL or CUDA will be more flexible, however looks like the problem is still unproved and there could be many difficult parts in the implementation. Maybe some other more specialized people can give you a more accurate opinion.

0 Likes
niravshah00
Journeyman III

Brook+ project tutor- ready to Pay

well Cuda implementation is already some one elses project (by th way Does CUDA support AMD Firestream cards).

What abt the solution for precision i suggested is it ok?

Also any suggestion for returning the result?

 

0 Likes