Can you post the ported sample?
Originally posted by: MicahVillmow
Raistmer,
This is a known issue that we are working on a fix for.
Originally posted by: MicahVillmow
Raistmer,
This is a known issue that we are working on a fix for.
Originally posted by: MicahVillmow Raistmer, This is a known issue that we are working on a fix for.
This would be very much appreciated - since there's huge community out there waiting for this: People running Seti@home.
Currently the vast majority of them is using nVidia cards.
This could be an excellent chance for ATI to get new customers.
Regards,
Tristan
Originally posted by: Tristan23 Originally posted by: MicahVillmow Raistmer, This is a known issue that we are working on a fix for.
This would be very much appreciated - since there's huge community out there waiting for this: People running Seti@home
Currently the vast majority of them is using nVidia cards.
This could be an excellent chance for ATI to get new customers.
This issue is fixed internally. upcoming release includes this fix.
I downloaded and ported the Apple's OpenCL FFT to Linux a month ago. So I had a chance to try the code on both nVidia C1060 and AMD HD5870. And I'm seeing a number of issues with this code. In my tests I was only interested in 2D FFT of relatively large images (around 1024x1024).
The first observation was made on nVidia C1060. It turns out that the OpenCL FFT implementation is 2-3 times (depending on a problem size) slower compared with CUFFT. I presume this is a general problem of the Apple's OpenCL FFT implementation.
The second issue. When I moved with my tests to SDK 2.01 & Ubunti 9.04 & HD5870 the performance got even worse, which was a big surprise to me as I was expecting the opposite. In particular, Apple's OpenCL FFT was doing x8 slower on 512x512 images on HD5870 (AMD Streams SDK 2.01) as compared with the same algorithm run on C1060.
The next problem became a real show-stopper for me. In my SDK 2.01 & HD5870 tests I could not test 1024x1024 or anything bigger due to an apparent hard kernel lockup happening within clFlush or clFinish! Interesting enough, SDK 2.00 had a similar lockup at smaller images of the 512x512 size. Is there any explanation for this?
Thanks!
Originally posted by: genaganna
This issue is fixed internally. upcoming release includes this fix.
Can you please tell us when this release will be publicly available?
Would it be possible to have access to a beta version?
Originally posted by: Tristan23 Originally posted by: genaganna
This issue is fixed internally. upcoming release includes this fix.
Can you please tell us when this release will be publicly available?
Would it be possible to have access to a beta version?
I can't give an exact date but should be in the next few months.
Originally posted by: genaganna
I can't give an exact date but should be in the next few months.
In a few month??? In a few month nVidias Fermi cards are available. If I would be AMD I would get my sh*t sorted ASAP!
Originally posted by: Tristan23 Originally posted by: genaganna
I can't give an exact date but should be in the next few months.
In a few month??? In a few month nVidias Fermi cards are available. If I would be AMD I would get my sh*t sorted ASAP!
I agree, Nvidia is going to make a massacre on ATI in the OpenCL field, if ATI doesn't hurry up with the development of their OpenCL implementation and releases more often bugfixes and new features...
Hi,
I think that AMD is not scary about Fermi because if you see the performance specifications for the Tesla series, nVidia says that it will have 600GFlops peak in double precision and the board will be available in Q2 2010
(http://www.nvidia.com/object/product_tesla_C2050_C2070_us.html)
AMD instead has its HD5970 today with 928GFlops peak in double precision
(http://www.amd.com/la/products/desktop/graphics/ati-radeon-hd-5000/hd-5970/Pages/ati-radeon-hd-5970-specifications.aspx)
Of course we agree that ATI's OpenCL implementation is not the most beautiful girl in town right now, but I think that their strategic is to have OpenCL in production state before Fermi's launch.
best regards,
Alfonso
> I think that AMD is not scary about Fermi ...
I fear so too - but I'd say they better should.
> ... AMD instead has its HD5970 today with 928GFlops peak in double precision
GFlops are only theoretical as long as the software/driver sucks.
> ... but I think that their strategic is to have OpenCL in production state before Fermi's launch.
Doesn't look like thats going to happen.
Originally posted by: Tristan23 > I think that AMD is not scary about Fermi ...
I fear so too - but I'd say they better should.
> ... AMD instead has its HD5970 today with 928GFlops peak in double precision
GFlops are only theoretical as long as the software/driver sucks.
> ... but I think that their strategic is to have OpenCL in production state before Fermi's launch.
Doesn't look like thats going to happen.
Apparently here is a reason why AMD isn't concerned too much:
http://techreport.com/discussions.x/18492
Raistmer,
The new SDK is going to be released soon.
hi
it would be really great if you could post your ported OpenCL FFT code...
thanks
Originally posted by: fulcrum_xyz
hi
it would be really great if you could post your ported OpenCL FFT code...
thanks
Thanks Raistmer, I have the apple version...and currently porting it to run on my OpenSUSE 11.2.
So, I was wondering if you had already ported it to a linux (non MacOS version) and if you could share that ?
thanks again...
P.S: I have taken a look at the OpenCL SDK FFT sample, that seems to be very preliminary and support very minimal parameters (on 1D, no batching, no complex)...
Originally posted by: fulcrum_xyz
Thanks Raistmer, I have the apple version...and currently porting it to run on my OpenSUSE 11.2.
So, I was wondering if you had already ported it to a linux (non MacOS version) and if you could share that ?
thanks again...
P.S: I have taken a look at the OpenCL SDK FFT sample, that seems to be very preliminary and support very minimal parameters (on 1D, no batching, no complex)...
hey thanks for the info...
i wanted to benchmark some (mostly 2^x) 2D FFTs on OpenCL on the GPU
On the NVIDIA cars, i think we can safely assume that the performance with OpenCL with <= cufft performance ( ~ 20 - 40 % ). I am not sure if NVD is even thinking of a OpenCL version of theier library anytime soon...
But, with the ATI cards its not all the clear...so I was looking to get an estimate for the same (it would be also great if someone from AMD could fill us in if they have nay information in this regard..)
So, with I've concluded that porting the Apple OpenCL fft and benchmarking it both the hardware is the best way to go (with the lack of any futher info...)....'