with your funds you are limited to gaming gpus
which in general are intentionally crippled in their
double precision compute rates (fp64). both
amd & nvidia do this presumably so as not to undercut
their sales of their "professional" products (firepro
in the case of amd) which sell for much more.
for older cards see how they compare in their fp64 rates
in the link below.
http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/
the amd tahiti core (hd 79xx renamed later as r9 280) the fp64
is quite generous. the new fury cards have lower nominal
fp64 compute rates. the max energy consumption of the tahiti
cores is relatively high, but if it is unlikely that you
will be maxing the gpus 24/7, it may still be acceptable.
i am writing opencl code for the 280x card (and hd 7950=r9 280)
on ubuntu 14.04 and i use the codexl on such system ok.
typical test is a matrix-matrix multiplication. i have written
my own tiled opencl routine and on the 280x i get 530 fp64 gflops
multiplying two 800x800 matrices which is quite decent as my routine
is not limited to powers of 2. this is 3x faster than mkl-dgemm of
intel running on one or more i7-3930k cores. for larger sizes i will
probably get higher gflop rates on the gpu while it drops on the
cpu.
other such tests are to be found in the links below.
http://www.sable.mcgill.ca/publications/techreports/2013-1/sable-tr-2013-1.pdf
http://kmtmt.net/www/docs/rmmcl_sc13_poster.pdf
the old HD 7990 card has two full tahiti cores and you can find
it in amazon.fr for just 550 euros.
if u have the old lga2011 socket system (x79 chipset), it has 40 pci-3
lanes (and cpu permitting) it can easily handle two 280x cards
so you can interleave computations. this is the route i am taking.
--