It is a doddle. I have large(ish) f90 codes that call many gpu routines written
mostly by me but I also use clFFT (see below).
U need to write a "wrapper" - a C routines that is called from fortran. Here how it is done:
f90 calling routine:
Call getp_ocl( P(1,1,I), PP(1,1,I), N1, 0, JPLEN, KPLEN, MY*NY, MZ*NZ, 0 )
Now the corresponding C routine is:
void getp_ocl_( double P, double TMP, int *LENX, int *IOFF, \
int *LENY, int *LENZ, int *YDIM, int *ZDIM, int *N_GPU )
Note the appended underscore in the c routine - this is needed
by most compilers - but not all. Test your sys for what works.
Compile the fortan part with -c option
Compile the C part (with gcc) also with -c. Do not forget the various
paths for loading the libs and include headers.
Link the *.o files with your linker.
Regarding math libs on amd gpus, you may find that you need to writre
your own stuff. I am currently using clFFT extensively but it does not
work (by design) on multiple gpu cards. So I am now writing my own
Thanks but my question is about a sort of analogue (if there is) of cublas thunking and non-thunking. So, something provided from the library and not hand-made piece of software (error prone and performance critical):)
Last time I looked - about a year ago - thunking cuffts produced modest gains sp when
the io over pcie is taken into account. And surely, writing an interface routine to access
the manufacturer's software is a relatively safe process. And if performance is so critical,
it is very likely that u will have to try your hand programming.
All the best.