Hi I am new here. I have a question regarding using GPU in computing.

I have two single precision float array A[1....2000] and B[1....300], A or B does not have any repeated elements within each array, each elements are from 1.0000 to 6000.0000

What is the fastest way to find how many elements in B are similar to elements in A? Similar here means abs(B[x]-A[y])<1E-4

Here is the method I used on CPU

Boolean array C[1......6E7], if a number, say 1234.5678 is in A, then C[12345678]=1

Then I just read a number from B, and do: if A[B*1000]==1 then D++;

If I do this on GPU, will it be fast? Or array C is simply too big for the LDS?

Is that possible to treat these array as an image?

Hi, welcome to the forums. Please make sure to mark an answer if you find it helpful.

First you need to make sure you know what similar means for your application - I'll carry forward with what you've decided is OK to use:

-The dimension to attack parallelism here is the index of an array - you can see then that this is a 1d problem

-Read in one input from both arrrays in global datastore/memory to local registers

-perform an absolute difference operation and threshold to your desired tolerance, store the result to a local register, as an 32 bit [unsigned] integer.

-do a parallel reduction on this threshold integer - in particular the algorithm you would be using is called parallel prefix sum or [parallel] scan with a addition reduction

The final sum is a count of how many elements are "same" by your metric.

Yes it will be fast, that depends on hardware but for a 7990 - probably a hair under 500 microseconds for 4 million elements ignoring the upload time provided you find the right parallel prefix sum implementation and tune everything.

Note on the parallel prefix sum - it is the algorithm that occurs everywhere in GPU programming and so it is good to understand it. OpenCL 2.0 has the hard part built in as part of the workgroup reduction functions but alas it's hard to get it fast in OpenCL 1 since there's no official implementation/library. You might find inspiration from CUDA CUB or AMD's Bolt library but it is not for the faint of heart to figure these things out.