cancel
Showing results for
Did you mean:

# Archives Discussions

Journeyman III

## Hamming Distances for 100'000*100'1000 16 bit vectors

Good morning,

I am a total newbiew to this forum and admittedly to OpenCL. Nevertheless, I have a little challenge to solve: I have roughly 100'000 bit-vectors (each 16 bytes - actually it is 128 byte 0/1 ASCII at the moment but converting that is not an issue) and I would like to calculate the Hamming Distance between all of them. Actually, it would already be (more than) sufficient to write a program that takes a single 128 bit vector and caculates the hamming distance between this vector and and individual 100'000 128 bit vectors, return 100'000 integers containing the distances.

Note the Hamming Distance between two vectors is the number of differing bits, e.g.. if you have

v1:                  00000000001000001111

v2:                  00000000000000001111

the Hamming Distance would be 1 as one bit differs. The differing bits are actually easy to calculate: v1 XOR v2, i.e. in the above example

v1 XOR v2:     00000000001000000000

Roughly speaking, that would make it roughly 5'000'0000'0000 comparisons. I think that cannot be done with reasonable effort in plain C. When I look at the speed of the Mandelbrot programs when using OpenCL/GL, I was wondering if it were difficult to implement in Open CL and could anyone give me a rough indication who long would take someone being "fluent" in C/C++ to get into OpenCL?

Thanks a lot for any support and have some great Easter days (off),

Mercator

6 Replies
Staff

Thanks for the question. I'm exlicitly leaving this for our community to answer, as it is a general education question, not a support-style question.

Can anyone help out Mercator?

if you can devote a day or two to reading through some code examples, you'll be done with a working implementation on day 2 or 3. Modify one of those simple examples that come with the developer tools; the vector-matrix multiply example (i'm not sure if it's in the sdk) is a good one as it does something similar to what you want.

http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-c...

this describes a problem similar to your's: evaluating one/two points against all available.

when you're along the way, you're main kernel will probably consist mainly with about 5-10 lines of code

one for a bitwise operator (^) and then counting the number of nonzeros (popcount)

https://www.khronos.org/registry/cl/sdk/1.2/docs/OpenCL-1.2-refcard.pdf

Miniboss

Hi,

In OpenCL there is a function for this: popcount: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/popcount.html

This is implemented in the hardware by SSE4 CPUes and GCN GPUes too. (Not sure if it is supported by earlier GPUes, though...)

So the "5'000'0000'0000 comparisons" is not a problem at all.

For the ASCII -> binary conversion there is also a very fast instruction on the CPU: War on Theism: x86 Instruction Set Reference

It puts every upper bit of 16 bytes into the lower 16 bits of a given register. So you have to load 16 bytes of "00100101101..." into a 128bit xmm register, and then shift each byte in it left by 7 bits, and finally you can extract the resulting 16bits by the PMOVMSKB instruction. Note that, this is not the C/C++ way, it either needs an external assembler or inline asm or cpu intrinsics.

I think the fastest solution for the problem would be CPU asm on an SSE4 compatible CPU, as the task needs relatively small amount of math compared to memory IO.

Even if you don't have that modern SSE4 CPU, there are lots of older SIMD instrutions with which you can build a faster program than:

a) processing the whole data in a loop byte after byte with a classic C program.

b) transfering large (compared to the required math) amounts of data to and from the GPU.

Happy Easter to You too!

That would be really easy as there is a popcount builtin function in OpenCL which returns the number of non-zero bits in a number.

In just one day you can have a decent implementation in OpenCL!

You can take a look on the introductory tutorials about OpenCL on AMD OpenCL zone for example, and then adapt one of those to do your computations.