cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

walkershaw
Journeyman III

can CPU run kernel?

        i'm studying HSA now and trying to compare CPU's & GPU's computing ability in my APU.

        i found  CPU is just a agent, instead of a component. So that means i can't run the same kernel to compare the ability, right?

        can you guys give me some advice. THX.

0 Likes
1 Solution

On Kaveri, coherent shared memory access by an HSA device (eg GPU) is slower than the non-coherent accesses normally used for graphics. On Carrizo the performance of coherent and non-coherent memory should be roughly the same.

View solution in original post

0 Likes
5 Replies
qtvali
Adept I

You can do it in several ways.

There are drivers, which are supposed to turn CPU into OpenCL device, as much as I understand:

windows - Install AMD OpenCL CPU driver with an Nvidia graphic card - Stack Overflow

OpenCL™ Drivers and Runtimes for Intel® Architecture

There are parallel languages (generators) with several targets:

http://parse.ele.tue.nl/research/bones/

http://streamcomputing.eu/blog/2013-12-20/simple-sum-cpu-openmp-opencl-gpu/

So either you write OpenCL code and find ways to run it on CPU, which is possible either through driver or through compiler, or you write some language, which can be transformed into parallel code through code generators or compilers.

This should be fine for benchmarking and comparing different devices? The problem is, though, that different generators might be optimized differently, and you would need to optimize your code differently for different targets - looking the benchmarks on web, I don't think you can trust them much, because most of them test specific technologies with specific optimizations. This is probably very hard to create even a simple program actually perfectly optimized for all the devices, and using wide range of the capabilities of the devices - so we will probably never get a solid knowledge about which device is actually the best in case we would utilize it fully, in best possible way.

I also think that to compare CPU and GPU, you need to move all the program memory into GPU before you run the benchmark, because on CPU you don't have to wait for these copies - it's naturally coming from native memory.

thx a lot. But what am I trying to do is letting CPU and GPU do the same work to partition the work. And I think using HSA means CPU GPU shared the memory, so I don't have to copy.

0 Likes

HSA memory access is still slower, than cached memory access. I think if

you want results about processing speed, you need to make sure that the

memory is instantly accessible; sure you can also compare the both ways. On

both CPU and GPU, there are several layers of caches - as smaller as faster.

Tambet

0 Likes

On Kaveri, coherent shared memory access by an HSA device (eg GPU) is slower than the non-coherent accesses normally used for graphics. On Carrizo the performance of coherent and non-coherent memory should be roughly the same.

0 Likes
qtvali
Adept I

Actually I think you need to know this as well, which more or less determines the way to go:

A simple sum example, from CPU to OpenMP to OpenCL to GPU - Blog - StreamComputing

0 Likes