cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

suboba
Journeyman III

Driver/runtime, OpenCL version & supported devices

Okay so this is a continuation of : Installed AMD APP SDK 2.9.1 but got OpenCL 2.0?

I don't know why, but I'm not able to reply to my own thread, so I'll make this thread.

I have AMD GX-210HA w/ Radeon HD Graphics which I THINK supports OpenCL 1.2  : http://www.amd.com/Documents/AMDGSeriesSOCProductBrief.pdf

It is part of this Gizmo 2 SBC.

I installed the latest driver for this hardware and got the results in my previous thread:  OpenCL 2.0 for the gpu and OpenCL 1.2 for the cpu. (see previous thread).

Now according to bsp2020's response to my previous question, the driver you install contains the OpenCL runtime that will run on the gpu.

learning that the driver contains the ocl runtime, I tried to install an older version driver that I hoped would contain an ocl 1.2 runtime - not ocl 2.0  .   Unfortunately, it seems that I'm not able to install any previous version / older drivers for this hardware(I keep getting an error about X server not supported), thus there is no way I can get a hold of the ocl 1.2 runtime which my device supports.  The latest driver works, but gives me ocl 2.0 on hardware which is supposed to support ocl 1.2.  Consequently, when I try some sample programs, choosing the gpu actually results in slower results.  My guess is because the runtime/driver does not match the hardware which is supposed to support 1.2. 

So my questions are: 1)  can you use OpenCL 2.0 on a device that supports ocl 1.2?  ( I would think not).

2) Running clinfo ( see previous thread) reports that the gpu is getting ocl 2.0 runtime.  How can this be if the device does not support that version of ocl?  Thus, I tried to install an older version of catalyst, but wasn't successful.

This might be a long shot, but could AMD just tell me/ provide me the exact driver version and APP SDK for this product so I could finally do some ocl development?  I specifically purchased this Gizmo 2 board just for the OpenCL capability, and now it seems I can't use it!

Please, any help would be appreciated.

0 Likes
22 Replies
jtrudeau
Staff

Not sure why, but you weren't white listed. I just fixed that. You should have no problem replying to topics anywhere in the developer forums. I'll move this over to OpenCL.

0 Likes

ty jtrudeau

0 Likes
bsp2020
Challenger

Did you actually try running any examples in the SDK? You should not have any problem running OCL1.2 code using the latest driver. In fact, I don't know of any OCL2.0 applications that is written to take advantage of OCL2.0 features. All the software I tested are for OCL1.2 and they had no problem running with the latest AMD drivers.

0 Likes

Of course I did.  That's how I'm discovering that something is not correct about my setup/installation.  For example, I tried the Helloworld example and it took well over a second using the gpu device.  Is that really how it is?  (no seriously, i'm new to opencl, so I don't know what to expect).  However, when I select the Cpu device which clinfo reports as using ocl1.2 unlike the Gpu, I get much faster results (less than a second). 

Is it possible the cpu device is actually faster than the gpu device?  I thought the whole purpose of OpenCL is to take advantage of the gpu device that has parallel computation capabilities.

Let me ask you this.  Suppose your box has 2 devices.  One cpu (any # cores) and one discrete gpu (graphics card).  And both specify only supporting Ocl 1.2.  Now let's say you install the latest Catalyst driver which supports ocl 2.0.  If you were to run clinfo, what will it say your platform version is? 1.2 or 2.0?  For each device?

You saw in my clinfo output that it is reporting I have two devices each supporting a different version of ocl: gpu = 2.0 and the cpu = 1.2.  But according to the hardware spec, only 1.2 is supported for both devices (keep in mind that both the cpu and gpu are integrated on the same die, but it still reports it as two separate devices).  The product specification doesn't say anything about supporting ocl 2.0.   Is the driver capable of detecting the device and what ocl version it supports during installation?

I just don't know why the driver gives one device the 1.2 runtime (the cpu) while it gives the other the 2.0 runtime (the gpu) - which according to spec, doesn't even support 2.0.  Now when I try to run any application, it chooses the Gpu by default and actually runs slower than when I explicitly choose the Cpu device. 

I like to use the BinarySearch example because I can specifiy which device I want it to run on gpu/cpu and whenever I choose the Cpu device, it actually runs faster than the default Gpu device.

And like I said, I can't seem to install an older Catalyst driver so that I know for sure I'm only getting a 1.2 runtime.  Please do me a favor and review my clinfo output.  I just don't know why it's saying the gpu is ocl 2.0 capable when I know for a fact it isn't.  If I could get it to somehow report it as 1.2, as it should be, I think I might be in good shape.

thanks.

btw, I did try the 3.0 sdk helloworld example which is supposed to contain ocl 2.0 features on the gpu (which clinfo reports as supporting 2.0) still no change.  Still slower than choosing the cpu.

0 Likes

Comparing the performance of HellpWorld example is not a good way to measure performance of OCL runtime. Also, using OCL 1.2 runtime or OCL 2.0 runtime won't make much difference to the performance if the application is not written to take advantage of the OCL2.0 features.

As for the example running slower when using GPU, it may just mean that it is harder to compile OCL kernel for GPU than CPU. There are bunch of examples that measure the performance in the SDK examples. Try those if you are interested in measuring performance. Also, getting the maximum performance out of GPU is hard. Prepare to spend a lot of time optimizing.

I don't think you have any setup problems. Nor do I think installing OCL 1.2 driver will make any difference for you.

BTW, GX-210 only has 2 GPU compute units. It may not be significantly faster even after you optimized your OCL kernel.

Good luck.

0 Likes

Hi,

Not trying to interfere your discussion. Just want to share few points.

If you were to run clinfo, what will it say your platform version is? 1.2 or 2.0?  For each device?

Please don't confuse "Device OpenCL C version" with "Platform Version" . AMD has started to support OpenCL 2.0 from 14.12 Omega driver. If you install Omega or higher drivers, you'll get "Platform Version"  as OpenCL 2.0. But, the devices associated with the platform may have different OpenCL support. Each device has its own OpenCL support and you can get that information from "Device OpenCL C version" parameter. The programs or apps that can be run on the device depends on that particular device's OpenCL version. To get the OpenCL 2.0 supported devices, you need to install drivers with OpenCL 2.0 platform support.

  (keep in mind that both the cpu and gpu are integrated on the same die, but it still reports it as two separate devices).

In case APU or any integrated GPU, the CPU and GPU components are selected as two separate devices though they may be in the same die.

You saw in my clinfo output that it is reporting I have two devices each supporting a different version of ocl: gpu = 2.0 and the cpu = 1.2. ...

As per your clinfo output, the gpu device is listed as OpenCL 2.0. However, any information about the new OpenCL 2.0 capabilities such as SVM, device-side queue, pipe etc. are missing from the clinfo output. It should list all the information even if device doesn't have the support.

I would suggest you to clean everything and do a fresh installation. After installing the driver, please check and share the clinfo output without APP SDK (i.e. before APP SDK installation).

And like I said, I can't seem to install an older Catalyst driver so that I know for sure I'm only getting a 1.2 runtime.

Please provide more details about the error, OS, driver version (may share the link). If needed, I'll check with the concerned team.

To know whether the gpu really supports OpenCL 2.0 features or not, you may do:

1) run OpenCL 2.0 samples such as SVMBinaryTreeSearch, DeviceEnqueueBFS, PipeProducerConsumerKernels etc. from APP SDK 3.0 beta [please check the error message, if any].

2) try to build/compile a kernel for OpenCL 2.0 using "–cl-std=CL2.0".

Regards,

Thanks a bunch dipak!

I did not want to mention it earlier, but I am almost suspecting if AMD made a mistake on their hardware and the gpu really is ocl 2.0 compatible, but they somehow forgot to report it on their specification.  It should be 1.2 not 2.0.  But if it really is 2.0 compatible, I wouldn't mind.  Why would I?  This is exactly why I asked that question.  I wanted to know what clinfo reports/detects.  If this is the case, then the gpu must really be ocl 2.0 compatible and AMD didn't specify that!  In either case, if it's 1.2 or 2.0, it looks like I'm still not seeing any speedup from using it. 

Now that you've confirmed they really are two different devices, why am I seeing more speed from the cpu than from the gpu?  Again, in all the examples I run, choosing the cpu actually runs faster than choosing the gpu! (which most of the programs choose by default, the gpu).  

Not to get off topic, but I'm wanting to run an application that uses OpenCL.  So far, as with the sdk samples, this particular application is choosing the gpu by default and is running really slow.  Since I know from experimenting with the samples, I'd like to test this application using the cpu.  But I'd have to hack the source in order to do that.  I don't think it would be too hard, but I really shouldn't have to. 

It seems all OpenCL applications are choosing the gpu by defualt because that's what they determine is the fastest device from the environment, but I know from experimentation, something is wrong because choosing the cpu actually runs faster than the the gpu.  I don’t' want to manually hack applications to run on the cpu because I know it runs faster.  I shouldn't have to.

And like bsp2020 said, ocl applications written in 1.2 can run on 2.0 hardware.  It's just that 2.0 features won't get used.  That's fine for my situation.  I believe the software I'm wanting to run is all written in ocl 1.2,  thus it can run on a device that supports ocl 2.0,  but it's still choosing the slower device.

Argghh.  This is making me frustrated.  I'll try to do a fresh/clean install of the latest driver w/o the install of the sdk and report back what clinfo gives me.  Thanks again.  Really helpful stuff.  

thank you.

0 Likes

Hi suboba

Your are right. As generally GPUs are faster than CPUs and better fit for parallel algorithms, most of the OpenCL applications choose the GPU devices by default. Normally, its trivial to set or change the ordering statically. However, if you really want to do it dynamically based on capability of the associated devices, you need to gather their device information and compare the intended parameters such as number of CUs , PEs, clock frequency, memory size, native vector width etc. Priority of the comparing parameters depend on the particular application.

Now, coming to the lower GPU performance issue. There may be many reasons for observing this behaviour, for example, application type, hardware limitation, driver/compiler problem etc. As you aren't able to install other drivers (mainly pre-OpenCL 2.0 ones), its not possible to compare the impact of the drivers. Also, note that your GPU is not so fast [2 CUs with clock frequency of 300Mhz only]. So, I would suggest you to do some benchmarking to check how your GPU is performing compare to expected numbers. For benchmarking, you may run SDK samples such as GlobalMemoryBandwidth, BufferBandwidth etc. or publicly available apps such as krrishnarraj/clpeak · GitHub​ [originally posted here: Drop in fglrx OpenCL performance: 14.12 vs 15.5 ]. You may check Radeon HD 8210E  and similar sites for your reference.

Regards,

0 Likes

Hey dipak,

I did a clean install of the latest driver without installing the amd 3.0 sdk.  Hmmm...strange.  clinfo actually reports different stuff.  Is this expected?  How did you know?  Anyways, I think the ocl 2.0 stuff you were refering to missing is there now.  I have no idea why the clinfo is different this time without the sdk installed.  Please have a look and let me know what you think.  I will try to do some benchmarking with the gpu as you suggested and get back to you.

btw, is it even possible to run ocl applications with just the runtime/driver and no installed sdk?  Maybe the sdk is interfering with this driver which makes it perform the way it does.  I mean, clinfo is different without it after all.

Please see attachment for clinfo output.  Thank you.

0 Likes

Now your clinfo output looks fine. Sometime, due to improper installation/un-installation of catalyst or APP SDK, it may happen that clinfo link gets broken and it points to some other version, not the expected one. I faced this problem too.

As long as the runtime/driver is installed, you can run any OpenCL application. Don't need any SDK. APP SDK is mainly needed for development purpose.

Please check whether you see any performance difference or not. [Note: if you've the SDK sample binaries, you can run them directly. Don't need to install the SDK]

Once done, install the APP SDK [3.0 beta is preferable though you don't need any 2.0 features] and then check the clinfo. You should get the same output as now. [if still face the problem, just try reverse order, i.e. SDK then driver].

Regards,

0 Likes

Hi.  Just wanted to update.

so I tried to run the samples with just the runtime/driver installed and no sdk.  No performance gain.  Then I tried to install the 3.0 sdk again and this time clinfo was correct, but still no performance gain.  I tried the samples with ocl 2.0 features such as SVMBinaryTreeSearch and it seemed to work.  However, I'm not sure how to interpret the results.  From doing this, I learned something interesting.  I have an ocl 2.0 device and didn't even know it!  AMD didn't specify that the gpu is ocl 2.0 compatible on their product spec!  Really?  Is that possible?  How could a chip maker not even know what their hardware is capable of?

Anyways, even if the gpu is ocl2.0 compatible, practically speaking, I haven't seen any real performance from it.  I have to say, I'm a little disappointed in AMD.  False advertising.

0 Likes

I don't know if the Gizmo 2 is expected to support OpenCL 2.0 or not. Having said that....

Here's a simple scenario where a chip would support OpenCL 2.0, but the product spec doesn't say that. Perhaps at the time the spec was prepared, OpenCL 2.0 was not yet supported by the software stack. So, while the hardware was capable, some key software element (driver, runtime, compiler, etc) did not. So clearly better to understate than overstate. It's also possible someone made a mistake!

However, more importantly, what do you think is false advertising? Is it simply that you aren't seeing a performance gain between OpenCL 1.2 and 2.0, but we (and others) have certainly been talking about how you can write software that is faster using OpenCL 2.0 features?

I will get your feedback into the Gizmo team.  Including, "perhaps you should take a second look at the product spec, and update it if necessary." The last thing we want is to have misleading information.

0 Likes

Hi jtrudeau!  Thanks for chiming in.  I really appreciate this.

I don't know if the Gizmo 2 is expected to support OpenCL 2.0 or not. Having said that....

It's not expected to support ocl2.0 and I don't expect it to.

However, more importantly, what do you think is false advertising? Is it simply that you aren't seeing a performance gain between OpenCL 1.2 and 2.0, but we (and others) have certainly been talking about how you can write software that is faster using OpenCL 2.0 features?

Sorry for the confusion.  Forget whatever I said about Gizmo 2 and OpenCL 2.0.    I don't care about ocl 2.0.  The problem is: I AM NOT SEEING ANY OpenCL AT ALL!  I bought the Gizmo specifically because it said it supported OpenCL 1.2.  But I can't seem to get it to work.

I tried a simple vectoraddition helloworld example (ie. add the contents of two arrays A & B and store the results in a third array C), but the OpenCL version actually runs SLOWER than the serialized cpu version!  I know dipak and bsp2020 mentioned that the gpu is not that powerful (ie. only 2 compute units @ 300 MHz), but c'mon, vectorAddition?  really? Something is not right here?  Practically speaking, I added two integer arrays of size 1024  elements and it took well over 1 second using ocl!  Over 1 second?  I knew something couldn't be right, so I wrote a quick serialized version in C and just ran it straight -> 0.12s on the cpu!  Something's not right here. 

Here's AMD's claim: http://www.amd.com/en-us/press-releases/Pages/amd-embedded-g-2014nov11.aspx

In it, it says something about achieving 85 GFLOPS (I have no idea if that is fast or not) of performance.  How can I reproduce those results on this product?  What do I need to do in order to confirm that benchmark?  How did AMD's engineers come up with that number?  Is that 85 GFLOPS with or without OpenCL?

jtrudeau, I hope this is not too much too ask, but could you possibly talk to the Gizmo team and ask them for the specific driver and app sdk used in making the Gizmo2?  (if it was even used at all).  I'm really desperate here.  I think I should share my situation here.  I bought the Gizmo2 because according to the product specification it was capable of running OpenCV with OpenCL 1.2.  That's really my sole reason for buying it.  "...support for OpenCL1.2 enabling parallel processing...".  Now all I'm trying to do is test and verify that the OpenCL is really working, but God help me, I can't seem to see it in action!  I'm sorry if I said it was false advertising, but I'm sure you can understand how this is a bit of a letdown for me.  If a simple vectorAddition (openCl implementation) takes well over a second, how much worse will it be when I try to use the OpenCL with OpenCV?

In short:  Guy sees advertisement for computer that supports OpenCL1.2.  Guy buys computer.  Guy tries to test OpenCL feature.  Guy is dissapointed when OpenCL doesn't appear to work.  Guy now asking AMD (who developed the product) for help.  Guy wonders if he just wasted his money.  Guy feels cheated.

Not to be a crying baby, but please jtrudeau see if you can get any help for me from the gizmo team.  I would be forever grateful.

P.S.  As I'm typing this, I just realized the Catalyst driver for the Gizmo2 was just updated.  Released 7/5/15 and according to the notes, contains some fixes for G-series SOC.  But it doesn't look like quite the same chip for Gizmo2.  I haven''t tried it yet, but I'll see if it works for me.

0 Likes

bump

0 Likes

Should I venture to ask what is the clock speed of the CPU ? If the clock speed of the CPU is greater than that of the GPU and has more core, then it stand to reason why the GPU does not seem to be up to par. Especially if the CPU has more than 2 compute unit. Granted the GPU may have wider SIMD registers..but no substitute for brute force ( the higher clock speed and higher CU count of the CPU ). You keep mentioning that OpenCL is not working, by virtue of the fact that your application ran by your previous statement means OpenCL is working. If the app crashed at startup due to missing OpenCL support then I would agree that OpenCL does NOT work. Getting unexpected result ( speed it seems in this case ) is not equivalent to OpenCL not working as no where in the OpenCL specification does it guarantee that you will get a speed increase of say n. Give me the biggest baddest FirePro card and I can bring it to its knees with a few unoptimized kernels, that a multicore CPU may be able to execute faster for example due to memory access patterns...

0 Likes

The clock speed of the cpu is indeed greater than the gpu.  Does this automatically imply that there are no operations/tasks that can be offloaded to the gpu which can perform them faster than the cpu? 

Suppose the OpenCL is working.  How do I make USE of it?  What is it for?  Why would Gizmosphere & AMD add support for OpenCL only to not see the feature work?  Why would they include this feature (and even advertise it as one of its core features) only for it to be USELESS?  From the perspective of a user/customer//developer, what does the OpenCL feature mean for me?  What can I DO with it?  "Support for DirectX® 11.1, OpenGL 4.2x and OpenCL™ 1.2, enabling parallel processing"  What exactly did AMD mean by "enabling parallel processing"?   What exactly is their definition of parallel processing? 

A number of posters have been telling me that the gpu is not that powerful.  Something like only 2 compute units at 300MHz which apparently, is not that powerful.  So why the hell would AMD add OpenCL support for it, knowing (or unknowingly) that nothing (practical at least) will be gained from it?  (a hundred points to anyone that can answer that question).

AMD claims it can achieve 85 GFLOPS performance via the gpu.  I have yet to see that.  I'm hoping I can get some help from the Gizmo team to tell me how to reproduce that benchmark.   Any word yet jtrudeau?

If I can't get ANY speed increase from using it.  What's the point?  The Gizmo is really just another raspberry pi.  Prove me wrong.  I still have hope for it, I just think the driver for it is screwed up.  The second it reported that the Gpu was OpenCL 2.0 capable when it should only be 1.2 capable, I knew something was up.

AMD Gizmo team prove me wrong.  (Rather, prove your product right).

0 Likes

Why do you insist of thinking that the GPU is more powerful than the CPU...A GPU is a processor just like the CPU architecture may be a different but they operate on the same basic premise...this is basic computing.. the reason there was the clock speed war in heydays ( not so prevalent today ) was plain and simple higher clock speed would result in the more operations executed in LESS time. So again it stands to reason that IF the CPU has more cores and higher clock speed than the GPU it WILL outperform the GPU. Before, we can proceed you need to get rid of this idea that a GPU automatically is faster than a CPU.

You are sorely misunderstanding the purpose of OpenCL and for that I would refer you to the Khronos website for a little more information as it is all there. AMD/Nvidia and other IHV have documents related to OpenCL. Why would you try to use OpenCL without knowing what it is for? I would first try to get some information on OpenCL and how it applies to your problem set, and if its even a proper fit. That's call software design, one just doesn't get up one day and say I'm going to use OpenCL without knowing or have some idea what OpenCL is. As for parallel processing, its hard to explain these computing terms to someone who ( no disrespect ), has a limited understanding of the basis and the issues at hand. One of the key feature of OpenCL is that it allows heterogeneous multiprocessing aka parallel processing, ie, you can have multiple device operating in tandem to solve a particular problem set, not JUST the GPU.

Speed, speed up for WHAT? I highly confident that when used in unison, ( GPU + CPU ), you would achieve some form of measurable speed up, however, one cannot measure something that does NOT exist.

The Gizmo is NOT a raspberry pi as they CPU architecture is different. Yes they a both SBC, other than that there is no comparison between the 2.


All this ranting about being useless is counterproductive have you have not yet establish anything substantial to support that claim. I don't have to prove anything, if the driver is messed up which again I highly doubt, then that needs to get addressed, other than that, this is a baseless post. Say OpenCL 2.0 is NOT available and 1.2 is, what features of OpenCL 1.2 are you going to take advantage of ? ( Again without knowing that, all this is useless ranting ). As been mentioned above, OpenCL 1.2 can do most of the stuff that 2.0 does, and will run fine ( I know this because I use OpenCL ), but AGAIN you have not explained why problem you are trying to use OpenCL to solve.

I suggest you trade in your Gizmo for Jetson TK1 and then either rant to Nvidia that CUDA does not work or that you don't have OpenCL driver.

0 Likes

Yes, absolutely agree. But at the same time, for people new to the technology there SHOULD be a clear demo of the stated advantages, and it SHOULD be as bullet proof as possible. So my mission is to track one down. In this case it MAY be (not sure) that the APP SDK examples are built for higher-end dGPUs, and may not demonstrate obvious performance gains. Don't know.

If I can't come up with one from the Gizmo team, then I'll encourage them to create one. I'm hunting around and not finding one obviously available.

In the meantime, the fundamental concept here, as you state is: if the software is properly written, and the calculation can benefit from parallelization, you have additional compute units that can work relatively independently and simultaneously with the CPU. There is overhead from moving data back and forth (OpenCL 1.2), which will bite into performance gain, but done right and with the right algorithms, you get more bang. The absence of a good demo does not make this untrue.

suboba​ I assume you've visited Resources | GizmoSphere . Just in case.. although I don't see formal example demos, I'm skimming quickly. I did notice there is a gizmo-focused forum there... Forums | GizmoSphere | GizmoSphere. Not telling you to go away, just making sure you are aware of potential resources. I will hunt, shall not fall of my plate again.

0 Likes

I don't think you read the whole thread, so let me reiterate my problem statement.  From an earlier post (that you obviously didn't read):

jtrudeau, I hope this is not too much too ask, but could you possibly talk to the Gizmo team and ask them for the specific driver and app sdk used in making the Gizmo2?  (if it was even used at all).  I'm really desperate here.  I think I should share my situation here.  I bought the Gizmo2 because according to the product specification it was capable of running OpenCV with OpenCL 1.2.  That's really my sole reason for buying it.  "...support for OpenCL1.2 enabling parallel processing...".  Now all I'm trying to do is test and verify that the OpenCL is really working, but God help me, I can't seem to see it in action!  I'm sorry if I said it was false advertising, but I'm sure you can understand how this is a bit of a letdown for me.  If a simple vectorAddition (openCl implementation) takes well over a second, how much worse will it be when I try to use the OpenCL with OpenCV?

To answer your question, what problem am I trying to solve with OpenCL (that I obviously did not state earlier)?:  I need OpenCL to accelerate OpenCV on the Gizmo2.

I do admit, I don't know much about OpenCL.  But according to the product specification for this product developed by Gizmosphere and AMD, I should be able to use the Gizmo 2 in conjunction with OpenCV and OpenCL.  I should see an increase in performance ( in terms of speed) of OpenCV due to the presence of OpenCL.

At this point,  I am just trying to unit test the OpenCL capability of the board.  If I can't see a simple speedup of a helloworld example, how much worse will it be when I try to hook up OpenCV and OpenCL together?  I have to admit that I haven't actually tried to run OpenCV with the OpenCL installed, but I figure I'd unit test the OpenCL first.  Sure, it could magically work ie. OpenCV performs faster w/ the OpenCL compared to w/o the OpenCL, but I want the reasoning to be sound.  Eg. get the OpenCL to work first as a standalone to confirm that it's working, THEN hook it up with OpenCV to see if there will be a speedup in OpenCV.

I shouldn't need a deep understanding of OpenCL or parallel programming in order to get my OpenCV application to run faster using this Gizmo2.  That's what they (Gizmosphere & AMD) CLAIM.  Why do I have to waste my time proving that I can get a speedup using OpenCL  on this board, when they are already claiming it?  I shouldn't have to.  I should be able to run it out-of-the-box.

I don't care HOW the OpenCL works for this board.  If Gizmosphere/AMD claim that there is a performance speedup from using it (OpenCL), then that's all I need to know.  The problem now is that I (and AMD so far) cannot confirm/prove their claim on their own product.  It could be the hardware, the driver, or maybe I'm just using the wrong OS, etc.

The bottom line: Gizmosphere & AMD have made a claim with this product, and I want proof of that claim.  Eg.  OpenCV runs faster w/ the OpenCL installed or at least a reproducible benchmark that confirms a claimed 85 GFLOPS performance.

0 Likes

My apologies, this fell off my plate completely. When bumping, it's best to @mention me, like jtrudeau​, and that is more likely to get my attention. Nothing is perfect, but that's better. Anyway...

I never heard back from the Gizmo folks. I'm going to go raise the temperature on that. I am not directly familiar with this board, but if the description here is correct, it is at the low end of the performance spectrum. However, in principle, the presence of two compute units gives you a resource that in theory a developer can use to get performance gains, with properly designed software. I am 99% sure that the examples in the APP SDK were not designed specifically for this hardware, and probably not tested on it. Nonetheless, let me go ring someone's bell over here, and see if we can come up with a demonstration of performance improvement using OpenCL on this hardware. That is a perfectly reasonable request.

0 Likes

Thank you so much jtrudeau.  I don't mean to use you as the middle man, but you're my only connection to the Gizmo developers. 

Yes, I have been to the resources section of the Gizmosphere website, but I don't think there is anything useful over there.  What really sucks is that they have an article on the wonderful capabilities you can experience when you combine OpenCV and OpenCL on the Gizmo.  The problem is, it looks like that article is referring to the old Gizmo (gizmo 1) not the latest Gizmo 2.  I sure it's an outdated article.

The community forum over there seems a little dead.  I'll try to join and see if I can get support, but I doubt it.

Between you and me, I think there was a huge marketing mess up and the Gizmo 2 has no performance gain whatsoever due to its precious OpenCL capability.  However, please keep up the good work and keep pressing the Gizmo team over there.  It could be as simple as using a specific OS or something.  Yes, a simple demo for this product proving its capability, would really put me at ease.  Eg.  Here's a program w/o OpenCL enabled and the same program w/ OpenCL enabled both running on the Gizmo 2.

If nothing else, ask them how to reproduce the 85 GFLOPs performance benchmark (Show them this video): AMD Embedded: Gizmosphere's Gizmo 2 SBC at #CES2015 - YouTube  (I hope that guy is not just some marketing shmuck that doesn't know wtf he's talking about).

I wonder why they never got back to you?  I hope they're not hiding because they can't prove their own product.

0 Likes

Hi Praever,

Sorry for the delayed response.

The Gizmo team ran a bunch of tests on the Gizmo 2 -- specifically, they ran programs on the OpenCL CPU device (i.e. running only on the CPU, using the --device cpu option) and the same programs on the OpenCL GPU device (i.e. running only on the GPU).

The OpenCL CPU device used was:  AMD GX-210HA SOC with Radeon(tm) HD Graphics

The OpenCL GPU device used was: Kalindi Device

The programs tested are all part of the AMD APP SDK (the version tested was APP SDK 3.0.0 Beta). The driver used was: Driver 15.101.1007-150611a-185789C-AES.

The following table shows the ratio of the time taken on the CPU only versus on the OpenCL-enabled GPU only for these programs:

Avg. Kernel Execution time on OCL CPU / Avg. Kernel Execution time on OCL GPU
BlackScholes19.326
KmeansClustering10.46279
MandelBrot7.75105
Nbody40.27012
UnsharpMask36.04005

As you can see from the above table, the above programs take significantly longer to execute on the OpenCL CPU device than they do on the OpenCL GPU device.

We encourage you to run the above samples and report back if you see similar results.

I don't have the best (or even close to the best) understanding of OpenCL, but I can say this: performance gains on OpenCL GPUs depend a lot on how the program has been written to take advantage of the OpenCL features. Like any other programming tool, OpenCL is but another tool: a capable developer can tune OpenCL code to provide strong performance gains on OpenCL GPUs. There's no reason why you too, after some hands-on with OpenCL code, shouldn't be able to do so.

--Prasad

0 Likes