cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dragonxi4amd
Journeyman III

OpenCL project without any OOP language

Using C-language with C99 instead of C++ and C99 for OpenCL

Hi All,

(1) Is it possible to develop C99 OpenCL apps using just C language and

C99 language instead of C++ and C99 ?

(2) If yes, which C-language and for which OS ?

(3) Has anyone tried C & C99 combination ?

For those who wander why not to use C++:

A)  C++ is not needed in this project

B)  C-language offers all the features needed with C99

C) Although OpenCL API documentation refers to objects, please not that API is about FUNCTIONS - there no classes to create objects as in OOP! 

D)  C++ introduces extra delay compared to C-language

* this is our client's requirement based on their own experiences, not to be argued and it's true that OOP has its price to be paid i.e. lost speed

E) response times already are challenge with OpenCl

F) OpenCL specs V1.2 has some new features which (if rightly implemented) will improve speed

- however one may have to wait v1.2 implementation not to be available before summer 2012 !?

Thanks in advance!

Ronnie

ps. this topic is relevant and not to be "merged" (nor deleted, please!)

 

 

 

 

thanks in advance

Ronnie

 

0 Likes
20 Replies
himanshu_gautam
Grandmaster

Ronnie,

Well I think on a basic level OpenCL Programming needs a OpenCl Kernel(which has to be in OpenCL C language, which is nothing more that c99 with some extensions) and Host Program(which is totally language independent. Wrappers for almost all popular languages are now available.And ofcourse C is definitely the fastest of the choices available).

It would be nice if you specify what particular feature you are looking for from 1.2 spec. It might be shifted to higher priority 🙂

0 Likes

Hi Himanshu,

That would be VERY nice surprise !

Following are the features our clients are looking for from 1.2 spec:

1) clCreateProgramWithBuiltInKernels

2) clCompileProgram

3) clLinkProgram

4) clCreateSubDevices

Thanks in advance !
Ronnie

0 Likes
nou
Exemplar

- OpenCL API is in C and there is only thin C++ wrapper on top of this API.

- you can write in style OOP even with pure C or even in ASM. C++ is just syntactic sugar for OOP.

0 Likes

IMHO this overhead of C++ we're talking about is minimal, somewhere on the border of being able to see at all. Efficient OpenCL applications do not call API functions on every corner, and even if they do, the speed of these functions will 99.99% depend on the API call itself (which is precompiled library that the programmer has no control over), and not on the language it was called from, or even a wrapper around it. Even if a wrapper does error checking on every corner, does some profiling, it should not hurt performance. (Taking 2-3 if()s, and the likes is not painful for a CPU, and API calls shouldn't happen more than 100-200 times per second at the very most).

0 Likes
thesmileman
Journeyman III

Originally posted by: dragonxi4amd Hi All,

(1) Is it possible to develop C99 OpenCL apps using just C language and

C99 language instead of C++ and C99 ?

(2) If yes, which C-language and for which OS ?

(3) Has anyone tried C & C99 combination ?

For those who wander why not to use C++:

A)  C++ is not needed in this project

B)  C-language offers all the features needed with C99

C) Although OpenCL API documentation refers to objects, please not that API is about FUNCTIONS - there no classes to create objects as in OOP! 

D)  C++ introduces extra delay compared to C-language

* this is our client's requirement based on their own experiences, not to be argued and it's true that OOP has its price to be paid i.e. lost speed

E) response times already are challenge with OpenCl

F) OpenCL specs V1.2 has some new features which (if rightly implemented) will improve speed

- however one may have to wait v1.2 implementation not to be available before summer 2012 !?

Thanks in advance!

Ronnie

ps. this topic is relevant and not to be "merged" (nor deleted, please!) 

Your entire post seems to indicate you haven't read the spec or you don't have a clear understand of basic code. You mentioned the C++ API doesn't have classes and seem to indicate it is just a static class wrapper which simply calls functions which isn't the case. It even has container support so I really don't understand what you are talking about.

Question 1 and 3 are demenstrated in the spec and almost any other tutorial on the web. For you to actually know statement "E" you would have to have used OpenCL which clearly if you can't answer 1 or 3 you haven't done (Or if you have you are completely missing the obvious). For statement C you obviously haven't looked at the wrapper (or ANY tutorial dealing with the wrapper).

The only reason for your post seems to have us do research to answer your questions for you went litterall 1 hr of research coudl have provided you witht the answers. Now I am wasting my time so I am going to stop.

0 Likes
LeeHowes
Staff

I don't really understand, and I've been working on the C++ bindings recently (there is a new version downloadable from khronos if anyone's interested - minimal hello world program is now down to about 10 lines). There is little to no performance degredation from C++, and the way the bindings are designed they should almost entirely optimise away. There are many cases, largely thanks to templates and functors instead of function pointers, where C++ is just faster than C.

Having said that, obviously there are still developers who do not want to use C++ for various reasons. The OpenCL API is not a C++ API - that is a conscious design decision. You can hence use OpenCL perfectly well without a C++ compiler.

Given that, I just don't see what the problem is... are you just asking if anyone has tested this fact? I'm sure the embedded people have. I haven't used a compiler in strict C mode but there's no obvious reason why it shouldn't work. I wouldn't do it. The C++ bindings are better in every way unless there is a good reason to stick to pure C.

Lee

0 Likes

I use pure C under GNU/Linux. Seems everything works fine. But there is enormous size of code. For instance, I've 800 lines of CL initialization code in my pattern recognition program. I used to program CUDA and it's C implementation much more compact.

0 Likes

Thanks to all of those who have given constructive feedback !

Our developers agree, the OpenCL C programming language (C99) is/will suitable for certain applications and platforms -
we use it at the moment for certain projects!

However, for example rapidly deployable autonomous and unmanned systems and devices for network-centric operations
at remote and demanding theaters require special software and hardware not available at the moment!

Also, development for real-time embedded mobile systems require powerful compilers!
Sadly, current OpenCL C compiler implementations do not produce enough efficient code.

Some of our clients are applying the principle "less is more" and develop with their partners SDKs
where programmers need to master ONE language (with C99 you need another language such as C++/C)
- and SDKs which are designed right from the beginning with performance and reliability having high priority
(they do not make the mistake thinking that one can optimize badly designed and implemented SDKs once it first works).

As was replied to me, embedded developers might be the ones which might chooce C instead of C++ as another.
That's true !

With respect to all developers !
Ronnie


 

0 Likes

Originally posted by: dragonxi4amd

Also, development for real-time embedded mobile systems require powerful compilers! Sadly, current OpenCL C compiler implementations do not produce enough efficient code.

 

What do you have as a reference point for this statement? We have 20-100x gains we are getting over FPGA and CPU based solutions at the same cost. And we sure spent FAR less time on the GPU implementation than the FPGA emplementation.

If you aren't getting efficent code then clearly your developers aren't writting efficent code and don't understand the hardware they are working on. To do embedded development you have to know the hardware and alter your algorithms to meet your needs. The compiler doesn't do some magic. Developing for the GPU is similar to developing for FPGAs and embedded systems in the respect that code has to be tuned for the hardware it is running on it just has to. You can't expect a compiler to do that all for you. If your developers expect that then you need new developers or if you have code that runs fast enough without optimization I guarentee you are not using its full potential.

Originally posted by: dragonxi4amd

Some of our clients are applying the principle "less is more" and develop with their partners SDKs where programmers need to master ONE language (with C99 you need another language such as C++/C) - and SDKs which are designed right from the beginning with performance and reliability having high priority (they do not make the mistake thinking that one can optimize badly designed and implemented SDKs once it first works).

Wow! How about you insult everyone while you are at it. The language developers at Khronos, AMD, Intel, Apple, Nvidia and many others are extreamly intelligent and had the hardware in mind when they developed the OpenCL SDK. Also they continue to improve its design with each update.

What is badly designed? What is causing you such a performance bottleneck? What is this horrible problem you have found. How could your "developers" evaluate reach a conclusion on a new API in two days? They sure haven't been here at the forums asking for help like they would on any other FPGA compiler site.

"and SDKs which are designed right from the beginning with performance and reliability having high priority "

WHAT? I keep comming back to that as this is exactly what OpenCL was designed for from the beginning! While they wanted portability they didn't make it an absolute nessesity. if they did OpenCL would be very very slow which it isn't. It isn't programming around the hardware it is actually thinking about what you are doing. No one writes an algorithm meant to take full advantage of hardware, presses compile and then walks away. NO ONE. If you think that you must feel software devleopment is just langauge translation from english to c with no possible variations. If that is the case we can get a machine to do and and a monkey to press compile. It isn't and high performance computing will always have considerable code and algorithm changes to optimize for the hardware. The government understands this the scientific community understands this and any developer who works with embedded systems understands this.  That isn't to say that the compilers can't be better of course they can but they are currently being used to produce some of the largest simulations in the world and with higher efficency than most designs of that size.

I don't normally get upset by comments but you are clearly unknowledgable on OpenCL, typical embedded develop, high performance computing and appear to be here for the purpose of insulting the people working hard on OpenCL and doing a very good job. OpenCL and CUDA aren't perfect nor are the implementations but they are leaps and bounds about anythign which has been tried before.

0 Likes

Originally posted by: thesmileman

 

 

What do you have as a reference point for this statement? We have 20-100x gains we are getting over FPGA and CPU based solutions at the same cost. And we sure spent FAR less time on the GPU implementation than the FPGA emplementation.

Oh rly? 🙂

1. Recently i've designed Verilog code(for Spartan 6 with 45k logic cells) that only then 2 times slower then powefull enough GPU (NVIDIA GTX460). Remark: currently it is only Verilog model, which in optimistic case will run on 250 MHz

2. Their power consumption uncomparable. And cost of FPGA in mass production will be lower.

Ofcourse i agree that optimisation in CL code is vital. It is normal to achieve considerable gain (3 - 4 times) just rewriting properly code.

Originally posted by: thesmileman

 

I don't normally get upset by comments but you are clearly unknowledgable on OpenCL, typical embedded develop, high performance computing and appear to be here for the purpose of insulting the people working hard on OpenCL and doing a very good job. OpenCL and CUDA aren't perfect nor are the implementations but they are leaps and bounds about anythign which has been tried before.

 

Don't take it so close.

0 Likes

Originally posted by: player999  

 

Oh rly? 🙂

 

1. Recently i've designed Verilog code(for Spartan 6 with 45k logic cells) that only then 2 times slower then powefull enough GPU (NVIDIA GTX460). Remark: currently it is only Verilog model, which in optimistic case will run on 250 MHz

Originally posted by: player999
Originally posted by: thesmileman

 

2. Their power consumption uncomparable. And cost of FPGA in mass production will be lower.

Certainly for specific cases (a lot of specific cases) and for fixed designs this would be true as well. Our designs are seeing savings because we upgrade our designs often (relatively often) in the field. With GPUs we can just upgrade the 20-400 GPUs modules(per system) and do a small amount of coding. Every time we change the FPGAs it takes forever and integration takes forever. We still have FPGAs to direct data and they can't write to our GPUs fast enough and are now the bottleneck. Since we deploy in relatively small numbers and have very specific needs for FPGAs our costs are ridiculous compared to many designs. 

We use the MXM mobile GPUs and not at full speed. This drastically reduces the power consumption. In addition after optimizing every last drop we could reduce power consuption down by HUGE amounts because we didn't need the hardware to run at even close to full speed. Also the GTX460 is very very power intensive card. 

 

Originally posted by: player999

 

Don't take it so close.

 

 

Good advice!

0 Likes

Originally posted by: player999 I use pure C under GNU/Linux. Seems everything works fine. But there is enormous size of code. For instance, I've 800 lines of CL initialization code in my pattern recognition program. I used to program CUDA and it's C implementation much more compact.

 

Of course. CUDA's host code isn't C. It's translated into C by the CUDA compiler, which isn't so different from using the C++ bindings for OpenCL (which are just as compact in the new version, but lack single source, which is a benefit if you like avoiding having another compiler in your tool chain).

 

dragonxi4amd:

I think I understand what you're saying in your last post, but I still don't understand what you're asking for. Obviously the tool chain has limits in its optimisation ability, though it's getting pretty good these days it's clearly not going to be comparable with ICC on x86 - but there isn't another compiler generating AMD IL so there's nothing to compare against. Your C99 comments are still going over my head... are you asking for single source? We could do single source in theory, but then you wouldn't be using C as I just pointed out to player999. The C++ bindings come close to that without going outside a language standard, but that would mean people using C++.

0 Likes

Hi Guys,

My comments:
[C1/4] I have had no intension to insult anyone in this forum!

[C2/4] Being developer myself I respect other developers!

[C3/4] Our hardware and software guys develop together new products!

[C4/4] The portfolio of our software and hardware is continuously evolving!
Modelling: UML since 1990, Languages: C-language since 1981, ASM since 1978 (still being used)
C++ since 1988, Java since 2000, Python since 2009, C99 since 2011 ...
OS: embedded real-time since 1979, not real-time: CP/M, MS-DOS ... Windows, Linux since introduced ...  
Hardware: Intel, AMD, ARM, Sun, Honeywell, TI ... both client/server NET: distributed ones since 1982

[C5] When c-language was developed c_ and C_ prefixes were not instroduced with language,
no cplus_ & C_PLUS with C++ nor java_ & JAVA_ prefixes with Java -
whereas C99 language came with cl_ and CL_ prefixes!    

My questions:
[Q1/3] Question to AMD: 
Does AMD already use and or has plans to use Clang and LLVM technologies ?

Ref#1: PLAYER999 in this forum
"But there is enormous size of code.
For instance, I've 800 lines of CL initialization code in my pattern recognition program.
I used to program CUDA and it's C implementation much more compact"

Ref#2: NVIDIA
"NVIDIA OpenCL runtime compiler (Clang + LLVM)"
>> http://llvm.org/Users.html

Ref#3:
"Clang is an "LLVM native" C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles, 
extremely useful error and warning messages and to provide a platform for building great source level tools"
>> http://clang.llvm.org/

Ref#4:
"The LLVM Core libraries provide a modern source- and target-independent optimizer,
along with code generation support for many popular CPUs (as well as some less common ones!)"
>> http://llvm.org/

[Q2/3] Question about OpenCL API - could OpenCL functions return more detailed info about errors ? 
A) resources function needed ?
B) resources function had available for request ?
C) how did function ended up to that conclusion ?

Example 1: clEnqueueNDRangeKernel
A) CL_OUT_OF_RESOURCES if there is a failure to queue the execution instance of kernel on the command-queue
B) CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device 
* which one was it - failure to queue the execution instance of kernel or failure to allocate resources !?
* which kind of failure !?
* how much did implementation require !?
* how did implementation calculate it !?
 
Example 2: clCreateProgramWithBinary
A) CL_OUT_OF_RESOURCES    if there is a failure to allocate resources required by the OpenCL implementation on the device
* which kind of failure !?
* how much did implementation require !?
* how did implementation calculate it !?

B) CL_OUT_OF_HOST_MEMORY  if there is a failure to allocate resources required by the OpenCL implementation on the host
* which kind of failure !?
* how much did implementation require !?
* how did implementation calculate it !?

[Q3/3] Question to AMD about OpenCL implementation - do functions retry in failures ?

 

 

0 Likes

[Q1/3] Question to AMD:  

Does AMD already use and or has plans to use Clang and LLVM technologies ?



 

The compiler is LLVM based. It is not Clang based. I couldn't comment on future plans. Whether we use Clang and LLVM or not is entirely irrelevant to the question of whether we start using a set of non-standard language extensions to simplify the API. I much prefer the C++ bindings as the solution to that because they are standard. NVIDIA's approach has value as well, but it is simply wrong to say their API is C. Their API is CUDARUNTIMEAPI, which looks like C with extensions and is translated to C by their compiler (therein lies the advantage of it to someone who doesn't want to use C++).

NVIDIA has only relatively recently moved to LLVM as a CUDA compiler. Before that they used Open64 and still had the same runtime API approach. There are many OpenCL users who do not want that runtime API approach (which is why NVIDIA also supports the driver API). While it is useful to someone who is just learning CUDA to an experienced developer it adds little because by that time most people have already wrapped the API calls in their own helper library.

 

[Q2/3] Question about OpenCL API - could OpenCL functions return more detailed info about errors ?  


Probably. Error reporting is always a difficult problem and to do this properly the standard would need adequate info reporting API calls to assist. Off the top of my head I don't know if it does, you could check the 1.2 spec.

0 Likes

Thanks Lee for your constructive feedbacks !

 

Please, consider passing to AMD guys in OpenCL forum request from my developers to get more information about error situation; even in the latest spec v1.2 programmer gets only int as return from functions.

We also suggest pipe mechanism for control and command:

+ keeps existing int return design as it is

+ does not affect queue communication

+ pipes / named pipes are familiar to programmers - benefits well known!

Program which detects / believes it has hit a problem/exception should be the one to report in more detail what was the problem / how did it conclude/calculated / make decision.

Received information from the program could not only be used to retry with right parameters but also to tune/optimize both CPU and GPU programs.

Please, note by giving feedback I have NOT / will NOT mean that AMD developers have not done good job - they have -  instead I try just to give some ideas. Handling errors / exceptions is demanding - filtered/meaningful trace helps to detect bugs and to tune performance.

~ Ronnie 

 

 

 

 

 

 

 

 

0 Likes

i have one question. when you ask

A) CL_OUT_OF_RESOURCES    if there is a failure to allocate resources required by the OpenCL implementation on the device
* which kind of failure !?
* how much did implementation require !?
* how did implementation calculate it !?

what will you do with this information in your program? it can be helpfull in debug and tweaking of program. but it is hardly usable in production enviroment. also you must consider that this type of info will be higly implementation dependent so you can't easily design error reporting for all.

 

0 Likes

Hi nou,

In production environment programs always try to handle errors and exceptions and if possible adjust themselves to limited resources and try again.

We don't trust "debug it - tune it - move to production"- phasing.

Instead we embed tracing/recovering/re-trials into production versions to be able to offer remote diagnostics and to offer quick fixes and guarantee.

Also, due to complexity of multi-core/multiprocess/multi-threading  

we use OpenCL tools to produce visual 3D-tracing and diagnostics about systems.

best regards

Ronnie

  

0 Likes

Hi nou,

In addition to standard (iterative) specify --> design --> implement --> module test --> system test --> deploy we use phases

system test --> install pilots to selected client sites --> move to productions (once pilots have been approved) --> move to production for mass production (with remote diagnostics embedded)

Environments at client site may differ and systems have to adjust adapt dynamically. Therefore just debugging in our own lab is not enough in our case whereas to others it might be ok.

Best regards

Ronnie

 

0 Likes

well but there is still main problem about implementation dependent error reporting. IMHO you ask for something like OpenGL debug context where you get more verbose error reporting. but again this is usefull for debug enviroment as errors are reported in text form and are implentation dependent. my main point is that desing verbose error reporting is quite imposible or it will become quicly cumbersome.

0 Likes
notzed
Challenger

Originally posted by: dragonxi4amd Hi All,

 

(1) Is it possible to develop C99 OpenCL apps using just C language and

 

C99 language instead of C++ and C99 ?

The OpenCL specification defines a C API.  So everything already does this, and any other language interface merely goes through the C API.

I think that fully answers your whole post, but since i'm on holiday i'm going to bite further ...

 

(2) If yes, which C-language and for which OS ?

All of the ones that support OpenCL must support the C API.  Any C compiler which executes on those platforms and supports the platform standard link library conventions would work by definition.

 

(3) Has anyone tried C & C99 combination ?

I did some small test code, but it's not the opencl api that's a pain to use, it's all the other crap one has to deal with to display and use ones results.  I personally use java for that.

 

For those who wander why not to use C++:

 

A)  C++ is not needed in this project

Well I hate C++ too, you don't need to justify any such decision or start a religous language war over it (particularly one that doesn't seem to have any justification anyway since the api and language is already C).  The language choice is up to you.

The fact that opencl's API is C, and it's kernel language is C is a big reason I was interested in it in the first place ...

 

B)  C-language offers all the features needed with C99

 

C) Although OpenCL API documentation refers to objects, please not that API is about FUNCTIONS - there no classes to create objects as in OOP! 

I don't think you have a good understanding of what OO means.  It isn't related to the language at all (as long as they support structs or equivalent), its just a design methodolody and implementation detail.  My first OO code was in assembly language.

Also, there are many 'objects' which are not object oriented language objects: a phone, a mouse, a keyboard, a c struct

As soon as you put anything in a struct and pass it around to different functions you've created an object oriented api, no matter what you want call it.

 

D)  C++ introduces extra delay compared to C-language

 

* this is our client's requirement based on their own experiences, not to be argued and it's true that OOP has its price to be paid i.e. lost speed

Well if it's a client requirement, why try to justify it with some entirely made-up and ill-informed reasons?

FWIW The overhead of the api calls wont be noticeable, it isn't even noticeable on java.  Not that C++ doesn't seem to bring along a huge pile of other baggage you might not want, and if used incorrectly seems to generate some poor code.

If you're worrying about delay you're doing it wrong and misunderstand the API.  It's not a real-time API, it's a client/server producer/queue/consumer model (more or less).  Delays can be almost entirely hidden, but they can't be removed.

If the processing going on is insigificant compared to the inovocation overheads then there's no reason to be using opencl in the first place.

 

E) response times already are challenge with OpenCl

For someone who doesn't seem to know that opencl is an entirely C API ... this seems a fairly uninformed comment.

Again, latency ('response times', 'delay') can usually be hidden.  It's also unavoidable and based on physical laws of nature which cannot be broken: you just have to deal with it.

 

F) OpenCL specs V1.2 has some new features which (if rightly implemented) will improve speed

 

- however one may have to wait v1.2 implementation not to be available before summer 2012 !?

Summer 2012 is 15 days away?  Some big wait?

Anyway - so what?  OpenCL is still new and evolving, and it takes time to design and implement changes.  There isn't any practical alternative for what it does, so like anything new you just have to wait. 1.2 is only an incremental update anyway, it's not going to radically alter any application or it's performance.

 

 

0 Likes