cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

egonotto
Journeyman III

Concurrent kernels

How can i run more concurrent kernels parallel

Hi,

i want run 20 "Monte Carlo Simulation" parallel on an HD5870 Card.

Because the simulation is complex, with many branches and loops and random behavior, i think one execution pro one stream processor is ok.

As the HD5870 Card has 20 stream processors, i got 20 parallel runs.

I hear, that the new 5870 Chip has the ability to run different kernels at the same time.

But how can i do this.

Exist a demo to run several kernel at the same time?

Sorry for my bat englisch.

Thanks in advanced

egonotto

0 Likes
15 Replies
wgbljl
Journeyman III

Originally posted by: egonotto I hear, that the new 5870 Chip has the ability to run different kernels at the same time.


Hi, egonotto. Where did you hear that?

Concurrent executing kernels is one key feature of Nvidia's 'Fermi', while AMD seems does not mention it.

I am one of the hunters for this new feature.

0 Likes

Hi,

i have 2 sources.

In the german ct  (http://www.heise.de/ct/artikel/Fermis-goldene-Regel-811487.html) there is in an article over the new Fermi an sentence:

"Bei ATIs RV8xx soll nach den Angaben von AMDs Direktor für Stream Computing, Patricia Harrell, ebenfalls die parallele Ausführung möglich sein, in den bislang veröffentlichten Unterlagen findet man zum Thema „concurrent kernels“ allerdings kein Wort, vielleicht ist das Feature bei ATI einfach selbstverständlich"

The information in ct is from AMD's Direktor for Stream Computing Patricia Harrel. 

It should be a good soure.

The other is in an article from internet olso about Fermi  (http://techreport.com/articles.x/17670/2) .

There is a sentence:

"(Incidentally, AMD tells us its Cypress chip can also run multiple kernels concurrently on its different SIMDs. In fact, different kernels can be interleaved on one SIMD.) "

yours sincerely

egonotto

0 Likes

I think it exists if using OpenCL, but I forgot in which documents...

0 Likes

It's actually offical that AMD's 5800 cards support concurrent kernel execution:

http://www.hardocp.com/image.html?image=MTI1NTQ3MDM3NXlvcUhUU1k4TzlfMV8xN19sLmpwZw==

I find it interesting though that AMD didn't step up to respond to this question. Why would they be hush hush about a superior feature their product has???

BTW, AFAIK OpenCL makes no assumption to the number of kernels executed concurrently on a device. If the command-queue is in the out-of-order execution mode then the runtime is free to issue multiple kernel commands at the same time (suppose they are not waiting for some event).

AMD was quiet about this (concurrent kernel) feature probably because their CAL driver doesn't support it yet. However, I believe it is necessary for things like Eyefinity to work.

 

0 Likes

I remember now, it exists in DirectX 11 as better multi-threading for graphics rendering

Still has to check it in DirectX Compute though, all DirectX interface is a whole different universe to me...

0 Likes

edward,
I had not responded because I was attempting to get confirmation on this from some of our hardware engineers, but they have not responded back. But it looks like you already found your answer.
0 Likes

so any updates? am very interested on how the r8xx concurrently processes kernels, as a "YES" in a slide isnt enough you know

I mean like how many kernels per SIMD (IIRC, fermi does 2), and if the programmer can control such behavior, maybe like egonotto's way, or if there is a more optimized way? if any.

sorry for being a "??????", but with all that GFLOPS blazing, one gets very curious. Speaking of curiosity, when can we expect a R8XX ISA reference?

regards

0 Likes

hello Micah.

you mentioned you were waiting for confirmation from hardware engineers. any word? I think quite a few of us are wondering not only if 58xx cards, but also - say - 57xx cards, or any other r8xx gpus, support concurrent kernel execution. this would make ATI gpus very attractive as opposed to fermi, especially if even the lower end 57xx cards support it. this is a major selling point for ATI - information on support for concurrent kernel execution - should be easily accessible to developers/system builders 🙂

regards,

- Tom

(edited because i misread the name of the Micah)

0 Likes

been going through the R7xx ISA, and i suppose its 4 for R8xx. since R7xx has odd and even wavefronts (so 2 on the fly). and RV870 looks like 2 RV770 sticked together, so following the "having 2 of everything in cypress" theme, IMHO suppose (again) the magic number is 4.

the only case its still 2 that the thread scheduler was excluded from the X2 theme. 

 

kinda off-topic: its becoming quite amusing to dig for facts in ATi's ISA documents! 

0 Likes

Tomhammo,
The hardware is capable of doing so, so is something that we are looking into how to properly utilize. So I don't have much to update at this time.
0 Likes

Thanks for the info Micah,

I guess leading on from that - if I were to recommend purchasing of 57xx vs 58xx gpu's for opencl development - to a customer interested in concurrent kernel execution (due to small kernels) - would there be a difference between support for concurrent kernels between 57xx and 58xx gpus going forward? i.e. if support is enabled in 58xx (via out-of-order command queues in opencl for example) would it also be enabled in 57xx? do 57xx/58xx support concurrent execution in the current version of the opencl drivers? thanks - Tom

0 Likes

Tom,
The major difference between the 57XX and 58XX from a compute feature perspective is 58XX has double precision and 57XX only has single precision. For concurrent execution, when one chip gets the feature, they all will get it.
0 Likes

Hi Micah,

We have been working with 5850's, with very good OpenCL results. Though a big question still remains regarding concurrent kernel execution... is there a (rough) estimate of when this feature will be available ?

Looking at getting a Fermi board to see whether it is more, or less, cost-effective that 58xx cards - the extra cache would probably not provide much gain for our needs, but concurrent kernel execution might tip it over the edge (e.g. be able to hide the latency of a memory-bound kernel by running it in parallel to a processing-bound kernel).

would prefer to keep our investment in ATI-optimised code if we see concurrent-kernel execution down the track

thank you and regards,

- Tom

0 Likes

bump.

Any news on Concurrent Kernel Execution? A rough due date, problems holding it back, anything?

This would definately be a huge plus, if CKE were "activated" on ATI cards. (Since it is an issue of driver support)

Without CKE, task parallel computation [queue.enqueueTask() - kernels with a workgroup of size 1] have ZERO performance improvement, by running in OpenCL. Since no 2 tasks can run in parallel, and must be run 1 after the other, even if the OpenCL device has more than enough resources to run both kernels.

thanks,
-Chris

0 Likes

Sorry for reviving the thread. But concurrent kernels also very important for me so i join asking the question: when is it available?

 

Is currently work getting done to support concurrent kernels?

 

Two concurrent kernels would already make life a lot easier and would make programming life a lot easier at AMD gpu's for most problems!

 

Sometimes 2 concurrent is easier than n

 

Thanks,

Vincent

 

 

Originally posted by: sir.um bump.

 

Any news on Concurrent Kernel Execution? A rough due date, problems holding it back, anything?

 

This would definately be a huge plus, if CKE were "activated" on ATI cards. (Since it is an issue of driver support)

 

Without CKE, task parallel computation [queue.enqueueTask() - kernels with a workgroup of size 1] have ZERO performance improvement, by running in OpenCL. Since no 2 tasks can run in parallel, and must be run 1 after the other, even if the OpenCL device has more than enough resources to run both kernels.

 

thanks, -Chris

 

0 Likes