cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Meteorhead
Challenger

async_work_group_copy

Hi!

My quick question is: I was told earlier that prefetch is currently not supported. Is it the same with async_work_group_copy? If yes, is it a HW limitation that asynchronous commands cannot be used or is it simply an implementation issue, that it is not yet supported.

Cheers,

Máté

0 Likes
9 Replies
nou
Exemplar

 you can use asyn_copy at worst it will be not asynchronous.

0 Likes

Although it is not entirely relevant to this topic (although it might even be)...

It has been rumored, that global sync might become available with 69xx cards. What is the "Global synchronisation registers" block mean on the architecture block diagram? It was already present on the Cypress diagrams.

0 Likes

What is async_work_group_copy is, in your context?

0 Likes

async_work_group_copy is described in the spec pdf. I know global syncing has nothing to do with work_group_copy, but as a matter of fact I wouldn't be surpised if infact on the hardware events used to query async commands would be stored in a location visible to all SIMD engines.

I have no clue what those Global sync registers are used for, that's why I asked. And as I have said before, I know it has little to do with the topic name, but it might turn out that async commands and global sync would use same HW element.  That's what I tried to refer to.

0 Likes

Let me ask one thing about this issue for clarification:

Is out-of-order command queue only a host thread issue? (In my mind it requires device occupation query every time a kernel "wave" finishes and host thread decides which thread to issue from the command queue based upon event dependency and resource usage)

Are 5xxx and 6xxx cards capable of async operations at all? (async-work-group-copy and prefetch in particular) Both would be very useful for algorithms that can intelligently cache ahead. I was told prefetch compiles into nop() on GPUs (I do not know about CPUs), but is this lack of support or HW capability?

0 Likes

 

Are 5xxx and 6xxx cards capable of async operations at all? (async-work-group-copy and prefetch in particular) Both would be very useful for algorithms that can intelligently cache ahead. I was told prefetch compiles into nop() on GPUs (I do not know about CPUs), but is this lack of support or HW capability?

async_work_group_copy should work for Evergreen and NI cards. Do you face any issue while using them?

AFAIK out of order command queue execution is still not supported.

0 Likes

A member of our group experimented with async_work-group_copy on NV cards, and he stated it brought zero speedup. The problem I was coding could have made use prefetch, since data required for next iteration step was known ahead, therefore while one iteration calculates, data for the second could be loaded.

I would try async_work-group_copy for caching too (using shared memory as a cache of VRAM), but that would need serious alteration of the code, and I would hate to do it in vain. If noone has such experiences and could tell me in advance whether it works async or not, all that remains is for me to find the time to test it.

But if AWGC should work really in an async matter, that would be good news indeed, although prefetch would be the best solution, as it does not require explicit memory allocation, just a simple function call, and future memory load will be faster.

0 Likes

Meteorhead,

I guess NV and AMD implementation can give different results. Can you ask you group member to test his code on AMD device. It would be nice if you can post that code.

0 Likes

meteorhead,

have you done those tests?

0 Likes