Archives Discussions

godsic · ‎04-23-2009

developers suggestions for ASF

I did not find ASF discussion branch on AMD forum, so I open it. I think that developers can help AMD modify ASF specification to deliver maximum efficiency from it. So, guys please post suggestions for AMD on this topic!

remark · ‎05-02-2009

Hi!

I've posted ASF announce to comp.arch/comp.programming.threads:

http://groups.google.com/group/comp.programming.threads/tree/browse_frm/thread/c1c6c6327aed79b6

And there are some discussions going which may be useful to AMD.

Below I will summarize my thought on ASF.

First of all a bit of formal hair-splitting. In 1.1.1 you say:

---------------------------------------------------

In more detail, ASF guarantees forward progress for speculative regions, provided the following
conditions hold:
• The speculative region does not exceed ASF's guaranteed capacity: up to four cacheable
memory regions with a size and alignment of 64 bytes. (See Section 1.6 for details.)
• No interrupt or exception is delivered while executing the speculative region.
• There are no conflicting memory accesses from other CPUs.

---------------------------------------------------

Then in 6.1.1. you say:

---------------------------------------------------

ASF automatically aborts a speculative region when one of the following conditions occurs:

....
• Other implementation-specific conditions

---------------------------------------------------

IMHO These statements are a bit incosistent regarding "Other implementation-specific conditions". I.e. is ASF guarantee forward-progress if conditions specified in 1.1.1. hold, or ASF may still abort transaction in implementation-specific conditions?

remark · ‎05-02-2009

Table 6.2.1. CPU B holds cache line in protected owned state, CPU A makes *non* transactional read. You postulate that CPU B must abort transaction.

Isn't it possible to create ASF implementation which will satisfy non transactional read on CPU A with old value (which was actual before transaction on CPU B begins), thus CPU B may not abort. The illusion of atomicity in this case is that read on CPU A just happens-before transaction on CPU B. Such implementation must use special store buffer for speculative stores.

If such implementation possible, then IMHO it's better to say that "CPU B may or may not abort (implementation dependent)".

remark · ‎05-02-2009

Originally posted by: remark Table 6.2.1. CPU B holds cache line in protected owned state, CPU A makes *non* transactional read. You postulate that CPU B must abort transaction.

Isn't it possible to create ASF implementation which will satisfy non transactional read on CPU A with old value (which was actual before transaction on CPU B begins), thus CPU B may not abort. The illusion of atomicity in this case is that read on CPU A just happens-before transaction on CPU B. Such implementation must use special store buffer for speculative stores.

If such implementation possible, then IMHO it's better to say that "CPU B may or may not abort (implementation dependent)".

I have to mention that Mitch Alsup said that I am loosing an "illusion of atomicity" in this case. But I am not quite see where and why. Illusion of atomicity will broken provided that CPU A makes transactional read (transactions may end neither happens before each other), or write (either transactional or not).

edward_yang · ‎05-02-2009

Isn't it possible to create ASF implementation which will satisfy non transactional read on CPU A with old value (which was actual before transaction on CPU B begins), thus CPU B may not abort. The illusion of atomicity in this case is that read on CPU A just happens-before transaction on CPU B.

I guess in that case the cache line wouldn't have been in protected owned state by CPU B?

I mean if the illusion of atomicity were that CPU A read the old value right before CPU B's transaction, then the assmption that CPU B's transaction was performed on protected owned data would've been false?

remark · ‎05-03-2009

Originally posted by: edward_yang
Isn't it possible to create ASF implementation which will satisfy non transactional read on CPU A with old value (which was actual before transaction on CPU B begins), thus CPU B may not abort. The illusion of atomicity in this case is that read on CPU A just happens-before transaction on CPU B.

I guess in that case the cache line wouldn't have been in protected owned state by CPU B?

I mean if the illusion of atomicity were that CPU A read the old value right before CPU B's transaction, then the assmption that CPU B's transaction was performed on protected owned data would've been false?

From the point of view of illusion of atomicity - yes, you are right, the illusion is that CPU B just not yet started it's transaction.

However in practice CPU B may hold the cache-line in any status since it's not observable by the programmer.

Here is 2 levels: (1) actual implementation, (2) illusion of atomicity for a programmer.

remark · ‎05-02-2009

You require transactional reads/writes to be made with explicit LOCK prefix.

This makes it impossible to use ASF for transactinal lock ellision on existing code-bases (I guess you are aware of Sun's work on HTM). I.e. to replace:

LOCK(x);

// some C/C++ code

UNLOCK(x);

with:

START_TRX_WITH_FALLBACK_TO_MUTEX(x);

// some C/C++ code

COMMIT_TRX_WITH_FALLBACK_TO_MUTEX(x);

You may see David Dice's somments on this here:

http://groups.google.com/group/comp.programming.threads/msg/4df3824adc92926a

On the other hand ASF's fine-grained control over read-,write-sets is also very nice, and something I don't want to give up.

So what I am proposing is to rename LOCK MOV to MOV, and MOV to UNLOCK MOV. I.e. all MOVs inside of a transaction will treated as transactional, non-transactional reads/writes may be expressed with UNLOCK MOV.

You may see my brief rationale for this here:

http://groups.google.com/group/comp.programming.threads/msg/1b2dfbbeda9a7fdd

Basically it will be a win-win solution, allowing both fine-grained control and reuse of existing code-bases. Possibility of usage of C/C++ code inside of transactions is at least nice.

remark · ‎05-02-2009

Once again table 6.2.1. I do not see much value in exact specification of contention-resolution protocol. I guess software developers will differentiate implementation based on speed of execution of their code, and not care too much whether straight aborts, delayed ACKs, NACKs are used. So maybe it's better to remove table 6.2.1. at all, and just say that hardware will do it's best to provide maximum efficiency, if you see special abort codes or too high abort rate then apply backoff, fallback to mutexes or whatever. This will open more possibilities for future ASF implementations by AMD or other vendors.

Table 6.2.1. may be moved to recommendational appendix and/or to software developers manuals for particular processor families.