This is sort of note to myself. Here is a good article that I found while ago QPI Quiescence from Dave Dice. Again this makes me feel multi-processor or multi-core CPU like a networked system. See this article. Also the article mentioned Atomic CAS is more than 3 times expensive than regular CAS.

This slide summarized and visualized what is Quiescent Consistency very well.

“Method calls separated by a period of quiescence should appear to take effect in their real-time order.” (Herlihy/Shavit 2008)

QPI Quiescence is interesting way to create asymmetric synchronization which is basically optimization for fast-path thread. Idea is eliminating any #LOCK signal from fast-path thread as possible. And the slow-path thread still can use #LOCK signal and also creates QPI Quiescence.

Windows provides FlushProcessWriteBuffers API which is in fact a mechanism to create Quiescent Consistency in muti-processor system. It also recognizes process CPU affinity.

Flushes the write queue of each processor that is running a thread of the current process. The function generates an interprocessor interrupt (IPI) to all processors that are part of the current process affinity. It guarantees the visibility of write operations performed on one processor to the other processors.

Dmitri’s timing result is promising. I even wrote a Reader/Writer lock which is using FlushProcessWriteBuffers API before. I’d like to improve the R/W lock with perf comparison with other locks.