## THE SEMANTICS OF TRANSACTIONS AND WEAK MEMORY IN X86, POWER, ARM, AND C++ Nathan Chong Arm Ltd. **Tyler Sorensen**Princeton University John Wickerson Imperial College London **USENIX Annual Technical Conference, 11 July 2019** ## WEAK MEMORY ``` MOV [x] 1 MOV [y] 1 MOV r0 [y] MOV r1 [x] ``` $$r0=0$$ SC x86 ### WEAK MEMORY IS HARD! - x86 proved tricky to formalise correctly [Sarkar et al., POPL'09; Owens et al., TPHOLs'09] - Bug found in deployed "Power 5" processors [Alglave et al., CAV'10] - C++ specification did not guarantee its own key property [Batty et al., POPL'11] - Routine compiler optimisations are invalid under Java and C++ memory models [Sevcik, PLDI'11; Vafeiadis et al. POPL'15] - Behaviour of NVIDIA graphics processors contradicted NVIDIA's programming guide [Alglave et al., ASPLOS'15] ### MODELLING WEAK MEMORY ## TRANSACTIONAL MEMORY • X86: **XBEGIN** **XEND** • Power: tbegin tend • ARM: tstart tcommit • C++: atomic { } Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy Digital Equipment Corporation Cambridge Research Laboratory Cambridge MA 02139 herlihy@crl.dec.com ### **Abstract** A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques based on mutual exclu-Transcational memory allows programmers to de- J. Eliot B. Moss Dept. of Computer Science University of Massachusetts Amherst, MA 01003 moss@cs.umass.edu structures avoid common problems : ventional locking techniques in highly - Priority inversion occurs when cess is preempted while holdi higher-priority processes. - Convoying occurs when a process scheduled, perhaps by exhaust tum, by a page fault, or by som When such an interruption oc pable of running may be unal - Deadlock can occur if proce same set of objects in diff avoidance can be awkward it tiple data objects, particula in advance. ## WEAK MEMORY + TM = ? ``` XBEGIN XBEGIN MOV [x] 1 MOV [y] 1 MOV r0 [y] MOV r1 [x] XEND XEND ``` $$r0=0$$ $$r0=0$$ SC x86 ### WEAK MEMORY + TM = ? $$r0=0$$ $$r0=0$$ $$r1=0$$ transactional SC SC x86 ## BUILDING OUR MODELS #### x86: #### **Power:** ``` acyclic(po_{loc} \cup com) (COHERENCE) empty(rmw \cap (fr_e; co_e)) (RMWIsol) acyclic(hb) (ORDER) where ppo = (preserved program order, elided) tfence = po \cap ((\neg stxn; stxn) \cup (stxn; \neg stxn)) fence = sync \cup tfence \cup (lwsync \setminus (W \times R)) ihb = ppo \cup fence thb = (rf_e \cup ((fr_e \cup co_e)^*; ihb))^*; (fr_e \cup co_e)^*; rf_e^? hb = (rf_e^?; ihb; rf_e^?) \cup weaklift(thb, stxn) acyclic(co \cup prop) (Propagation) where efence = rf_e^?; fence; rf_e^? prop_1 = [W]; efence; hb^*; [W] prop_2 = com_e^*; efence^*; hb^*; (sync \cup tfence); hb^* tprop_1 = rf_e; stxn; [W] tprop_2 = stxn; rf_e prop = prop_1 \cup prop_2 \cup tprop_1 \cup tprop_2 irreflexive(fr_e; prop; hb^*) (OBSERVATION) acyclic(stronglift(com, stxn)) (STRONGISOL) (TXNORDER) acyclic(stronglift(hb, stxn)) (TXNCANCELSRMW) emptv(rmw \cap tfence^*) ``` ### **ARM:** ``` acyclic(po_{loc} \cup com) (COHERENCE) acvclic(ob) (ORDER) where dob = (order imposed by dependencies, elided) aob = (order imposed by atomic RMWs, elided) bob = (order imposed by barriers, elided) tfence = po \cap ((\neg stxn; stxn) \cup (stxn; \neg stxn)) ob = com_e \cup dob \cup aob \cup bob \cup tfence empty(rmw \cap (fr_e; co_e)) (RMWIsoL) acyclic(stronglift(com, stxn)) (STRONGISOL) acyclic(stronglift(ob, stxn)) (TXNORDER) empty(rmw \cap tfence^*) (TXNCANCELSRMW) ``` #### C++: ``` irreflexive(hb; com*) (НвСом) where sw = (synchronises-with, elided) ecom = com \cup (co; rf) tsw = weaklift(ecom, stxn) hb = (sw \cup tsw \cup po)^+ empty(rmw \cap (fr_e; co_e)) (RMWIsoL) acyclic(po \cup rf) (NoThinAir) (SeoCsT) acvclic(psc) where psc = (constraints on SC events, elided) empty(cnf \setminus Ato^2 \setminus (hb \cup hb^{-1})) (NoRace) where cnf = ((W \times W) \cup (R \times W) \cup (W \times R)) \cap sloc \setminus id ``` ## VALIDATING OUR MODELS - 1. Consult architecture manuals. - 2. Interview engineers. - 3. Check models have reasonable mathematical properties (e.g. adding/extending/coalescing transactions is safe). - 4. Check that models validate existing compiler mappings. - 5. Generate conformance tests and run them on hardware. ## VALIDATING OUR MODELS Behaviours that must be forbidden ## VALIDATING OUR MODELS Behaviours that should be allowed # USING OUR MODELS ``` Loop: ldaxr W2,[M] cbnz W2, Loop mov W3,#1 stxr W4, W3, [M] lock() cbnz W4, Loop mov W7,#1 ldr W5,[X] add W5, W5, #2 str W7,[X] str W5,[X] unlock() stlr WZR, [M] ``` ``` Loop: ldaxr W2, [M] tstart cbnz W2, Loop ldr W6,[M] mov W3,#1 cbz W6,L1 stxr W4,W3,[M] tcancel L1: cbnz W4, Loop ldr W5,[X] mov W7, #1 add W5, W5, #2 str W7,[X] tcommit str W5,[X] stlr WZR, [M] ``` ## CONCLUSION - Weak memory is pervasive, and transactional memory is entering the mainstream. - We have designed and validated formal models of how these features interact in x86, Power, ARM, and C++. - Weak memory + transactions + lock elision = tricky! ## THE SEMANTICS OF TRANSACTIONS AND WEAK MEMORY IN X86, POWER, ARM, AND C++ Nathan Chong Arm Ltd. **Tyler Sorensen**Princeton University John Wickerson Imperial College London **USENIX Annual Technical Conference, 11 July 2019**