diff mbox series

[09/13] x86/mm/pae: Use WRITE_ONCE()

Message ID 20221022114425.038102604@infradead.org (mailing list archive)
State New
Headers show
Series Clean up pmd_get_atomic() and i386-PAE | expand

Commit Message

Peter Zijlstra Oct. 22, 2022, 11:14 a.m. UTC
Disallow write-tearing, that would be really unfortunate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/pgtable-3level.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Linus Torvalds Oct. 22, 2022, 5:42 p.m. UTC | #1
On Sat, Oct 22, 2022 at 4:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
>  static inline void native_set_pte(pte_t *ptep, pte_t pte)
>  {
> -       ptep->pte_high = pte.pte_high;
> +       WRITE_ONCE(ptep->pte_high, pte.pte_high);
>         smp_wmb();
> -       ptep->pte_low = pte.pte_low;
> +       WRITE_ONCE(ptep->pte_low, pte.pte_low);

With this, the smp_wmb() should just go away too. It was really only
ever there as a compiler barrier.

Two WRITE_ONCE() statements are inherently ordered for the compiler
(due to volatile rules), and x86 doesn't re-order writes.

It's not a big deal, since smp_wmb() is just a barrier() on x86-64
anyway, but it might make some improvement to code generation to
remove it, and the smp_wmb() really isn't adding anything.

If somebody likes the smp_wmb() as a comment, I think it would be
better to actually _make_ it a comment, and have these functions turn
into just

  /* Force ordered word-sized writes, set low word with present bit last */
  static inline void native_set_pte(pte_t *ptep, pte_t pte)
  {
        WRITE_ONCE(ptep->pte_high, pte.pte_high);
        WRITE_ONCE(ptep->pte_low, pte.pte_low);
  }

or similar. I think that kind of one-liner comment is much more
informative than a "smp_wmb()".

Or do we already have a comment elsewhere about why the ordering is
important (and how *clearing* clears the low word with the present bit
first, but setting a *new* entry sets the high word first so that the
64-bit entry is complete when the present bit is set?)

                 Linus
Peter Zijlstra Oct. 24, 2022, 10:21 a.m. UTC | #2
On Sat, Oct 22, 2022 at 10:42:52AM -0700, Linus Torvalds wrote:
> On Sat, Oct 22, 2022 at 4:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> >  static inline void native_set_pte(pte_t *ptep, pte_t pte)
> >  {
> > -       ptep->pte_high = pte.pte_high;
> > +       WRITE_ONCE(ptep->pte_high, pte.pte_high);
> >         smp_wmb();
> > -       ptep->pte_low = pte.pte_low;
> > +       WRITE_ONCE(ptep->pte_low, pte.pte_low);
> 
> With this, the smp_wmb() should just go away too. It was really only
> ever there as a compiler barrier.

Right, however I find it easier to reason about this with the smp_wmb()
there, esp. since the counterpart is in generic code and (must) carries
those smp_rmb()s.

Still, I can take them out if you prefer.

> Or do we already have a comment elsewhere about why the ordering is
> important (and how *clearing* clears the low word with the present bit
> first, but setting a *new* entry sets the high word first so that the
> 64-bit entry is complete when the present bit is set?)

There's a comment in include/linux/pgtable.h near ptep_get_lockless().

Now; I've been on the fence about making those READ_ONCE(), I think
KCSAN would want that, but I think the code is correct without them,
even if the loads get torn, we rely on the equality of the first and
third load and the barriers then guarantee the second load is coherent.

OTOH, if the stores (this patch) go funny and get torn bad things can
happen, imagine it writing the byte with the present bit in first and
then the other bytes (because the compile is an evil bastard and wants a
giggle).
diff mbox series

Patch

--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -27,9 +27,9 @@ 
  */
 static inline void native_set_pte(pte_t *ptep, pte_t pte)
 {
-	ptep->pte_high = pte.pte_high;
+	WRITE_ONCE(ptep->pte_high, pte.pte_high);
 	smp_wmb();
-	ptep->pte_low = pte.pte_low;
+	WRITE_ONCE(ptep->pte_low, pte.pte_low);
 }
 
 static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
@@ -58,16 +58,16 @@  static inline void native_set_pud(pud_t
 static inline void native_pte_clear(struct mm_struct *mm, unsigned long addr,
 				    pte_t *ptep)
 {
-	ptep->pte_low = 0;
+	WRITE_ONCE(ptep->pte_low, 0);
 	smp_wmb();
-	ptep->pte_high = 0;
+	WRITE_ONCE(ptep->pte_high, 0);
 }
 
 static inline void native_pmd_clear(pmd_t *pmdp)
 {
-	pmdp->pmd_low = 0;
+	WRITE_ONCE(pmdp->pmd_low, 0);
 	smp_wmb();
-	pmdp->pmd_high = 0;
+	WRITE_ONCE(pmdp->pmd_high, 0);
 }
 
 static inline void native_pud_clear(pud_t *pudp)