diff mbox series

parisc: Fix extraction of hash lock bits in syscall.S

Message ID df51e873-4576-d4c2-7d86-b607cbb714b4@bell.net (mailing list archive)
State Accepted, archived
Headers show
Series parisc: Fix extraction of hash lock bits in syscall.S | expand

Commit Message

John David Anglin Nov. 18, 2021, 5:03 p.m. UTC
The extru instruction leaves the most significant 32 bits of the target register in an undefined
state on PA 2.0 systems.  If any of these bits are nonzero, this will break the calculation of the
lock pointer.

Fix by using extrd,u instruction on 64-bit kernels.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
---

Comments

Helge Deller Nov. 18, 2021, 7:24 p.m. UTC | #1
On 11/18/21 18:03, John David Anglin wrote:
> The extru instruction leaves the most significant 32 bits of the target register in an undefined
> state on PA 2.0 systems.  If any of these bits are nonzero, this will break the calculation of the
> lock pointer.
>
> Fix by using extrd,u instruction on 64-bit kernels.

Good catch!!
Did you checked if it actually happened that the most
significant 32 bits were non-zero?
If so, could this be one of the reasons we saw strange
issues or even memory corruption?

Sadly I sent a pull request to Linus a few hours ago,
otherwise I would have added this patch...

Helge

> Signed-off-by: John David Anglin <dave.anglin@bell.net>
> ---
> diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
> index 3f24a0af1e04..3f70528622eb 100644
> --- a/arch/parisc/kernel/syscall.S
> +++ b/arch/parisc/kernel/syscall.S
> @@ -572,7 +572,11 @@ lws_compare_and_swap:
>      ldo    R%lws_lock_start(%r20), %r28
>
>      /* Extract eight bits from r26 and hash lock (Bits 3-11) */
> +#ifdef CONFIG_64BIT
> +    extrd,u  %r26, 60, 8, %r20
> +#else
>      extru  %r26, 28, 8, %r20
> +#endif
>
>      /* Find lock to use, the hash is either one of 0 to
>         15, multiplied by 16 (keep it 16-byte aligned)
> @@ -762,7 +761,11 @@ cas2_lock_start:
>      ldo    R%lws_lock_start(%r20), %r28
>
>      /* Extract eight bits from r26 and hash lock (Bits 3-11) */
> +#ifdef CONFIG_64BIT
> +    extrd,u  %r26, 60, 8, %r20
> +#else
>      extru  %r26, 28, 8, %r20
> +#endif
>
>      /* Find lock to use, the hash is either one of 0 to
>         15, multiplied by 16 (keep it 16-byte aligned)
John David Anglin Nov. 18, 2021, 7:47 p.m. UTC | #2
On 2021-11-18 2:24 p.m., Helge Deller wrote:
> On 11/18/21 18:03, John David Anglin wrote:
>> The extru instruction leaves the most significant 32 bits of the target register in an undefined
>> state on PA 2.0 systems.  If any of these bits are nonzero, this will break the calculation of the
>> lock pointer.
>>
>> Fix by using extrd,u instruction on 64-bit kernels.
> Good catch!!
> Did you checked if it actually happened that the most
> significant 32 bits were non-zero?
No.  I tend to think the bits are always zero but the arch says they are undefined.
> If so, could this be one of the reasons we saw strange
> issues or even memory corruption?
Possibly but I wouldn't be too hopeful that it will make a big difference.
>
> Sadly I sent a pull request to Linus a few hours ago,
> otherwise I would have added this patch...
I just noticed the problem yesterday.  I was looking at the failure of glibc's tst-cleanupx4:

dave@mx3210:~/gnu/glibc/objdir$ env GCONV_PATH=/home/dave/gnu/glibc/objdir/iconvdata LOCPATH=/home/dave/gnu/glibc/objdir/localedata LC_ALL=C 
/home/dave/gnu/glibc/objdir/elf/ld.so.1 --library-path 
/home/dave/gnu/glibc/objdir:/home/dave/gnu/glibc/objdir/math:/home/dave/gnu/glibc/objdir/elf:/home/dave/gnu/glibc/objdir/dlfcn:/home/dave/gnu/glibc/objdir/nss:/home/dave/gnu/glibc/objdir/nis:/home/dave/gnu/glibc/objdir/rt:/home/dave/gnu/glibc/objdir/resolv:/home/dave/gnu/glibc/objdir/mathvec:/home/dave/gnu/glibc/objdir/support:/home/dave/gnu/glibc/objdir/crypt:/home/dave/gnu/glibc/objdir/nptl 
/home/dave/gnu/glibc/objdir/nptl/tst-cleanupx4
test 0
clh (2)
clh (1)
clh (3)
global = 12, expected 15
[...]

As far as I can tell, clh() is called in the wrong order - should be 1, 2, 3.  This gives the expected value of 15.  2, 1, 3 yields 12.

This suggests our atomic operations are broken.  I think the problem may be that atomic loads may need to be sequenced
with the LWS lock.  While sequencing stores is obvious, this is not obvious for loads.  Anyway, I starting hacking on syscall.S
to provide lws_atomic_load and lws_atomic_store operations. Currently, atomic stores are done using CAS operation.  This
is less efficient than it could be.

Another little issue is "because" is misspelled in a couple of places in syscall.S.

Dave
John David Anglin Nov. 18, 2021, 7:55 p.m. UTC | #3
On 2021-11-18 2:47 p.m., John David Anglin wrote:
> I just noticed the problem yesterday.  I was looking at the failure of glibc's tst-cleanupx4:
>
> dave@mx3210:~/gnu/glibc/objdir$ env GCONV_PATH=/home/dave/gnu/glibc/objdir/iconvdata LOCPATH=/home/dave/gnu/glibc/objdir/localedata LC_ALL=C 
> /home/dave/gnu/glibc/objdir/elf/ld.so.1 --library-path 
> /home/dave/gnu/glibc/objdir:/home/dave/gnu/glibc/objdir/math:/home/dave/gnu/glibc/objdir/elf:/home/dave/gnu/glibc/objdir/dlfcn:/home/dave/gnu/glibc/objdir/nss:/home/dave/gnu/glibc/objdir/nis:/home/dave/gnu/glibc/objdir/rt:/home/dave/gnu/glibc/objdir/resolv:/home/dave/gnu/glibc/objdir/mathvec:/home/dave/gnu/glibc/objdir/support:/home/dave/gnu/glibc/objdir/crypt:/home/dave/gnu/glibc/objdir/nptl 
> /home/dave/gnu/glibc/objdir/nptl/tst-cleanupx4
> test 0
> clh (2)
> clh (1)
> clh (3)
> global = 12, expected 15
> [...]
>
> As far as I can tell, clh() is called in the wrong order - should be 1, 2, 3.  This gives the expected value of 15.  2, 1, 3 yields 12.
I see same order on c3750 with one cpu.

Dave
Helge Deller Nov. 19, 2021, 3:56 p.m. UTC | #4
* John David Anglin <dave.anglin@bell.net>:
> The extru instruction leaves the most significant 32 bits of the target register in an undefined
> state on PA 2.0 systems.  If any of these bits are nonzero, this will break the calculation of the
> lock pointer.
>
> Fix by using extrd,u instruction on 64-bit kernels.

I wonder if we shouldn't introduce an extru_safe() macro.
The name doesn't matter, but that way we can get rid of the ifdefs and
use it in other places as well, e.g. as seen below.
Thoughs?

Helge

diff --git a/arch/parisc/include/asm/assembly.h b/arch/parisc/include/asm/assembly.h
index 7085df079702..9c5f0fc67400 100644
--- a/arch/parisc/include/asm/assembly.h
+++ b/arch/parisc/include/asm/assembly.h
@@ -143,6 +143,16 @@
 	extrd,u \r, 63-(\sa), 64-(\sa), \t
 	.endm

+	/* The extru instruction leaves the most significant 32 bits of the
+	 * target register in an undefined state on PA 2.0 systems. */
+	.macro extru_safe r, p, len, t
+#ifdef CONFIG_64BIT
+	extrd,u	\r, 32+(\p), \len, \t
+#else
+	extru	\r, \p, \len, \t
+#endif
+	.endm
+
 	/* load 32-bit 'value' into 'reg' compensating for the ldil
 	 * sign-extension when running in wide mode.
 	 * WARNING!! neither 'value' nor 'reg' can be expressions
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index 88c188a965d8..6e9cdb269862 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -366,17 +366,9 @@
 	 */
 	.macro		L2_ptep	pmd,pte,index,va,fault
 #if CONFIG_PGTABLE_LEVELS == 3
-	extru		\va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
+	extru_safe	\va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
 #else
-# if defined(CONFIG_64BIT)
-	extrd,u		\va,63-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
-  #else
-  # if PAGE_SIZE > 4096
-	extru		\va,31-ASM_PGDIR_SHIFT,32-ASM_PGDIR_SHIFT,\index
-  # else
-	extru		\va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
-  # endif
-# endif
+	extru_safe	\va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
 #endif
 	dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
 #if CONFIG_PGTABLE_LEVELS < 3
@@ -386,7 +378,7 @@
 	bb,>=,n		\pmd,_PxD_PRESENT_BIT,\fault
 	dep		%r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
 	SHLREG		\pmd,PxD_VALUE_SHIFT,\pmd
-	extru		\va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
+	extru_safe	\va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
 	dep		%r0,31,PAGE_SHIFT,\pmd  /* clear offset */
 	shladd		\index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
 	.endm
diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index 4fb3b6a993bf..d2497b339d13 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -566,7 +566,7 @@ lws_compare_and_swap:
 	ldo	R%lws_lock_start(%r20), %r28

 	/* Extract eight bits from r26 and hash lock (Bits 3-11) */
-	extru  %r26, 28, 8, %r20
+	extru_safe  %r26, 28, 8, %r20

 	/* Find lock to use, the hash is either one of 0 to
 	   15, multiplied by 16 (keep it 16-byte aligned)
@@ -751,7 +751,7 @@ cas2_lock_start:
 	ldo	R%lws_lock_start(%r20), %r28

 	/* Extract eight bits from r26 and hash lock (Bits 3-11) */
-	extru  %r26, 28, 8, %r20
+	extru_safe  %r26, 28, 8, %r20

 	/* Find lock to use, the hash is either one of 0 to
 	   15, multiplied by 16 (keep it 16-byte aligned)
John David Anglin Nov. 19, 2021, 8:27 p.m. UTC | #5
On 2021-11-19 10:56 a.m., Helge Deller wrote:
> * John David Anglin<dave.anglin@bell.net>:
>> The extru instruction leaves the most significant 32 bits of the target register in an undefined
>> state on PA 2.0 systems.  If any of these bits are nonzero, this will break the calculation of the
>> lock pointer.
>>
>> Fix by using extrd,u instruction on 64-bit kernels.
> I wonder if we shouldn't introduce an extru_safe() macro.
> The name doesn't matter, but that way we can get rid of the ifdefs and
> use it in other places as well, e.g. as seen below.
> Thoughs?
Seems like a good idea.

Only question is this hunk

@@ -366,17 +366,9 @@
       */
      .macro        L2_ptep    pmd,pte,index,va,fault
  #if CONFIG_PGTABLE_LEVELS == 3
-    extru        \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
+    extru_safe    \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
  #else
-# if defined(CONFIG_64BIT)
-    extrd,u        \va,63-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
-  #else
-  # if PAGE_SIZE > 4096
-    extru        \va,31-ASM_PGDIR_SHIFT,32-ASM_PGDIR_SHIFT,\index
-  # else
-    extru        \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
-  # endif
-# endif
+    extru_safe    \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
  #endif
      dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
  #if CONFIG_PGTABLE_LEVELS < 3

where we lose the PAGE_SIZE > 4096 shift.

Dave
Helge Deller Nov. 19, 2021, 8:41 p.m. UTC | #6
On 11/19/21 21:27, John David Anglin wrote:
> On 2021-11-19 10:56 a.m., Helge Deller wrote:
>> * John David Anglin<dave.anglin@bell.net>:
>>> The extru instruction leaves the most significant 32 bits of the target register in an undefined
>>> state on PA 2.0 systems.  If any of these bits are nonzero, this will break the calculation of the
>>> lock pointer.
>>>
>>> Fix by using extrd,u instruction on 64-bit kernels.
>> I wonder if we shouldn't introduce an extru_safe() macro.
>> The name doesn't matter, but that way we can get rid of the ifdefs and
>> use it in other places as well, e.g. as seen below.
>> Thoughs?
> Seems like a good idea.
>
> Only question is this hunk
>
> @@ -366,17 +366,9 @@
>       */
>      .macro        L2_ptep    pmd,pte,index,va,fault
>  #if CONFIG_PGTABLE_LEVELS == 3
> -    extru        \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
> +    extru_safe    \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
>  #else
> -# if defined(CONFIG_64BIT)
> -    extrd,u        \va,63-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
> -  #else
> -  # if PAGE_SIZE > 4096
> -    extru        \va,31-ASM_PGDIR_SHIFT,32-ASM_PGDIR_SHIFT,\index
> -  # else
> -    extru        \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
> -  # endif
> -# endif
> +    extru_safe    \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
>  #endif
>      dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>  #if CONFIG_PGTABLE_LEVELS < 3
>
> where we lose the PAGE_SIZE > 4096 shift.

That's a left-over.
PAGE_SIZE>4096 can only be enabled on PA20 and is currently marked broken anyway.
The if was there to theoretically be able to use it with 32bit kernels where
the extru length extended left to the upper 32bits...

Helge
diff mbox series

Patch

diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index 3f24a0af1e04..3f70528622eb 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -572,7 +572,11 @@  lws_compare_and_swap:
  	ldo	R%lws_lock_start(%r20), %r28

  	/* Extract eight bits from r26 and hash lock (Bits 3-11) */
+#ifdef CONFIG_64BIT
+	extrd,u  %r26, 60, 8, %r20
+#else
  	extru  %r26, 28, 8, %r20
+#endif

  	/* Find lock to use, the hash is either one of 0 to
  	   15, multiplied by 16 (keep it 16-byte aligned)
@@ -762,7 +761,11 @@  cas2_lock_start:
  	ldo	R%lws_lock_start(%r20), %r28

  	/* Extract eight bits from r26 and hash lock (Bits 3-11) */
+#ifdef CONFIG_64BIT
+	extrd,u  %r26, 60, 8, %r20
+#else
  	extru  %r26, 28, 8, %r20
+#endif

  	/* Find lock to use, the hash is either one of 0 to
  	   15, multiplied by 16 (keep it 16-byte aligned)