Message ID | df51e873-4576-d4c2-7d86-b607cbb714b4@bell.net (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | parisc: Fix extraction of hash lock bits in syscall.S | expand |
On 11/18/21 18:03, John David Anglin wrote: > The extru instruction leaves the most significant 32 bits of the target register in an undefined > state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the > lock pointer. > > Fix by using extrd,u instruction on 64-bit kernels. Good catch!! Did you checked if it actually happened that the most significant 32 bits were non-zero? If so, could this be one of the reasons we saw strange issues or even memory corruption? Sadly I sent a pull request to Linus a few hours ago, otherwise I would have added this patch... Helge > Signed-off-by: John David Anglin <dave.anglin@bell.net> > --- > diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S > index 3f24a0af1e04..3f70528622eb 100644 > --- a/arch/parisc/kernel/syscall.S > +++ b/arch/parisc/kernel/syscall.S > @@ -572,7 +572,11 @@ lws_compare_and_swap: > ldo R%lws_lock_start(%r20), %r28 > > /* Extract eight bits from r26 and hash lock (Bits 3-11) */ > +#ifdef CONFIG_64BIT > + extrd,u %r26, 60, 8, %r20 > +#else > extru %r26, 28, 8, %r20 > +#endif > > /* Find lock to use, the hash is either one of 0 to > 15, multiplied by 16 (keep it 16-byte aligned) > @@ -762,7 +761,11 @@ cas2_lock_start: > ldo R%lws_lock_start(%r20), %r28 > > /* Extract eight bits from r26 and hash lock (Bits 3-11) */ > +#ifdef CONFIG_64BIT > + extrd,u %r26, 60, 8, %r20 > +#else > extru %r26, 28, 8, %r20 > +#endif > > /* Find lock to use, the hash is either one of 0 to > 15, multiplied by 16 (keep it 16-byte aligned)
On 2021-11-18 2:24 p.m., Helge Deller wrote: > On 11/18/21 18:03, John David Anglin wrote: >> The extru instruction leaves the most significant 32 bits of the target register in an undefined >> state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the >> lock pointer. >> >> Fix by using extrd,u instruction on 64-bit kernels. > Good catch!! > Did you checked if it actually happened that the most > significant 32 bits were non-zero? No. I tend to think the bits are always zero but the arch says they are undefined. > If so, could this be one of the reasons we saw strange > issues or even memory corruption? Possibly but I wouldn't be too hopeful that it will make a big difference. > > Sadly I sent a pull request to Linus a few hours ago, > otherwise I would have added this patch... I just noticed the problem yesterday. I was looking at the failure of glibc's tst-cleanupx4: dave@mx3210:~/gnu/glibc/objdir$ env GCONV_PATH=/home/dave/gnu/glibc/objdir/iconvdata LOCPATH=/home/dave/gnu/glibc/objdir/localedata LC_ALL=C /home/dave/gnu/glibc/objdir/elf/ld.so.1 --library-path /home/dave/gnu/glibc/objdir:/home/dave/gnu/glibc/objdir/math:/home/dave/gnu/glibc/objdir/elf:/home/dave/gnu/glibc/objdir/dlfcn:/home/dave/gnu/glibc/objdir/nss:/home/dave/gnu/glibc/objdir/nis:/home/dave/gnu/glibc/objdir/rt:/home/dave/gnu/glibc/objdir/resolv:/home/dave/gnu/glibc/objdir/mathvec:/home/dave/gnu/glibc/objdir/support:/home/dave/gnu/glibc/objdir/crypt:/home/dave/gnu/glibc/objdir/nptl /home/dave/gnu/glibc/objdir/nptl/tst-cleanupx4 test 0 clh (2) clh (1) clh (3) global = 12, expected 15 [...] As far as I can tell, clh() is called in the wrong order - should be 1, 2, 3. This gives the expected value of 15. 2, 1, 3 yields 12. This suggests our atomic operations are broken. I think the problem may be that atomic loads may need to be sequenced with the LWS lock. While sequencing stores is obvious, this is not obvious for loads. Anyway, I starting hacking on syscall.S to provide lws_atomic_load and lws_atomic_store operations. Currently, atomic stores are done using CAS operation. This is less efficient than it could be. Another little issue is "because" is misspelled in a couple of places in syscall.S. Dave
On 2021-11-18 2:47 p.m., John David Anglin wrote: > I just noticed the problem yesterday. I was looking at the failure of glibc's tst-cleanupx4: > > dave@mx3210:~/gnu/glibc/objdir$ env GCONV_PATH=/home/dave/gnu/glibc/objdir/iconvdata LOCPATH=/home/dave/gnu/glibc/objdir/localedata LC_ALL=C > /home/dave/gnu/glibc/objdir/elf/ld.so.1 --library-path > /home/dave/gnu/glibc/objdir:/home/dave/gnu/glibc/objdir/math:/home/dave/gnu/glibc/objdir/elf:/home/dave/gnu/glibc/objdir/dlfcn:/home/dave/gnu/glibc/objdir/nss:/home/dave/gnu/glibc/objdir/nis:/home/dave/gnu/glibc/objdir/rt:/home/dave/gnu/glibc/objdir/resolv:/home/dave/gnu/glibc/objdir/mathvec:/home/dave/gnu/glibc/objdir/support:/home/dave/gnu/glibc/objdir/crypt:/home/dave/gnu/glibc/objdir/nptl > /home/dave/gnu/glibc/objdir/nptl/tst-cleanupx4 > test 0 > clh (2) > clh (1) > clh (3) > global = 12, expected 15 > [...] > > As far as I can tell, clh() is called in the wrong order - should be 1, 2, 3. This gives the expected value of 15. 2, 1, 3 yields 12. I see same order on c3750 with one cpu. Dave
* John David Anglin <dave.anglin@bell.net>: > The extru instruction leaves the most significant 32 bits of the target register in an undefined > state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the > lock pointer. > > Fix by using extrd,u instruction on 64-bit kernels. I wonder if we shouldn't introduce an extru_safe() macro. The name doesn't matter, but that way we can get rid of the ifdefs and use it in other places as well, e.g. as seen below. Thoughs? Helge diff --git a/arch/parisc/include/asm/assembly.h b/arch/parisc/include/asm/assembly.h index 7085df079702..9c5f0fc67400 100644 --- a/arch/parisc/include/asm/assembly.h +++ b/arch/parisc/include/asm/assembly.h @@ -143,6 +143,16 @@ extrd,u \r, 63-(\sa), 64-(\sa), \t .endm + /* The extru instruction leaves the most significant 32 bits of the + * target register in an undefined state on PA 2.0 systems. */ + .macro extru_safe r, p, len, t +#ifdef CONFIG_64BIT + extrd,u \r, 32+(\p), \len, \t +#else + extru \r, \p, \len, \t +#endif + .endm + /* load 32-bit 'value' into 'reg' compensating for the ldil * sign-extension when running in wide mode. * WARNING!! neither 'value' nor 'reg' can be expressions diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index 88c188a965d8..6e9cdb269862 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -366,17 +366,9 @@ */ .macro L2_ptep pmd,pte,index,va,fault #if CONFIG_PGTABLE_LEVELS == 3 - extru \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index + extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index #else -# if defined(CONFIG_64BIT) - extrd,u \va,63-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index - #else - # if PAGE_SIZE > 4096 - extru \va,31-ASM_PGDIR_SHIFT,32-ASM_PGDIR_SHIFT,\index - # else - extru \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index - # endif -# endif + extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index #endif dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ #if CONFIG_PGTABLE_LEVELS < 3 @@ -386,7 +378,7 @@ bb,>=,n \pmd,_PxD_PRESENT_BIT,\fault dep %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */ SHLREG \pmd,PxD_VALUE_SHIFT,\pmd - extru \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index + extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ .endm diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index 4fb3b6a993bf..d2497b339d13 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -566,7 +566,7 @@ lws_compare_and_swap: ldo R%lws_lock_start(%r20), %r28 /* Extract eight bits from r26 and hash lock (Bits 3-11) */ - extru %r26, 28, 8, %r20 + extru_safe %r26, 28, 8, %r20 /* Find lock to use, the hash is either one of 0 to 15, multiplied by 16 (keep it 16-byte aligned) @@ -751,7 +751,7 @@ cas2_lock_start: ldo R%lws_lock_start(%r20), %r28 /* Extract eight bits from r26 and hash lock (Bits 3-11) */ - extru %r26, 28, 8, %r20 + extru_safe %r26, 28, 8, %r20 /* Find lock to use, the hash is either one of 0 to 15, multiplied by 16 (keep it 16-byte aligned)
On 2021-11-19 10:56 a.m., Helge Deller wrote: > * John David Anglin<dave.anglin@bell.net>: >> The extru instruction leaves the most significant 32 bits of the target register in an undefined >> state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the >> lock pointer. >> >> Fix by using extrd,u instruction on 64-bit kernels. > I wonder if we shouldn't introduce an extru_safe() macro. > The name doesn't matter, but that way we can get rid of the ifdefs and > use it in other places as well, e.g. as seen below. > Thoughs? Seems like a good idea. Only question is this hunk @@ -366,17 +366,9 @@ */ .macro L2_ptep pmd,pte,index,va,fault #if CONFIG_PGTABLE_LEVELS == 3 - extru \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index + extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index #else -# if defined(CONFIG_64BIT) - extrd,u \va,63-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index - #else - # if PAGE_SIZE > 4096 - extru \va,31-ASM_PGDIR_SHIFT,32-ASM_PGDIR_SHIFT,\index - # else - extru \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index - # endif -# endif + extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index #endif dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ #if CONFIG_PGTABLE_LEVELS < 3 where we lose the PAGE_SIZE > 4096 shift. Dave
On 11/19/21 21:27, John David Anglin wrote: > On 2021-11-19 10:56 a.m., Helge Deller wrote: >> * John David Anglin<dave.anglin@bell.net>: >>> The extru instruction leaves the most significant 32 bits of the target register in an undefined >>> state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the >>> lock pointer. >>> >>> Fix by using extrd,u instruction on 64-bit kernels. >> I wonder if we shouldn't introduce an extru_safe() macro. >> The name doesn't matter, but that way we can get rid of the ifdefs and >> use it in other places as well, e.g. as seen below. >> Thoughs? > Seems like a good idea. > > Only question is this hunk > > @@ -366,17 +366,9 @@ > */ > .macro L2_ptep pmd,pte,index,va,fault > #if CONFIG_PGTABLE_LEVELS == 3 > - extru \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index > + extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index > #else > -# if defined(CONFIG_64BIT) > - extrd,u \va,63-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index > - #else > - # if PAGE_SIZE > 4096 > - extru \va,31-ASM_PGDIR_SHIFT,32-ASM_PGDIR_SHIFT,\index > - # else > - extru \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index > - # endif > -# endif > + extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index > #endif > dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ > #if CONFIG_PGTABLE_LEVELS < 3 > > where we lose the PAGE_SIZE > 4096 shift. That's a left-over. PAGE_SIZE>4096 can only be enabled on PA20 and is currently marked broken anyway. The if was there to theoretically be able to use it with 32bit kernels where the extru length extended left to the upper 32bits... Helge
diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index 3f24a0af1e04..3f70528622eb 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -572,7 +572,11 @@ lws_compare_and_swap: ldo R%lws_lock_start(%r20), %r28 /* Extract eight bits from r26 and hash lock (Bits 3-11) */ +#ifdef CONFIG_64BIT + extrd,u %r26, 60, 8, %r20 +#else extru %r26, 28, 8, %r20 +#endif /* Find lock to use, the hash is either one of 0 to 15, multiplied by 16 (keep it 16-byte aligned) @@ -762,7 +761,11 @@ cas2_lock_start: ldo R%lws_lock_start(%r20), %r28 /* Extract eight bits from r26 and hash lock (Bits 3-11) */ +#ifdef CONFIG_64BIT + extrd,u %r26, 60, 8, %r20 +#else extru %r26, 28, 8, %r20 +#endif /* Find lock to use, the hash is either one of 0 to 15, multiplied by 16 (keep it 16-byte aligned)
The extru instruction leaves the most significant 32 bits of the target register in an undefined state on PA 2.0 systems. If any of these bits are nonzero, this will break the calculation of the lock pointer. Fix by using extrd,u instruction on 64-bit kernels. Signed-off-by: John David Anglin <dave.anglin@bell.net> ---