Message ID | 20170927155007.GA16211@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 9/27/2017 9:50 AM, Will Deacon wrote: > On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote: >> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote: >>> On 9/26/2017 4:23 AM, Will Deacon wrote: >>>> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote: >>>>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I >>>>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K >>>>> page I was not able to reproduce. RH also reported it here: https:// >>>>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel >>>>> (4.12) on Centriq2400 and ThunderX >>>>> >>>>> >>>>> https://bugs.linaro.org/show_bug.cgi?id=3191 >>>>> >>>>> https://bugs.linaro.org/show_bug.cgi?id=3068. >>>> These two aren't the same bug (that's a forward progress issue that we're >>>> currently working on). I don't have permission to look at the redhat one, >>>> but is it just an RCU stall or actually the Oops reported by Yury? >>>> >>>>> I was able to bisect down to a specific commit. >>>> I think we're chasing two different things here, so not sure I trust the >>>> bisect! >>>> >>> The RCU stall is side effect. The issue I'm seeing has the same stack >>> trace and same stimulus (rwtest). Following are the details. >> FWIW, I think I've worked out what's going on here and I should have a patch >> tomorrow. > Diff below. I'm going to follow up with a separate thread about this, > because the proper fix is going to be invasive. I'll keep you on cc. > > Out of curiosity: what version of GCC are you using to compile the kernel? I'm using gcc-linaro-6.3.1-2017.02-x86_64_aarch64-linux-gnu Thanks for the patch, test results to follow. Richard > > Will > > --->8 > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index bc4e92337d16..b46e54c2399b 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) > /* Find an entry in the third-level page table. */ > #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) > > -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t)) > +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t)) > #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr)))) > > #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
On 9/27/2017 12:00 PM, Richard Ruigrok wrote: > > On 9/27/2017 9:50 AM, Will Deacon wrote: >> On Tue, Sep 26, 2017 at 06:31:12PM +0100, Will Deacon wrote: >>> On Tue, Sep 26, 2017 at 08:23:35AM -0600, Ruigrok, Richard wrote: >>>> On 9/26/2017 4:23 AM, Will Deacon wrote: >>>>> On Mon, Sep 25, 2017 at 01:54:57PM -0600, Ruigrok, Richard wrote: >>>>>> I also found this issue with kernels from 4.11 through 4.13. In my tests, I >>>>>> found that it reproduces only with 4K page and Transparent Huge Pages. With 64K >>>>>> page I was not able to reproduce. RH also reported it here: https:// >>>>>> bugzilla.redhat.com/show_bug.cgi?id=1491504 Linaro reported on the RPK kernel >>>>>> (4.12) on Centriq2400 and ThunderX >>>>>> >>>>>> >>>>>> https://bugs.linaro.org/show_bug.cgi?id=3191 >>>>>> >>>>>> https://bugs.linaro.org/show_bug.cgi?id=3068. >>>>> These two aren't the same bug (that's a forward progress issue that we're >>>>> currently working on). I don't have permission to look at the redhat one, >>>>> but is it just an RCU stall or actually the Oops reported by Yury? >>>>> >>>>>> I was able to bisect down to a specific commit. >>>>> I think we're chasing two different things here, so not sure I trust the >>>>> bisect! >>>>> >>>> The RCU stall is side effect. The issue I'm seeing has the same stack >>>> trace and same stimulus (rwtest). Following are the details. >>> FWIW, I think I've worked out what's going on here and I should have a patch >>> tomorrow. >> Diff below. I'm going to follow up with a separate thread about this, >> because the proper fix is going to be invasive. I'll keep you on cc. >> >> Out of curiosity: what version of GCC are you using to compile the kernel? > I'm using gcc-linaro-6.3.1-2017.02-x86_64_aarch64-linux-gnu > Thanks for the patch, test results to follow. > Richard With this change applied on v4.13, the LTP rwtest passed 50 iterations, it appears to solve the issue I was seeing. This kernel was built with 5.2.1, I've also started using 6.3.1. If you think it makes a difference I can test also with 6.3.1. Linux version 4.13.0-00002-g8540910-dirty (rruigrok@rruigrok-lnx) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-1)) #55 SMP PREEMPT Wed Sep 27 13:37:25 MDT 2017 Richard >> Will >> >> --->8 >> >> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >> index bc4e92337d16..b46e54c2399b 100644 >> --- a/arch/arm64/include/asm/pgtable.h >> +++ b/arch/arm64/include/asm/pgtable.h >> @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) >> /* Find an entry in the third-level page table. */ >> #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) >> >> -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t)) >> +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t)) >> #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr)))) >> >> #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index bc4e92337d16..b46e54c2399b 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -401,7 +401,7 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd) /* Find an entry in the third-level page table. */ #define pte_index(addr) (((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)) -#define pte_offset_phys(dir,addr) (pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t)) +#define pte_offset_phys(dir,addr) (pmd_page_paddr(READ_ONCE(*(dir))) + pte_index(addr) * sizeof(pte_t)) #define pte_offset_kernel(dir,addr) ((pte_t *)__va(pte_offset_phys((dir), (addr)))) #define pte_offset_map(dir,addr) pte_offset_kernel((dir), (addr))