Message ID | 1447858999-26665-1-git-send-email-jeremy.linton@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Jeremy, On Wed, Nov 18, 2015 at 09:03:19AM -0600, Jeremy Linton wrote: > The HP m400 fails to boot the linux 4.4rc1 kernel. Are you using defconfig? If not, can you share your config? > It usually hangs or sometimes takes an unhanded exception around the > DMA zone messages. This was bisected to the new CONT PTE changes. Do you have any examples of the unhandled exception cases? Are they a mixed bag, or a consistent exception class? > Adding an extra flush_tlb_all() in the code path which is > changing the kernel permissions allows the machine to boot > consistently. As you mention changing permissions, I take it you're using CONFIG_DEBUG_RODATA? Thanks, Mark. > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > --- > arch/arm64/mm/mmu.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index e3f563c..e92fe77 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -121,6 +121,7 @@ static void __populate_init_pte(pte_t *pte, unsigned long addr, > pfn++; > addr += PAGE_SIZE; > } while (addr != end); > + flush_tlb_all(); > } > > static void alloc_init_pte(pmd_t *pmd, unsigned long addr, > -- > 2.4.3 >
On 11/18/2015 09:20 AM, Mark Rutland wrote: > Hi Jeremy, > > On Wed, Nov 18, 2015 at 09:03:19AM -0600, Jeremy Linton wrote: >> The HP m400 fails to boot the linux 4.4rc1 kernel. > > Are you using defconfig? If not, can you share your config? No, its not defconfig, its roughly the RHELSA config tossed into a mainline 4.4 tree and all the default options selected. AFAIK RHELSA is still limited access. > >> It usually hangs or sometimes takes an unhanded exception around the >> DMA zone messages. This was bisected to the new CONT PTE changes. > > Do you have any examples of the unhandled exception cases? Are they a > mixed bag, or a consistent exception class? I'm guessing about 90% of the time its a dead hang, the remaining are the faults of which there is one that happens more frequently than the others. Here is one i found in my notes.. [ 0.000000] On node 0 totalpages: 1048512 [ 0.000000] DMA zone: 64 pages used for memmap [ 0.000000] DMA zone: 0 pages reserved [ 0.000000] DMA zone: 65472 pages, LIFO batch:1 [ 0.000000] Unhandled fault: unknown 48 (0x96000070) at 0xfffffe0000d60588 >> Adding an extra flush_tlb_all() in the code path which is >> changing the kernel permissions allows the machine to boot >> consistently. > > As you mention changing permissions, I take it you're using > CONFIG_DEBUG_RODATA? The failing configuration doesn't have DEBUG_RODATA set, I might have been pretty loose with my terminology. Frankly, I wondered originally how config RODATA was working reliably because the flushes were only around the directories getting split, fixup_init() (and basically anything calling create_mapping_late()) looked like there were paths that could avoid flushing. When I added the CONT changes I didn't add flushes to paths that didn't previously have them (except in the split cont range case, which matched the spit p[mu]d case). I made the mistake of assuming someone knew about some edge case that avoided the need for the flush. Once I find/fix the console issue on that machine with 4.4rc1 (there are a small handful of issues that keep mainline from working on it, including the sata patch that was posted, and rejected), I will focus on hoisting the tlb flush into create_mapping_late() and removing the splattering of flushes in those code paths. That is unless there is a reason to be preforming them as soon as the directories are split.
On Wed, Nov 18, 2015 at 10:08:58AM -0600, Jeremy Linton wrote: > On 11/18/2015 09:20 AM, Mark Rutland wrote: > >Hi Jeremy, > > > >On Wed, Nov 18, 2015 at 09:03:19AM -0600, Jeremy Linton wrote: > >>The HP m400 fails to boot the linux 4.4rc1 kernel. > > > >Are you using defconfig? If not, can you share your config? > No, its not defconfig, its roughly the RHELSA config tossed into a > mainline 4.4 tree and all the default options selected. AFAIK RHELSA > is still limited access. That renders this extremely difficult for anyone else to reproduce... > >>It usually hangs or sometimes takes an unhanded exception around the > >>DMA zone messages. This was bisected to the new CONT PTE changes. > > > >Do you have any examples of the unhandled exception cases? Are they a > >mixed bag, or a consistent exception class? > > I'm guessing about 90% of the time its a dead hang, the remaining > are the faults of which there is one that happens more frequently > than the others. Here is one i found in my notes.. Ok. In future please provide a sample with any bug report. > [ 0.000000] On node 0 totalpages: 1048512 > [ 0.000000] DMA zone: 64 pages used for memmap > [ 0.000000] DMA zone: 0 pages reserved > [ 0.000000] DMA zone: 65472 pages, LIFO batch:1 > [ 0.000000] Unhandled fault: unknown 48 (0x96000070) at > 0xfffffe0000d60588 From a quick grep that's from do_mem_abort, where the "unknown 48" is the DFSC, the bit in brackets is the ESR, and the address is the faulting address from FAR_EL1. That 48 / 0b110000 for the DFSC decodes as "TLB conflict abort" per the ARM ARM. Other than that, the WnR bit is set in the ISS. So this is probably a break-before-make issue. Can you figure out where 0xfffffe0000d60588 pointed to, and where in the kernel the access was performed? It would be nice to know if this is consistently happening at some edge of the kernel address space. FWIW, Will had a patch [1] for detecting PTE level break-before-make violations. I gave this a go on Juno with v4.4-rc1, and saw an issue in the EFI virtmap code that I'm currently investigating. > >>Adding an extra flush_tlb_all() in the code path which is > >>changing the kernel permissions allows the machine to boot > >>consistently. > > > >As you mention changing permissions, I take it you're using > >CONFIG_DEBUG_RODATA? > > The failing configuration doesn't have DEBUG_RODATA set, I might > have been pretty loose with my terminology. Ok, good to know. > Frankly, I wondered originally how config RODATA was working > reliably because the flushes were only around the directories > getting split, fixup_init() (and basically anything calling > create_mapping_late()) looked like there were paths that could avoid > flushing. When I added the CONT changes I didn't add flushes to > paths that didn't previously have them (except in the split cont > range case, which matched the spit p[mu]d case). I made the mistake > of assuming someone knew about some edge case that avoided the need > for the flush. I'll need to page the code back into my head, but I recall I had concerns about break-before-make, so there's some auditing to be done. > Once I find/fix the console issue on that machine with 4.4rc1 (there > are a small handful of issues that keep mainline from working on it, > including the sata patch that was posted, and rejected), I will > focus on hoisting the tlb flush into create_mapping_late() and > removing the splattering of flushes in those code paths. That is > unless there is a reason to be preforming them as soon as the > directories are split. We need to figure out exactly what maintenance we actually need. Hoisting the TLB flush isn't necessarily possible if we need to perform break-before-make at the PTE level, and even that may not be possible for the kernel page tables; we might need to do something more drastic like using ASIDs and double-buffering them... We also need to figure out what's happening with the code as it is. Thanks, Mark. [1] https://git.kernel.org/cgit/linux/kernel/git/will/linux.git/commit/?h=aarch64/devel&id=372f39220ad35fa39a75419f2221ffeb6ffd78d3
On 11/18/2015 10:29 AM, Mark Rutland wrote: > On Wed, Nov 18, 2015 at 10:08:58AM -0600, Jeremy Linton wrote: >> No, its not defconfig, its roughly the RHELSA config tossed into a >> mainline 4.4 tree and all the default options selected. AFAIK RHELSA >> is still limited access. > > That renders this extremely difficult for anyone else to reproduce... Well the kernel in question boots fine on a Juno. I haven't tried any other APM based machines. And given whats happening I doubt its config related. > That 48 / 0b110000 for the DFSC decodes as "TLB conflict abort" per the > ARM ARM. Other than that, the WnR bit is set in the ISS. > > So this is probably a break-before-make issue. > > Can you figure out where 0xfffffe0000d60588 pointed to, and where in the > kernel the access was performed? It would be nice to know if this is > consistently happening at some edge of the kernel address space. I decoded everything when I initially saw it, but it didn't make a lick of sense related to what I was attempting to accomplish so I didn't keep any of it. Only later when I found out it wasn't related to the patches I was applying did I start trying to track down the regression. Even so, given some other patches that went in, it wasn't blindingly obvious where the problem was until I was sure that it was related to the linear mapping changes. AKA I didn't think anyone would be able to debug the failure with that little information, maybe i'm wrong on that point... Anyway, the kernel that produced that failure is long gone, I can in the near future attempt to reproduce the message. >> Once I find/fix the console issue on that machine with 4.4rc1 (there >> are a small handful of issues that keep mainline from working on it, >> including the sata patch that was posted, and rejected), I will >> focus on hoisting the tlb flush into create_mapping_late() and >> removing the splattering of flushes in those code paths. That is >> unless there is a reason to be preforming them as soon as the >> directories are split. > > We need to figure out exactly what maintenance we actually need. > > Hoisting the TLB flush isn't necessarily possible if we need to perform > break-before-make at the PTE level, and even that may not be possible > for the kernel page tables; we might need to do something more > drastic like using ASIDs and double-buffering them... > > We also need to figure out what's happening with the code as it is. Well, I'm suspect what is happening is that there are conflicting TLB's hanging around, one for a cont range that is overlapping a stale non cont one. This sort of implies that this has been happening all along, AKA RO regions were being "lazy" activated if you will. Its only on a core that aborts when it detects that (which i assume requires differing size entries for this core) does it cause problems. The break-before-make issue, seems like it won't cause a big problem here as long as there is some way to assure valid TLBs before the update, and then assure they are cleared following it. Hence the overly aggressive change works because it flushes following every cont block update. Which would bother me more if the code were run more than once per boot (or in the future per module load/unload if someone gets around to updating the no execute reliably).
> >We also need to figure out what's happening with the code as it is. > > Well, I'm suspect what is happening is that there are conflicting > TLB's hanging around, one for a cont range that is overlapping a stale > non cont one. This sort of implies that this has been happening all > along, AKA RO regions were being "lazy" activated if you will. Its > only on a core that aborts when it detects that (which i assume > requires differing size entries for this core) does it cause problems. I suspect that we may have believed that the TLB maintenance at the end of paging_init was sufficient, as we evidently id not consider TLB conflicts. > The break-before-make issue, seems like it won't cause a big problem > here as long as there is some way to assure valid TLBs before the > update, and then assure they are cleared following it. If at any instant in time you have a valid TLB entry for an address, and the page tables hold a value that would give rise to a conflicting TLB entry, you can encounter a TLB conflict abort. It's a race against the hardware. > Hence the overly aggressive change works because it > flushes following every cont block update. Which would bother me > more if the code were run more than once per boot (or in the future > per module load/unload if someone gets around to updating the no > execute reliably). You're racing against other parts of the CPU (the page table walker(s), I-caches, etc). The flushing only minimises the window for a race, and does not prevent the race from being possible. Given that the envelope is constantly pushing forward w.r.t. how aggressive CPUs may be in this area, we need to fix the issue by reasoning against what the architecture guarantees us. We're almost certainly going to have to revisit this code in future, and sprinkling TLB maintenance over it will only make it harder to reason about in future. Mark.
On 11/18/2015 12:04 PM, Mark Rutland wrote: > You're racing against other parts of the CPU (the page table walker(s), > I-caches, etc). The flushing only minimises the window for a race, and > does not prevent the race from being possible. > > Given that the envelope is constantly pushing forward w.r.t. how > aggressive CPUs may be in this area, we need to fix the issue by > reasoning against what the architecture guarantees us. Its also not suppose to fault on speculative access, and to me that means page table walks/etc that are the result of speculative access. Which AFAIK, closes the window significantly. I would only really worry about interrupt activity, and updates to the memory containing the PTE's themselves. Either way the simple change (rather than rewriting the whole code path) is probably to flag the fault handler to simply resume from these kinds of faults during create_mapping_late(). But that isn't what is happening here AFAIK, the faults are long after the PTE's have been updated, and are the result of failure to flush the TLB..
On Wed, Nov 18, 2015 at 01:31:18PM -0600, Jeremy Linton wrote: > On 11/18/2015 12:04 PM, Mark Rutland wrote: > > >You're racing against other parts of the CPU (the page table walker(s), > >I-caches, etc). The flushing only minimises the window for a race, and > >does not prevent the race from being possible. > > > >Given that the envelope is constantly pushing forward w.r.t. how > >aggressive CPUs may be in this area, we need to fix the issue by > >reasoning against what the architecture guarantees us. > Its also not suppose to fault on speculative access, and to me that > means page table walks/etc that are the result of speculative > access. I was under the impression that TLB conflict abort could be delivered for asynchronous events (e.g. speculative I-cache fetches rather than for speculative execution of already fetched instructions). Having looked at the ARM ARM, I appear to have been mistaken. As you say, it appears that TLB conflict aborts are always delivered synchronously. > Which AFAIK, closes the window significantly. I would only > really worry about interrupt activity, and updates to the memory > containing the PTE's themselves. Either way the simple change > (rather than rewriting the whole code path) is probably to flag the > fault handler to simply resume from these kinds of faults during > create_mapping_late(). Unfortunately that may not be sufficient. The conflicting address range might cover the current stack or the text of the exception handler, and in those cases trying to handle the exception would result in taking another TLB conflict abort recursively. > But that isn't what is happening here AFAIK, the faults are long > after the PTE's have been updated, and are the result of failure to > flush the TLB.. It's not quite the same case, certainly. It's also possible that the faults you are seeing see are also possible earlier, but simply less likely, and that we get away without seeing the other potential issues because of things that may change (i.e. the way the compiler lays out the text). I think that if we need to do something more drastic to account for the other issues above (e.g. by ensuring that we can never allocate conflicting TLB entries in the first place), and that said strategy would also fix this problem, that would be preferable, given that we're going to have to do that eventually anyway. Mark.
On Thu, Nov 19, 2015 at 11:31:34AM +0000, Mark Rutland wrote: > On Wed, Nov 18, 2015 at 01:31:18PM -0600, Jeremy Linton wrote: > > On 11/18/2015 12:04 PM, Mark Rutland wrote: > > > > >You're racing against other parts of the CPU (the page table walker(s), > > >I-caches, etc). The flushing only minimises the window for a race, and > > >does not prevent the race from being possible. > > > > > >Given that the envelope is constantly pushing forward w.r.t. how > > >aggressive CPUs may be in this area, we need to fix the issue by > > >reasoning against what the architecture guarantees us. > > Its also not suppose to fault on speculative access, and to me that > > means page table walks/etc that are the result of speculative > > access. > > I was under the impression that TLB conflict abort could be delivered > for asynchronous events (e.g. speculative I-cache fetches rather than > for speculative execution of already fetched instructions). > > Having looked at the ARM ARM, I appear to have been mistaken. As you > say, it appears that TLB conflict aborts are always delivered > synchronously. > > > Which AFAIK, closes the window significantly. I would only > > really worry about interrupt activity, and updates to the memory > > containing the PTE's themselves. Either way the simple change > > (rather than rewriting the whole code path) is probably to flag the > > fault handler to simply resume from these kinds of faults during > > create_mapping_late(). > > But that isn't what is happening here AFAIK, the faults are long > > after the PTE's have been updated, and are the result of failure to > > flush the TLB.. > I think that if we need to do something more drastic to account for the > other issues above (e.g. by ensuring that we can never allocate > conflicting TLB entries in the first place), and that said strategy > would also fix this problem, that would be preferable, given that we're > going to have to do that eventually anyway. Having looked into this further, we also have the same issue with the kasan init code. I believe that the issue is restricted to one-off init code, as I don't think that we do anything at runtime which would be problematic. If anyone knows of a counter-example, please let me know! Given that, we can restrict the problem to an early UP environment, and it won't matter if therre's some large(ish) fixed cost associated with updating the kernel page tables. I think that we can avoid the issue entirely by modifying a copy of the kernel page tables, which we can later install via some idmap code (going via a reserved table to clear the TLBs). I'm working on patches to implement the above, which I'll try to get somewhere with next week. Thanks, Mark
On Thu, Nov 19, 2015 at 11:31:34AM +0000, Mark Rutland wrote: > On Wed, Nov 18, 2015 at 01:31:18PM -0600, Jeremy Linton wrote: > > On 11/18/2015 12:04 PM, Mark Rutland wrote: > > > > >You're racing against other parts of the CPU (the page table walker(s), > > >I-caches, etc). The flushing only minimises the window for a race, and > > >does not prevent the race from being possible. > > > > > >Given that the envelope is constantly pushing forward w.r.t. how > > >aggressive CPUs may be in this area, we need to fix the issue by > > >reasoning against what the architecture guarantees us. > > Its also not suppose to fault on speculative access, and to me that > > means page table walks/etc that are the result of speculative > > access. > > I was under the impression that TLB conflict abort could be delivered > for asynchronous events (e.g. speculative I-cache fetches rather than > for speculative execution of already fetched instructions). > > Having looked at the ARM ARM, I appear to have been mistaken. As you > say, it appears that TLB conflict aborts are always delivered > synchronously. Having invesitgated further, while we may not encounter (synchronous) TLB conflict aborts, we may still encounter (asynchronous) issues from conflicting TLB entries. Per the ARM ARM, if the TLB contains multiple entries for the same address, the result of a translation may be some amalgamation of said entries (where the amalgamation could be an arbitrary function of all of said matching entries). Thus page table walks and *-cache fetches may use completely erroneous addresses and/or attributes, asynchronous to the instruction stream, and as a result of this may change the state of MMIO peripherals, trigger SError, etc. This is a much scarier proposition than the TLB conflict aborts. Thanks, Mark.
On Fri, Nov 20, 2015 at 07:52:44PM +0000, Mark Rutland wrote: > On Thu, Nov 19, 2015 at 11:31:34AM +0000, Mark Rutland wrote: > > I think that if we need to do something more drastic to account for the > > other issues above (e.g. by ensuring that we can never allocate > > conflicting TLB entries in the first place), and that said strategy > > would also fix this problem, that would be preferable, given that we're > > going to have to do that eventually anyway. > > Having looked into this further, we also have the same issue with the > kasan init code. I don't think the kasan_init() problem is that bad. We are preserving the same size mappings (PAGE_SIZE) and just changing the physical address they point at without a break-before-make (just a TTBR1 switch). I don't know how clear the ARM ARM is around this but at least so far we haven't hit any problems. The problem with the contiguous bit is that we switch from e.g. a 4KB mapping to a 64KB one and it's very likely that we would get a TLB conflict. With CONFIG_DEBUG_RODATA, we go from bigger block to a smaller one, so less chance of a TLB conflict but still present. I need to read the ARM ARM some more in this area (and maybe ask for clarification). > I believe that the issue is restricted to one-off init code, as I don't > think that we do anything at runtime which would be problematic. If > anyone knows of a counter-example, please let me know! > > Given that, we can restrict the problem to an early UP environment, and > it won't matter if therre's some large(ish) fixed cost associated with > updating the kernel page tables. I think that we can avoid the issue > entirely by modifying a copy of the kernel page tables, which we can > later install via some idmap code (going via a reserved table to clear > the TLBs). > > I'm working on patches to implement the above, which I'll try to get > somewhere with next week. That's a complete fix indeed but it would require some more testing and I don't think it's feasible for 4.4-rc. In the meantime, I propose that we revert the contiguous PTE patches and push them again once we fix the TLB conflict problems.
On Mon, Nov 23, 2015 at 12:15:15PM +0000, Catalin Marinas wrote: > On Fri, Nov 20, 2015 at 07:52:44PM +0000, Mark Rutland wrote: > > On Thu, Nov 19, 2015 at 11:31:34AM +0000, Mark Rutland wrote: > > > I think that if we need to do something more drastic to account for the > > > other issues above (e.g. by ensuring that we can never allocate > > > conflicting TLB entries in the first place), and that said strategy > > > would also fix this problem, that would be preferable, given that we're > > > going to have to do that eventually anyway. > > > > Having looked into this further, we also have the same issue with the > > kasan init code. > > I don't think the kasan_init() problem is that bad. We are preserving > the same size mappings (PAGE_SIZE) and just changing the physical > address they point at without a break-before-make (just a TTBR1 switch). Per the ARM ARM, "CONSTRAINED UNPREDICTABLE behaviors due to caching of control or data values", the result of a translation could be "an amalgamation" of the values. I believe that we have to read "amalgamation" as "arbitrary function of" here. I don't think that we're safe because we only changed the output addresses of entries. > I don't know how clear the ARM ARM is around this but at least so far we > haven't hit any problems. I assume you're talking generally here, rather than specifically about kasan. I agree that we haven't spotted any issues so far. Given that kasan itself is new and requires a relatively new compiler, it may not yet have been tested on a platform where it would fail on. Jeremy, for reference, have you tried kasan on m400? Or DEBUG_RODATA? > The problem with the contiguous bit is that we switch from e.g. a 4KB > mapping to a 64KB one and it's very likely that we would get a TLB > conflict. > > With CONFIG_DEBUG_RODATA, we go from bigger block to a smaller one, so > less chance of a TLB conflict but still present. I need to read the ARM > ARM some more in this area (and maybe ask for clarification). We should certainly try to get clarification here. > > I believe that the issue is restricted to one-off init code, as I don't > > think that we do anything at runtime which would be problematic. If > > anyone knows of a counter-example, please let me know! > > > > Given that, we can restrict the problem to an early UP environment, and > > it won't matter if therre's some large(ish) fixed cost associated with > > updating the kernel page tables. I think that we can avoid the issue > > entirely by modifying a copy of the kernel page tables, which we can > > later install via some idmap code (going via a reserved table to clear > > the TLBs). > > > > I'm working on patches to implement the above, which I'll try to get > > somewhere with next week. > > That's a complete fix indeed but it would require some more testing and > I don't think it's feasible for 4.4-rc. In the meantime, I propose that > we revert the contiguous PTE patches and push them again once we fix the > TLB conflict problems. I agree that this would be too late for v4.4-rc*. In the meantime, I guess that reverting the patches is the best thing to do given we're already at rc2. Thanks, Mark.
On 11/23/2015 06:15 AM, Catalin Marinas wrote: > That's a complete fix indeed but it would require some more testing and > I don't think it's feasible for 4.4-rc. In the meantime, I propose that > we revert the contiguous PTE patches and push them again once we fix the > TLB conflict problems. The problem as you say exists with the block mappings as well, and the explicit TLB flush there is apparently sufficient to keep this problem from occurring. The 1 line explicit flush is sufficient to keep it from happening in the late mapping case as well, and actually fixes a problem that was in the kernel from before these patches (the complete lack of flush in paths calling create_mapping_late()). So, I fail to see how reverting the patch fixes anything, it only returns to the previously (now known) to be bad code path. BTW: The M400, has other issues that keep it from booting with ACPI firmware (the 4.3 EOI changes irritate a 'bug' in the MADT table).
On 11/23/2015 07:49 AM, Mark Rutland wrote:
> Jeremy, for reference, have you tried kasan on m400? Or DEBUG_RODATA?
No, because the machine has a list of issues that keep it from booting a
mainline kernel in a functional state. Once those are cleared up I will
revisit this patch.
The goal was to create a conceptually safe fix for the problem, which
isn't all this hypothetical stuff being discussed, but the fact that the
TLBs are not being flushed properly (with or without the CONT bit stuff)
resulting in a tlb conflict fault long after this code path has finished
executing.
Jeremy, On Mon, Nov 23, 2015 at 08:48:40AM -0600, Jeremy Linton wrote: > On 11/23/2015 07:49 AM, Mark Rutland wrote: > >Jeremy, for reference, have you tried kasan on m400? Or DEBUG_RODATA? > > No, because the machine has a list of issues that keep it from booting a > mainline kernel in a functional state. Once those are cleared up I will > revisit this patch. > > The goal was to create a conceptually safe fix for the problem, which isn't > all this hypothetical stuff being discussed, but the fact that the TLBs are > not being flushed properly (with or without the CONT bit stuff) resulting in > a tlb conflict fault long after this code path has finished executing. I appreciate that you just want to get something working, and we've established that a TLBIALL probably does solve the problem, however there's more to it than that. If we start adding TLBI/DSB/ISB/NOP/cache flush/read from config space type operations in arbitrary places, we run a very real risk of masking future bugs. Then, when the original problem that prompted the temporary bodge is fixed properly, we uncover a whole raft of problems that we didn't even realise were there. Consequently, we get stuck with the bodge forever and, crucially, we lose the ability to reason about the state of the CPU under Linux. That reasoning is incredibly useful when developing new architectural features, debugging, profiling or assessing whether or not Linux is susceptible to hardware errata and is precisely why the "hypothetical stuff" matters. For these reasons, we have no viable option other than reverting the offending series until the underlying problem is fixed properly. With any luck, that's the 4.5 timeframe so we really only lose a single release (providing that you have time to rebase and repost). Sorry about this; I hope it doesn't dissuade you from reporting bugs in the future. Will
On 11/23/2015 09:41 AM, Will Deacon wrote: > For these reasons, we have no viable option other than reverting the > offending series until the underlying problem is fixed properly. With > any luck, that's the 4.5 timeframe so we really only lose a single > release (providing that you have time to rebase and repost). That is fine but, I think you need to take this fix anyway (or something similar), because there _IS_ a TLB flush bug in that code path. I saw it when I wrote the original path, but discounted it assuming that the original authors had more insight on the hardware than I did. Whether the fix takes the suggest form or another, the cont bit changes are only irritating an existing bug, not causing it.
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index e3f563c..e92fe77 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -121,6 +121,7 @@ static void __populate_init_pte(pte_t *pte, unsigned long addr, pfn++; addr += PAGE_SIZE; } while (addr != end); + flush_tlb_all(); } static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
The HP m400 fails to boot the linux 4.4rc1 kernel. It usually hangs or sometimes takes an unhanded exception around the DMA zone messages. This was bisected to the new CONT PTE changes. Adding an extra flush_tlb_all() in the code path which is changing the kernel permissions allows the machine to boot consistently. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- arch/arm64/mm/mmu.c | 1 + 1 file changed, 1 insertion(+)