Message ID | 20170913153242.GA11299@char.us.oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 09/13/2017 11:32 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Sep 12, 2017 at 09:19:23PM -0400, Boris Ostrovsky wrote: >> >> On 09/12/2017 08:01 PM, Konrad Rzeszutek Wilk wrote: >>> On Mon, Sep 11, 2017 at 08:45:02PM -0400, Boris Ostrovsky wrote: >>>> >>>> On 09/11/2017 07:55 PM, Konrad Rzeszutek Wilk wrote: >>>>> Hey, >>>>> >>>>> I've only been able to reproduce this on ARM64 (trying right now ARM32 >>>>> as well), and not on x86. >>>>> >>>>> If I compile Xen without CONFIG_SCRUB_DEBUG it works great. But if >>>>> enable it and try to load a livepatch it blows up in page_alloc.c:738 >>>>> >>>>> This is with origin/staging (d0291f3391) >>>> Can you still reproduce this if you revert 307c3be? >>> Sadly yes - it still crashes. I didn't capture the serial output. >>> >>> I honestly think the issue is that on ARM64 the "sleep" loop does not >>> wake up as often as on x86 (CC-ing Dariof who I believe observed this >>> with Credit2 and the wakeup.. something) - maybe he remembers the >>> details. Anyhow my theory is that the pages are not scrubbed at all >>> when they go in the idle loop as once it goes to sleep - it stays there. >> >> There is no (well, should not be) any timing dependencies in how/whether >> pages are scrubbed. If a page doesn't get scrubbed because someone didn't >> wake up then it should be scrubbed in alloc_heap_pages(). So in this case >> the page is thought to be clean (_PGC_need_scrub is not set), but it is not. >> >> Have you tried running a guest (or two), rebooting in a loop? > No. I just cold-booted it and tried to livepatch. >> Another thing to try is to set need_scrub to true in free_heap_pages(). > Magic! > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index dbad1e1ca0..9303eb4517 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1308,6 +1308,7 @@ static void free_heap_pages( > ASSERT(node >= 0); > > spin_lock(&heap_lock); > + need_scrub = true; > > for ( i = 0; i < (1 << order); i++ ) > { > > Fixes it ! :-) Well, that's not a fix. This eliminates the case that something in ARM-specific code (which I haven't tested) accidentally clears _PGC_need_scrub. OK, I think I know what the problem is. You are using CONFIG_SEPARATE_XENHEAP, are you? -boris
Hi, On 09/13/2017 07:05 PM, Boris Ostrovsky wrote: > On 09/13/2017 11:32 AM, Konrad Rzeszutek Wilk wrote: > Well, that's not a fix. This eliminates the case that something in > ARM-specific code (which I haven't tested) accidentally clears > _PGC_need_scrub. > > OK, I think I know what the problem is. You are using > CONFIG_SEPARATE_XENHEAP, are you? It seems the bug appear on Arm64, so CONFIG_SEPARATE_XENHEAP is not set. Note that Arm32 is using separate heap. Cheers,
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c index dbad1e1ca0..9303eb4517 100644 --- a/xen/common/page_alloc.c +++ b/xen/common/page_alloc.c @@ -1308,6 +1308,7 @@ static void free_heap_pages( ASSERT(node >= 0); spin_lock(&heap_lock); + need_scrub = true; for ( i = 0; i < (1 << order); i++ ) {