From patchwork Wed May 11 08:08:58 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Stefan Bader X-Patchwork-Id: 9065721 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id C846EBF29F for ; Wed, 11 May 2016 08:11:54 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id A308420142 for ; Wed, 11 May 2016 08:11:53 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7C825200E5 for ; Wed, 11 May 2016 08:11:52 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b0PCv-0006Uq-5Z; Wed, 11 May 2016 08:09:25 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1b0PCt-0006UE-3X for xen-devel@lists.xenproject.org; Wed, 11 May 2016 08:09:23 +0000 Received: from [85.158.139.211] by server-3.bemta-5.messagelabs.com id 2E/BD-29997-2B8E2375; Wed, 11 May 2016 08:09:22 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupkleJIrShJLcpLzFFi42KJ3htZoLvxhVG 4wZFOQYvvWyYzOTB6HP5whSWAMYo1My8pvyKBNaPpyCzmgvOOFX8ntzE1MH4172Lk4hAS+MIo cWXbAdYuRk4OYYFcie/TZrCA2CICmRL7Gr4wgdhCAhkSlztugMXZBPQldi1fzA5iSwjISfR2T 2IBGcQs0M0o8XThZqAGDg5eAR2JeS0mIDUsAqoS9yZ+AAuLCoRLrNnuDhLmFRCUODnzCdhITg FdiaXdu9ggxrQzSszYe4t5AiPvLCR1s5DlZgHNYhbwl2g+aQ9SwyygLvFn3iVmCFtbYtnC11B 2ksSJ9gmMmOJ1Ehu3v4OK20qsW/eeBcLWk3j7uxfKtpRYv+MT+wJG7lWM6sWpRWWpRbomeklF mekZJbmJmTm6hgamermpxcWJ6ak5iUnFesn5uZsYgVHBAAQ7GG/1OR9ilORgUhLlLWs3ChfiS 8pPqcxILM6ILyrNSS0+xKjBwSGwee3qC4xSLHn5ealKEryswOgTEixKTU+tSMvMAcYtTKkEB4 +SCO+d50Bp3uKCxNzizHSI1ClGXY4tC26sZRICmyElznsIpEgApCijNA9uBCyFXGKUlRLmZQQ 6UIinILUoN7MEVf4VozgHo5Iw7z6QKTyZeSVwm14BHcEEdET1dbAjShIRUlINjGY6C6K/OK97 9aRw4/8rx1nKVKZnv3Q2uLtjk5Bea/va+3cXzZp2syRwVr34r+1Rmms/ReyTmu2k1Gt0pqP9b mzC9OT3XfNaUpse7yuZkKxgrz2Ts+7BVmebkpLI8Mr8wwnxjwxlzrim139rK/ieazPlwrGuha ycAR72cjO8nyZUH38238hEUImlOCPRUIu5qDgRANbXpI8cAwAA X-Env-Sender: stefan.bader@canonical.com X-Msg-Ref: server-5.tower-206.messagelabs.com!1462954160!38988625!1 X-Originating-IP: [91.189.89.112] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 8.34; banners=-,-,- X-VirusChecked: Checked Received: (qmail 7867 invoked from network); 11 May 2016 08:09:21 -0000 Received: from youngberry.canonical.com (HELO youngberry.canonical.com) (91.189.89.112) by server-5.tower-206.messagelabs.com with AES256-SHA encrypted SMTP; 11 May 2016 08:09:21 -0000 Received: from 1.general.smb.us.vpn ([10.172.65.28]) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1b0PCe-00043O-KJ; Wed, 11 May 2016 08:09:08 +0000 To: xen-devel , Linux Kernel Mailing List References: <57273050.6060300@canonical.com> <57273CDE.10300@suse.com> <5727632C.1020209@canonical.com> From: Stefan Bader X-Enigmail-Draft-Status: N1110 Message-ID: <5732E89A.1040501@canonical.com> Date: Wed, 11 May 2016 10:08:58 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <5727632C.1020209@canonical.com> Cc: Juergen Gross , Nathan Zimmer , David Vrabel , Mel Gorman Subject: Re: [Xen-devel] bad page flags booting 32bit dom0 on 64bit hypervisor using dom0_mem (kernel >=4.2) X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 02.05.2016 16:24, Stefan Bader wrote: > On 02.05.2016 13:41, Juergen Gross wrote: >> On 02/05/16 12:47, Stefan Bader wrote: >>> I recently tried to boot 32bit dom0 on 64bit Xen host which I configured to run >>> with a limited, fix amount of memory for dom0. It seems that somewhere between >>> kernel versions 3.19 and 4.2 (sorry that is still a wide range) the Linux kernel >>> would report bad page flags for a range of pages (which seem to be around the >>> end of the guest pfn range). For a 4.2 kernel that was easily missed as the boot >>> finished ok and dom0 was accessible. However starting with 4.4 (tested 4.5 and a >>> 4.6-rc) the serial console output freezes after some of those bad page flag >>> messages and then (unfortunately without any further helpful output) the host >>> reboots (I assume there is a panic that triggers a reset). >>> >>> I suspect the problem is more a kernel side one. It is just possible to >>> influence things by variation of dom0_mem=#,max:#. 512M seems ok, 1024M, 2048M, >>> and 3072M cause bad page flags starting around kernel 4.2 and reboots around >>> 4.4. Then 4096M and not clamping dom0 memory seem to be ok again (though not >>> limiting dom0 memory seems to cause trouble on 32bit dom0 later when a domU >>> tries to balloon memory, but I think that is a different problem). >>> >>> I have not seen this on a 64bit dom0. Below is an example of those bad page >>> errors. Somehow it looks to be a page marked as reserved. Initially I wondered >>> whether this could be a problem of not clearing page flags when moving mappings >>> to match the e820. But I never looked into i386 memory setup in that detail. So >>> I am posting this, hoping that someone may have an idea from the detail about >>> where to look next. PAE is enabled there. Usually its bpf init that gets hit but >>> that likely is just because that is doing the first vmallocs. >> >> Could you please post the kernel config, Xen and dom0 boot parameters? >> I'm quite sure this is no common problem as there are standard tests >> running for each kernel version including 32 bit dom0 with limited >> memory size. > > Hi Jürgen, > > sure. Though by doing that I realized where I actually messed the whole thing > up. I got the max limit syntax completely wrong. :( Instead of the correct > "dom0_mem=1024M,max:1024M" I am using "dom0_mem=1024M:max=1024M" which I guess > is like not having max set at all. Not sure whether that is a valid use case. > > When I actually do the dom0_mem argument right, there are no bad page flag > errors even in 4.4 with 1024M limit. I was at least consistent in my > mis-configuration, so doing the same stupid thing on 64bit seems to be handled > more gracefully. > > Likely false alarm. But at least cut&pasting the config into mail made me spot > the problem... > Ok, thinking that "dom0_mem=x" (without a max or min) still is a valid case, I went ahead and did a bisect for when the bad page flag issue started. I ended up at: 92923ca "mm: meminit: only set page reserved in the memblock region" And with a few more printks in the new functions I finally realized why this goes wrong. The new reserve_bootmem_region is using unsigned long for start and end addresses which just isn't working too well for 32bit. For Xen dom0 the problem with that can just be more easily triggered. When dom0 memory is limited to a small size but allowed to balloon for more, the additional system memory is put into reserved regions. In my case a host with 8G memory and say 1G initial dom0 memory this created (apart from other) one reserved region which started at 4GB and covered the remaining 4G of host memory. Which reserve_bootmem_region() got as 0-4G due to the unsigned long conversion. This basically marked *all* memory below 4G as reserved. The fix is relatively simple, just use phys_addr_t for start and end. I tested this on 4.2 and 4.4 kernels. Both now boot without errors and neither does the 4.4 kernel crash. Maybe still not 100% safe when running on very large memory systems (if I did not get the math wrong 16T) but at least some improvement... -Stefan From 1588a8b3983f63f8e690b91e99fe631902e38805 Mon Sep 17 00:00:00 2001 From: Stefan Bader Date: Tue, 10 May 2016 19:05:16 +0200 Subject: [PATCH] mm: Use phys_addr_t for reserve_bootmem_region arguments Since 92923ca the reserved bit is set on reserved memblock regions. However start and end address are passed as unsigned long. This is only 32bit on i386, so it can end up marking the wrong pages reserved for ranges at 4GB and above. This was observed on a 32bit Xen dom0 which was booted with initial memory set to a value below 4G but allowing to balloon in memory (dom0_mem=1024M for example). This would define a reserved bootmem region for the additional memory (for example on a 8GB system there was a reverved region covering the 4GB-8GB range). But since the addresses were passed on as unsigned long, this was actually marking all pages from 0 to 4GB as reserved. Fixes: 92923ca "mm: meminit: only set page reserved in the memblock region" Signed-off-by: Stefan Bader Cc: # 4.2+ --- include/linux/mm.h | 2 +- mm/page_alloc.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b56ff72..4c1ff62 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1715,7 +1715,7 @@ extern void free_highmem_page(struct page *page); extern void adjust_managed_page_count(struct page *page, long count); extern void mem_init_print_info(const char *str); -extern void reserve_bootmem_region(unsigned long start, unsigned long end); +extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c69531a..eb66f89 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -951,7 +951,7 @@ static inline void init_reserved_page(unsigned long pfn) * marks the pages PageReserved. The remaining valid pages are later * sent to the buddy page allocator. */ -void __meminit reserve_bootmem_region(unsigned long start, unsigned long end) +void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) { unsigned long start_pfn = PFN_DOWN(start); unsigned long end_pfn = PFN_UP(end); -- 1.9.1