From patchwork Tue Jan 5 19:59:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 7959481 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 4036CBEEE5 for ; Tue, 5 Jan 2016 20:01:02 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 56E4C20220 for ; Tue, 5 Jan 2016 20:01:01 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F321B2021F for ; Tue, 5 Jan 2016 20:00:59 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1aGXlO-00077U-7e; Tue, 05 Jan 2016 19:59:26 +0000 Received: from mail-io0-x231.google.com ([2607:f8b0:4001:c06::231]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1aGXlK-00074y-T5 for linux-arm-kernel@lists.infradead.org; Tue, 05 Jan 2016 19:59:24 +0000 Received: by mail-io0-x231.google.com with SMTP id 1so150865558ion.1 for ; Tue, 05 Jan 2016 11:59:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EDrKqaTUnage5cWpvXp0DxpQ3wtifAJqqp/WiwJC/tk=; b=iWyNPmoZQduAtDsCorYUYyOmB7elKJW0kVz9RMRNReapGHVVhTLgJBV7sD4WsBRL30 7NcGWzt+P+L8WaQ8/+U9vqF0GVRGAEgjN5bW6h8Dco7oHnKmLBij6V+qcS1tUxrhLnNN lCswYHgVjL+GE1zFM6dMNnFe05ch19RV8D3g4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=EDrKqaTUnage5cWpvXp0DxpQ3wtifAJqqp/WiwJC/tk=; b=cyJmkSSS/lHKEHJNUTSt9oun+7ao28zPgD6JXAACvv64YfDJjIfF/93seiw28ZkoJ/ pcatRD/WpEtzLFyn/2q33UuiZVOvvzl6Lhh7BOHKGTM9sB3pAiSTftQihnPSW56XGg42 9t9UuBvkETk6WfzatxII5lqzZSe1u5KR4pR8xnxJUW4F62mfIFswFUQAPKUEY6cHNOTv GapJecnLt2D62sCgqXnS/FJ3tNu8ptieKvVlVWKKP31ZpN9IZF5/h35LbAIrd23mbkCq oAqc+GtBt/EboGLUqEBMCumpB8hhzz722hNO+hezll0PhTgITHp7832BXQn5pJDYsu7n QR0A== X-Gm-Message-State: ALoCoQmVQiunbWHsS0opyyU8kczyQG5Y63oKn9hWI4s3TEVJQ2G2c+NHAB3WJvDIHeI+XXDGaJkbkusBbXzNV1jijGkIOpCMLMBxNBYtCYoQSTUwwbacWFI= MIME-Version: 1.0 X-Received: by 10.107.14.72 with SMTP id 69mr53112066ioo.145.1452023941538; Tue, 05 Jan 2016 11:59:01 -0800 (PST) Received: by 10.36.159.67 with HTTP; Tue, 5 Jan 2016 11:59:01 -0800 (PST) In-Reply-To: <568BB55F.2020709@arm.com> References: <20160104224233.GU16023@sirena.org.uk> <20160104150946.373ed02b8e8b81221340b7c8@linux-foundation.org> <20160104235512.GW16023@sirena.org.uk> <20160104163528.be56a4b1.akpm@linux-foundation.org> <20160105114549.GX16023@sirena.org.uk> <568BB55F.2020709@arm.com> Date: Tue, 5 Jan 2016 19:59:01 +0000 Message-ID: Subject: Re: Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" From: Steve Capper To: Sudeep Holla X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160105_115923_291170_1F850D50 X-CRM114-Status: GOOD ( 18.88 ) X-Spam-Score: -2.7 (--) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matt Fleming , Stephen Rothwell , Tony Luck , Russell King , Kernel Build Reports Mailman List , Mel Gorman , Kamezawa Hiroyuki , Tyler Baker , Dave Hansen , Kevin.Hilman@linaro.org, Mark Brown , linux-next@vger.kernel.org, Taku Izumi , Xishi Qiu , Andrew Morton , "linux-arm-kernel@lists.infradead.org" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 5 January 2016 at 12:21, Sudeep Holla wrote: > > > On 05/01/16 11:45, Mark Brown wrote: >> >> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote: >>> >>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown wrote: >>>> >>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote: >> >> >>>>> Thanks. That patch has rather a blooper if >>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing? >> >> >>>> Seems to be what's making a difference from a quick run through, yes. >> >> >>> OK, thanks. >> >> >> Seems like I was mistaken here somehow or there's some other problem - >> I've kicked off another bisect for today's -next: >> >> >> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console >> >> and will follow up with any results. >> > > With both patches applied(one already in today's -next), I am able to > boot on ARM64 platform but I get huge load(for each pfn) of below warning: > > -->8 > > BUG: Bad page state in process swapper pfn:900000 > page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0 > flags: 0x0() > page dumped because: nonzero mapcount > Modules linked in: > Hardware name: ARM Juno development board (r0) (DT) > Call trace: > [] dump_backtrace+0x0/0x180 > [] show_stack+0x14/0x20 > [] dump_stack+0x90/0xc8 > [] bad_page+0xd8/0x138 > [] free_pages_prepare+0x218/0x290 > [] __free_pages_ok+0x1c/0xb8 > [] __free_pages+0x30/0x50 > [] __free_pages_bootmem+0xa0/0xa8 > [] free_all_bootmem+0x11c/0x184 > [] mem_init+0x48/0x1b4 > [] start_kernel+0x224/0x3b4 > [<0000000080663000>] 0x80663000 > Disabling lock debugging due to kernel taint > > -- I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on an arm64 machine without errors with the following changes: @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size, pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); +#else + start_pfn = node_start_pfn; #endif calculate_node_totalpages(pgdat, start_pfn, end_pfn, zones_size, zholes_size); ===================================== My understanding is that 904769a ("mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()") inadvertently discards information when pgdat->node_start_pfn is removed from free_area_init_core (and zone_start_pfn is no longer updated by "size" in the loop inside free_area_init_core). This isn't an issue with systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as zone_start_pfn is set correctly. On systems without CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0. When I ported the above fix to linux-next (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM but not on my actual machine, I'll investigate that tomorrow. Cheers, --- Steve ===================================== diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a8bb70d..0edb608 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit zone_spanned_pages_in_node(int nid, unsigned long *zone_end_pfn, unsigned long *zones_size) { + unsigned int zone; + + *zone_start_pfn = node_start_pfn; + for (zone = 0; zone < zone_type; zone++) { + *zone_start_pfn += zones_size[zone]; + } + + *zone_end_pfn = *zone_start_pfn + zones_size[zone_type]; + return zones_size[zone_type]; }