From patchwork Mon Dec 30 09:38:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 11312805 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE407930 for ; Mon, 30 Dec 2019 09:38:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7375F207FF for ; Mon, 30 Dec 2019 09:38:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="Ygv8JhuI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7375F207FF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8C3088E0005; Mon, 30 Dec 2019 04:38:36 -0500 (EST) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 874338E0003; Mon, 30 Dec 2019 04:38:36 -0500 (EST) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 763EF8E0005; Mon, 30 Dec 2019 04:38:36 -0500 (EST) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0244.hostedemail.com [216.40.44.244]) by kanga.kvack.org (Postfix) with ESMTP id 6217E8E0003 for ; Mon, 30 Dec 2019 04:38:36 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id EB0735DD1 for ; Mon, 30 Dec 2019 09:38:35 +0000 (UTC) X-FDA: 76321307790.12.gun80_6816e3d4ca525 X-Spam-Summary: 2,0,0,5a0f132d0c34b5c4,d41d8cd98f00b204,kirill@shutemov.name,:akpm@linux-foundation.org:dan.j.williams@intel.com:mhocko@suse.com:vbabka@suse.cz:mgorman@suse.de:zhi.jin@intel.com::linux-kernel@vger.kernel.org:kirill.shutemov@linux.intel.com,RULES_HIT:41:355:379:541:960:966:973:988:989:1260:1311:1314:1345:1437:1515:1535:1543:1711:1730:1747:1777:1792:1801:2196:2199:2393:2559:2562:2693:2898:3138:3139:3140:3141:3142:3308:3354:3369:3865:3867:3868:3870:3871:3872:3874:4117:4250:4321:4385:4605:5007:6117:6119:6261:6653:7875:7903:8603:8957:9901:10004:11026:11232:11473:11658:11914:12043:12291:12295:12296:12297:12438:12517:12519:12555:12895:12986:13161:13229:13894:14096:14181:14394:14721:21063:21080:21433:21444:21451:21627:21740:21990:30054:30070,0,RBL:209.85.167.66:@shutemov.name:.lbl8.mailshell.net-62.14.0.100 66.201.201.201,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fn,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:27,LUA_SUMMARY:none X-HE-Tag: gun80_6816e3d4ca525 X-Filterd-Recvd-Size: 6819 Received: from mail-lf1-f66.google.com (mail-lf1-f66.google.com [209.85.167.66]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Dec 2019 09:38:35 +0000 (UTC) Received: by mail-lf1-f66.google.com with SMTP id i23so24791188lfo.7 for ; Mon, 30 Dec 2019 01:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Z0vdDOcQ8v/e5BOs7LEyb3P17yQ8hnJKwJziys7L6rQ=; b=Ygv8JhuINbWLBhzkd3IlrKctEf1mAMdzzYF3we4UVT1cSJi50jRZ4MfyCoCfdKhgee fEM5wKIN18hFiBSGoP3wNRwOsDfAyaOAGCfXPBCdr1u7aYGb6xroGI43ceDtO6bWdq9r Og4U87PV4DBhC+OxqlCpRHDtJBdLfcyy+NJv1vZKw4HmOh4vj1x9AgQZJPEV9zRpmY+L OAGkZdLmUhtdGh02PJKK1TWGgKEKU9gllNxDgjMDMEhsxA5ft8OD69GdhjBLFRxaefnL xOR3w8Ggv5HNgBB6AkBi0jPbFeiIbf2FB4UH1kHJw37THervQ6/LjP6X2vJcKHfuCA+4 5pgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Z0vdDOcQ8v/e5BOs7LEyb3P17yQ8hnJKwJziys7L6rQ=; b=hzZbepAbem2KccX1IHCfhq1lhEmnyIiTMIhZQG7j6n75zcesjXAUlBXiXk/U1Ime3s XPGpRH5WeAQ9WeXXCYp/dA1Rao6pMIRJttZ27qYQYy/kHeXStArmjnHU4qq+swCldpYQ ypJp7pxDLUS9/EDLP29aVp52/39+nL6TyQl+huagGLCbYrcneP9laNOqJbab8vRmbp/w yQCIZU+eRsQgL5R/Pd7th2HgvfB25ieRscLPPqKHiuyIgnVC9Gh7lyqYpaqUbb9tMNAk Q+yXf/hc7oAM3RG611f+AhnkGRGh8sgy93C1H4hDVRcfzCohE48xWXD8oI3qsFz+98oH LNIw== X-Gm-Message-State: APjAAAUWOZGDo3l7BSZ0JEaoa4LLVehjEyQwSUQbiz3khPf7xxizR3Vu WAEuLdY3e9yHPH5rT6JToSuTBQ== X-Google-Smtp-Source: APXvYqwvy+jwnOiLX19xtMYVXRbH7ItJEWyLKetcakwpxsGTkfxA+b08EXSS6xLAhjnIwSUmyUvZ/Q== X-Received: by 2002:a19:f716:: with SMTP id z22mr39561834lfe.14.1577698713777; Mon, 30 Dec 2019 01:38:33 -0800 (PST) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id e8sm20764094ljb.45.2019.12.30.01.38.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Dec 2019 01:38:33 -0800 (PST) From: "Kirill A. Shutemov" X-Google-Original-From: "Kirill A. Shutemov" Received: by box.localdomain (Postfix, from userid 1000) id BD47610152C; Mon, 30 Dec 2019 12:38:34 +0300 (+03) To: Andrew Morton Cc: Dan Williams , Michal Hocko , Vlastimil Babka , Mel Gorman , "Jin, Zhi" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCH] mm/page_alloc: Skip non present sections on zone initialization Date: Mon, 30 Dec 2019 12:38:28 +0300 Message-Id: <20191230093828.24613-1-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: memmap_init_zone() can be called on the ranges with holes during the boot. It will skip any non-valid PFNs one-by-one. It works fine as long as holes are not too big. But huge holes in the memory map causes a problem. It takes over 20 seconds to walk 32TiB hole. x86-64 with 5-level paging allows for much larger holes in the memory map which would practically hang the system. Deferred struct page init doesn't help here. It only works on the present ranges. Skipping non-present sections would fix the issue. Signed-off-by: Kirill A. Shutemov Reviewed-by: Baoquan He Acked-by: Michal Hocko --- The situation can be emulated using the following QEMU patch: diff --git a/hw/i386/pc.c b/hw/i386/pc.c index ac08e6360437..f5f2258092e1 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1159,13 +1159,14 @@ void pc_memory_init(PCMachineState *pcms, memory_region_add_subregion(system_memory, 0, ram_below_4g); e820_add_entry(0, x86ms->below_4g_mem_size, E820_RAM); if (x86ms->above_4g_mem_size > 0) { + int shift = 45; ram_above_4g = g_malloc(sizeof(*ram_above_4g)); memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram, x86ms->below_4g_mem_size, x86ms->above_4g_mem_size); - memory_region_add_subregion(system_memory, 0x100000000ULL, + memory_region_add_subregion(system_memory, 1ULL << shift, ram_above_4g); - e820_add_entry(0x100000000ULL, x86ms->above_4g_mem_size, E820_RAM); + e820_add_entry(1ULL << shift, x86ms->above_4g_mem_size, E820_RAM); } if (!pcmc->has_reserved_memory && diff --git a/target/i386/cpu.h b/target/i386/cpu.h index cde2a16b941a..694c26947bf6 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -1928,7 +1928,7 @@ uint64_t cpu_get_tsc(CPUX86State *env); /* XXX: This value should match the one returned by CPUID * and in exec.c */ # if defined(TARGET_X86_64) -# define TCG_PHYS_ADDR_BITS 40 +# define TCG_PHYS_ADDR_BITS 52 # else # define TCG_PHYS_ADDR_BITS 36 # endif --- mm/page_alloc.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index df62a49cd09e..442dc0244bb4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5873,6 +5873,30 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) return false; } +#ifdef CONFIG_SPARSEMEM +/* Skip PFNs that belong to non-present sections */ +static inline __meminit unsigned long next_pfn(unsigned long pfn) +{ + unsigned long section_nr; + + section_nr = pfn_to_section_nr(++pfn); + if (present_section_nr(section_nr)) + return pfn; + + while (++section_nr <= __highest_present_section_nr) { + if (present_section_nr(section_nr)) + return section_nr_to_pfn(section_nr); + } + + return -1; +} +#else +static inline __meminit unsigned long next_pfn(unsigned long pfn) +{ + return pfn++; +} +#endif + /* * Initially all pages are reserved - free ones are freed * up by memblock_free_all() once the early boot process is @@ -5912,8 +5936,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * function. They do not exist on hotplugged memory. */ if (context == MEMMAP_EARLY) { - if (!early_pfn_valid(pfn)) + if (!early_pfn_valid(pfn)) { + pfn = next_pfn(pfn) - 1; continue; + } if (!early_pfn_in_nid(pfn, nid)) continue; if (overlap_memmap_init(zone, &pfn))