From patchwork Thu Apr 27 00:08:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Yznaga X-Patchwork-Id: 13225043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 909C2C77B60 for ; Thu, 27 Apr 2023 00:10:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C33DF6B0088; Wed, 26 Apr 2023 20:10:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6ED46B008A; Wed, 26 Apr 2023 20:10:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 971116B008C; Wed, 26 Apr 2023 20:10:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 84E976B0088 for ; Wed, 26 Apr 2023 20:10:01 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5F48E1C6B08 for ; Thu, 27 Apr 2023 00:10:01 +0000 (UTC) X-FDA: 80725238202.28.A44D034 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf04.hostedemail.com (Postfix) with ESMTP id 6F08E40003 for ; Thu, 27 Apr 2023 00:09:59 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-03-30 header.b=Zggh1E8c; spf=pass (imf04.hostedemail.com: domain of anthony.yznaga@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=anthony.yznaga@oracle.com; dmarc=pass (policy=none) header.from=oracle.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682554199; a=rsa-sha256; cv=none; b=5dSCzg4NQv2vhtqJhAEyWfKsKEsK2WaaN8vmKF/U31tAqFoDkbeuw3Nc3bUqwQcgYyOT/V G1pA3nEZkLAOcl9LEFwILEUdD4jl6CQx3z3HfGs0k1EQv3wkUc2UKiU9hK2YwE8KAiiKuK bUMWFNDwBZeA8mPCWW0vQPjm4WHghsQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-03-30 header.b=Zggh1E8c; spf=pass (imf04.hostedemail.com: domain of anthony.yznaga@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=anthony.yznaga@oracle.com; dmarc=pass (policy=none) header.from=oracle.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682554199; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=TFcTuHsIv/plzZmdlp1y8kigeEeyXo4KMEAxl9EnHkE=; b=HwiBaIri1UClqJe1LcH6cBpivOYux0BNy0FvL5/GQnk8uc+9DVzP31amjykQ44hrZnOxXM Hh6fJG5dACzValg1vjHWx5ehKNNapwJGZaCi8fS7u7luT5DjQrIG2quHDmHlxya0rB3I3u 7sEOFbAx65hgSYdNStcHNtch85W7NNE= Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33QGwraY017018; Thu, 27 Apr 2023 00:09:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=TFcTuHsIv/plzZmdlp1y8kigeEeyXo4KMEAxl9EnHkE=; b=Zggh1E8czE4x4ifpjvsDspPPPFrTZtGY0vjQy73xEQSQJCKBnE4cJTj4k5BJ5EXzz1Su KsiwjlUDFJNM0uU2OJB/zM5Bm9iqCJeuDWrpQ/MKoHxEXp0zMW1mIqKa3m5N4CQormOS z7FEU3qe7EeyswcxhsWF8wXz9MgbSVFiSArjA8dRyMlNb/JnPJC7N6VrgImA1wYTkyxp ctCaI/TG+W5xh1cBKNImbW8wT7qmLf4zR/nm8RB1KqjFdQRqFLyQoSuKmPAtJLhdvsAQ FQE42ee6cVT5V1dU/kk3KMCHIgvH2DN3bPq0HhRluiybIMW2fDFh36w+3T5LamXb1skG hA== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q476u2ng3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:18 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 33QLgiYb007654; Thu, 27 Apr 2023 00:09:18 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q4618mpgv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Apr 2023 00:09:17 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 33R0938e013888; Thu, 27 Apr 2023 00:09:17 GMT Received: from ca-qasparc-x86-2.us.oracle.com (ca-qasparc-x86-2.us.oracle.com [10.147.24.103]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3q4618mp42-10; Thu, 27 Apr 2023 00:09:17 +0000 From: Anthony Yznaga To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, rppt@kernel.org, akpm@linux-foundation.org, ebiederm@xmission.com, keescook@chromium.org, graf@amazon.com, jason.zeng@intel.com, lei.l.li@intel.com, steven.sistare@oracle.com, fam.zheng@bytedance.com, mgalaxy@akamai.com, kexec@lists.infradead.org Subject: [RFC v3 09/21] PKRAM: pass a list of preserved ranges to the next kernel Date: Wed, 26 Apr 2023 17:08:45 -0700 Message-Id: <1682554137-13938-10-git-send-email-anthony.yznaga@oracle.com> X-Mailer: git-send-email 1.9.4 In-Reply-To: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> References: <1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-26_10,2023-04-26_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 mlxscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304270000 X-Proofpoint-GUID: xTHpC9CzAHAd5dZprMjOHVznQrKD_GfJ X-Proofpoint-ORIG-GUID: xTHpC9CzAHAd5dZprMjOHVznQrKD_GfJ X-Rspam-User: X-Rspamd-Queue-Id: 6F08E40003 X-Rspamd-Server: rspam01 X-Stat-Signature: o4ki9t5uc84bqkmktm8xhqc8skixoonc X-HE-Tag: 1682554199-534567 X-HE-Meta: U2FsdGVkX18hsWRXMLdWyh+m9mS+Lej/uGpuehkW1QSmiYQkd2GPWkPed9eCsVYlJOa2r7UPQQOcJjCeqWyqLvkbSkt8qwdm3L4GC9v+gXfb9GVCQWHY5RN3U6ZP7JG3pKC7WDUdu9/G/+FFed7Qxsje6+olE2QHHOUc8Dn+2RuZ/zqFjWxkc1po54Ov3A6DFHWjegE2LEIKEPje1jTM7aJz0UXOYDoCSNV68k3SLoh1lMH+TQu6ce/pY7/0OswhQkjDh5hQAzoay3coBZGJkZKIfxFKg33UhUqN3VX5QpU8cPzQa1B6F8DQ2Ii1KPSkBB6X3PMSp2Z3wL20fnt2sZ6w2m9/SQ5hyJXMmbWOfAK3LBp/DBrGtGODiedj6kOQjHR/iWfRfxeoBZKWwJAeSWiNP/2X8vbOAz0wD9xwKwsavWdJPwb1XkIH2Z+deE/pwtB/Yo4K8GkSbz0/mYXHWtUW6l+pAoTzwR0VrBuG6FI50TN094g2a/MhEyoDReIbJ4jJIC0Zt9X+aVzDN0gfsBoy9uin6uRWmlo6LBmui+8j+ysk5tVQ/U7ZHsgLQZlPSYdCR7Y9zkHAUKzrSMSTQ3iqNj+BdAfQG38e7CrH9zSUS+nyAZT1A05rQGFq6xx/5kmYZtY2cxruzPpL7hvtYYiLNi7vOmUQSntneHA+lp9CXI/nhV9dwNLTSzOYmj/mIbMVtjcE8t/aZyYqlB2Y4QslBuZm+OXdDZLK0ojrCSu76XERL0XCVSI2IRc2bKSjHJb6838y0F5fVweiZ4c5wBIZ1SOz6mIi2XKpJA6jdVYqzwzQ6qbAIIqez02h+vzvIm16B6g/O/BRG0SrAV7T35+Igb8BAOFZPbp+UFEN47yKVqonuWrdCeSbSDXzCr8xGCgwF6GMvEvDf0vfGeRoEC19n625X+ODZ1pvl3AwSqBl+pkJjd6QVYZMVj+16GgDTj+0eXHT4Pq+ciHBcUV C/wIm5QO ibqH0E8c7EXd6u55JWvrdghd1Ly+mGJYss+ridiKFirr9ydooOayH4XmiQxIWWxIvjClhxyjdVJv0fdtGKOwJZowmK9oGpAZSH0PPfhn2muiq+8XIC5O4WAhaeQa7wCY+95bI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In order to build a new memblock reserved list during boot that includes ranges preserved by the previous kernel, a list of preserved ranges is passed to the next kernel via the pkram superblock. The ranges are stored in ascending order in a linked list of pages. A more complete memblock list is not prepared to avoid possible conflicts with changes in a newer kernel and to avoid having to allocate a contiguous range larger than a page. Signed-off-by: Anthony Yznaga --- mm/pkram.c | 184 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 177 insertions(+), 7 deletions(-) diff --git a/mm/pkram.c b/mm/pkram.c index e6c0f3c52465..3790e5180feb 100644 --- a/mm/pkram.c +++ b/mm/pkram.c @@ -84,6 +84,20 @@ struct pkram_node { #define PKRAM_LOAD 2 #define PKRAM_ACCMODE_MASK 3 +struct pkram_region { + phys_addr_t base; + phys_addr_t size; +}; + +struct pkram_region_list { + __u64 prev_pfn; + __u64 next_pfn; + + struct pkram_region regions[0]; +}; + +#define PKRAM_REGIONS_LIST_MAX \ + ((PAGE_SIZE-sizeof(struct pkram_region_list))/sizeof(struct pkram_region)) /* * The PKRAM super block contains data needed to restore the preserved memory * structure on boot. The pointer to it (pfn) should be passed via the 'pkram' @@ -96,13 +110,21 @@ struct pkram_node { */ struct pkram_super_block { __u64 node_pfn; /* first element of the node list */ + __u64 region_list_pfn; + __u64 nr_regions; }; +static struct pkram_region_list *pkram_regions_list; +static int pkram_init_regions_list(void); +static unsigned long pkram_populate_regions_list(void); + static unsigned long pkram_sb_pfn __initdata; static struct pkram_super_block *pkram_sb; extern int pkram_add_identity_map(struct page *page); extern void pkram_remove_identity_map(struct page *page); +extern void pkram_find_preserved(unsigned long start, unsigned long end, void *private, + int (*callback)(unsigned long base, unsigned long size, void *private)); /* * For convenience sake PKRAM nodes are kept in an auxiliary doubly-linked list @@ -878,21 +900,48 @@ static void __pkram_reboot(void) struct page *page; struct pkram_node *node; unsigned long node_pfn = 0; + unsigned long rl_pfn = 0; + unsigned long nr_regions = 0; + int err = 0; - list_for_each_entry_reverse(page, &pkram_nodes, lru) { - node = page_address(page); - if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) - continue; - node->node_pfn = node_pfn; - node_pfn = page_to_pfn(page); + if (!list_empty(&pkram_nodes)) { + err = pkram_add_identity_map(virt_to_page(pkram_sb)); + if (err) { + pr_err("PKRAM: failed to add super block to pagetable\n"); + goto done; + } + list_for_each_entry_reverse(page, &pkram_nodes, lru) { + node = page_address(page); + if (WARN_ON(node->flags & PKRAM_ACCMODE_MASK)) + continue; + node->node_pfn = node_pfn; + node_pfn = page_to_pfn(page); + } + err = pkram_init_regions_list(); + if (err) { + pr_err("PKRAM: failed to init regions list\n"); + goto done; + } + nr_regions = pkram_populate_regions_list(); + if (IS_ERR_VALUE(nr_regions)) { + err = nr_regions; + pr_err("PKRAM: failed to populate regions list\n"); + goto done; + } + rl_pfn = page_to_pfn(virt_to_page(pkram_regions_list)); } +done: /* * Zero out pkram_sb completely since it may have been passed from * the previous boot. */ memset(pkram_sb, 0, PAGE_SIZE); - pkram_sb->node_pfn = node_pfn; + if (!err && node_pfn) { + pkram_sb->node_pfn = node_pfn; + pkram_sb->region_list_pfn = rl_pfn; + pkram_sb->nr_regions = nr_regions; + } } static int pkram_reboot(struct notifier_block *notifier, @@ -968,3 +1017,124 @@ static int __init pkram_init(void) return 0; } module_init(pkram_init); + +static int count_region_cb(unsigned long base, unsigned long size, void *private) +{ + unsigned long *nr_regions = (unsigned long *)private; + + (*nr_regions)++; + return 0; +} + +static unsigned long pkram_count_regions(void) +{ + unsigned long nr_regions = 0; + + pkram_find_preserved(0, PHYS_ADDR_MAX, &nr_regions, count_region_cb); + + return nr_regions; +} + +/* + * To faciliate rapidly building a new memblock reserved list during boot + * with the addition of preserved memory ranges a regions list is built + * before reboot. + * The regions list is a linked list of pages with each page containing an + * array of preserved memory ranges. The ranges are stored in each page + * and across the list in address order. A linked list is used rather than + * a single contiguous range to mitigate against the possibility that a + * larger, contiguous allocation may fail due to fragmentation. + * + * Since the pages of the regions list must be preserved and the pkram + * pagetable is used to determine what ranges are preserved, the list pages + * must be allocated and represented in the pkram pagetable before they can + * be populated. Rather than recounting the number of regions after + * allocating pages and repeating until a precise number of pages are + * allocated, the number of pages needed is estimated. + */ +static int pkram_init_regions_list(void) +{ + struct pkram_region_list *rl; + unsigned long nr_regions; + unsigned long nr_lpages; + struct page *page; + + nr_regions = pkram_count_regions(); + + nr_lpages = DIV_ROUND_UP(nr_regions, PKRAM_REGIONS_LIST_MAX); + nr_regions += nr_lpages; + nr_lpages = DIV_ROUND_UP(nr_regions, PKRAM_REGIONS_LIST_MAX); + + for (; nr_lpages; nr_lpages--) { + page = pkram_alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + rl = page_address(page); + if (pkram_regions_list) { + rl->next_pfn = page_to_pfn(virt_to_page(pkram_regions_list)); + pkram_regions_list->prev_pfn = page_to_pfn(page); + } + pkram_regions_list = rl; + } + + return 0; +} + +struct pkram_regions_priv { + struct pkram_region_list *curr; + struct pkram_region_list *last; + unsigned long nr_regions; + int idx; +}; + +static int add_region_cb(unsigned long base, unsigned long size, void *private) +{ + struct pkram_regions_priv *priv; + struct pkram_region_list *rl; + int i; + + priv = (struct pkram_regions_priv *)private; + rl = priv->curr; + i = priv->idx; + + if (!rl) { + WARN_ON(1); + return 1; + } + + if (!i) + priv->last = priv->curr; + + rl->regions[i].base = base; + rl->regions[i].size = size; + + priv->nr_regions++; + i++; + if (i == PKRAM_REGIONS_LIST_MAX) { + u64 next_pfn = rl->next_pfn; + + if (next_pfn) + priv->curr = pfn_to_kaddr(next_pfn); + else + priv->curr = NULL; + + i = 0; + } + priv->idx = i; + + return 0; +} + +static unsigned long pkram_populate_regions_list(void) +{ + struct pkram_regions_priv priv = { .curr = pkram_regions_list }; + + pkram_find_preserved(0, PHYS_ADDR_MAX, &priv, add_region_cb); + + /* + * Link the first node to the last populated one. + */ + pkram_regions_list->prev_pfn = page_to_pfn(virt_to_page(priv.last)); + + return priv.nr_regions; +}