From patchwork Thu Jul 1 01:50:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 12353171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 414EAC11F6A for ; Thu, 1 Jul 2021 01:50:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E6C0761469 for ; Thu, 1 Jul 2021 01:50:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E6C0761469 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 68FD58D020C; Wed, 30 Jun 2021 21:50:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6406A8D0202; Wed, 30 Jun 2021 21:50:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E1B08D020C; Wed, 30 Jun 2021 21:50:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 2A89F8D0202 for ; Wed, 30 Jun 2021 21:50:13 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id F25A0252D7 for ; Thu, 1 Jul 2021 01:50:12 +0000 (UTC) X-FDA: 78312338664.05.07E4FCC Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf29.hostedemail.com (Postfix) with ESMTP id 8A8A0900032B for ; Thu, 1 Jul 2021 01:50:12 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 367606105A; Thu, 1 Jul 2021 01:50:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1625104211; bh=U7q2SNv2XL2VMJUK6O9N7koxRTuh+HEVxidHJUWSZ/0=; h=Date:From:To:Subject:In-Reply-To:From; b=07c5W6ajiBmdBcwW/gtK3IZ3rvlvX20A+5CQKPXOY7J8E7HvfAZzvkvoJpd6BeLIL 68zF5v3jMSjDjqlCG3g3za3FpRVrggyKB5QYlPzMa4QAy+xdjcBoiKv5bDf1F6mNrN e83bgBs5gJ2BVYuLXyN/kbX5dhBTvABNgKUBNHIo= Date: Wed, 30 Jun 2021 18:50:10 -0700 From: Andrew Morton To: adobriyan@gmail.com, akpm@linux-foundation.org, alex.shi@linux.alibaba.com, david@redhat.com, guro@fb.com, haiyangz@microsoft.com, jasowang@redhat.com, jbohac@suse.cz, kys@microsoft.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, mst@redhat.com, naoya.horiguchi@nec.com, osalvador@suse.de, rppt@linux.ibm.com, steven.price@arm.com, sthemmin@microsoft.com, torvalds@linux-foundation.org, wei.liu@kernel.org, willy@infradead.org, yaoaili@kingsoft.com Subject: [patch 056/192] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages Message-ID: <20210701015010.PMff3kM77%akpm@linux-foundation.org> In-Reply-To: <20210630184624.9ca1937310b0dd5ce66b30e7@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8A8A0900032B Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=07c5W6aj; spf=pass (imf29.hostedemail.com: domain of akpm@linux-foundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org; dmarc=none X-Stat-Signature: f3isoj96nx8km1mm5x9jkssh8fegk4eh X-HE-Tag: 1625104212-697753 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Hildenbrand Subject: fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages Let's avoid reading: 1) Offline memory sections: the content of offline memory sections is stale as the memory is effectively unused by the kernel. On s390x with standby memory, offline memory sections (belonging to offline storage increments) are not accessible. With virtio-mem and the hyper-v balloon, we can have unavailable memory chunks that should not be accessed inside offline memory sections. Last but not least, offline memory sections might contain hwpoisoned pages which we can no longer identify because the memmap is stale. 2) PG_offline pages: logically offline pages that are documented as "The content of these pages is effectively stale. Such pages should not be touched (read/write/dump/save) except by their owner.". Examples include pages inflated in a balloon or unavailble memory ranges inside hotplugged memory sections with virtio-mem or the hyper-v balloon. 3) PG_hwpoison pages: Reading pages marked as hwpoisoned can be fatal. As documented: "Accessing is not safe since it may cause another machine check. Don't touch!" Introduce is_page_hwpoison(), adding a comment that it is inherently racy but best we can really do. Reading /proc/kcore now performs similar checks as when reading /proc/vmcore for kdump via makedumpfile: problematic pages are exclude. It's also similar to hibernation code, however, we don't skip hwpoisoned pages when processing pages in kernel/power/snapshot.c:saveable_page() yet. Note 1: we can race against memory offlining code, especially memory going offline and getting unplugged: however, we will properly tear down the identity mapping and handle faults gracefully when accessing this memory from kcore code. Note 2: we can race against drivers setting PageOffline() and turning memory inaccessible in the hypervisor. We'll handle this in a follow-up patch. Link: https://lkml.kernel.org/r/20210526093041.8800-4-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Mike Rapoport Reviewed-by: Oscar Salvador Cc: Aili Yao Cc: Alexey Dobriyan Cc: Alex Shi Cc: Haiyang Zhang Cc: Jason Wang Cc: Jiri Bohac Cc: "K. Y. Srinivasan" Cc: "Matthew Wilcox (Oracle)" Cc: "Michael S. Tsirkin" Cc: Michal Hocko Cc: Mike Kravetz Cc: Naoya Horiguchi Cc: Roman Gushchin Cc: Stephen Hemminger Cc: Steven Price Cc: Wei Liu Signed-off-by: Andrew Morton --- fs/proc/kcore.c | 14 +++++++++++++- include/linux/page-flags.h | 12 ++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) --- a/fs/proc/kcore.c~fs-proc-kcore-dont-read-offline-sections-logically-offline-pages-and-hwpoisoned-pages +++ a/fs/proc/kcore.c @@ -465,6 +465,9 @@ read_kcore(struct file *file, char __use m = NULL; while (buflen) { + struct page *page; + unsigned long pfn; + /* * If this is the first iteration or the address is not within * the previous entry, search for a matching entry. @@ -503,7 +506,16 @@ read_kcore(struct file *file, char __use } break; case KCORE_RAM: - if (!pfn_is_ram(__pa(start) >> PAGE_SHIFT)) { + pfn = __pa(start) >> PAGE_SHIFT; + page = pfn_to_online_page(pfn); + + /* + * Don't read offline sections, logically offline pages + * (e.g., inflated in a balloon), hwpoisoned pages, + * and explicitly excluded physical ranges. + */ + if (!page || PageOffline(page) || + is_page_hwpoison(page) || !pfn_is_ram(pfn)) { if (clear_user(buffer, tsz)) { ret = -EFAULT; goto out; --- a/include/linux/page-flags.h~fs-proc-kcore-dont-read-offline-sections-logically-offline-pages-and-hwpoisoned-pages +++ a/include/linux/page-flags.h @@ -695,6 +695,18 @@ PAGEFLAG_FALSE(DoubleMap) #endif /* + * Check if a page is currently marked HWPoisoned. Note that this check is + * best effort only and inherently racy: there is no way to synchronize with + * failing hardware. + */ +static inline bool is_page_hwpoison(struct page *page) +{ + if (PageHWPoison(page)) + return true; + return PageHuge(page) && PageHWPoison(compound_head(page)); +} + +/* * For pages that are never mapped to userspace (and aren't PageSlab), * page_type may be used. Because it is initialised to -1, we invert the * sense of the bit, so __SetPageFoo *clears* the bit used for PageFoo, and