Message ID | 20210429122519.15183-5-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages | expand |
On Thu, Apr 29, 2021 at 02:25:16PM +0200, David Hildenbrand wrote: > Let's avoid reading: > > 1) Offline memory sections: the content of offline memory sections is stale > as the memory is effectively unused by the kernel. On s390x with standby > memory, offline memory sections (belonging to offline storage > increments) are not accessible. With virtio-mem and the hyper-v balloon, > we can have unavailable memory chunks that should not be accessed inside > offline memory sections. Last but not least, offline memory sections > might contain hwpoisoned pages which we can no longer identify > because the memmap is stale. > > 2) PG_offline pages: logically offline pages that are documented as > "The content of these pages is effectively stale. Such pages should not > be touched (read/write/dump/save) except by their owner.". > Examples include pages inflated in a balloon or unavailble memory > ranges inside hotplugged memory sections with virtio-mem or the hyper-v > balloon. > > 3) PG_hwpoison pages: Reading pages marked as hwpoisoned can be fatal. > As documented: "Accessing is not safe since it may cause another machine > check. Don't touch!" > > Reading /proc/kcore now performs similar checks as when reading > /proc/vmcore for kdump via makedumpfile: problematic pages are exclude. > It's also similar to hibernation code, however, we don't skip hwpoisoned > pages when processing pages in kernel/power/snapshot.c:saveable_page() yet. > > Note 1: we can race against memory offlining code, especially > memory going offline and getting unplugged: however, we will properly tear > down the identity mapping and handle faults gracefully when accessing > this memory from kcore code. > > Note 2: we can race against drivers setting PageOffline() and turning > memory inaccessible in the hypervisor. We'll handle this in a follow-up > patch. > > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > --- > fs/proc/kcore.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > index ed6fbb3bd50c..92ff1e4436cb 100644 > --- a/fs/proc/kcore.c > +++ b/fs/proc/kcore.c > @@ -465,6 +465,9 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > > m = NULL; > while (buflen) { > + struct page *page; > + unsigned long pfn; > + > /* > * If this is the first iteration or the address is not within > * the previous entry, search for a matching entry. > @@ -503,7 +506,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > } > break; > case KCORE_RAM: > - if (!pfn_is_ram(__pa(start) >> PAGE_SHIFT)) { > + pfn = __pa(start) >> PAGE_SHIFT; > + page = pfn_to_online_page(pfn); > + > + /* > + * Don't read offline sections, logically offline pages > + * (e.g., inflated in a balloon), hwpoisoned pages, > + * and explicitly excluded physical ranges. > + */ > + if (!page || PageOffline(page) || > + is_page_hwpoison(page) || !pfn_is_ram(pfn)) { > if (clear_user(buffer, tsz)) { > ret = -EFAULT; > goto out; > -- > 2.30.2 >
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index ed6fbb3bd50c..92ff1e4436cb 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -465,6 +465,9 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) m = NULL; while (buflen) { + struct page *page; + unsigned long pfn; + /* * If this is the first iteration or the address is not within * the previous entry, search for a matching entry. @@ -503,7 +506,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) } break; case KCORE_RAM: - if (!pfn_is_ram(__pa(start) >> PAGE_SHIFT)) { + pfn = __pa(start) >> PAGE_SHIFT; + page = pfn_to_online_page(pfn); + + /* + * Don't read offline sections, logically offline pages + * (e.g., inflated in a balloon), hwpoisoned pages, + * and explicitly excluded physical ranges. + */ + if (!page || PageOffline(page) || + is_page_hwpoison(page) || !pfn_is_ram(pfn)) { if (clear_user(buffer, tsz)) { ret = -EFAULT; goto out;
Let's avoid reading: 1) Offline memory sections: the content of offline memory sections is stale as the memory is effectively unused by the kernel. On s390x with standby memory, offline memory sections (belonging to offline storage increments) are not accessible. With virtio-mem and the hyper-v balloon, we can have unavailable memory chunks that should not be accessed inside offline memory sections. Last but not least, offline memory sections might contain hwpoisoned pages which we can no longer identify because the memmap is stale. 2) PG_offline pages: logically offline pages that are documented as "The content of these pages is effectively stale. Such pages should not be touched (read/write/dump/save) except by their owner.". Examples include pages inflated in a balloon or unavailble memory ranges inside hotplugged memory sections with virtio-mem or the hyper-v balloon. 3) PG_hwpoison pages: Reading pages marked as hwpoisoned can be fatal. As documented: "Accessing is not safe since it may cause another machine check. Don't touch!" Reading /proc/kcore now performs similar checks as when reading /proc/vmcore for kdump via makedumpfile: problematic pages are exclude. It's also similar to hibernation code, however, we don't skip hwpoisoned pages when processing pages in kernel/power/snapshot.c:saveable_page() yet. Note 1: we can race against memory offlining code, especially memory going offline and getting unplugged: however, we will properly tear down the identity mapping and handle faults gracefully when accessing this memory from kcore code. Note 2: we can race against drivers setting PageOffline() and turning memory inaccessible in the hypervisor. We'll handle this in a follow-up patch. Signed-off-by: David Hildenbrand <david@redhat.com> --- fs/proc/kcore.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)