From patchwork Mon Feb 4 20:18:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796573 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DD006C2 for ; Mon, 4 Feb 2019 20:22:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E31F2C06D for ; Mon, 4 Feb 2019 20:22:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1CAD42C251; Mon, 4 Feb 2019 20:22:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6A0872C254 for ; Mon, 4 Feb 2019 20:22:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728445AbfBDUU2 (ORCPT ); Mon, 4 Feb 2019 15:20:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47682 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728010AbfBDUTj (ORCPT ); Mon, 4 Feb 2019 15:19:39 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1490081F0E; Mon, 4 Feb 2019 20:19:38 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 648878924D; Mon, 4 Feb 2019 20:19:36 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 1/7] KVM: Support for guest free page hinting Date: Mon, 4 Feb 2019 15:18:48 -0500 Message-Id: <20190204201854.2328-2-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 04 Feb 2019 20:19:38 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch includes the following: 1. Basic skeleton for the support 2. Enablement of x86 platform to use the same Signed-off-by: Nitesh Narayan Lal --- arch/x86/Kbuild | 2 +- arch/x86/kvm/Kconfig | 8 ++++++++ arch/x86/kvm/Makefile | 2 ++ include/linux/gfp.h | 9 +++++++++ include/linux/page_hinting.h | 17 +++++++++++++++++ virt/kvm/page_hinting.c | 36 ++++++++++++++++++++++++++++++++++++ 6 files changed, 73 insertions(+), 1 deletion(-) create mode 100644 include/linux/page_hinting.h create mode 100644 virt/kvm/page_hinting.c diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index c625f57472f7..3244df4ee311 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -2,7 +2,7 @@ obj-y += entry/ obj-$(CONFIG_PERF_EVENTS) += events/ -obj-$(CONFIG_KVM) += kvm/ +obj-$(subst m,y,$(CONFIG_KVM)) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 72fa955f4a15..2fae31459706 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -96,6 +96,14 @@ config KVM_MMU_AUDIT This option adds a R/W kVM module parameter 'mmu_audit', which allows auditing of KVM MMU events at runtime. +# KVM_FREE_PAGE_HINTING will allow the guest to report the free pages to the +# host in regular interval of time. +config KVM_FREE_PAGE_HINTING + def_bool y + depends on KVM + select VIRTIO + select VIRTIO_BALLOON + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source "drivers/vhost/Kconfig" diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 69b3a7c30013..78640a80501e 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -16,6 +16,8 @@ kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ hyperv.o page_track.o debugfs.o +obj-$(CONFIG_KVM_FREE_PAGE_HINTING) += $(KVM)/page_hinting.o + kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o kvm-amd-y += svm.o pmu_amd.o diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 5f5e25fd6149..e596527284ba 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -7,6 +7,7 @@ #include #include #include +#include struct vm_area_struct; @@ -456,6 +457,14 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); } +#ifdef CONFIG_KVM_FREE_PAGE_HINTING +#define HAVE_ARCH_FREE_PAGE +static inline void arch_free_page(struct page *page, int order) +{ + guest_free_page(page, order); +} +#endif + #ifndef HAVE_ARCH_FREE_PAGE static inline void arch_free_page(struct page *page, int order) { } #endif diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h new file mode 100644 index 000000000000..b54f7428f348 --- /dev/null +++ b/include/linux/page_hinting.h @@ -0,0 +1,17 @@ +/* + * Size of the array which is used to store the freed pages is defined by + * MAX_FGPT_ENTRIES. If possible, we have to find a better way using which + * we can get rid of the hardcoded array size. + */ +#define MAX_FGPT_ENTRIES 1000 +/* + * hypervisor_pages - It is a dummy structure passed with the hypercall. + * @pfn: page frame number for the page which needs to be sent to the host. + * @order: order of the page needs to be reported to the host. + */ +struct hypervisor_pages { + unsigned long pfn; + unsigned int order; +}; + +void guest_free_page(struct page *page, int order); diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c new file mode 100644 index 000000000000..818bd6b84e0c --- /dev/null +++ b/virt/kvm/page_hinting.c @@ -0,0 +1,36 @@ +#include +#include +#include + +/* + * struct kvm_free_pages - Tracks the pages which are freed by the guest. + * @pfn: page frame number for the page which is freed. + * @order: order corresponding to the page freed. + * @zonenum: zone number to which the freed page belongs. + */ +struct kvm_free_pages { + unsigned long pfn; + unsigned int order; + int zonenum; +}; + +/* + * struct page_hinting - holds array objects for the structures used to track + * guest free pages, along with an index variable for each of them. + * @kvm_pt: array object for the structure kvm_free_pages. + * @kvm_pt_idx: index for kvm_free_pages object. + * @hypervisor_pagelist: array object for the structure hypervisor_pages. + * @hyp_idx: index for hypervisor_pages object. + */ +struct page_hinting { + struct kvm_free_pages kvm_pt[MAX_FGPT_ENTRIES]; + int kvm_pt_idx; + struct hypervisor_pages hypervisor_pagelist[MAX_FGPT_ENTRIES]; + int hyp_idx; +}; + +DEFINE_PER_CPU(struct page_hinting, hinting_obj); + +void guest_free_page(struct page *page, int order) +{ +} From patchwork Mon Feb 4 20:18:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796529 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CFD2414E1 for ; Mon, 4 Feb 2019 20:19:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BFFA128760 for ; Mon, 4 Feb 2019 20:19:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BE06F2BDE7; Mon, 4 Feb 2019 20:19:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30AD32C1D2 for ; Mon, 4 Feb 2019 20:19:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728106AbfBDUTn (ORCPT ); Mon, 4 Feb 2019 15:19:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48122 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728042AbfBDUTk (ORCPT ); Mon, 4 Feb 2019 15:19:40 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9D69D8B10E; Mon, 4 Feb 2019 20:19:39 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D0538AD93; Mon, 4 Feb 2019 20:19:37 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 2/7] KVM: Enabling guest free page hinting via static key Date: Mon, 4 Feb 2019 15:18:49 -0500 Message-Id: <20190204201854.2328-3-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 04 Feb 2019 20:19:39 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch enables the guest free page hinting support to enable or disable based on the STATIC key which could be set via sysctl. Signed-off-by: Nitesh Narayan Lal --- include/linux/gfp.h | 2 ++ include/linux/page_hinting.h | 5 +++++ kernel/sysctl.c | 9 +++++++++ virt/kvm/page_hinting.c | 23 +++++++++++++++++++++++ 4 files changed, 39 insertions(+) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e596527284ba..8389219a076a 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -461,6 +461,8 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) #define HAVE_ARCH_FREE_PAGE static inline void arch_free_page(struct page *page, int order) { + if (!static_branch_unlikely(&guest_page_hinting_key)) + return; guest_free_page(page, order); } #endif diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h index b54f7428f348..9bdcf63e1306 100644 --- a/include/linux/page_hinting.h +++ b/include/linux/page_hinting.h @@ -14,4 +14,9 @@ struct hypervisor_pages { unsigned int order; }; +extern int guest_page_hinting_flag; +extern struct static_key_false guest_page_hinting_key; + +int guest_page_hinting_sysctl(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos); void guest_free_page(struct page *page, int order); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ba4d9e85feb8..5d53629c9bfb 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1690,6 +1690,15 @@ static struct ctl_table vm_table[] = { .extra1 = (void *)&mmap_rnd_compat_bits_min, .extra2 = (void *)&mmap_rnd_compat_bits_max, }, +#endif +#ifdef CONFIG_KVM_FREE_PAGE_HINTING + { + .procname = "guest-page-hinting", + .data = &guest_page_hinting_flag, + .maxlen = sizeof(guest_page_hinting_flag), + .mode = 0644, + .proc_handler = guest_page_hinting_sysctl, + }, #endif { } }; diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c index 818bd6b84e0c..4a34ea8db0c8 100644 --- a/virt/kvm/page_hinting.c +++ b/virt/kvm/page_hinting.c @@ -1,6 +1,7 @@ #include #include #include +#include /* * struct kvm_free_pages - Tracks the pages which are freed by the guest. @@ -31,6 +32,28 @@ struct page_hinting { DEFINE_PER_CPU(struct page_hinting, hinting_obj); +struct static_key_false guest_page_hinting_key = STATIC_KEY_FALSE_INIT; +EXPORT_SYMBOL(guest_page_hinting_key); +static DEFINE_MUTEX(hinting_mutex); +int guest_page_hinting_flag; +EXPORT_SYMBOL(guest_page_hinting_flag); + +int guest_page_hinting_sysctl(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + int ret; + + mutex_lock(&hinting_mutex); + ret = proc_dointvec(table, write, buffer, lenp, ppos); + if (guest_page_hinting_flag) + static_key_enable(&guest_page_hinting_key.key); + else + static_key_disable(&guest_page_hinting_key.key); + mutex_unlock(&hinting_mutex); + return ret; +} + void guest_free_page(struct page *page, int order) { } From patchwork Mon Feb 4 20:18:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4842014E1 for ; Mon, 4 Feb 2019 20:20:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 382FB2C0FC for ; Mon, 4 Feb 2019 20:20:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 365C92C23E; Mon, 4 Feb 2019 20:20:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BC0792C1CD for ; Mon, 4 Feb 2019 20:20:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728268AbfBDUT5 (ORCPT ); Mon, 4 Feb 2019 15:19:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46878 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727795AbfBDUTm (ORCPT ); Mon, 4 Feb 2019 15:19:42 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B2C7458E40; Mon, 4 Feb 2019 20:19:41 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F0A375C79; Mon, 4 Feb 2019 20:19:39 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 3/7] KVM: Guest free page hinting functional skeleton Date: Mon, 4 Feb 2019 15:18:50 -0500 Message-Id: <20190204201854.2328-4-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 04 Feb 2019 20:19:42 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds the functional skeleton for the guest implementation. It also enables the guest to maintain the list of pages which are freed by the guest. Once the list is full guest_free_page invokes scan_array() which wakes up the kernel thread responsible for further processing. Signed-off-by: Nitesh Narayan Lal --- include/linux/page_hinting.h | 3 ++ virt/kvm/page_hinting.c | 60 +++++++++++++++++++++++++++++++++++- 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h index 9bdcf63e1306..2d7ff59f3f6a 100644 --- a/include/linux/page_hinting.h +++ b/include/linux/page_hinting.h @@ -1,3 +1,5 @@ +#include + /* * Size of the array which is used to store the freed pages is defined by * MAX_FGPT_ENTRIES. If possible, we have to find a better way using which @@ -16,6 +18,7 @@ struct hypervisor_pages { extern int guest_page_hinting_flag; extern struct static_key_false guest_page_hinting_key; +extern struct smp_hotplug_thread hinting_threads; int guest_page_hinting_sysctl(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c index 4a34ea8db0c8..636990e7fbb3 100644 --- a/virt/kvm/page_hinting.c +++ b/virt/kvm/page_hinting.c @@ -1,7 +1,7 @@ #include #include -#include #include +#include /* * struct kvm_free_pages - Tracks the pages which are freed by the guest. @@ -37,6 +37,7 @@ EXPORT_SYMBOL(guest_page_hinting_key); static DEFINE_MUTEX(hinting_mutex); int guest_page_hinting_flag; EXPORT_SYMBOL(guest_page_hinting_flag); +static DEFINE_PER_CPU(struct task_struct *, hinting_task); int guest_page_hinting_sysctl(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, @@ -54,6 +55,63 @@ int guest_page_hinting_sysctl(struct ctl_table *table, int write, return ret; } +static void hinting_fn(unsigned int cpu) +{ + struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + + page_hinting_obj->kvm_pt_idx = 0; + put_cpu_var(hinting_obj); +} + +void scan_array(void) +{ + struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + + if (page_hinting_obj->kvm_pt_idx == MAX_FGPT_ENTRIES) + wake_up_process(__this_cpu_read(hinting_task)); +} + +static int hinting_should_run(unsigned int cpu) +{ + struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + int free_page_idx = page_hinting_obj->kvm_pt_idx; + + if (free_page_idx == MAX_FGPT_ENTRIES) + return 1; + else + return 0; +} + +struct smp_hotplug_thread hinting_threads = { + .store = &hinting_task, + .thread_should_run = hinting_should_run, + .thread_fn = hinting_fn, + .thread_comm = "hinting/%u", + .selfparking = false, +}; +EXPORT_SYMBOL(hinting_threads); + void guest_free_page(struct page *page, int order) { + unsigned long flags; + struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + /* + * use of global variables may trigger a race condition between irq and + * process context causing unwanted overwrites. This will be replaced + * with a better solution to prevent such race conditions. + */ + + local_irq_save(flags); + if (page_hinting_obj->kvm_pt_idx != MAX_FGPT_ENTRIES) { + page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].pfn = + page_to_pfn(page); + page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].zonenum = + page_zonenum(page); + page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].order = + order; + page_hinting_obj->kvm_pt_idx += 1; + if (page_hinting_obj->kvm_pt_idx == MAX_FGPT_ENTRIES) + scan_array(); + } + local_irq_restore(flags); } From patchwork Mon Feb 4 20:18:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796531 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1238414E1 for ; Mon, 4 Feb 2019 20:19:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02A0B2C0FC for ; Mon, 4 Feb 2019 20:19:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 009D52C1A5; Mon, 4 Feb 2019 20:19:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 969872C0FC for ; Mon, 4 Feb 2019 20:19:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728158AbfBDUTt (ORCPT ); Mon, 4 Feb 2019 15:19:49 -0500 Received: from mx1.redhat.com ([209.132.183.28]:26758 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728042AbfBDUTs (ORCPT ); Mon, 4 Feb 2019 15:19:48 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0D213A0361; Mon, 4 Feb 2019 20:19:48 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id B14C68AD92; Mon, 4 Feb 2019 20:19:41 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 4/7] KVM: Disabling page poisoning to prevent corruption Date: Mon, 4 Feb 2019 15:18:51 -0500 Message-Id: <20190204201854.2328-5-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 04 Feb 2019 20:19:48 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch disables page poisoning if guest page hinting is enabled. It is required to avoid possible guest memory corruption errors. Page Poisoning is a feature in which the page is filled with a specific pattern of (0x00 or 0xaa) after arch_free_page and the same is verified before arch_alloc_page to prevent following issues: *information leak from the freed data *use after free bugs *memory corruption Selection of the pattern depends on the CONFIG_PAGE_POISONING_ZERO Once the guest pages which are supposed to be freed are sent to the hypervisor it frees them. After freeing the pages in the global list following things may happen: *Hypervisor reallocates the freed memory back to the guest *Hypervisor frees the memory and maps a different physical memory In order to prevent any information leak hypervisor before allocating memory to the guest fills it with zeroes. The issue arises when the pattern used for Page Poisoning is 0xaa while the newly allocated page received from the hypervisor by the guest is filled with the pattern 0x00. This will result in memory corruption errors. Signed-off-by: Nitesh Narayan Lal --- include/linux/page_hinting.h | 8 ++++++++ mm/page_poison.c | 2 +- virt/kvm/page_hinting.c | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h index 2d7ff59f3f6a..e800c6b07561 100644 --- a/include/linux/page_hinting.h +++ b/include/linux/page_hinting.h @@ -19,7 +19,15 @@ struct hypervisor_pages { extern int guest_page_hinting_flag; extern struct static_key_false guest_page_hinting_key; extern struct smp_hotplug_thread hinting_threads; +extern bool want_page_poisoning; int guest_page_hinting_sysctl(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); void guest_free_page(struct page *page, int order); + +static inline void disable_page_poisoning(void) +{ +#ifdef CONFIG_PAGE_POISONING + want_page_poisoning = 0; +#endif +} diff --git a/mm/page_poison.c b/mm/page_poison.c index f0c15e9017c0..9af96021133b 100644 --- a/mm/page_poison.c +++ b/mm/page_poison.c @@ -7,7 +7,7 @@ #include #include -static bool want_page_poisoning __read_mostly; +bool want_page_poisoning __read_mostly; static int __init early_page_poison_param(char *buf) { diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c index 636990e7fbb3..be529f6f2bc0 100644 --- a/virt/kvm/page_hinting.c +++ b/virt/kvm/page_hinting.c @@ -103,6 +103,7 @@ void guest_free_page(struct page *page, int order) local_irq_save(flags); if (page_hinting_obj->kvm_pt_idx != MAX_FGPT_ENTRIES) { + disable_page_poisoning(); page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].pfn = page_to_pfn(page); page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].zonenum = From patchwork Mon Feb 4 20:18:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796539 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2701922 for ; Mon, 4 Feb 2019 20:20:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C25062C0D0 for ; Mon, 4 Feb 2019 20:20:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C033B2B651; Mon, 4 Feb 2019 20:20:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25E3D2B651 for ; Mon, 4 Feb 2019 20:20:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728239AbfBDUT5 (ORCPT ); Mon, 4 Feb 2019 15:19:57 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47246 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728094AbfBDUT4 (ORCPT ); Mon, 4 Feb 2019 15:19:56 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 376CF5D61C; Mon, 4 Feb 2019 20:19:56 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E4E175C79; Mon, 4 Feb 2019 20:19:48 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 5/7] virtio: Enables to add a single descriptor to the host Date: Mon, 4 Feb 2019 15:18:52 -0500 Message-Id: <20190204201854.2328-6-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 04 Feb 2019 20:19:56 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch enables the caller to expose a single buffers to the other end using vring descriptor. It also allows the caller to perform this action in synchornous manner by using virtqueue_kick_sync. Signed-off-by: Nitesh Narayan Lal --- drivers/virtio/virtio_ring.c | 72 ++++++++++++++++++++++++++++++++++++ include/linux/virtio.h | 4 ++ 2 files changed, 76 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cd7e755484e3..93c161ac6a28 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1695,6 +1695,52 @@ static inline int virtqueue_add(struct virtqueue *_vq, out_sgs, in_sgs, data, ctx, gfp); } +/** + * virtqueue_add_desc - add a buffer to a chain using a vring desc + * @vq: the struct virtqueue we're talking about. + * @addr: address of the buffer to add. + * @len: length of the buffer. + * @in: set if the buffer is for the device to write. + * + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). + */ +int virtqueue_add_desc(struct virtqueue *_vq, u64 addr, u32 len, int in) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + struct vring_desc *desc = vq->split.vring.desc; + u16 flags = in ? VRING_DESC_F_WRITE : 0; + unsigned int i; + void *data = (void *)addr; + int avail_idx; + + /* Sanity check */ + if (!_vq) + return -EINVAL; + + START_USE(vq); + if (unlikely(vq->broken)) { + END_USE(vq); + return -EIO; + } + + i = vq->free_head; + flags &= ~VRING_DESC_F_NEXT; + desc[i].flags = cpu_to_virtio16(_vq->vdev, flags); + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr); + desc[i].len = cpu_to_virtio32(_vq->vdev, len); + + vq->vq.num_free--; + vq->free_head = virtio16_to_cpu(_vq->vdev, desc[i].next); + vq->split.desc_state[i].data = data; + vq->split.avail_idx_shadow = 1; + avail_idx = vq->split.avail_idx_shadow; + vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, avail_idx); + vq->num_added = 1; + END_USE(vq); + return 0; +} +EXPORT_SYMBOL_GPL(virtqueue_add_desc); + /** * virtqueue_add_sgs - expose buffers to other end * @vq: the struct virtqueue we're talking about. @@ -1842,6 +1888,32 @@ bool virtqueue_notify(struct virtqueue *_vq) } EXPORT_SYMBOL_GPL(virtqueue_notify); +/** + * virtqueue_kick_sync - update after add_buf and busy wait till update is done + * @vq: the struct virtqueue + * + * After one or more virtqueue_add_* calls, invoke this to kick + * the other side. Busy wait till the other side is done with the update. + * + * Caller must ensure we don't call this with other virtqueue + * operations at the same time (except where noted). + * + * Returns false if kick failed, otherwise true. + */ +bool virtqueue_kick_sync(struct virtqueue *vq) +{ + u32 len; + + if (likely(virtqueue_kick(vq))) { + while (!virtqueue_get_buf(vq, &len) && + !virtqueue_is_broken(vq)) + cpu_relax(); + return true; + } + return false; +} +EXPORT_SYMBOL_GPL(virtqueue_kick_sync); + /** * virtqueue_kick - update after add_buf * @vq: the struct virtqueue diff --git a/include/linux/virtio.h b/include/linux/virtio.h index fa1b5da2804e..58943a3a0e8d 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -57,6 +57,10 @@ int virtqueue_add_sgs(struct virtqueue *vq, unsigned int in_sgs, void *data, gfp_t gfp); +/* A desc with this init id is treated as an invalid desc */ +int virtqueue_add_desc(struct virtqueue *_vq, u64 addr, u32 len, int in); + +bool virtqueue_kick_sync(struct virtqueue *vq); bool virtqueue_kick(struct virtqueue *vq); From patchwork Mon Feb 4 20:18:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796535 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3011C14E1 for ; Mon, 4 Feb 2019 20:20:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E2752C1DE for ; Mon, 4 Feb 2019 20:20:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 122762C206; Mon, 4 Feb 2019 20:20:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 06C942AD31 for ; Mon, 4 Feb 2019 20:20:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728311AbfBDUT7 (ORCPT ); Mon, 4 Feb 2019 15:19:59 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47576 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728269AbfBDUT6 (ORCPT ); Mon, 4 Feb 2019 15:19:58 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DBDD4C9DA6; Mon, 4 Feb 2019 20:19:57 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 51B4A8A29D; Mon, 4 Feb 2019 20:19:56 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report free pages Date: Mon, 4 Feb 2019 15:18:53 -0500 Message-Id: <20190204201854.2328-7-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 04 Feb 2019 20:19:58 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch enables the kernel to scan the per cpu array and compress it by removing the repetitive/re-allocated pages. Once the per cpu array is completely filled with pages in the buddy it wakes up the kernel per cpu thread which re-scans the entire per cpu array by acquiring a zone lock corresponding to the page which is being scanned. If the page is still free and present in the buddy it tries to isolate the page and adds it to another per cpu array. Once this scanning process is complete and if there are any isolated pages added to the new per cpu array kernel thread invokes hyperlist_ready(). In hyperlist_ready() a hypercall is made to report these pages to the host using the virtio-balloon framework. In order to do so another virtqueue 'hinting_vq' is added to the balloon framework. As the host frees all the reported pages, the kernel thread returns them back to the buddy. Signed-off-by: Nitesh Narayan Lal --- drivers/virtio/virtio_balloon.c | 56 +++++++- include/linux/page_hinting.h | 18 ++- include/uapi/linux/virtio_balloon.h | 1 + mm/page_alloc.c | 2 +- virt/kvm/page_hinting.c | 202 +++++++++++++++++++++++++++- 5 files changed, 269 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 728ecd1eea30..8af34e0b9a32 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -57,13 +57,15 @@ enum virtio_balloon_vq { VIRTIO_BALLOON_VQ_INFLATE, VIRTIO_BALLOON_VQ_DEFLATE, VIRTIO_BALLOON_VQ_STATS, + VIRTIO_BALLOON_VQ_HINTING, VIRTIO_BALLOON_VQ_FREE_PAGE, VIRTIO_BALLOON_VQ_MAX }; struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq, + *hinting_vq; /* Balloon's own wq for cpu-intensive work items */ struct workqueue_struct *balloon_wq; @@ -122,6 +124,40 @@ static struct virtio_device_id id_table[] = { { 0 }, }; +#ifdef CONFIG_KVM_FREE_PAGE_HINTING +void virtballoon_page_hinting(struct virtio_balloon *vb, u64 gvaddr, + int hyper_entries) +{ + u64 gpaddr = virt_to_phys((void *)gvaddr); + + virtqueue_add_desc(vb->hinting_vq, gpaddr, hyper_entries, 0); + virtqueue_kick_sync(vb->hinting_vq); +} + +static void hinting_ack(struct virtqueue *vq) +{ + struct virtio_balloon *vb = vq->vdev->priv; + + wake_up(&vb->acked); +} + +static void enable_hinting(struct virtio_balloon *vb) +{ + guest_page_hinting_flag = 1; + static_branch_enable(&guest_page_hinting_key); + request_hypercall = (void *)&virtballoon_page_hinting; + balloon_ptr = vb; + WARN_ON(smpboot_register_percpu_thread(&hinting_threads)); +} + +static void disable_hinting(void) +{ + guest_page_hinting_flag = 0; + static_branch_enable(&guest_page_hinting_key); + balloon_ptr = NULL; +} +#endif + static u32 page_to_balloon_pfn(struct page *page) { unsigned long pfn = page_to_pfn(page); @@ -481,6 +517,7 @@ static int init_vqs(struct virtio_balloon *vb) names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate"; names[VIRTIO_BALLOON_VQ_STATS] = NULL; names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; + names[VIRTIO_BALLOON_VQ_HINTING] = NULL; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { names[VIRTIO_BALLOON_VQ_STATS] = "stats"; @@ -492,11 +529,18 @@ static int init_vqs(struct virtio_balloon *vb) callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL; } + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) { + names[VIRTIO_BALLOON_VQ_HINTING] = "hinting_vq"; + callbacks[VIRTIO_BALLOON_VQ_HINTING] = hinting_ack; + } err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, callbacks, names, NULL, NULL); if (err) return err; + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) + vb->hinting_vq = vqs[VIRTIO_BALLOON_VQ_HINTING]; + vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE]; vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE]; if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) { @@ -908,6 +952,11 @@ static int virtballoon_probe(struct virtio_device *vdev) if (err) goto out_del_balloon_wq; } + +#ifdef CONFIG_KVM_FREE_PAGE_HINTING + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) + enable_hinting(vb); +#endif virtio_device_ready(vdev); if (towards_target(vb)) @@ -950,6 +999,10 @@ static void virtballoon_remove(struct virtio_device *vdev) cancel_work_sync(&vb->update_balloon_size_work); cancel_work_sync(&vb->update_balloon_stats_work); +#ifdef CONFIG_KVM_FREE_PAGE_HINTING + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINTING)) + disable_hinting(); +#endif if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) { cancel_work_sync(&vb->report_free_page_work); destroy_workqueue(vb->balloon_wq); @@ -1009,6 +1062,7 @@ static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_F_DEFLATE_ON_OOM, + VIRTIO_BALLOON_F_HINTING, VIRTIO_BALLOON_F_FREE_PAGE_HINT, VIRTIO_BALLOON_F_PAGE_POISON, }; diff --git a/include/linux/page_hinting.h b/include/linux/page_hinting.h index e800c6b07561..3ba8c1f3b4a4 100644 --- a/include/linux/page_hinting.h +++ b/include/linux/page_hinting.h @@ -1,15 +1,12 @@ #include -/* - * Size of the array which is used to store the freed pages is defined by - * MAX_FGPT_ENTRIES. If possible, we have to find a better way using which - * we can get rid of the hardcoded array size. - */ #define MAX_FGPT_ENTRIES 1000 /* * hypervisor_pages - It is a dummy structure passed with the hypercall. - * @pfn: page frame number for the page which needs to be sent to the host. - * @order: order of the page needs to be reported to the host. + * @pfn - page frame number for the page which is to be freed. + * @pages - number of pages which are supposed to be freed. + * A global array object is used to to hold the list of pfn and pages and is + * passed as part of the hypercall. */ struct hypervisor_pages { unsigned long pfn; @@ -19,11 +16,18 @@ struct hypervisor_pages { extern int guest_page_hinting_flag; extern struct static_key_false guest_page_hinting_key; extern struct smp_hotplug_thread hinting_threads; +extern void (*request_hypercall)(void *, u64, int); +extern void *balloon_ptr; extern bool want_page_poisoning; int guest_page_hinting_sysctl(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); void guest_free_page(struct page *page, int order); +extern int __isolate_free_page(struct page *page, unsigned int order); +extern void free_one_page(struct zone *zone, + struct page *page, unsigned long pfn, + unsigned int order, + int migratetype); static inline void disable_page_poisoning(void) { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index a1966cd7b677..2b0f62814e22 100644 --- a/include/uapi/linux/virtio_balloon.h +++ b/include/uapi/linux/virtio_balloon.h @@ -36,6 +36,7 @@ #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */ #define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */ #define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */ +#define VIRTIO_BALLOON_F_HINTING 5 /* Page hinting virtqueue */ /* Size of a PFN in the balloon interface. */ #define VIRTIO_BALLOON_PFN_SHIFT 12 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d295c9bc01a8..93224cba9243 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1199,7 +1199,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, spin_unlock(&zone->lock); } -static void free_one_page(struct zone *zone, +void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, int migratetype) diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c index be529f6f2bc0..315099fcda43 100644 --- a/virt/kvm/page_hinting.c +++ b/virt/kvm/page_hinting.c @@ -1,6 +1,8 @@ #include #include +#include #include +#include #include /* @@ -39,6 +41,11 @@ int guest_page_hinting_flag; EXPORT_SYMBOL(guest_page_hinting_flag); static DEFINE_PER_CPU(struct task_struct *, hinting_task); +void (*request_hypercall)(void *, u64, int); +EXPORT_SYMBOL(request_hypercall); +void *balloon_ptr; +EXPORT_SYMBOL(balloon_ptr); + int guest_page_hinting_sysctl(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) @@ -55,18 +62,201 @@ int guest_page_hinting_sysctl(struct ctl_table *table, int write, return ret; } +void hyperlist_ready(struct hypervisor_pages *guest_isolated_pages, int entries) +{ + int i = 0; + int mt = 0; + + if (balloon_ptr) + request_hypercall(balloon_ptr, (u64)&guest_isolated_pages[0], + entries); + + while (i < entries) { + struct page *page = pfn_to_page(guest_isolated_pages[i].pfn); + + mt = get_pageblock_migratetype(page); + free_one_page(page_zone(page), page, page_to_pfn(page), + guest_isolated_pages[i].order, mt); + i++; + } +} + +struct page *get_buddy_page(struct page *page) +{ + unsigned long pfn = page_to_pfn(page); + unsigned int order; + + for (order = 0; order < MAX_ORDER; order++) { + struct page *page_head = page - (pfn & ((1 << order) - 1)); + + if (PageBuddy(page_head) && page_private(page_head) >= order) + return page_head; + } + return NULL; +} + static void hinting_fn(unsigned int cpu) { struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + int idx = 0, ret = 0; + struct zone *zone_cur; + unsigned long flags = 0; + + while (idx < MAX_FGPT_ENTRIES) { + unsigned long pfn = page_hinting_obj->kvm_pt[idx].pfn; + unsigned long pfn_end = page_hinting_obj->kvm_pt[idx].pfn + + (1 << page_hinting_obj->kvm_pt[idx].order) - 1; + + while (pfn <= pfn_end) { + struct page *page = pfn_to_page(pfn); + struct page *buddy_page = NULL; + + zone_cur = page_zone(page); + spin_lock_irqsave(&zone_cur->lock, flags); + + if (PageCompound(page)) { + struct page *head_page = compound_head(page); + unsigned long head_pfn = page_to_pfn(head_page); + unsigned int alloc_pages = + 1 << compound_order(head_page); + + pfn = head_pfn + alloc_pages; + spin_unlock_irqrestore(&zone_cur->lock, flags); + continue; + } + + if (page_ref_count(page)) { + pfn++; + spin_unlock_irqrestore(&zone_cur->lock, flags); + continue; + } + + if (PageBuddy(page)) { + int buddy_order = page_private(page); + ret = __isolate_free_page(page, buddy_order); + if (!ret) { + } else { + int l_idx = page_hinting_obj->hyp_idx; + struct hypervisor_pages *l_obj = + page_hinting_obj->hypervisor_pagelist; + + l_obj[l_idx].pfn = pfn; + l_obj[l_idx].order = buddy_order; + page_hinting_obj->hyp_idx += 1; + } + pfn = pfn + (1 << buddy_order); + spin_unlock_irqrestore(&zone_cur->lock, flags); + continue; + } + + buddy_page = get_buddy_page(page); + if (buddy_page) { + int buddy_order = page_private(buddy_page); + + ret = __isolate_free_page(buddy_page, + buddy_order); + if (!ret) { + } else { + int l_idx = page_hinting_obj->hyp_idx; + struct hypervisor_pages *l_obj = + page_hinting_obj->hypervisor_pagelist; + unsigned long buddy_pfn = + page_to_pfn(buddy_page); + + l_obj[l_idx].pfn = buddy_pfn; + l_obj[l_idx].order = buddy_order; + page_hinting_obj->hyp_idx += 1; + } + pfn = page_to_pfn(buddy_page) + + (1 << buddy_order); + spin_unlock_irqrestore(&zone_cur->lock, flags); + continue; + } + spin_unlock_irqrestore(&zone_cur->lock, flags); + pfn++; + } + page_hinting_obj->kvm_pt[idx].pfn = 0; + page_hinting_obj->kvm_pt[idx].order = -1; + page_hinting_obj->kvm_pt[idx].zonenum = -1; + idx++; + } + if (page_hinting_obj->hyp_idx > 0) { + hyperlist_ready(page_hinting_obj->hypervisor_pagelist, + page_hinting_obj->hyp_idx); + page_hinting_obj->hyp_idx = 0; + } page_hinting_obj->kvm_pt_idx = 0; put_cpu_var(hinting_obj); } +int if_exist(struct page *page) +{ + int i = 0; + struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + + while (i < MAX_FGPT_ENTRIES) { + if (page_to_pfn(page) == page_hinting_obj->kvm_pt[i].pfn) + return 1; + i++; + } + return 0; +} + +void pack_array(void) +{ + int i = 0, j = 0; + struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + + while (i < MAX_FGPT_ENTRIES) { + if (page_hinting_obj->kvm_pt[i].pfn != 0) { + if (i != j) { + page_hinting_obj->kvm_pt[j].pfn = + page_hinting_obj->kvm_pt[i].pfn; + page_hinting_obj->kvm_pt[j].order = + page_hinting_obj->kvm_pt[i].order; + page_hinting_obj->kvm_pt[j].zonenum = + page_hinting_obj->kvm_pt[i].zonenum; + } + j++; + } + i++; + } + i = j; + page_hinting_obj->kvm_pt_idx = j; + while (j < MAX_FGPT_ENTRIES) { + page_hinting_obj->kvm_pt[j].pfn = 0; + page_hinting_obj->kvm_pt[j].order = -1; + page_hinting_obj->kvm_pt[j].zonenum = -1; + j++; + } +} + void scan_array(void) { struct page_hinting *page_hinting_obj = this_cpu_ptr(&hinting_obj); + int i = 0; + while (i < MAX_FGPT_ENTRIES) { + struct page *page = + pfn_to_page(page_hinting_obj->kvm_pt[i].pfn); + struct page *buddy_page = get_buddy_page(page); + + if (!PageBuddy(page) && buddy_page) { + if (if_exist(buddy_page)) { + page_hinting_obj->kvm_pt[i].pfn = 0; + page_hinting_obj->kvm_pt[i].order = -1; + page_hinting_obj->kvm_pt[i].zonenum = -1; + } else { + page_hinting_obj->kvm_pt[i].pfn = + page_to_pfn(buddy_page); + page_hinting_obj->kvm_pt[i].order = + page_private(buddy_page); + } + } + i++; + } + pack_array(); if (page_hinting_obj->kvm_pt_idx == MAX_FGPT_ENTRIES) wake_up_process(__this_cpu_read(hinting_task)); } @@ -111,8 +301,18 @@ void guest_free_page(struct page *page, int order) page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].order = order; page_hinting_obj->kvm_pt_idx += 1; - if (page_hinting_obj->kvm_pt_idx == MAX_FGPT_ENTRIES) + if (page_hinting_obj->kvm_pt_idx == MAX_FGPT_ENTRIES) { + /* + * We are depending on the buddy free-list to identify + * if a page is free or not. Hence, we are dumping all + * the per-cpu pages back into the buddy allocator. This + * will ensure less failures when we try to isolate free + * captured pages and hence more memory reporting to the + * host. + */ + drain_local_pages(NULL); scan_array(); + } } local_irq_restore(flags); } From patchwork Mon Feb 4 20:18:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nitesh Narayan Lal X-Patchwork-Id: 10796533 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A213A922 for ; Mon, 4 Feb 2019 20:20:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91B1A28737 for ; Mon, 4 Feb 2019 20:20:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D5482C06D; Mon, 4 Feb 2019 20:20:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ECEA22AD31 for ; Mon, 4 Feb 2019 20:20:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728401AbfBDUUH (ORCPT ); Mon, 4 Feb 2019 15:20:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51472 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727861AbfBDUUF (ORCPT ); Mon, 4 Feb 2019 15:20:05 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5E788C05001B; Mon, 4 Feb 2019 20:20:05 +0000 (UTC) Received: from virtlab420.virt.lab.eng.bos.redhat.com (virtlab420.virt.lab.eng.bos.redhat.com [10.19.152.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id E23188AD88; Mon, 4 Feb 2019 20:19:57 +0000 (UTC) From: Nitesh Narayan Lal To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, pbonzini@redhat.com, lcapitulino@redhat.com, pagupta@redhat.com, wei.w.wang@intel.com, yang.zhang.wz@gmail.com, riel@surriel.com, david@redhat.com, mst@redhat.com, dodgen@google.com, konrad.wilk@oracle.com, dhildenb@redhat.com, aarcange@redhat.com Subject: [RFC][Patch v8 7/7] KVM: Adding tracepoints for guest page hinting Date: Mon, 4 Feb 2019 15:18:54 -0500 Message-Id: <20190204201854.2328-8-nitesh@redhat.com> In-Reply-To: <20190204201854.2328-1-nitesh@redhat.com> References: <20190204201854.2328-1-nitesh@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Mon, 04 Feb 2019 20:20:05 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch enables to track the pages freed by the guest and the pages isolated by the page hinting code through kernel tracepoints. Signed-off-by: Nitesh Narayan Lal --- include/trace/events/kmem.h | 40 +++++++++++++++++++++++++++++++++++++ virt/kvm/page_hinting.c | 10 ++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index eb57e3037deb..69f6da9ff939 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -315,6 +315,46 @@ TRACE_EVENT(mm_page_alloc_extfrag, __entry->change_ownership) ); +TRACE_EVENT(guest_free_page, + TP_PROTO(struct page *page, unsigned int order), + + TP_ARGS(page, order), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + __field(unsigned int, order) + ), + + TP_fast_assign( + __entry->pfn = page_to_pfn(page); + __entry->order = order; + ), + + TP_printk("page=%p pfn=%lu number of pages=%d", + pfn_to_page(__entry->pfn), + __entry->pfn, + (1 << __entry->order)) +); + +TRACE_EVENT(guest_isolated_pfn, + TP_PROTO(unsigned long pfn, unsigned int pages), + + TP_ARGS(pfn, pages), + + TP_STRUCT__entry( + __field(unsigned long, pfn) + __field(unsigned int, pages) + ), + + TP_fast_assign( + __entry->pfn = pfn; + __entry->pages = pages; + ), + + TP_printk("pfn=%lu number of pages=%u", + __entry->pfn, + __entry->pages) +); #endif /* _TRACE_KMEM_H */ /* This part must be outside protection */ diff --git a/virt/kvm/page_hinting.c b/virt/kvm/page_hinting.c index 315099fcda43..395d94e52c74 100644 --- a/virt/kvm/page_hinting.c +++ b/virt/kvm/page_hinting.c @@ -4,6 +4,7 @@ #include #include #include +#include /* * struct kvm_free_pages - Tracks the pages which are freed by the guest. @@ -140,7 +141,11 @@ static void hinting_fn(unsigned int cpu) int l_idx = page_hinting_obj->hyp_idx; struct hypervisor_pages *l_obj = page_hinting_obj->hypervisor_pagelist; + unsigned int buddy_pages = + 1 << buddy_order; + trace_guest_isolated_pfn(pfn, + buddy_pages); l_obj[l_idx].pfn = pfn; l_obj[l_idx].order = buddy_order; page_hinting_obj->hyp_idx += 1; @@ -163,7 +168,11 @@ static void hinting_fn(unsigned int cpu) page_hinting_obj->hypervisor_pagelist; unsigned long buddy_pfn = page_to_pfn(buddy_page); + unsigned int buddy_pages = + 1 << buddy_order; + trace_guest_isolated_pfn(pfn, + buddy_pages); l_obj[l_idx].pfn = buddy_pfn; l_obj[l_idx].order = buddy_order; page_hinting_obj->hyp_idx += 1; @@ -294,6 +303,7 @@ void guest_free_page(struct page *page, int order) local_irq_save(flags); if (page_hinting_obj->kvm_pt_idx != MAX_FGPT_ENTRIES) { disable_page_poisoning(); + trace_guest_free_page(page, order); page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].pfn = page_to_pfn(page); page_hinting_obj->kvm_pt[page_hinting_obj->kvm_pt_idx].zonenum =