From patchwork Mon Feb 4 18:15:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10796321 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 86DEB13B4 for ; Mon, 4 Feb 2019 18:15:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7803F2B73A for ; Mon, 4 Feb 2019 18:15:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6C15E2B84A; Mon, 4 Feb 2019 18:15:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 10DC52B73A for ; Mon, 4 Feb 2019 18:15:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729331AbfBDSPn (ORCPT ); Mon, 4 Feb 2019 13:15:43 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:46864 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729306AbfBDSPm (ORCPT ); Mon, 4 Feb 2019 13:15:42 -0500 Received: by mail-pg1-f195.google.com with SMTP id w7so251798pgp.13; Mon, 04 Feb 2019 10:15:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=zqElbD91wN7xrII3yrqdT2po60YKbiGuJ5EuHOboq08=; b=oJsYf7z38AP8TbCH80dOcCkJYlGDdhpEscQ3nBiPGAezSpQoznfn6yaatFIh56rLa3 WQTC59TkQ38+viuHQYO8ZTHXdI+ikg6haVJUBIfi65J1jh7YFOQScPeFfZn5FdLIwwV1 HQBqxVYcgT/LNtaebjn1fviZ+IgZBpZybIhLxGWy4QMZkk8/ERcgCeCm1h0Ahhp0HBSL ojLR6pytSqLpez0vfIpiHCesbcI1La8NGXWRe7x6NeiEsm8412Xo098elaZawQ0pGk05 5ytGec+2Ioqy0lGzOd4BXBHFOLC6p+IG7quU66Y9Uh+PJMntM5b7DnSKxSFW0Y1aRGXF smjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=zqElbD91wN7xrII3yrqdT2po60YKbiGuJ5EuHOboq08=; b=LXuOcFISuwC4gze3TDCTShTZFrOZVU+ZnCy1aNK0848sPWlZ0p9TPUr3/K/rVIEGTi tdL+kvNTZDkQ7xSxEb0oHB4u3A1fzfeoOjYzWv9lGW/1BeLXLqKdFnnCPbb1MxL79izm X5MxJTiMEh+ba+MjAcje2Kefr3f5V0Iy3mOVEnXfn9xR6A8rPSlMffYsNdSpe9J5O29H m35G0X2kuFkdC+YZFA+NavsjAhYFB+i7tuOErHgdGQIQRfiufp7+IApoEuoPQNctesCl dxjqIi0UzIH8ayYW4x2Qt7gWmMYl4as4cmgkclEHcjboS7HXhaVsAosJmBEmMrsyiQgS 6X8w== X-Gm-Message-State: AHQUAuaBssMwK/TvsA8m8S+hEizStvF1qetiRYuqZ1JKvh64FsM5HofK LpwZFW3/LUVdDGOCeDSIoHk= X-Google-Smtp-Source: AHgI3Ia491NQopE8fziC0IJZOVG35rTaJ4DGuP8qMsPV/GoMoreeKpaQJ7srkoR5qCHTRvCiosG1aw== X-Received: by 2002:a63:2bc4:: with SMTP id r187mr615506pgr.306.1549304141567; Mon, 04 Feb 2019 10:15:41 -0800 (PST) Received: from localhost.localdomain ([2001:470:b:9c3:9e5c:8eff:fe4f:f2d0]) by smtp.gmail.com with ESMTPSA id v89sm1323954pfk.12.2019.02.04.10.15.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Feb 2019 10:15:41 -0800 (PST) Subject: [RFC PATCH 1/4] madvise: Expose ability to set dontneed from kernel From: Alexander Duyck To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: rkrcmar@redhat.com, alexander.h.duyck@linux.intel.com, x86@kernel.org, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, pbonzini@redhat.com, tglx@linutronix.de, akpm@linux-foundation.org Date: Mon, 04 Feb 2019 10:15:40 -0800 Message-ID: <20190204181540.12095.87973.stgit@localhost.localdomain> In-Reply-To: <20190204181118.12095.38300.stgit@localhost.localdomain> References: <20190204181118.12095.38300.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Alexander Duyck In order to enable a KVM hypervisor to notify the host that a guest has freed its pages we will need to have a mechanism to update the virtual memory associated with the guest. In order to expose this functionality I am adding a new function do_madvise_dontneed that can be used to indicate a region that a given VM is done with. Signed-off-by: Alexander Duyck --- include/linux/mm.h | 2 ++ mm/madvise.c | 13 ++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e04396375cf9..eb668a5b4b4f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2840,5 +2840,7 @@ static inline bool page_is_guard(struct page *page) static inline void setup_nr_node_ids(void) {} #endif +int do_madvise_dontneed(unsigned long start, size_t len_in); + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/mm/madvise.c b/mm/madvise.c index 21a7881a2db4..8730f7e0081a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -799,7 +799,7 @@ static int madvise_inject_error(int behavior, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) +static int do_madvise(unsigned long start, size_t len_in, int behavior) { unsigned long end, tmp; struct vm_area_struct *vma, *prev; @@ -894,3 +894,14 @@ static int madvise_inject_error(int behavior, return error; } + +SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) +{ + return do_madvise(start, len_in, behavior); +} + +int do_madvise_dontneed(unsigned long start, size_t len_in) +{ + return do_madvise(start, len_in, MADV_DONTNEED); +} +EXPORT_SYMBOL_GPL(do_madvise_dontneed); From patchwork Mon Feb 4 18:15:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10796325 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1205413A4 for ; Mon, 4 Feb 2019 18:15:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0247B2B720 for ; Mon, 4 Feb 2019 18:15:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EA6C22B7F8; Mon, 4 Feb 2019 18:15:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 637EB2B720 for ; Mon, 4 Feb 2019 18:15:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729434AbfBDSPt (ORCPT ); Mon, 4 Feb 2019 13:15:49 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:40416 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729077AbfBDSPs (ORCPT ); Mon, 4 Feb 2019 13:15:48 -0500 Received: by mail-pg1-f194.google.com with SMTP id z10so266708pgp.7; Mon, 04 Feb 2019 10:15:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=nq0LeNkhaffr3QT/wk9IUrhhpxaGavdHfyKoMsgvjdo=; b=L7X4qKJ6GtgkMcoLCzJQFQp5YQflLDPqBeUBgAQS/WXNVw9NqiYnEdhx6yzV/spFHf 7L6vF0PKeQfrqkdqNkzH3QVdqEBRrkySxTix4y/7CmVOEzkdHP528G8v2K4DbX4S0WV2 AhSF+d4Gsnd8UxoIwWQ8vQD/e9fxTkgZdr562N5/FfkuEGVN08h77ixyK50tK0aEZWUz S3qkPFlc6MIeg2bP73mBpc4Z++3TXyhuw9NU68HOgeGg+81CIb7tNpBUAAWspG1rQu+J wp+8/e5Z5znfNnx7la1axnMHEZ93SPljoKMEtbwXuayDg+bCSFIS50QchxDi7o75IWLR nAKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=nq0LeNkhaffr3QT/wk9IUrhhpxaGavdHfyKoMsgvjdo=; b=XJ1hxJoWsGFYO8mjoJvDuwa32Dng3ucKQeXXfF52GrBsCWvmXlLWe5+Xj+zWLyZmoi yXOy6GUkpuw6uVw55JySvWNecceP4T63LfsnohewweuniSfYAZepb3BgOwWjLpE6583S GBGMiFMIE+QGSRQ6ZBvWo9WJa4/pcVTPmCulzOpr7DiVu1DOAO/ZZ7egOMLiBsCUVJWX QqBMP5KDZrfBEQTk4oX80PDi5qIVkxmKwbpvz7D9DMVaiS+32W8Qe8iTdmbqpHkBjqEg 60GFbSs+ArTkGgg4NeXfn8XgG+ux5/9CbaKPOI4DxpwUo6Y7ZDwHSO18I4CNJVS0xax3 jNLA== X-Gm-Message-State: AHQUAub8JF9Bx1v7pFw0xQsV+ikgK8OX+oLe1vn1Ux8We61DIswTYDVf ztX9h2LBvO95JGkDlMct9jY= X-Google-Smtp-Source: AHgI3IaZMHStqmgitk+Th2B3stPdDyvz8KaztFDV/2BXNVgK22LkFei1Bc7H+nadwdx82aStWOarZQ== X-Received: by 2002:a63:4665:: with SMTP id v37mr575190pgk.425.1549304147758; Mon, 04 Feb 2019 10:15:47 -0800 (PST) Received: from localhost.localdomain ([2001:470:b:9c3:9e5c:8eff:fe4f:f2d0]) by smtp.gmail.com with ESMTPSA id x11sm1092934pfe.72.2019.02.04.10.15.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Feb 2019 10:15:47 -0800 (PST) Subject: [RFC PATCH 2/4] kvm: Add host side support for free memory hints From: Alexander Duyck To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: rkrcmar@redhat.com, alexander.h.duyck@linux.intel.com, x86@kernel.org, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, pbonzini@redhat.com, tglx@linutronix.de, akpm@linux-foundation.org Date: Mon, 04 Feb 2019 10:15:46 -0800 Message-ID: <20190204181546.12095.81356.stgit@localhost.localdomain> In-Reply-To: <20190204181118.12095.38300.stgit@localhost.localdomain> References: <20190204181118.12095.38300.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Alexander Duyck Add the host side of the KVM memory hinting support. With this we expose a feature bit indicating that the host will pass the messages along to the new madvise function. This functionality is mutually exclusive with device assignment. If a device is assigned we will disable the functionality as it could lead to a potential memory corruption if a device writes to a page after KVM has flagged it as not being used. The logic as it is currently defined limits the hint to only supporting a hugepage or larger notifications. This is meant to help prevent us from potentially breaking up huge pages by hinting that only a portion of the page is not needed. Signed-off-by: Alexander Duyck --- Documentation/virtual/kvm/cpuid.txt | 4 +++ Documentation/virtual/kvm/hypercalls.txt | 14 ++++++++++++ arch/x86/include/uapi/asm/kvm_para.h | 3 +++ arch/x86/kvm/cpuid.c | 6 ++++- arch/x86/kvm/x86.c | 35 ++++++++++++++++++++++++++++++ include/uapi/linux/kvm_para.h | 1 + 6 files changed, 62 insertions(+), 1 deletion(-) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 97ca1940a0dc..fe3395a58b7e 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -66,6 +66,10 @@ KVM_FEATURE_PV_SEND_IPI || 11 || guest checks this feature bit || || before using paravirtualized || || send IPIs. ------------------------------------------------------------------------------ +KVM_FEATURE_PV_UNUSED_PAGE_HINT || 12 || guest checks this feature bit + || || before using paravirtualized + || || unused page hints. +------------------------------------------------------------------------------ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt index da24c138c8d1..b374678ac1f9 100644 --- a/Documentation/virtual/kvm/hypercalls.txt +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -141,3 +141,17 @@ a0 corresponds to the APIC ID in the third argument (a2), bit 1 corresponds to the APIC ID a2+1, and so on. Returns the number of CPUs to which the IPIs were delivered successfully. + +7. KVM_HC_UNUSED_PAGE_HINT +------------------------ +Architecture: x86 +Status: active +Purpose: Send unused page hint to host + +a0: physical address of region unused, page aligned +a1: size of unused region, page aligned + +The hypercall lets a guest send notifications to the host that it will no +longer be using a given page in memory. Multiple pages can be hinted at by +using the size field to hint that a higher order page is available by +specifying the higher order page size. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 19980ec1a316..f066c23060df 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -29,6 +29,7 @@ #define KVM_FEATURE_PV_TLB_FLUSH 9 #define KVM_FEATURE_ASYNC_PF_VMEXIT 10 #define KVM_FEATURE_PV_SEND_IPI 11 +#define KVM_FEATURE_PV_UNUSED_PAGE_HINT 12 #define KVM_HINTS_REALTIME 0 @@ -119,4 +120,6 @@ struct kvm_vcpu_pv_apf_data { #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK #define KVM_PV_EOI_DISABLED 0x0 +#define KVM_PV_UNUSED_PAGE_HINT_MIN_ORDER HUGETLB_PAGE_ORDER + #endif /* _UAPI_ASM_X86_KVM_PARA_H */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index bbffa6c54697..b82bcbfbc420 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -136,6 +136,9 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu) if (kvm_hlt_in_guest(vcpu->kvm) && best && (best->eax & (1 << KVM_FEATURE_PV_UNHALT))) best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT); + if (kvm_arch_has_assigned_device(vcpu->kvm) && best && + (best->eax & KVM_FEATURE_PV_UNUSED_PAGE_HINT)) + best->eax &= ~(1 << KVM_FEATURE_PV_UNUSED_PAGE_HINT); /* Update physical-address width */ vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); @@ -637,7 +640,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_PV_UNHALT) | (1 << KVM_FEATURE_PV_TLB_FLUSH) | (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) | - (1 << KVM_FEATURE_PV_SEND_IPI); + (1 << KVM_FEATURE_PV_SEND_IPI) | + (1 << KVM_FEATURE_PV_UNUSED_PAGE_HINT); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3d27206f6c01..3ec75ab849e2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -55,6 +55,7 @@ #include #include #include +#include #include @@ -7052,6 +7053,37 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu) kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu); } +static int kvm_pv_unused_page_hint_op(struct kvm *kvm, gpa_t gpa, size_t len) +{ + unsigned long start; + + /* + * Guarantee the following: + * len meets minimum size + * len is a power of 2 + * gpa is aligned to len + */ + if (len < (PAGE_SIZE << KVM_PV_UNUSED_PAGE_HINT_MIN_ORDER)) + return -KVM_EINVAL; + if (!is_power_of_2(len) || !IS_ALIGNED(gpa, len)) + return -KVM_EINVAL; + + /* + * If a device is assigned we cannot use use madvise as memory + * is shared with the device and could lead to memory corruption + * if the device writes to it after free. + */ + if (kvm_arch_has_assigned_device(kvm)) + return -KVM_EOPNOTSUPP; + + start = gfn_to_hva(kvm, gpa_to_gfn(gpa)); + + if (kvm_is_error_hva(start + len)) + return -KVM_EFAULT; + + return do_madvise_dontneed(start, len); +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -7098,6 +7130,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_SEND_IPI: ret = kvm_pv_send_ipi(vcpu->kvm, a0, a1, a2, a3, op_64_bit); break; + case KVM_HC_UNUSED_PAGE_HINT: + ret = kvm_pv_unused_page_hint_op(vcpu->kvm, a0, a1); + break; default: ret = -KVM_ENOSYS; break; diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 6c0ce49931e5..75643b862a4e 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -28,6 +28,7 @@ #define KVM_HC_MIPS_CONSOLE_OUTPUT 8 #define KVM_HC_CLOCK_PAIRING 9 #define KVM_HC_SEND_IPI 10 +#define KVM_HC_UNUSED_PAGE_HINT 11 /* * hypercalls use architecture specific From patchwork Mon Feb 4 18:15:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10796329 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C20F61669 for ; Mon, 4 Feb 2019 18:15:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B33C92B720 for ; Mon, 4 Feb 2019 18:15:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A742C2B7F8; Mon, 4 Feb 2019 18:15:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AE8B2B720 for ; Mon, 4 Feb 2019 18:15:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729528AbfBDSP4 (ORCPT ); Mon, 4 Feb 2019 13:15:56 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:41183 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726585AbfBDSPz (ORCPT ); Mon, 4 Feb 2019 13:15:55 -0500 Received: by mail-pf1-f195.google.com with SMTP id b7so282460pfi.8; Mon, 04 Feb 2019 10:15:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=YZe004olxwKUVDNhwEwrZ8AUV4iVt97unB7E5kh6nQ4=; b=IQrmXhYJn+hqxmnp2CKgxQtlZ/Plcxr3XKvJAlVd5LVl4ehyQEQyTW3i6g9aDkFdWB BlSQvYrHO04cqFRXHDFceoyzCyJjkEUYcdv8SMX3X8kEcAFyGnZZnQy3YTjYAAoKzZzL 37CyqRl4Xz3IG9WZGdn/ub1I0CoSEBtCocL0KPfAtM/Pq8+/TphetVo8ZCIOKRkSMe/e 6zdTFFr4eBZmo16fnkCw3kEFRC29XwHCOtqP8uxcfKFLPIb9OtClIu6+Ed1cZ6kCaIqz YrpTH1YSNpQoSXSdmT38pszxTEWgFqbZ+qx5I7iUCxMXnWy77QKrrIdOErtX/PYZ/ZRv s1xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=YZe004olxwKUVDNhwEwrZ8AUV4iVt97unB7E5kh6nQ4=; b=RZBzVY5Xtgk3uvGrp08r4Mv9ddknGM3TUPCwNMN11AdGNFa79LTOBs0o9xHTmIXbwr FpBQJWBerPzWEFFb0wVK1KxWRKY0V/rYbhdk/RRjF2e8TV6DO2BLlefcPu4z2st3BjZ4 ClYbUN/ex6oH95KUhs69ZZrgBy5dCE7kFms0vKPTKvSZzrn4ArjXl1fGlScP0mWamanl dMvOyIRKkpL1xFHhRdadyZ+8atzoNwfh91KGC36Q05qITTI7ywAaFTGFgXPEGljLj9kk W5hv09okoDRXrm1iJYrL9xJiCR/IqXJrf9uZKHTlaRhGoDLxIP5tqHA7miFV1Dl+1fxh DQEg== X-Gm-Message-State: AHQUAuZns8SO5zgv41qfomIxZ0zh/aIIkrXO8HOAsoxkeuHLO4BC4WR5 HTRtWqOtPeN58UVVwPW+/Cs= X-Google-Smtp-Source: AHgI3IYghO9Gyc5CdVnnZDhQJpFkPxKmWnw3rcVOoUNENx3bnkpWEbZ8P5nJPUmckNTyGdk2ogPr6Q== X-Received: by 2002:a63:1f64:: with SMTP id q36mr580276pgm.230.1549304153874; Mon, 04 Feb 2019 10:15:53 -0800 (PST) Received: from localhost.localdomain ([2001:470:b:9c3:9e5c:8eff:fe4f:f2d0]) by smtp.gmail.com with ESMTPSA id q5sm2033090pfi.165.2019.02.04.10.15.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Feb 2019 10:15:53 -0800 (PST) Subject: [RFC PATCH 3/4] kvm: Add guest side support for free memory hints From: Alexander Duyck To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: rkrcmar@redhat.com, alexander.h.duyck@linux.intel.com, x86@kernel.org, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, pbonzini@redhat.com, tglx@linutronix.de, akpm@linux-foundation.org Date: Mon, 04 Feb 2019 10:15:52 -0800 Message-ID: <20190204181552.12095.46287.stgit@localhost.localdomain> In-Reply-To: <20190204181118.12095.38300.stgit@localhost.localdomain> References: <20190204181118.12095.38300.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Alexander Duyck Add guest support for providing free memory hints to the KVM hypervisor for freed pages huge TLB size or larger. I am restricting the size to huge TLB order and larger because the hypercalls are too expensive to be performing one per 4K page. Using the huge TLB order became the obvious choice for the order to use as it allows us to avoid fragmentation of higher order memory on the host. I have limited the functionality so that it doesn't work when page poisoning is enabled. I did this because a write to the page after doing an MADV_DONTNEED would effectively negate the hint, so it would be wasting cycles to do so. Signed-off-by: Alexander Duyck --- arch/x86/include/asm/page.h | 13 +++++++++++++ arch/x86/kernel/kvm.c | 23 +++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 7555b48803a8..4487ad7a3385 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -18,6 +18,19 @@ struct page; +#ifdef CONFIG_KVM_GUEST +#include +extern struct static_key_false pv_free_page_hint_enabled; + +#define HAVE_ARCH_FREE_PAGE +void __arch_free_page(struct page *page, unsigned int order); +static inline void arch_free_page(struct page *page, unsigned int order) +{ + if (static_branch_unlikely(&pv_free_page_hint_enabled)) + __arch_free_page(page, order); +} +#endif + #include extern struct range pfn_mapped[]; extern int nr_pfn_mapped; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 5c93a65ee1e5..09c91641c36c 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -48,6 +48,7 @@ #include static int kvmapf = 1; +DEFINE_STATIC_KEY_FALSE(pv_free_page_hint_enabled); static int __init parse_no_kvmapf(char *arg) { @@ -648,6 +649,15 @@ static void __init kvm_guest_init(void) if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); + /* + * The free page hinting doesn't add much value if page poisoning + * is enabled. So we only enable the feature if page poisoning is + * no present. + */ + if (!page_poisoning_enabled() && + kvm_para_has_feature(KVM_FEATURE_PV_UNUSED_PAGE_HINT)) + static_branch_enable(&pv_free_page_hint_enabled); + #ifdef CONFIG_SMP smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus; smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu; @@ -762,6 +772,19 @@ static __init int kvm_setup_pv_tlb_flush(void) } arch_initcall(kvm_setup_pv_tlb_flush); +void __arch_free_page(struct page *page, unsigned int order) +{ + /* + * Limit hints to blocks no smaller than pageblock in + * size to limit the cost for the hypercalls. + */ + if (order < KVM_PV_UNUSED_PAGE_HINT_MIN_ORDER) + return; + + kvm_hypercall2(KVM_HC_UNUSED_PAGE_HINT, page_to_phys(page), + PAGE_SIZE << order); +} + #ifdef CONFIG_PARAVIRT_SPINLOCKS /* Kick a cpu by its apicid. Used to wake up a halted vcpu */ From patchwork Mon Feb 4 18:15:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Duyck X-Patchwork-Id: 10796333 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C09F313A4 for ; Mon, 4 Feb 2019 18:16:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B17EF2B720 for ; Mon, 4 Feb 2019 18:16:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A3DCA2B7F8; Mon, 4 Feb 2019 18:16:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 370882B720 for ; Mon, 4 Feb 2019 18:16:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729619AbfBDSQB (ORCPT ); Mon, 4 Feb 2019 13:16:01 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:36927 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729548AbfBDSQB (ORCPT ); Mon, 4 Feb 2019 13:16:01 -0500 Received: by mail-pf1-f195.google.com with SMTP id y126so293259pfb.4; Mon, 04 Feb 2019 10:16:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=BH3kUGOLH4rXiV4UZ/yFSk8Kv2EdG9WZAShG2rqLHTM=; b=HBVmbsGODWcw1Zzq6loJb9qa/nqzCqIjc1caWDwzjyVMZ0cmFJhvTchZVI2WOq0SGD K4mHhjv+oZT7Dk4YHXIh0FzSfuUvWLrG1BGCDIpuHv+axbdWqQYThhkdY5vAh6CubTRI 6G4HgTpmqZTTI1aEb1ZK3zYjOIvHQ9+xhErmeERXv3lpM/Hbv04CdXW8ILjza6RN2IJa q+7M7+iRHnlW3ZY88/DcYz7Uy5sz23/8oKf6rqSooCtg8X23HxZJlCOC3paGvl4fG0eN amSuhIznkGkG7AuuxMuc46cqDxIP0uc2r4qUVRoHCp5Qyz4dV046yyikJ4zcIEwnzJH3 XWKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=BH3kUGOLH4rXiV4UZ/yFSk8Kv2EdG9WZAShG2rqLHTM=; b=IrmLqhOl9Aaqfbco4/DVHqILJ0IDIU2MVTzOcmIEWNkaj3Rtql45xu2IOv4+L+Zxd2 r/H+0K07pqWa9dO5C+RhUs5ecD09RtctlMMrWpA4OznCXf70GCJqSO/AUIlo92SgT5Ae zAiQ4RcexFq93kIvkFeGMHMXSaW+NUNW8V50ZeHnmTb+S3Chts39BQ2WWgp+QK2+gBa3 SzKQvV1vCWly91rJ6vKFMoVUh6JRvdJmtyRuPzowdf8AaQHIbHAH8yu8uFBbU0cJsekY H9mWxsEkuBMQ8DuGhKvpJ1gm02JGiPhPA9Lv3jrGpIG5JnEWK1CEa0AsMgrrr58QEQcC AS7g== X-Gm-Message-State: AHQUAuaW+i3hOzDpr4jYEZzoaUmhKrjUQkrD8cxj3yx88iyhRnynvpGy I1CkcY+zhZFpTcnewjHfT1o= X-Google-Smtp-Source: AHgI3IatDmHCVIQvkPjf403DPZXddCGM1dtT3pI76p0hmmxOMfvPU6zQrFp/JIlwwbmQbdyNZzEbOA== X-Received: by 2002:a62:37c3:: with SMTP id e186mr635900pfa.251.1549304159979; Mon, 04 Feb 2019 10:15:59 -0800 (PST) Received: from localhost.localdomain ([2001:470:b:9c3:9e5c:8eff:fe4f:f2d0]) by smtp.gmail.com with ESMTPSA id d129sm991541pfc.31.2019.02.04.10.15.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Feb 2019 10:15:59 -0800 (PST) Subject: [RFC PATCH 4/4] mm: Add merge page notifier From: Alexander Duyck To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: rkrcmar@redhat.com, alexander.h.duyck@linux.intel.com, x86@kernel.org, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, pbonzini@redhat.com, tglx@linutronix.de, akpm@linux-foundation.org Date: Mon, 04 Feb 2019 10:15:58 -0800 Message-ID: <20190204181558.12095.83484.stgit@localhost.localdomain> In-Reply-To: <20190204181118.12095.38300.stgit@localhost.localdomain> References: <20190204181118.12095.38300.stgit@localhost.localdomain> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Alexander Duyck Because the implementation was limiting itself to only providing hints on pages huge TLB order sized or larger we introduced the possibility for free pages to slip past us because they are freed as something less then huge TLB in size and aggregated with buddies later. To address that I am adding a new call arch_merge_page which is called after __free_one_page has merged a pair of pages to create a higher order page. By doing this I am able to fill the gap and provide full coverage for all of the pages huge TLB order or larger. Signed-off-by: Alexander Duyck --- arch/x86/include/asm/page.h | 12 ++++++++++++ arch/x86/kernel/kvm.c | 28 ++++++++++++++++++++++++++++ include/linux/gfp.h | 4 ++++ mm/page_alloc.c | 2 ++ 4 files changed, 46 insertions(+) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 4487ad7a3385..9540a97c9997 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -29,6 +29,18 @@ static inline void arch_free_page(struct page *page, unsigned int order) if (static_branch_unlikely(&pv_free_page_hint_enabled)) __arch_free_page(page, order); } + +struct zone; + +#define HAVE_ARCH_MERGE_PAGE +void __arch_merge_page(struct zone *zone, struct page *page, + unsigned int order); +static inline void arch_merge_page(struct zone *zone, struct page *page, + unsigned int order) +{ + if (static_branch_unlikely(&pv_free_page_hint_enabled)) + __arch_merge_page(zone, page, order); +} #endif #include diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 09c91641c36c..957bb4f427bb 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -785,6 +785,34 @@ void __arch_free_page(struct page *page, unsigned int order) PAGE_SIZE << order); } +void __arch_merge_page(struct zone *zone, struct page *page, + unsigned int order) +{ + /* + * The merging logic has merged a set of buddies up to the + * KVM_PV_UNUSED_PAGE_HINT_MIN_ORDER. Since that is the case, take + * advantage of this moment to notify the hypervisor of the free + * memory. + */ + if (order != KVM_PV_UNUSED_PAGE_HINT_MIN_ORDER) + return; + + /* + * Drop zone lock while processing the hypercall. This + * should be safe as the page has not yet been added + * to the buddy list as of yet and all the pages that + * were merged have had their buddy/guard flags cleared + * and their order reset to 0. + */ + spin_unlock(&zone->lock); + + kvm_hypercall2(KVM_HC_UNUSED_PAGE_HINT, page_to_phys(page), + PAGE_SIZE << order); + + /* reacquire lock and resume freeing memory */ + spin_lock(&zone->lock); +} + #ifdef CONFIG_PARAVIRT_SPINLOCKS /* Kick a cpu by its apicid. Used to wake up a halted vcpu */ diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fdab7de7490d..4746d5560193 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -459,6 +459,10 @@ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) #ifndef HAVE_ARCH_FREE_PAGE static inline void arch_free_page(struct page *page, int order) { } #endif +#ifndef HAVE_ARCH_MERGE_PAGE +static inline void +arch_merge_page(struct zone *zone, struct page *page, int order) { } +#endif #ifndef HAVE_ARCH_ALLOC_PAGE static inline void arch_alloc_page(struct page *page, int order) { } #endif diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c954f8c1fbc4..7a1309b0b7c5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -913,6 +913,8 @@ static inline void __free_one_page(struct page *page, page = page + (combined_pfn - pfn); pfn = combined_pfn; order++; + + arch_merge_page(zone, page, order); } if (max_order < MAX_ORDER) { /* If we are here, it means order is >= pageblock_order.