From patchwork Mon Nov 18 12:39:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13878479 Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FEEE13A86A; Mon, 18 Nov 2024 12:40:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.190.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933621; cv=none; b=qPLQm35TbchZka0YqO1rFxu16x0ucGIYKdzZMBLcRgzw55nKNpFbx6hdUWxSipqJMkdzmkALtnpbcumeaa2/CN0Goc6/M5aoPK80x5172EzepEYIWSN2kZ9SG6c2l7gHTeAwXVnTKE5DOqmwAu5o6/lxBDU9PwMoi3fq6MBcsZ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933621; c=relaxed/simple; bh=kNxZvO8y36K4Ep7rrKtJA4RxlL2dmtI+1okxTDd2da4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HZCShcwMynAvvJCx3vGQyXjq7IsroQBrjqk9dBtEp4qsqlWzF+IXkGCjLQR4QJBY2aYjbE91zk1gh0gWDE5vZUxLQ1zuC4gyz9aITFDpHQv0ihWQC6dvKUq4bK1VJWw0547HBTt09/0y6PUHBShbK7NBh72R6uGxFzt8dbVm95c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=U8VwvxEt; arc=none smtp.client-ip=207.171.190.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="U8VwvxEt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1731933620; x=1763469620; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FSBrb5cqWO2WFHJzPfcnCvsF7cALuKZ15nU4s+5TVuI=; b=U8VwvxEtcvCEoMpVKTnn9gLQXtSesqkpexipjuJYzNbAwg5i32vN2MO2 F14MxglZ8qK0HI9y70jGvHRGnYWEWM3dSrmL6FRVwW9Wm7dbDeIOX2X3x PrA9F53jFJjPpENUBbFrlsDkqcSztdEQsmY8b2tRk7fi9sGi1FGOcufm6 o=; X-IronPort-AV: E=Sophos;i="6.12,164,1728950400"; d="scan'208";a="386239130" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-33001.sea14.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2024 12:40:13 +0000 Received: from EX19MTAUWB001.ant.amazon.com [10.0.7.35:2984] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.61.54:2525] with esmtp (Farcaster) id 76537f13-6d29-4dbc-bfeb-5cd5a39fcb19; Mon, 18 Nov 2024 12:40:12 +0000 (UTC) X-Farcaster-Flow-ID: 76537f13-6d29-4dbc-bfeb-5cd5a39fcb19 Received: from EX19D003UWB003.ant.amazon.com (10.13.138.116) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:40:06 +0000 Received: from EX19MTAUEB001.ant.amazon.com (10.252.135.35) by EX19D003UWB003.ant.amazon.com (10.13.138.116) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Mon, 18 Nov 2024 12:40:06 +0000 Received: from email-imr-corp-prod-iad-all-1a-8c151b82.us-east-1.amazon.com (10.124.125.2) by mail-relay.amazon.com (10.252.135.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Mon, 18 Nov 2024 12:40:06 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-iad-all-1a-8c151b82.us-east-1.amazon.com (Postfix) with ESMTPS id D72314041B; Mon, 18 Nov 2024 12:40:03 +0000 (UTC) From: Nikita Kalyazin To: , , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [RFC PATCH 1/6] Documentation: KVM: add userfault KVM exit flag Date: Mon, 18 Nov 2024 12:39:43 +0000 Message-ID: <20241118123948.4796-2-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241118123948.4796-1-kalyazin@amazon.com> References: <20241118123948.4796-1-kalyazin@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Update KVM documentation to reflect the change made in [1]: add KVM_MEMORY_EXIT_FLAG_USERFAULT flag to struct memory_fault. [1] https://lore.kernel.org/lkml/20240710234222.2333120-7-jthoughton@google.com/ Signed-off-by: Nikita Kalyazin --- Documentation/virt/kvm/api.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 26a98fea718c..ffe9a2d0e525 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6996,6 +6996,7 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. /* KVM_EXIT_MEMORY_FAULT */ struct { #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) + #define KVM_MEMORY_EXIT_FLAG_USERFAULT (1ULL << 4) __u64 flags; __u64 gpa; __u64 size; @@ -7009,6 +7010,8 @@ describes properties of the faulting access that are likely pertinent: - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred on a private memory access. When clear, indicates the fault occurred on a shared access. + - KVM_MEMORY_EXIT_FLAG_USERFAULT - When set, indicates the memory fault + occurred, because the vCPU attempted to access a gfn marked as userfault. Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it accompanies a return code of '-1', not '0'! errno will always be set to EFAULT From patchwork Mon Nov 18 12:39:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13878480 Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37E4513A86A; Mon, 18 Nov 2024 12:40:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.95.48.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933630; cv=none; b=m4gyRBU7rt3hkHPuA2YSetyJh29vpLqrmV0dkXIeuwTzdK+mc+UOEzQ47/If/QIoyGp7jf195oIOc1HQ0oK2TILv7tk3yzBg726FTZDUPRHhRDiK/fK2SJu3VVugCS4XvWHSdDKSTyXP+jY6wquOoQs2MtaFOhm8v7g5x0u68U8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933630; c=relaxed/simple; bh=+ITA0+YbY97k84dSG63wbtwvOGu1ISlsjpfi46Kayj0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=TeiCrIPW8OcmmywqVyRvqTcevZV1fARLaxikuI446VphYcnEwo94sRs2fbKVzQNul0vZpFKMS+tmYayn13UnbPqs28puJF9JJGXNC/FymzCYY59b6bfcrRiRC0MYZADGPPg/y+5q8PbaSoTgHWYSDd9vX3MlixUhSW2FwJEQB3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=jyjJyROy; arc=none smtp.client-ip=52.95.48.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="jyjJyROy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1731933629; x=1763469629; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YWnZNG4OqnCj1+mc3lzFOp31d7iLjWz4EGzgiOgOzEg=; b=jyjJyROyjTI29TeVNYY1T3vwGL5ESsRFw3y3oDYvr0GbS+2aCyi7j21U ozhtf4/rcbdvJVYQOLLplBn0hSwwdoGhJfRJxUa/AYA9npcN6kjYDOcD2 kWISwUoOeQrJtILt2jMAFtqcfZetBtMkqwnBRq8D0LZS3KqYoRYN1v8Wy Q=; X-IronPort-AV: E=Sophos;i="6.12,164,1728950400"; d="scan'208";a="440822515" Received: from iad6-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.124.125.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2024 12:40:25 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.38.20:37285] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.52.223:2525] with esmtp (Farcaster) id 4fa2831c-e969-4c66-9129-1fa566249239; Mon, 18 Nov 2024 12:40:23 +0000 (UTC) X-Farcaster-Flow-ID: 4fa2831c-e969-4c66-9129-1fa566249239 Received: from EX19D003UWC002.ant.amazon.com (10.13.138.169) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:40:21 +0000 Received: from EX19MTAUWA002.ant.amazon.com (10.250.64.202) by EX19D003UWC002.ant.amazon.com (10.13.138.169) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Mon, 18 Nov 2024 12:40:20 +0000 Received: from email-imr-corp-prod-pdx-all-2b-f5cd2367.us-west-2.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Mon, 18 Nov 2024 12:40:20 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-pdx-all-2b-f5cd2367.us-west-2.amazon.com (Postfix) with ESMTPS id 1C58DC0319; Mon, 18 Nov 2024 12:40:16 +0000 (UTC) From: Nikita Kalyazin To: , , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [RFC PATCH 2/6] Documentation: KVM: add async pf user doc Date: Mon, 18 Nov 2024 12:39:44 +0000 Message-ID: <20241118123948.4796-3-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241118123948.4796-1-kalyazin@amazon.com> References: <20241118123948.4796-1-kalyazin@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Signed-off-by: Nikita Kalyazin --- Documentation/virt/kvm/api.rst | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index ffe9a2d0e525..b30f9989f5c1 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6352,6 +6352,32 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_ASYNC_PF_USER_READY +---------------------------- + +:Capability: KVM_CAP_USERFAULT +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: struct kvm_async_pf_user_ready(in) +:Returns: 0 on success, <0 on error + +KVM_ASYNC_PF_USER_READY notifies the kernel that the fault corresponding to the +'token' has been resolved by the userspace. The ioctl is supposed to be used by +the userspace when processing an async PF in response to a VM exit with the +KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER flag set. The 'token' must match the value +supplied by the kernel in 'async_pf_user_token' field of the +struct memory_fault. When handling the ioctl, the kernel will inject the +'page present' event in the guest and wake the vcpu up if it is halted, like it +would do when completing a regular (kernel) async PF. + +:: + + struct kvm_async_pf_user_ready { + __u32 token; + }; + +This is an asynchronous vcpu ioctl and can be invoked from any thread. + 5. The kvm_run structure ======================== @@ -6997,9 +7023,11 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. struct { #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) #define KVM_MEMORY_EXIT_FLAG_USERFAULT (1ULL << 4) + #define KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER (1ULL << 5) __u64 flags; __u64 gpa; __u64 size; + __u32 async_pf_user_token; } memory_fault; KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that @@ -7012,6 +7040,10 @@ describes properties of the faulting access that are likely pertinent: shared access. - KVM_MEMORY_EXIT_FLAG_USERFAULT - When set, indicates the memory fault occurred, because the vCPU attempted to access a gfn marked as userfault. + - KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER - When set, indicates the memory fault can + be processed asynchronously and 'async_pf_user_token' contains the token to + be used when notifying KVM of the completion via the KVM_ASYNC_PF_USER_READY + ioctl. Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it accompanies a return code of '-1', not '0'! errno will always be set to EFAULT From patchwork Mon Nov 18 12:39:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13878481 Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 833F61993BD; Mon, 18 Nov 2024 12:40:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.206 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933647; cv=none; b=O+P4JZ9v/h17QTcUGz0LMjn7TARUPTsDImILYPb5d+f87Y/vF7Xl3NI/K3z5VLUSd8ULtunDketMNjcRAXC3eYAL7XEejSnLPPddCfUJWsxg0bO8G3yHnOVAXKt7r+3I/zgAxxhgStXJAtfEccCo054PGQoIwb1HP22V5z3Uczc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933647; c=relaxed/simple; bh=rSjbVcPNeeTOu8BHYNJH8D9bjIrU9iD0wGsLBvrxgus=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UGT4/+N9fOF032utyyT1ghb1/5DeIf6Na6YZyqTDhS3vqUIM61Uv1tR8VDah+C7PTIjDcBlqMK5zO9WR2V7LrrO884hkfyxMC1AqDPYjBW0iNYWhni9f/m6jrKXyeGMAJPdjO/oJApDKj/hu7A7QXjrcbrWzss6COUl3p6giuK0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=LyIJRSgx; arc=none smtp.client-ip=207.171.188.206 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="LyIJRSgx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1731933646; x=1763469646; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hb+cZM45a1mlq40ZkxximZXO4dH+zx+4/vCBLQqDL2Q=; b=LyIJRSgxYrcPXB4A3iDLA9ICnv+L4FGncviP7nDkTiMCW7HZosobO5X7 yyM0xV0ClDX2RuQ1SsCLp3JRQtur1AV2KcQBr3oV4wF3gwxqtcWgO/CfM ygQbarkQbFaeaRBVux+ezZu1x7EE3VjtYw7YLK+bWUhyx/fbzKkSwZXIW 4=; X-IronPort-AV: E=Sophos;i="6.12,164,1728950400"; d="scan'208";a="776563597" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2024 12:40:38 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.7.35:23454] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.38.94:2525] with esmtp (Farcaster) id 74afcfd0-b984-403f-96b8-cd8a7b317bca; Mon, 18 Nov 2024 12:40:37 +0000 (UTC) X-Farcaster-Flow-ID: 74afcfd0-b984-403f-96b8-cd8a7b317bca Received: from EX19D003UWB004.ant.amazon.com (10.13.138.24) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:40:34 +0000 Received: from EX19MTAUWB001.ant.amazon.com (10.250.64.248) by EX19D003UWB004.ant.amazon.com (10.13.138.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Mon, 18 Nov 2024 12:40:34 +0000 Received: from email-imr-corp-prod-iad-all-1b-1323ce6b.us-east-1.amazon.com (10.25.36.214) by mail-relay.amazon.com (10.250.64.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Mon, 18 Nov 2024 12:40:34 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-iad-all-1b-1323ce6b.us-east-1.amazon.com (Postfix) with ESMTPS id 70E6F403FE; Mon, 18 Nov 2024 12:40:31 +0000 (UTC) From: Nikita Kalyazin To: , , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [RFC PATCH 3/6] KVM: x86: add async ioctl support Date: Mon, 18 Nov 2024 12:39:45 +0000 Message-ID: <20241118123948.4796-4-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241118123948.4796-1-kalyazin@amazon.com> References: <20241118123948.4796-1-kalyazin@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 x86 has not had support for async ioctls. This patch adds an arch implementation, but does not add any of the ioctls just yet. Signed-off-by: Nikita Kalyazin --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/x86.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index ebd1ec6600bc..191dfba3e27a 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -46,6 +46,7 @@ config KVM select KVM_GENERIC_HARDWARE_ENABLING select KVM_WERROR if WERROR select KVM_USERFAULT + select HAVE_KVM_VCPU_ASYNC_IOCTL help Support hosting fully virtualized guest machines using hardware virtualization extensions. You will need a fairly recent diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ba0ad76f53bc..800493739043 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13619,6 +13619,12 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) } #endif +long kvm_arch_vcpu_async_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + return -ENOIOCTLCMD; +} + int kvm_spec_ctrl_test_value(u64 value) { /* From patchwork Mon Nov 18 12:39:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13878482 Received: from smtp-fw-80009.amazon.com (smtp-fw-80009.amazon.com [99.78.197.220]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E76E6190468; Mon, 18 Nov 2024 12:40:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=99.78.197.220 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933658; cv=none; b=YbBh1Hxodj/Dr+6e1175rpe5Unsyum+O6msolZVkKhaCp1sy9uHsVAKM6oo1cL6iEa3fwmpH28BIE9P5yT5xHRJrmUuXJlZtvAJ63ElxrRbLMuEzPtk7bKuZYWSZAOMhasMxyWLmVH0wn8p7xahYMG6SOJk7O+mPRiZWGDuW66c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933658; c=relaxed/simple; bh=ue9FtTHkIf0O26WBxVlWInbp9GnN2JW/zSBON5bdtt0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ixdcmEjumbf9GymoOalZZbrIzIq07pcgzE0cJY5Na7U8kQDOVbZDRHtnE0I5rJViYevxo2KLAhTTMUGmyvj+mLrTVrFG36OGVpbMdX6R00X6XDOrYYJ9O8OFGaaU6oq1rCgiy65O1AHf69qE+D9SXLyPX1JyYiO2antXLWTlzoE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=uMUpSHMT; arc=none smtp.client-ip=99.78.197.220 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="uMUpSHMT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1731933657; x=1763469657; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ykhv5KGCLiLl2yV3bclkR2RpJ6v/6kUiGVl7nYaMM3g=; b=uMUpSHMTVmJvYsx0cMbPY9MmkWrzot6kb2zerrtlWul53wSGAB9dqzBI cCzkxk+SkzTPNVARCNA80kM+tcV7THMoRh9Z6xZ538MKTIkOBBUEZZKic n5aTFt2V37mItxzGasObjeiAm7amy9jPtmkfiSf+qBNY9oKm8tVk+3i/u k=; X-IronPort-AV: E=Sophos;i="6.12,164,1728950400"; d="scan'208";a="148483206" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80009.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2024 12:40:55 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.21.151:58933] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.38.94:2525] with esmtp (Farcaster) id 1b0522d1-c98b-42f4-917e-56c1a88600cf; Mon, 18 Nov 2024 12:40:55 +0000 (UTC) X-Farcaster-Flow-ID: 1b0522d1-c98b-42f4-917e-56c1a88600cf Received: from EX19D003UWC001.ant.amazon.com (10.13.138.144) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:40:48 +0000 Received: from EX19MTAUWC002.ant.amazon.com (10.250.64.143) by EX19D003UWC001.ant.amazon.com (10.13.138.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Mon, 18 Nov 2024 12:40:48 +0000 Received: from email-imr-corp-prod-pdx-all-2c-619df93b.us-west-2.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Mon, 18 Nov 2024 12:40:48 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-pdx-all-2c-619df93b.us-west-2.amazon.com (Postfix) with ESMTPS id A5766404F6; Mon, 18 Nov 2024 12:40:44 +0000 (UTC) From: Nikita Kalyazin To: , , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [RFC PATCH 4/6] KVM: trace events: add type argument to async pf Date: Mon, 18 Nov 2024 12:39:46 +0000 Message-ID: <20241118123948.4796-5-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241118123948.4796-1-kalyazin@amazon.com> References: <20241118123948.4796-1-kalyazin@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 With async PF user being added, in order to reuse existing tracepoint definitions and distinguish async PF user from kernel, a new int argument `type` is being added that can be either 0 ("kernel") or 1 ("user"). For now all of the users of these tracepoints supply 0 ("kernel") as async PF user are not yet implemented. In the next commits when they are implemented, the tracepoints user will set this to 1 ("user") as necessary. Signed-off-by: Nikita Kalyazin --- arch/x86/kvm/mmu/mmu.c | 4 +-- arch/x86/kvm/x86.c | 4 +-- include/trace/events/kvm.h | 50 ++++++++++++++++++++++++-------------- virt/kvm/async_pf.c | 2 +- 4 files changed, 37 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f0dbc3c68e5c..004e068cabae 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4395,9 +4395,9 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_CONTINUE; /* *pfn has correct page already */ if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { - trace_kvm_try_async_get_page(fault->addr, fault->gfn); + trace_kvm_try_async_get_page(fault->addr, fault->gfn, 0); if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) { - trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn); + trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn, 0); kvm_make_request(KVM_REQ_APF_HALT, vcpu); return RET_PF_RETRY; } else if (kvm_arch_setup_async_pf(vcpu, fault)) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 800493739043..0a04de5dbada 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13408,7 +13408,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, { struct x86_exception fault; - trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa); + trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa, 0); kvm_add_async_pf_gfn(vcpu, work->arch.gfn); if (kvm_can_deliver_async_pf(vcpu) && @@ -13447,7 +13447,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, work->arch.token = ~0; /* broadcast wakeup */ else kvm_del_async_pf_gfn(vcpu, work->arch.gfn); - trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa); + trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa, 0); if ((work->wakeup_all || work->notpresent_injected) && kvm_pv_async_pf_enabled(vcpu) && diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index 74e40d5d4af4..a7731b62863b 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -256,90 +256,104 @@ TRACE_EVENT(kvm_fpu, ); #ifdef CONFIG_KVM_ASYNC_PF +#define kvm_async_pf_type_symbol \ + {0, "kernel"}, \ + {1, "user"} + DECLARE_EVENT_CLASS(kvm_async_get_page_class, - TP_PROTO(u64 gva, u64 gfn), + TP_PROTO(u64 gva, u64 gfn, int type), - TP_ARGS(gva, gfn), + TP_ARGS(gva, gfn, type), TP_STRUCT__entry( __field(__u64, gva) __field(u64, gfn) + __field(int, type) ), TP_fast_assign( __entry->gva = gva; __entry->gfn = gfn; + __entry->type = type; ), - TP_printk("gva = %#llx, gfn = %#llx", __entry->gva, __entry->gfn) + TP_printk("gva = %#llx, gfn = %#llx, type = %s", __entry->gva, + __entry->gfn, __print_symbolic(__entry->type, + kvm_async_pf_type_symbol)) ); DEFINE_EVENT(kvm_async_get_page_class, kvm_try_async_get_page, - TP_PROTO(u64 gva, u64 gfn), + TP_PROTO(u64 gva, u64 gfn, int type), - TP_ARGS(gva, gfn) + TP_ARGS(gva, gfn, type) ); DEFINE_EVENT(kvm_async_get_page_class, kvm_async_pf_repeated_fault, - TP_PROTO(u64 gva, u64 gfn), + TP_PROTO(u64 gva, u64 gfn, int type), - TP_ARGS(gva, gfn) + TP_ARGS(gva, gfn, type) ); DECLARE_EVENT_CLASS(kvm_async_pf_nopresent_ready, - TP_PROTO(u64 token, u64 gva), + TP_PROTO(u64 token, u64 gva, int type), - TP_ARGS(token, gva), + TP_ARGS(token, gva, type), TP_STRUCT__entry( __field(__u64, token) __field(__u64, gva) + __field(int, type) ), TP_fast_assign( __entry->token = token; __entry->gva = gva; + __entry->type = type; ), - TP_printk("token %#llx gva %#llx", __entry->token, __entry->gva) + TP_printk("token %#llx gva %#llx type %s", __entry->token, __entry->gva, + __print_symbolic(__entry->type, kvm_async_pf_type_symbol)) ); DEFINE_EVENT(kvm_async_pf_nopresent_ready, kvm_async_pf_not_present, - TP_PROTO(u64 token, u64 gva), + TP_PROTO(u64 token, u64 gva, int type), - TP_ARGS(token, gva) + TP_ARGS(token, gva, type) ); DEFINE_EVENT(kvm_async_pf_nopresent_ready, kvm_async_pf_ready, - TP_PROTO(u64 token, u64 gva), + TP_PROTO(u64 token, u64 gva, int type), - TP_ARGS(token, gva) + TP_ARGS(token, gva, type) ); TRACE_EVENT( kvm_async_pf_completed, - TP_PROTO(unsigned long address, u64 gva), - TP_ARGS(address, gva), + TP_PROTO(unsigned long address, u64 gva, int type), + TP_ARGS(address, gva, type), TP_STRUCT__entry( __field(unsigned long, address) __field(u64, gva) + __field(int, type) ), TP_fast_assign( __entry->address = address; __entry->gva = gva; + __entry->type = type; ), - TP_printk("gva %#llx address %#lx", __entry->gva, - __entry->address) + TP_printk("gva %#llx address %#lx type %s", __entry->gva, + __entry->address, __print_symbolic(__entry->type, + kvm_async_pf_type_symbol)) ); #endif diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index 99a63bad0306..77c689a9b585 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -92,7 +92,7 @@ static void async_pf_execute(struct work_struct *work) if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first) kvm_arch_async_page_present_queued(vcpu); - trace_kvm_async_pf_completed(addr, cr2_or_gpa); + trace_kvm_async_pf_completed(addr, cr2_or_gpa, 0); __kvm_vcpu_wake_up(vcpu); } From patchwork Mon Nov 18 12:39:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13878483 Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EEDB176ADE; Mon, 18 Nov 2024 12:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=72.21.196.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933674; cv=none; b=Oj+GWa+eUA5wXtJXb4l73Is8uNWETaO9hB/uk5kJvu1PlaoAI/WnKFfnG12+qyLRjuQc4f9lY9yRht7YJdoGc09HTLmQKvH3WRvB0EVMgtFH8J3P2x+PsSlkKXVe0TVZPMBNOVtfAWIXAZYueScEKo8S2UL+kMag0Hy368oSrt0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933674; c=relaxed/simple; bh=8zCNhAYj0VQZXEo3nPh1Zo9y1eb0jQjv/hk5n3jEcn8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=mBNiQxSHFAnL+jQCAUzEvmNx33cuuTskPnCjYJYaR3UnqeAhNfP1F4bKRVejtiXeGpTxJUm7LK46Qq58cY/D9LSQvfZVJ57UQNb2l6VncElm4DLhwJZSUxLRNsKezWA4ci7tWBbkI0J6TMBLjIwO4hZ0DM/USD33WbRbggQK/hY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=Z0iZxJzu; arc=none smtp.client-ip=72.21.196.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Z0iZxJzu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1731933672; x=1763469672; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yXWzOy4nHoc072Ub0A3QUH3HAOeon+hpWGwOWYadjPI=; b=Z0iZxJzuajNm5C8nHqcKuLLDQsLyTxSqmDnNGWImirTOUfOmBqF993NJ aoq+rACUwify1/AVU1WcMDswsDE93+SmZ+2e+LlB4na74p8tbSpPqmShZ 6+O4cNT71xqePdTRYSs07hwJo0yBJKb3GpSve5fw1x6R7WebtJGvigPUe U=; X-IronPort-AV: E=Sophos;i="6.12,164,1728950400"; d="scan'208";a="443750017" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.124.125.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2024 12:41:08 +0000 Received: from EX19MTAUWA001.ant.amazon.com [10.0.21.151:18171] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.61.54:2525] with esmtp (Farcaster) id 4f47d125-d5ee-4c84-bfe0-b81856ec61f5; Mon, 18 Nov 2024 12:41:07 +0000 (UTC) X-Farcaster-Flow-ID: 4f47d125-d5ee-4c84-bfe0-b81856ec61f5 Received: from EX19D020UWC003.ant.amazon.com (10.13.138.187) by EX19MTAUWA001.ant.amazon.com (10.250.64.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:41:02 +0000 Received: from EX19MTAUWC002.ant.amazon.com (10.250.64.143) by EX19D020UWC003.ant.amazon.com (10.13.138.187) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:41:01 +0000 Received: from email-imr-corp-prod-iad-all-1a-8c151b82.us-east-1.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Mon, 18 Nov 2024 12:41:01 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-iad-all-1a-8c151b82.us-east-1.amazon.com (Postfix) with ESMTPS id 0A28040413; Mon, 18 Nov 2024 12:40:58 +0000 (UTC) From: Nikita Kalyazin To: , , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [RFC PATCH 5/6] KVM: x86: async_pf_user: add infrastructure Date: Mon, 18 Nov 2024 12:39:47 +0000 Message-ID: <20241118123948.4796-6-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241118123948.4796-1-kalyazin@amazon.com> References: <20241118123948.4796-1-kalyazin@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add both generic and x86-specific infrastructure for async PF. The functionality is gated by the KVM_ASYNC_PF_USER config option. The async PF user implementation is mostly isolated from the original (kernel) implementation. The only piece that is shared between the two is the struct apf within struct kvm_vcpu_arch (x86) that is tracking guest-facing state. Signed-off-by: Nikita Kalyazin --- arch/x86/include/asm/kvm_host.h | 12 +- arch/x86/kvm/Kconfig | 6 + arch/x86/kvm/lapic.c | 2 + arch/x86/kvm/mmu/mmu.c | 19 +++ arch/x86/kvm/x86.c | 75 ++++++++++++ include/linux/kvm_host.h | 30 +++++ include/linux/kvm_types.h | 1 + include/uapi/linux/kvm.h | 8 ++ virt/kvm/Kconfig | 3 + virt/kvm/Makefile.kvm | 1 + virt/kvm/async_pf_user.c | 197 ++++++++++++++++++++++++++++++++ virt/kvm/async_pf_user.h | 24 ++++ virt/kvm/kvm_main.c | 14 +++ 13 files changed, 391 insertions(+), 1 deletion(-) create mode 100644 virt/kvm/async_pf_user.c create mode 100644 virt/kvm/async_pf_user.h diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9bb2e164c523..36cea4c9000f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -122,6 +122,7 @@ #define KVM_REQ_HV_TLB_FLUSH \ KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34) +#define KVM_REQ_APF_USER_READY KVM_ARCH_REQ(29) #define CR0_RESERVED_BITS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ @@ -164,6 +165,7 @@ #define KVM_NR_VAR_MTRR 8 #define ASYNC_PF_PER_VCPU 64 +#define ASYNC_PF_USER_PER_VCPU 64 enum kvm_reg { VCPU_REGS_RAX = __VCPU_REGS_RAX, @@ -973,7 +975,7 @@ struct kvm_vcpu_arch { struct { bool halted; - gfn_t gfns[ASYNC_PF_PER_VCPU]; + gfn_t gfns[ASYNC_PF_PER_VCPU + ASYNC_PF_USER_PER_VCPU]; struct gfn_to_hva_cache data; u64 msr_en_val; /* MSR_KVM_ASYNC_PF_EN */ u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */ @@ -983,6 +985,7 @@ struct kvm_vcpu_arch { u32 host_apf_flags; bool delivery_as_pf_vmexit; bool pageready_pending; + bool pageready_user_pending; } apf; /* OSVW MSRs (AMD only) */ @@ -2266,11 +2269,18 @@ void kvm_make_scan_ioapic_request_mask(struct kvm *kvm, bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); +bool kvm_arch_async_page_not_present_user(struct kvm_vcpu *vcpu, + struct kvm_async_pf_user *apf); void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); +void kvm_arch_async_page_present_user(struct kvm_vcpu *vcpu, + struct kvm_async_pf_user *apf); void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work); +void kvm_arch_async_page_ready_user(struct kvm_vcpu *vcpu, + struct kvm_async_pf_user *apf); void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu); +void kvm_arch_async_page_present_user_queued(struct kvm_vcpu *vcpu); bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu); extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 191dfba3e27a..255597942d59 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -209,4 +209,10 @@ config KVM_MAX_NR_VCPUS the memory footprint of each KVM guest, regardless of how many vCPUs are created for a given VM. +config KVM_ASYNC_PF_USER + bool "Support for async PF handled by userspace" + depends on KVM && KVM_USERFAULT && KVM_ASYNC_PF && X86_64 + help + Support for async PF handled by userspace. + endif # VIRTUALIZATION diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index acd7d48100a1..723c9584d47a 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -515,6 +515,7 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val) /* Check if there are APF page ready requests pending */ if (enabled) { kvm_make_request(KVM_REQ_APF_READY, apic->vcpu); + kvm_make_request(KVM_REQ_APF_USER_READY, apic->vcpu); kvm_xen_sw_enable_lapic(apic->vcpu); } } @@ -2560,6 +2561,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) static_branch_slow_dec_deferred(&apic_hw_disabled); /* Check if there are APF page ready requests pending */ kvm_make_request(KVM_REQ_APF_READY, vcpu); + kvm_make_request(KVM_REQ_APF_USER_READY, vcpu); } else { static_branch_inc(&apic_hw_disabled.key); atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 004e068cabae..adf0161af894 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4304,6 +4304,25 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code, true, NULL); } +void kvm_arch_async_page_ready_user(struct kvm_vcpu *vcpu, struct kvm_async_pf_user *apf) +{ + int r; + + if ((vcpu->arch.mmu->root_role.direct != apf->arch.direct_map) || + apf->wakeup_all) + return; + + r = kvm_mmu_reload(vcpu); + if (unlikely(r)) + return; + + if (!vcpu->arch.mmu->root_role.direct && + apf->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu)) + return; + + kvm_mmu_do_page_fault(vcpu, apf->cr2_or_gpa, apf->arch.error_code, true, NULL); +} + static inline u8 kvm_max_level_for_order(int order) { BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0a04de5dbada..2b8cd3af326b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -942,6 +942,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon if ((cr0 ^ old_cr0) & X86_CR0_PG) { kvm_clear_async_pf_completion_queue(vcpu); + kvm_clear_async_pf_user_completion_queue(vcpu); kvm_async_pf_hash_reset(vcpu); /* @@ -3569,6 +3570,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) if (!kvm_pv_async_pf_enabled(vcpu)) { kvm_clear_async_pf_completion_queue(vcpu); + kvm_clear_async_pf_user_completion_queue(vcpu); kvm_async_pf_hash_reset(vcpu); return 0; } @@ -3581,6 +3583,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; kvm_async_pf_wakeup_all(vcpu); + kvm_async_pf_user_wakeup_all(vcpu); return 0; } @@ -4019,6 +4022,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (data & 0x1) { vcpu->arch.apf.pageready_pending = false; kvm_check_async_pf_completion(vcpu); + vcpu->arch.apf.pageready_user_pending = false; + kvm_check_async_pf_user_completion(vcpu); } break; case MSR_KVM_STEAL_TIME: @@ -10924,6 +10929,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_vcpu_update_apicv(vcpu); if (kvm_check_request(KVM_REQ_APF_READY, vcpu)) kvm_check_async_pf_completion(vcpu); + if (kvm_check_request(KVM_REQ_APF_USER_READY, vcpu)) + kvm_check_async_pf_user_completion(vcpu); if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu)) static_call(kvm_x86_msr_filter_changed)(vcpu); @@ -12346,6 +12353,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvmclock_reset(vcpu); kvm_clear_async_pf_completion_queue(vcpu); + kvm_clear_async_pf_user_completion_queue(vcpu); kvm_async_pf_hash_reset(vcpu); vcpu->arch.apf.halted = false; @@ -12671,6 +12679,7 @@ static void kvm_unload_vcpu_mmus(struct kvm *kvm) kvm_for_each_vcpu(i, vcpu, kvm) { kvm_clear_async_pf_completion_queue(vcpu); + kvm_clear_async_pf_user_completion_queue(vcpu); kvm_unload_vcpu_mmu(vcpu); } } @@ -13119,6 +13128,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu) if (!list_empty_careful(&vcpu->async_pf.done)) return true; + if (!list_empty_careful(&vcpu->async_pf_user.done)) + return true; + if (kvm_apic_has_pending_init_or_sipi(vcpu) && kvm_apic_init_sipi_allowed(vcpu)) return true; @@ -13435,6 +13447,37 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, } } +bool kvm_arch_async_page_not_present_user(struct kvm_vcpu *vcpu, + struct kvm_async_pf_user *apf) +{ + struct x86_exception fault; + + trace_kvm_async_pf_not_present(apf->arch.token, apf->cr2_or_gpa, 1); + kvm_add_async_pf_gfn(vcpu, apf->arch.gfn); + + if (!apf_put_user_notpresent(vcpu)) { + fault.vector = PF_VECTOR; + fault.error_code_valid = true; + fault.error_code = 0; + fault.nested_page_fault = false; + fault.address = apf->arch.token; + fault.async_page_fault = true; + kvm_inject_page_fault(vcpu, &fault); + return true; + } else { + /* + * It is not possible to deliver a paravirtualized asynchronous + * page fault, but putting the guest in an artificial halt state + * can be beneficial nevertheless: if an interrupt arrives, we + * can deliver it timely and perhaps the guest will schedule + * another process. When the instruction that triggered a page + * fault is retried, hopefully the page will be ready in the host. + */ + kvm_make_request(KVM_REQ_APF_HALT, vcpu); + return false; + } +} + void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { @@ -13460,6 +13503,31 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; } +void kvm_arch_async_page_present_user(struct kvm_vcpu *vcpu, + struct kvm_async_pf_user *apf) +{ + struct kvm_lapic_irq irq = { + .delivery_mode = APIC_DM_FIXED, + .vector = vcpu->arch.apf.vec + }; + + if (apf->wakeup_all) + apf->arch.token = ~0; /* broadcast wakeup */ + else + kvm_del_async_pf_gfn(vcpu, apf->arch.gfn); + trace_kvm_async_pf_ready(apf->arch.token, apf->cr2_or_gpa, 1); + + if ((apf->wakeup_all || apf->notpresent_injected) && + kvm_pv_async_pf_enabled(vcpu) && + !apf_put_user_ready(vcpu, apf->arch.token)) { + vcpu->arch.apf.pageready_user_pending = true; + kvm_apic_set_irq(vcpu, &irq, NULL); + } + + vcpu->arch.apf.halted = false; + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; +} + void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu) { kvm_make_request(KVM_REQ_APF_READY, vcpu); @@ -13467,6 +13535,13 @@ void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu) kvm_vcpu_kick(vcpu); } +void kvm_arch_async_page_present_user_queued(struct kvm_vcpu *vcpu) +{ + kvm_make_request(KVM_REQ_APF_USER_READY, vcpu); + if (!vcpu->arch.apf.pageready_user_pending) + kvm_vcpu_kick(vcpu); +} + bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu) { if (!kvm_pv_async_pf_enabled(vcpu)) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3b9780d85877..d0aa0680127a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -257,6 +257,27 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #endif +#ifdef CONFIG_KVM_ASYNC_PF_USER +struct kvm_async_pf_user { + struct list_head link; + struct list_head queue; + gpa_t cr2_or_gpa; + struct kvm_arch_async_pf arch; + bool wakeup_all; + bool resolved; + bool notpresent_injected; +}; + +void kvm_clear_async_pf_user_completion_queue(struct kvm_vcpu *vcpu); +void kvm_check_async_pf_user_completion(struct kvm_vcpu *vcpu); +bool kvm_setup_async_pf_user(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + unsigned long hva, struct kvm_arch_async_pf *arch); +int kvm_async_pf_user_wakeup_all(struct kvm_vcpu *vcpu); +#endif + +int kvm_async_pf_user_ready(struct kvm_vcpu *vcpu, + struct kvm_async_pf_user_ready *apf_ready); + #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER union kvm_mmu_notifier_arg { unsigned long attributes; @@ -368,6 +389,15 @@ struct kvm_vcpu { } async_pf; #endif +#ifdef CONFIG_KVM_ASYNC_PF_USER + struct { + u32 queued; + struct list_head queue; + struct list_head done; + spinlock_t lock; + } async_pf_user; +#endif + #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT /* * Cpu relax intercept or pause loop exit optimization diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 827ecc0b7e10..149c7e48b2fb 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -5,6 +5,7 @@ struct kvm; struct kvm_async_pf; +struct kvm_async_pf_user; struct kvm_device_ops; struct kvm_gfn_range; struct kvm_interrupt; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 8cd8e08f11e1..ef3840a1c5e9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1561,4 +1561,12 @@ struct kvm_fault { #define KVM_READ_USERFAULT _IOR(KVMIO, 0xd5, struct kvm_fault) +/* for KVM_ASYNC_PF_USER_READY */ +struct kvm_async_pf_user_ready { + /* in */ + __u32 token; +}; + +#define KVM_ASYNC_PF_USER_READY _IOW(KVMIO, 0xd6, struct kvm_async_pf_user_ready) + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index f1b660d593e4..91abbd9a8e70 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -45,6 +45,9 @@ config KVM_MMIO config KVM_ASYNC_PF bool +config KVM_ASYNC_PF_USER + bool + # Toggle to switch between direct notification and batch job config KVM_ASYNC_PF_SYNC bool diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm index 724c89af78af..980217e0b03a 100644 --- a/virt/kvm/Makefile.kvm +++ b/virt/kvm/Makefile.kvm @@ -9,6 +9,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o kvm-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o +kvm-$(CONFIG_KVM_ASYNC_PF_USER) += $(KVM)/async_pf_user.o kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o diff --git a/virt/kvm/async_pf_user.c b/virt/kvm/async_pf_user.c new file mode 100644 index 000000000000..d72ce5733e1a --- /dev/null +++ b/virt/kvm/async_pf_user.c @@ -0,0 +1,197 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * kvm support for asyncrhonous fault in userspace + * + * Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * Author: + * Nikita Kalyazin + */ + +#include +#include +#include +#include + +#include "async_pf_user.h" +#include + +static struct kmem_cache *async_pf_user_cache; + +int kvm_async_pf_user_init(void) +{ + async_pf_user_cache = KMEM_CACHE(kvm_async_pf_user, 0); + + if (!async_pf_user_cache) + return -ENOMEM; + + return 0; +} + +void kvm_async_pf_user_deinit(void) +{ + kmem_cache_destroy(async_pf_user_cache); + async_pf_user_cache = NULL; +} + +void kvm_async_pf_user_vcpu_init(struct kvm_vcpu *vcpu) +{ + INIT_LIST_HEAD(&vcpu->async_pf_user.done); + INIT_LIST_HEAD(&vcpu->async_pf_user.queue); + spin_lock_init(&vcpu->async_pf_user.lock); +} + +int kvm_async_pf_user_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf_user_ready *apf_ready) +{ + struct kvm_async_pf_user *apf = NULL; + bool first; + + spin_lock(&vcpu->async_pf_user.lock); + list_for_each_entry(apf, &vcpu->async_pf_user.queue, queue) { + if (apf->arch.token == apf_ready->token) + break; + } + spin_unlock(&vcpu->async_pf_user.lock); + + if (unlikely(!apf || apf->arch.token != apf_ready->token)) + return -EINVAL; + + spin_lock(&vcpu->async_pf_user.lock); + first = list_empty(&vcpu->async_pf_user.done); + apf->resolved = true; + list_add_tail(&apf->link, &vcpu->async_pf_user.done); + spin_unlock(&vcpu->async_pf_user.lock); + + kvm_arch_async_page_present_user_queued(vcpu); + + if (first) + kvm_arch_async_page_present_user_queued(vcpu); + + trace_kvm_async_pf_completed(0, apf->cr2_or_gpa, 1); + + __kvm_vcpu_wake_up(vcpu); + + return 0; +} + +void kvm_clear_async_pf_user_completion_queue(struct kvm_vcpu *vcpu) +{ + spin_lock(&vcpu->async_pf_user.lock); + + /* cancel outstanding work queue item */ + while (!list_empty(&vcpu->async_pf_user.queue)) { + struct kvm_async_pf_user *apf = + list_first_entry(&vcpu->async_pf_user.queue, + typeof(*apf), queue); + list_del(&apf->queue); + + /* + * If userspace has already notified us that the fault + * had been resolved, we will delete the item when + * iterating over the `done` list. + * Otherwise, we free it now, and if at a later point + * userspaces comes back regarding this fault, it will + * be rejected due to an inexistent token. + * Note that we do not have a way to "cancel" the work + * like with traditional (kernel) async pf. + */ + if (!apf->resolved) + kmem_cache_free(async_pf_user_cache, apf); + } + + while (!list_empty(&vcpu->async_pf_user.done)) { + struct kvm_async_pf_user *apf = + list_first_entry(&vcpu->async_pf_user.done, + typeof(*apf), link); + list_del(&apf->link); + + /* + * Unlike with traditional (kernel) async pf, + * we know for sure that once the work has been queued, + * userspace has done with it and no residual resources + * are still being held by KVM. + */ + kmem_cache_free(async_pf_user_cache, apf); + } + spin_unlock(&vcpu->async_pf_user.lock); + + vcpu->async_pf_user.queued = 0; +} + +void kvm_check_async_pf_user_completion(struct kvm_vcpu *vcpu) +{ + struct kvm_async_pf_user *apf; + + while (!list_empty_careful(&vcpu->async_pf_user.done) && + kvm_arch_can_dequeue_async_page_present(vcpu)) { + spin_lock(&vcpu->async_pf_user.lock); + apf = list_first_entry(&vcpu->async_pf_user.done, typeof(*apf), + link); + list_del(&apf->link); + spin_unlock(&vcpu->async_pf_user.lock); + + kvm_arch_async_page_ready_user(vcpu, apf); + kvm_arch_async_page_present_user(vcpu, apf); + + list_del(&apf->queue); + vcpu->async_pf_user.queued--; + } +} + +/* + * Try to schedule a job to handle page fault asynchronously. Returns 'true' on + * success, 'false' on failure (page fault has to be handled synchronously). + */ +bool kvm_setup_async_pf_user(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, + unsigned long hva, struct kvm_arch_async_pf *arch) +{ + struct kvm_async_pf_user *apf; + + if (vcpu->async_pf_user.queued >= ASYNC_PF_USER_PER_VCPU) + return false; + + /* + * do alloc nowait since if we are going to sleep anyway we + * may as well sleep faulting in page + */ + apf = kmem_cache_zalloc(async_pf_user_cache, GFP_NOWAIT | __GFP_NOWARN); + if (!apf) + return false; + + apf->wakeup_all = false; + apf->cr2_or_gpa = cr2_or_gpa; + apf->arch = *arch; + + list_add_tail(&apf->queue, &vcpu->async_pf_user.queue); + vcpu->async_pf_user.queued++; + apf->notpresent_injected = kvm_arch_async_page_not_present_user(vcpu, apf); + + return true; +} + +int kvm_async_pf_user_wakeup_all(struct kvm_vcpu *vcpu) +{ + struct kvm_async_pf_user *apf; + bool first; + + if (!list_empty_careful(&vcpu->async_pf_user.done)) + return 0; + + apf = kmem_cache_zalloc(async_pf_user_cache, GFP_ATOMIC); + if (!apf) + return -ENOMEM; + + apf->wakeup_all = true; + INIT_LIST_HEAD(&apf->queue); /* for list_del to work */ + + spin_lock(&vcpu->async_pf_user.lock); + first = list_empty(&vcpu->async_pf_user.done); + list_add_tail(&apf->link, &vcpu->async_pf_user.done); + spin_unlock(&vcpu->async_pf_user.lock); + + if (first) + kvm_arch_async_page_present_user_queued(vcpu); + + vcpu->async_pf_user.queued++; + return 0; +} diff --git a/virt/kvm/async_pf_user.h b/virt/kvm/async_pf_user.h new file mode 100644 index 000000000000..35fa12858c05 --- /dev/null +++ b/virt/kvm/async_pf_user.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * kvm support for asyncrhonous fault in userspace + * + * Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. + * + * Author: + * Nikita Kalyazin + */ + +#ifndef __KVM_ASYNC_PF_USER_H__ +#define __KVM_ASYNC_PF_USER_H__ + +#ifdef CONFIG_KVM_ASYNC_PF_USER +int kvm_async_pf_user_init(void); +void kvm_async_pf_user_deinit(void); +void kvm_async_pf_user_vcpu_init(struct kvm_vcpu *vcpu); +#else +#define kvm_async_pf_user_init() (0) +#define kvm_async_pf_user_deinit() do {} while (0) +#define kvm_async_pf_user_vcpu_init(C) do {} while (0) +#endif + +#endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 90ce6b8ff0ab..a1a122acf93a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -59,6 +59,7 @@ #include "coalesced_mmio.h" #include "async_pf.h" +#include "async_pf_user.h" #include "kvm_mm.h" #include "vfio.h" @@ -493,6 +494,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) rcuwait_init(&vcpu->wait); #endif kvm_async_pf_vcpu_init(vcpu); + kvm_async_pf_user_vcpu_init(vcpu); kvm_vcpu_set_in_spin_loop(vcpu, false); kvm_vcpu_set_dy_eligible(vcpu, false); @@ -4059,6 +4061,11 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu) return true; #endif +#ifdef CONFIG_KVM_ASYNC_PF_USER + if (!list_empty_careful(&vcpu->async_pf_user.done)) + return true; +#endif + return false; } @@ -6613,6 +6620,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) if (r) goto err_async_pf; + r = kvm_async_pf_user_init(); + if (r) + goto err_async_pf_user; + kvm_chardev_ops.owner = module; kvm_vm_fops.owner = module; kvm_vcpu_fops.owner = module; @@ -6644,6 +6655,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) err_register: kvm_vfio_ops_exit(); err_vfio: + kvm_async_pf_user_deinit(); +err_async_pf_user: kvm_async_pf_deinit(); err_async_pf: kvm_irqfd_exit(); @@ -6677,6 +6690,7 @@ void kvm_exit(void) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); kvm_vfio_ops_exit(); + kvm_async_pf_user_deinit(); kvm_async_pf_deinit(); #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING unregister_syscore_ops(&kvm_syscore_ops); From patchwork Mon Nov 18 12:39:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nikita Kalyazin X-Patchwork-Id: 13878484 Received: from smtp-fw-9106.amazon.com (smtp-fw-9106.amazon.com [207.171.188.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84280176ADE; Mon, 18 Nov 2024 12:41:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.188.206 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933685; cv=none; b=ATKSafOHG6nWZH3U9wERJlJ9AZMQ8jqHGn91zyKac+Yhis+5aYGAa6OTW45m8Fx1r0tys6+4Qfj2fR7gCMpngAXTHj5ba6nO5WA4JiYR7o1mWYOCqLxnsT90ibzUqvpHUFL5QPyP2OHVFHAmYLgG17ywiMhMTzbB4yBkF1SdFuU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731933685; c=relaxed/simple; bh=Gvz8uazVE8B1pvZ+y+usnmLv8rqzH5i23ETEexpJWsI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RnJpYNSab4MGxBWLkB16eJ7gyToKN2UhADDsLl3oy+pACoHEeDOrBmqYBiVMt/8OZg/uOn5/iSfcZJBwEtpcjjTj+hcK4B93D1ogQ0Zer20dMIDi3pgEhS7kvIy0L/KQogrk4ipMnCg43MRBi0NnRKUeZDaX/8N0sy5zbVlvwDw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=QBMliTE2; arc=none smtp.client-ip=207.171.188.206 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="QBMliTE2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1731933684; x=1763469684; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OnUiGqwydGwkm0msPhftvhu6OlRE3sxag6FK0PEhEvs=; b=QBMliTE2TDw9KznxDNfzyuh8WEGoYIqxe0LZYUvwI55gZ9y98KlPZpNR SuXMIL0ycEpOTMYoUEOIMn0pvTdoZAHH/1XO3lm/LJ3O/P98Z0ymOk48h W9P3UWPCjSJDlV/DQYty07M3SBL7VyUYoRf60DcPsS2cpp+SS6cztQqjJ U=; X-IronPort-AV: E=Sophos;i="6.12,164,1728950400"; d="scan'208";a="776563783" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-9106.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2024 12:41:23 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.7.35:50740] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.43.2:2525] with esmtp (Farcaster) id 88669b73-3dba-495e-891e-bea667cdf808; Mon, 18 Nov 2024 12:41:22 +0000 (UTC) X-Farcaster-Flow-ID: 88669b73-3dba-495e-891e-bea667cdf808 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:41:15 +0000 Received: from EX19MTAUEB002.ant.amazon.com (10.252.135.47) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Mon, 18 Nov 2024 12:41:14 +0000 Received: from email-imr-corp-prod-iad-all-1b-a03c1db8.us-east-1.amazon.com (10.124.125.2) by mail-relay.amazon.com (10.252.135.97) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Mon, 18 Nov 2024 12:41:14 +0000 Received: from dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com (dev-dsk-kalyazin-1a-a12e27e2.eu-west-1.amazon.com [172.19.103.116]) by email-imr-corp-prod-iad-all-1b-a03c1db8.us-east-1.amazon.com (Postfix) with ESMTPS id 01B18815FE; Mon, 18 Nov 2024 12:41:11 +0000 (UTC) From: Nikita Kalyazin To: , , , , , , , , , , , , , , CC: , , , , , , , , , , , , Subject: [RFC PATCH 6/6] KVM: x86: async_pf_user: hook to fault handling and add ioctl Date: Mon, 18 Nov 2024 12:39:48 +0000 Message-ID: <20241118123948.4796-7-kalyazin@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20241118123948.4796-1-kalyazin@amazon.com> References: <20241118123948.4796-1-kalyazin@amazon.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This patch adds interception in the __kvm_faultin_pfn for handling faults that are causing exit to userspace asynchronously. If the kernel expects for the userspace to handle the fault asynchronously (ie it can resume the vCPU while the fault is being processed), it sets the KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER flag and supplies the async PF token in the struct memory_fault in the VM exit info. The patch also adds the KVM_ASYNC_PF_USER_READY ioctl that the userspace should use to notify the kernel that the fault has been processed by using the token corresponding to the fault. Signed-off-by: Nikita Kalyazin --- arch/x86/kvm/mmu/mmu.c | 45 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 16 +++++++++++++- arch/x86/kvm/x86.h | 2 ++ include/uapi/linux/kvm.h | 4 +++- 4 files changed, 65 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index adf0161af894..a2b024ccbbe1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4282,6 +4282,22 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, kvm_vcpu_gfn_to_hva(vcpu, fault->gfn), &arch); } +static bool kvm_arch_setup_async_pf_user(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault, u32 *token) +{ + struct kvm_arch_async_pf arch; + + arch.token = alloc_apf_token(vcpu); + arch.gfn = fault->gfn; + arch.error_code = fault->error_code; + arch.direct_map = vcpu->arch.mmu->root_role.direct; + arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu); + + *token = arch.token; + + return kvm_setup_async_pf_user(vcpu, 0, fault->addr, &arch); +} + void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { int r; @@ -4396,6 +4412,35 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault { bool async; + /* Pre-check for userfault and bail out early. */ + if (gfn_has_userfault(fault->slot->kvm, fault->gfn)) { + bool report_async = false; + u32 token = 0; + + if (vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM && + !fault->prefetch && kvm_can_do_async_pf(vcpu)) { + trace_kvm_try_async_get_page(fault->addr, fault->gfn, 1); + if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) { + trace_kvm_async_pf_repeated_fault(fault->addr, fault->gfn, 1); + kvm_make_request(KVM_REQ_APF_HALT, vcpu); + return RET_PF_RETRY; + } else if (kvm_can_deliver_async_pf(vcpu) && + kvm_arch_setup_async_pf_user(vcpu, fault, &token)) { + report_async = true; + } + } + + fault->pfn = KVM_PFN_ERR_USERFAULT; + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + + if (report_async) { + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER; + vcpu->run->memory_fault.async_pf_user_token = token; + } + + return -EFAULT; + } + if (fault->is_private) return kvm_faultin_pfn_private(vcpu, fault); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2b8cd3af326b..30b22904859f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13372,7 +13372,7 @@ static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu) return !val; } -static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) +bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) { if (!kvm_pv_async_pf_enabled(vcpu)) @@ -13697,6 +13697,20 @@ void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) long kvm_arch_vcpu_async_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { + void __user *argp = (void __user *)arg; + struct kvm_vcpu *vcpu = filp->private_data; + +#ifdef CONFIG_KVM_ASYNC_PF_USER + if (ioctl == KVM_ASYNC_PF_USER_READY) { + struct kvm_async_pf_user_ready apf_ready; + + if (copy_from_user(&apf_ready, argp, sizeof(apf_ready))) + return -EFAULT; + + return kvm_async_pf_user_ready(vcpu, &apf_ready); + } +#endif + return -ENOIOCTLCMD; } diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d80a4c6b5a38..66ece51ee94b 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -325,6 +325,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int emulation_type, void *insn, int insn_len); fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu); +bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu); + extern u64 host_xcr0; extern u64 host_xss; extern u64 host_arch_capabilities; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index ef3840a1c5e9..8aa5ce347bdf 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -430,12 +430,14 @@ struct kvm_run { struct { #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) #define KVM_MEMORY_EXIT_FLAG_USERFAULT (1ULL << 4) +#define KVM_MEMORY_EXIT_FLAG_ASYNC_PF_USER (1ULL << 5) __u64 flags; __u64 gpa; __u64 size; + __u32 async_pf_user_token; } memory_fault; /* Fix the size of the union. */ - char padding[256]; + char padding[252]; }; /* 2048 is the size of the char array used to bound/pad the size