From patchwork Thu Sep 26 01:34:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13812683 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A13DCCF9EB for ; Thu, 26 Sep 2024 01:35:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 727FE6B00C0; Wed, 25 Sep 2024 21:35:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D4DD6B00C1; Wed, 25 Sep 2024 21:35:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5287C6B00C2; Wed, 25 Sep 2024 21:35:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 30EF26B00C0 for ; Wed, 25 Sep 2024 21:35:28 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A11091C4E7D for ; Thu, 26 Sep 2024 01:35:27 +0000 (UTC) X-FDA: 82605171894.03.1D852B7 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf13.hostedemail.com (Postfix) with ESMTP id CE2722000B for ; Thu, 26 Sep 2024 01:35:25 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kH+exXd0; spf=pass (imf13.hostedemail.com: domain of 3XLr0ZgoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3XLr0ZgoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727314464; a=rsa-sha256; cv=none; b=jalna1V7joubQ8RfsVgA9ZS8/d2/oA/Kgt401hiSMPzMzFDFsbL95r+TmRYpY9Y44najC9 +Hiax1Mt+e5+CpnLSZK8En+F9aYJXA4v38AJKWYQRKDR9OmYaQrFv480/9EDA5rTJ6sbaI AUh3Z5ojLj1gLUsrMBEgE8/lCGDAsmQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kH+exXd0; spf=pass (imf13.hostedemail.com: domain of 3XLr0ZgoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3XLr0ZgoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727314464; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PEKVE3jXTsgIQ3oSBpZASsaX7oLzK9gVxgGb+Q0WSSI=; b=bjZQ+3xz/WyooD5BkAkIL46vzoDSPTsGZyQiwdDW3Wx4Zd2MEMBp64Igfwmw5EJxDqJyL/ tz0CAcgmiZeAQJQID4gPxXUaGOl65dKnyyFON+tpMR1WKeqjgOzkuW4lhAAVskuggHIALc l4GQBRosfyjOQxfbW0AatXuMrYqAlSU= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e1159159528so2256784276.1 for ; Wed, 25 Sep 2024 18:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727314525; x=1727919325; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PEKVE3jXTsgIQ3oSBpZASsaX7oLzK9gVxgGb+Q0WSSI=; b=kH+exXd0hqAmT9LwnyqvOsdAEOt+ugiirRvFpIvfpfgFpCLGFUGJu4g+1Tw95R8ge0 ycIP69U1BzeSXJeW+GxcQ9gtX67PApNiHY8te52ZPneDNxUl6bR24GZnTJpBra9gmX8C AdNP4crlzRpWMM3Lomp4ldFd+SaBnjaTBtz5Q5WFcWeSURCJzWSFLg0JyS/GybkayMfl ECCw77XvqQ779Ot07qvhA+1iSFMu1TBqkyE3Zl4xRUyFrZMv1YjAyOSeoxqrWANJ7r0b 27yHjWvhV9/+kygXrztdVzZIno3mARCWUJ9KLPz4Artb+mLKlIqa9nlIHo+Imqm/kg0f yTBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727314525; x=1727919325; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PEKVE3jXTsgIQ3oSBpZASsaX7oLzK9gVxgGb+Q0WSSI=; b=Ob0DdVx20LpduIbvHXFD5NKgOC8kz2cI2ZbsGYaPYOmku8pt+l0bm/gYeHxNiSKpyT H9m5ELAUOPB5I3zKdVdatZalhFjakFrERqaBlbl4JrpsXQaX09A1SDcjmYcBrtYDAtr6 6ZwJCYYVYQ/tH+ZMhM7oG1x7WJ1C0nwaYk6UoShbIQKF1LpS0taH6tyCXxuhJ8jWMkRB xhYgzfCNfCzvtYXkTkmSc45t/b1zCvokXrMAaITIpatFN2l+4weFQEkdfwJ2rQWtWHZx OEQjsUf52p1wyI1xVDy4hFEzOsDp7Qt7i78EMXSPbcPyM2XNcweFZxClNHOxuswvbReh 92YQ== X-Forwarded-Encrypted: i=1; AJvYcCWaszRiyb20RmclyFt/o4fhP1C413c7SQBWnm2vO1S/wxtujItLtimau2g9+oMwgse9r0k+PwoSew==@kvack.org X-Gm-Message-State: AOJu0Ywjqtthxc1xMcnNX8KtHUfzkb3rctiuOjY1gSiUIBt/x1fw62/9 acQ4N8QMleFc+72N4HH4+SrCC4+i7Z/IiVWNX/zPCjOkHmk4TFkidyHroklYkDzVngvZP0zitPZ 9eh0l3u+AKP5ajUvN9g== X-Google-Smtp-Source: AGHT+IF3/FYwkPT4bz8o9cvma+KKRWpatKrEO97ngoYUR5+iC7Jt4hs+IFCE9RrvcEVoXK4mDoAHACDMu6yNCXzE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a5b:704:0:b0:e20:2da6:ed77 with SMTP id 3f1490d57ef6-e25ca95c803mr24214276.5.1727314524854; Wed, 25 Sep 2024 18:35:24 -0700 (PDT) Date: Thu, 26 Sep 2024 01:34:56 +0000 In-Reply-To: <20240926013506.860253-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240926013506.860253-1-jthoughton@google.com> X-Mailer: git-send-email 2.46.0.792.g87dc391469-goog Message-ID: <20240926013506.860253-9-jthoughton@google.com> Subject: [PATCH v7 08/18] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock From: James Houghton To: Sean Christopherson , Paolo Bonzini Cc: Andrew Morton , David Matlack , David Rientjes , James Houghton , Jason Gunthorpe , Jonathan Corbet , Marc Zyngier , Oliver Upton , Wei Xu , Yu Zhao , Axel Rasmussen , kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Stat-Signature: 7wb85f5ubr1yirjmgkqmi7whtssft1ma X-Rspamd-Queue-Id: CE2722000B X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1727314525-719377 X-HE-Meta: U2FsdGVkX18Sx0wusWGQnyhKqF6WQEO3m7ymqBc0/nV9diKJ/VoJ2FQ8uDS1wIa/olMjM/Qyfr5OVMF0pbUqfg1avcpX2DkDg5XmOA0wtMenj3eCRY2Vyn82dp7h93/K3kI+T3z/qoMvbEB6L04M4L2a0dxPw0AY3O0L8TpTqvTyXZLSfRrMuudCGsBerFL1esvLf3bMgvppJf0Le/97ro/NFwZ89aXSz3O3fZOjphoIGcrdq18/kT0DtFX0LxaItmE9Qg6k4h6ap7Q9VDsPFKcLHMEPpJTrakVbdlqEKE/ORBho+/vmNOyECXx2eODlQHuHoR91qxq/BKuDmJAvXRXxBBjALX+wklSw5mUy1Jrltw72ayndvZY/twCGSCHuDkC5lZtxpUUdR7tYga6VTG/sxMn7REHpwrl2PXcrYYNVMb7JzV4S3MgTywmfjEP4eZQZ7ARMmqvQxTiFXx2NXtnJxnDMbiRZlGKz+77aIITPR5a3ApOfG1UE14WP09/yYupk6vUSxb1mj0M6BBruPES+i5forqXYrWofB0900ORSs9b0I1BElzzQptxdDvkkO5/eOgN2xJG95mVEQyp5cuR/mG4fgNDTK48GtlXr4/6Oqs04ex19pKYSRW+zYwsc+fYolpfYPCRRdsoFot937H134vy1IbikroFS83ItcCqE0zqW13ihMct2X4LP/KBRwp84PbemQi2ZmiHP4L9w0iowewspvPLu4vnW/B49lRp5jPeCyy2aM42YIWoiNTFM2BxwQmF9k1rZxHIS+oYz51OEPr3Ha1Y+L91ChgYf0bRYEXf7QzviDRmADWS+EkhbMMHK6MRF2DqT3NX/uaG+9WlwDsaNt/LsS1mJwez+oG+wvDDGMVc+DsM6nNUtFmAkEcTWRqLIUnI2yN7V+Sl6fuLfdPqG2HSmdMOx/QIwPT6dLQAk7WSB6/nbLEBdIaLaHU3DQJD/qI8eWKPVD0H SjaKFFRm LceAydRI1EoJh3ecSrwttvgX3nMtKi3e+Zq55iVagmG70E4P3kPXbsxldtTeKIdgjgGBB/A+vukLGib30ASV4mWLNg9Nfm1VKrd+O3IzzrI73QjnGdpnfqgRbBNWVMm5tW9iuL/AJhOD8DO2dZ0njQ6Jt/UY/CwqKPtevwZRPW3bJC+pNxMHZbJUl7pqFxczsFs+G6c8Ac42aZXb54o79PBK6xTecIf0Utvp/CaIm9V7Hj21eciHf0APIKGT0CJhu1ITQZGxlN5qvdevozOl/CD+BC4hSlkaz3JeEL3DUOKgXxYCtzGY1jx44QX6Q6cKKRMpWbeKV04C2eVQOYTBYbmhIQGQylr4jvYr5ZI/0JAIhd+y0HtP+gLwdKgLsB01K/ROxMoKB9dLnNNJTl+QCvwC7eok1phZbPZg8vL05agVxiNUHF4fcTC295sGSjBzHwq3YUsogfwBQDtp/3NGtczy6J63jrnU0vetJvCKunikllEcXV3D5A2Epgt++xajEdVYxvQMAkoR7qtiWTOTGaJFER3LrZAHDnnn165w9GAH94UtdeF3Mm7Ub+AFcGiKJmUhNoMQdmiBhWyhfB2ilmtPn5rEEyhnN2LsmWHHVBth5Jzs/hgX8CNOOTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Sean Christopherson Steal another bit from rmap entries (which are word aligned pointers, i.e. have 2 free bits on 32-bit KVM, and 3 free bits on 64-bit KVM), and use the bit to implement a *very* rudimentary per-rmap spinlock. The only anticipated usage of the lock outside of mmu_lock is for aging gfns, and collisions between aging and other MMU rmap operations are quite rare, e.g. unless userspace is being silly and aging a tiny range over and over in a tight loop, time between contention when aging an actively running VM is O(seconds). In short, a more sophisticated locking scheme shouldn't be necessary. Note, the lock only protects the rmap structure itself, SPTEs that are pointed at by a locked rmap can still be modified and zapped by another task (KVM drops/zaps SPTEs before deleting the rmap entries) Signed-off-by: Sean Christopherson Co-developed-by: James Houghton Signed-off-by: James Houghton --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu/mmu.c | 129 +++++++++++++++++++++++++++++--- 2 files changed, 120 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index adc814bad4bb..d1164ca3e840 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -401,7 +402,7 @@ union kvm_cpu_role { }; struct kvm_rmap_head { - unsigned long val; + atomic_long_t val; }; struct kvm_pio_request { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 17de470f542c..79676798ba77 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -909,11 +909,117 @@ static struct kvm_memory_slot *gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu * About rmap_head encoding: * * If the bit zero of rmap_head->val is clear, then it points to the only spte - * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct + * in this rmap chain. Otherwise, (rmap_head->val & ~3) points to a struct * pte_list_desc containing more mappings. */ #define KVM_RMAP_MANY BIT(0) +/* + * rmaps and PTE lists are mostly protected by mmu_lock (the shadow MMU always + * operates with mmu_lock held for write), but rmaps can be walked without + * holding mmu_lock so long as the caller can tolerate SPTEs in the rmap chain + * being zapped/dropped _while the rmap is locked_. + * + * Other than the KVM_RMAP_LOCKED flag, modifications to rmap entries must be + * done while holding mmu_lock for write. This allows a task walking rmaps + * without holding mmu_lock to concurrently walk the same entries as a task + * that is holding mmu_lock but _not_ the rmap lock. Neither task will modify + * the rmaps, thus the walks are stable. + * + * As alluded to above, SPTEs in rmaps are _not_ protected by KVM_RMAP_LOCKED, + * only the rmap chains themselves are protected. E.g. holding an rmap's lock + * ensures all "struct pte_list_desc" fields are stable. + */ +#define KVM_RMAP_LOCKED BIT(1) + +static unsigned long kvm_rmap_lock(struct kvm_rmap_head *rmap_head) +{ + unsigned long old_val, new_val; + + /* + * Elide the lock if the rmap is empty, as lockless walkers (read-only + * mode) don't need to (and can't) walk an empty rmap, nor can they add + * entries to the rmap. I.e. the only paths that process empty rmaps + * do so while holding mmu_lock for write, and are mutually exclusive. + */ + old_val = atomic_long_read(&rmap_head->val); + if (!old_val) + return 0; + + do { + /* + * If the rmap is locked, wait for it to be unlocked before + * trying acquire the lock, e.g. to bounce the cache line. + */ + while (old_val & KVM_RMAP_LOCKED) { + old_val = atomic_long_read(&rmap_head->val); + cpu_relax(); + } + + /* + * Recheck for an empty rmap, it may have been purged by the + * task that held the lock. + */ + if (!old_val) + return 0; + + new_val = old_val | KVM_RMAP_LOCKED; + /* + * Use try_cmpxchg_acquire to prevent reads and writes to the rmap + * from being reordered outside of the critical section created by + * __kvm_rmap_lock. + * + * Pairs with smp_store_release in kvm_rmap_unlock. + * + * For the !old_val case, no ordering is needed, as there is no rmap + * to walk. + */ + } while (!atomic_long_try_cmpxchg_acquire(&rmap_head->val, &old_val, new_val)); + + /* Return the old value, i.e. _without_ the LOCKED bit set. */ + return old_val; +} + +static void kvm_rmap_unlock(struct kvm_rmap_head *rmap_head, + unsigned long new_val) +{ + WARN_ON_ONCE(new_val & KVM_RMAP_LOCKED); + /* + * Ensure that all accesses to the rmap have completed + * before we actually unlock the rmap. + * + * Pairs with the atomic_long_try_cmpxchg_acquire in __kvm_rmap_lock. + */ + atomic_long_set_release(&rmap_head->val, new_val); +} + +static unsigned long kvm_rmap_get(struct kvm_rmap_head *rmap_head) +{ + return atomic_long_read(&rmap_head->val) & ~KVM_RMAP_LOCKED; +} + +/* + * If mmu_lock isn't held, rmaps can only locked in read-only mode. The actual + * locking is the same, but the caller is disallowed from modifying the rmap, + * and so the unlock flow is a nop if the rmap is/was empty. + */ +__maybe_unused +static unsigned long kvm_rmap_lock_readonly(struct kvm_rmap_head *rmap_head) +{ + return __kvm_rmap_lock(rmap_head); +} + +__maybe_unused +static void kvm_rmap_unlock_readonly(struct kvm_rmap_head *rmap_head, + unsigned long old_val) +{ + if (!old_val) + return; + + KVM_MMU_WARN_ON(old_val != kvm_rmap_get(rmap_head)); + atomic_long_set(&rmap_head->val, old_val); +} + /* * Returns the number of pointers in the rmap chain, not counting the new one. */ @@ -924,7 +1030,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, struct pte_list_desc *desc; int count = 0; - old_val = rmap_head->val; + old_val = kvm_rmap_lock(rmap_head); if (!old_val) { new_val = (unsigned long)spte; @@ -956,7 +1062,7 @@ static int pte_list_add(struct kvm_mmu_memory_cache *cache, u64 *spte, desc->sptes[desc->spte_count++] = spte; } - rmap_head->val = new_val; + kvm_rmap_unlock(rmap_head, new_val); return count; } @@ -1004,7 +1110,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, unsigned long rmap_val; int i; - rmap_val = rmap_head->val; + rmap_val = kvm_rmap_lock(rmap_head); if (KVM_BUG_ON_DATA_CORRUPTION(!rmap_val, kvm)) goto out; @@ -1030,7 +1136,7 @@ static void pte_list_remove(struct kvm *kvm, u64 *spte, } out: - rmap_head->val = rmap_val; + kvm_rmap_unlock(rmap_head, rmap_val); } static void kvm_zap_one_rmap_spte(struct kvm *kvm, @@ -1048,7 +1154,7 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, unsigned long rmap_val; int i; - rmap_val = rmap_head->val; + rmap_val = kvm_rmap_lock(rmap_head); if (!rmap_val) return false; @@ -1067,13 +1173,13 @@ static bool kvm_zap_all_rmap_sptes(struct kvm *kvm, } out: /* rmap_head is meaningless now, remember to reset it */ - rmap_head->val = 0; + kvm_rmap_unlock(rmap_head, 0); return true; } unsigned int pte_list_count(struct kvm_rmap_head *rmap_head) { - unsigned long rmap_val = rmap_head->val; + unsigned long rmap_val = kvm_rmap_get(rmap_head); struct pte_list_desc *desc; if (!rmap_val) @@ -1139,7 +1245,7 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { - unsigned long rmap_val = rmap_head->val; + unsigned long rmap_val = kvm_rmap_get(rmap_head); u64 *sptep; if (!rmap_val) @@ -1483,7 +1589,7 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator) while (++iterator->rmap <= iterator->end_rmap) { iterator->gfn += KVM_PAGES_PER_HPAGE(iterator->level); - if (iterator->rmap->val) + if (atomic_long_read(&iterator->rmap->val)) return; } @@ -2513,7 +2619,8 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp, * avoids retaining a large number of stale nested SPs. */ if (tdp_enabled && invalid_list && - child->role.guest_mode && !child->parent_ptes.val) + child->role.guest_mode && + !atomic_long_read(&child->parent_ptes.val)) return kvm_mmu_prepare_zap_page(kvm, child, invalid_list); }