diff mbox series

[v6,21/29] KVM: Resolve memslot ID via a hash table instead of via a static array

Message ID a6b62e0bdba2a82bbc31dcad3c8525ccc5ff0bff.1638304316.git.maciej.szmigiero@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: Scalable memslots implementation | expand

Commit Message

Maciej S. Szmigiero Nov. 30, 2021, 9:41 p.m. UTC
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

Memslot ID to the corresponding memslot mappings are currently kept as
indices in static id_to_index array.
The size of this array depends on the maximum allowed memslot count
(regardless of the number of memslots actually in use).

This has become especially problematic recently, when memslot count cap was
removed, so the maximum count is now full 32k memslots - the maximum
allowed by the current KVM API.

Keeping these IDs in a hash table (instead of an array) avoids this
problem.

Resolving a memslot ID to the actual memslot (instead of its index) will
also enable transitioning away from an array-based implementation of the
whole memslots structure in a later commit.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/kvm_host.h | 25 +++++++----
 virt/kvm/kvm_main.c      | 95 +++++++++++++++++++++++++++++++---------
 2 files changed, 91 insertions(+), 29 deletions(-)

Comments

Sean Christopherson Dec. 1, 2021, 2:54 a.m. UTC | #1
On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> Memslot ID to the corresponding memslot mappings are currently kept as
> indices in static id_to_index array.
> The size of this array depends on the maximum allowed memslot count
> (regardless of the number of memslots actually in use).
> 
> This has become especially problematic recently, when memslot count cap was
> removed, so the maximum count is now full 32k memslots - the maximum
> allowed by the current KVM API.
> 
> Keeping these IDs in a hash table (instead of an array) avoids this
> problem.
> 
> Resolving a memslot ID to the actual memslot (instead of its index) will
> also enable transitioning away from an array-based implementation of the
> whole memslots structure in a later commit.
> 
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Nit, your SoB should come last since you were the last person to handle the patch.
Maciej S. Szmigiero Dec. 1, 2021, 3:45 p.m. UTC | #2
On 01.12.2021 03:54, Sean Christopherson wrote:
> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> Memslot ID to the corresponding memslot mappings are currently kept as
>> indices in static id_to_index array.
>> The size of this array depends on the maximum allowed memslot count
>> (regardless of the number of memslots actually in use).
>>
>> This has become especially problematic recently, when memslot count cap was
>> removed, so the maximum count is now full 32k memslots - the maximum
>> allowed by the current KVM API.
>>
>> Keeping these IDs in a hash table (instead of an array) avoids this
>> problem.
>>
>> Resolving a memslot ID to the actual memslot (instead of its index) will
>> also enable transitioning away from an array-based implementation of the
>> whole memslots structure in a later commit.
>>
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>> Co-developed-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Nit, your SoB should come last since you were the last person to handle the patch.
> 

Thought that my SoB should come first as coming from the author of this
patch.

Documentation/process/submitting-patches.rst says that:
> Any further SoBs (Signed-off-by:'s) following the author's SoB are from
> people handling and transporting the patch, but were not involved in its
> development. SoB chains should reflect the **real** route a patch took
> as it was propagated to the maintainers and ultimately to Linus, with
> the first SoB entry signalling primary authorship of a single author.

So "further SoBs follow[] the author's SoB" and "the first SoB entry
signal[s] primary authorship".
But at the same time "SoB chains should reflect the **real** route a
patch took" - these rules contradict each other in our case.

Thanks,
Maciej
Sean Christopherson Dec. 1, 2021, 4:23 p.m. UTC | #3
On Wed, Dec 01, 2021, Maciej S. Szmigiero wrote:
> On 01.12.2021 03:54, Sean Christopherson wrote:
> > On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > 
> > > Memslot ID to the corresponding memslot mappings are currently kept as
> > > indices in static id_to_index array.
> > > The size of this array depends on the maximum allowed memslot count
> > > (regardless of the number of memslots actually in use).
> > > 
> > > This has become especially problematic recently, when memslot count cap was
> > > removed, so the maximum count is now full 32k memslots - the maximum
> > > allowed by the current KVM API.
> > > 
> > > Keeping these IDs in a hash table (instead of an array) avoids this
> > > problem.
> > > 
> > > Resolving a memslot ID to the actual memslot (instead of its index) will
> > > also enable transitioning away from an array-based implementation of the
> > > whole memslots structure in a later commit.
> > > 
> > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> > > Co-developed-by: Sean Christopherson <seanjc@google.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > 
> > Nit, your SoB should come last since you were the last person to handle the patch.
> > 
> 
> Thought that my SoB should come first as coming from the author of this
> patch.
> 
> Documentation/process/submitting-patches.rst says that:
> > Any further SoBs (Signed-off-by:'s) following the author's SoB are from
> > people handling and transporting the patch, but were not involved in its
> > development. SoB chains should reflect the **real** route a patch took
> > as it was propagated to the maintainers and ultimately to Linus, with
> > the first SoB entry signalling primary authorship of a single author.
> 
> So "further SoBs follow[] the author's SoB" and "the first SoB entry
> signal[s] primary authorship".
> But at the same time "SoB chains should reflect the **real** route a
> patch took" - these rules contradict each other in our case.

Yeah, this is a unusual case.  If we wanted to be super strict, for patches written
by you without a Co-developed-by, the SoB chain should be:

  Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
  Signed-off-by: Sean Christopherson <seanjc@google.com>
  Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>

but that's a bit ridiculous and probably unnecessary since my changes were little
more than code review feedback, which is why I think it's ok to just drop my SoB
for patches authored solely by you.

Co-developed-by is a slightly different case.  Because patches with multiple
authors are likely handed back and forth multiple times, as was the case here,
and because each author needs a SoB anyways, the normal rules are tweaked slightly
to require that the person submitting the patch is always last to capture that they
were the person that did the actual submission.  

There's another "When to use Acked-by:, Cc:, and Co-developed-by:" section in
submitting-patches.rst that covers this:

  Co-developed-by: states that the patch was co-created by multiple developers;
  it is used to give attribution to co-authors (in addition to the author
  attributed by the From: tag) when several people work on a single patch.  Since
  Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
  followed by a Signed-off-by: of the associated co-author.  Standard sign-off
  procedure applies, i.e. the ordering of Signed-off-by: tags should reflect the
  chronological history of the patch insofar as possible, regardless of whether
  the author is attributed via From: or Co-developed-by:.  Notably, the last
  Signed-off-by: must always be that of the developer submitting the patch.
  
  Note, the From: tag is optional when the From: author is also the person (and
  email) listed in the From: line of the email header.
  
  Example of a patch submitted by the From: author::
  
          <changelog>
  
          Co-developed-by: First Co-Author <first@coauthor.example.org>
          Signed-off-by: First Co-Author <first@coauthor.example.org>
          Co-developed-by: Second Co-Author <second@coauthor.example.org>
          Signed-off-by: Second Co-Author <second@coauthor.example.org>
          Signed-off-by: From Author <from@author.example.org>
  
  Example of a patch submitted by a Co-developed-by: author::
  
          From: From Author <from@author.example.org>
  
          <changelog>
  
          Co-developed-by: Random Co-Author <random@coauthor.example.org>
          Signed-off-by: Random Co-Author <random@coauthor.example.org>
          Signed-off-by: From Author <from@author.example.org>
          Co-developed-by: Submitting Co-Author <sub@coauthor.example.org>
          Signed-off-by: Submitting Co-Author <sub@coauthor.example.org>
diff mbox series

Patch

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 86562ffb6ea4..f3be79fb7d74 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -29,6 +29,7 @@ 
 #include <linux/refcount.h>
 #include <linux/nospec.h>
 #include <linux/notifier.h>
+#include <linux/hashtable.h>
 #include <linux/xarray.h>
 #include <asm/signal.h>
 
@@ -428,6 +429,7 @@  static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
 #define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
 
 struct kvm_memory_slot {
+	struct hlist_node id_node;
 	gfn_t base_gfn;
 	unsigned long npages;
 	unsigned long *dirty_bitmap;
@@ -529,8 +531,15 @@  static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
  */
 struct kvm_memslots {
 	u64 generation;
-	/* The mapping table from slot id to the index in memslots[]. */
-	short id_to_index[KVM_MEM_SLOTS_NUM];
+	/*
+	 * The mapping table from slot id to the index in memslots[].
+	 *
+	 * 7-bit bucket count matches the size of the old id to index array for
+	 * 512 slots, while giving good performance with this slot count.
+	 * Higher bucket counts bring only small performance improvements but
+	 * always result in higher memory usage (even for lower memslot counts).
+	 */
+	DECLARE_HASHTABLE(id_hash, 7);
 	atomic_t last_used_slot;
 	int used_slots;
 	struct kvm_memory_slot memslots[];
@@ -798,16 +807,14 @@  static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
 static inline
 struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
 {
-	int index = slots->id_to_index[id];
 	struct kvm_memory_slot *slot;
 
-	if (index < 0)
-		return NULL;
-
-	slot = &slots->memslots[index];
+	hash_for_each_possible(slots->id_hash, slot, id_node, id) {
+		if (slot->id == id)
+			return slot;
+	}
 
-	WARN_ON(slot->id != id);
-	return slot;
+	return NULL;
 }
 
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bbc0110224f3..f2eca32fbca8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -869,15 +869,13 @@  static void kvm_destroy_pm_notifier(struct kvm *kvm)
 
 static struct kvm_memslots *kvm_alloc_memslots(void)
 {
-	int i;
 	struct kvm_memslots *slots;
 
 	slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
 	if (!slots)
 		return NULL;
 
-	for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
-		slots->id_to_index[i] = -1;
+	hash_init(slots->id_hash);
 
 	return slots;
 }
@@ -1276,17 +1274,48 @@  static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
 	return 0;
 }
 
+static void kvm_replace_memslot(struct kvm_memslots *slots,
+				struct kvm_memory_slot *old,
+				struct kvm_memory_slot *new)
+{
+	/*
+	 * Remove the old memslot from the hash list, copying the node data
+	 * would corrupt the list.
+	 */
+	if (old) {
+		hash_del(&old->id_node);
+
+		if (!new)
+			return;
+
+		/* Copy the source *data*, not the pointer, to the destination. */
+		*new = *old;
+	}
+
+	/* (Re)Add the new memslot. */
+	hash_add(slots->id_hash, &new->id_node, new->id);
+}
+
+static void kvm_shift_memslot(struct kvm_memslots *slots, int dst, int src)
+{
+	struct kvm_memory_slot *mslots = slots->memslots;
+
+	kvm_replace_memslot(slots, &mslots[src], &mslots[dst]);
+}
+
 /*
  * Delete a memslot by decrementing the number of used slots and shifting all
  * other entries in the array forward one spot.
+ * @memslot is a detached dummy struct with just .id and .as_id filled.
  */
 static inline void kvm_memslot_delete(struct kvm_memslots *slots,
 				      struct kvm_memory_slot *memslot)
 {
 	struct kvm_memory_slot *mslots = slots->memslots;
+	struct kvm_memory_slot *oldslot = id_to_memslot(slots, memslot->id);
 	int i;
 
-	if (WARN_ON(slots->id_to_index[memslot->id] == -1))
+	if (WARN_ON(!oldslot))
 		return;
 
 	slots->used_slots--;
@@ -1294,12 +1323,17 @@  static inline void kvm_memslot_delete(struct kvm_memslots *slots,
 	if (atomic_read(&slots->last_used_slot) >= slots->used_slots)
 		atomic_set(&slots->last_used_slot, 0);
 
-	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
-		mslots[i] = mslots[i + 1];
-		slots->id_to_index[mslots[i].id] = i;
-	}
+	/*
+	 * Remove the to-be-deleted memslot from the list _before_ shifting
+	 * the trailing memslots forward, its data will be overwritten.
+	 * Defer the (somewhat pointless) copying of the memslot until after
+	 * the last slot has been shifted to avoid overwriting said last slot.
+	 */
+	kvm_replace_memslot(slots, oldslot, NULL);
+
+	for (i = oldslot - mslots; i < slots->used_slots; i++)
+		kvm_shift_memslot(slots, i, i + 1);
 	mslots[i] = *memslot;
-	slots->id_to_index[memslot->id] = -1;
 }
 
 /*
@@ -1317,30 +1351,39 @@  static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
  * itself is not preserved in the array, i.e. not swapped at this time, only
  * its new index into the array is tracked.  Returns the changed memslot's
  * current index into the memslots array.
+ * The memslot at the returned index will not be in @slots->id_hash by then.
+ * @memslot is a detached struct with desired final data of the changed slot.
  */
 static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
 					    struct kvm_memory_slot *memslot)
 {
 	struct kvm_memory_slot *mslots = slots->memslots;
+	struct kvm_memory_slot *oldslot = id_to_memslot(slots, memslot->id);
 	int i;
 
-	if (slots->id_to_index[memslot->id] == -1 || !slots->used_slots)
+	if (!oldslot || !slots->used_slots)
 		return -1;
 
+	/*
+	 * Delete the slot from the hash table before sorting the remaining
+	 * slots, the slot's data may be overwritten when copying slots as part
+	 * of the sorting proccess.  update_memslots() will unconditionally
+	 * rewrite the entire slot and re-add it to the hash table.
+	 */
+	kvm_replace_memslot(slots, oldslot, NULL);
+
 	/*
 	 * Move the target memslot backward in the array by shifting existing
 	 * memslots with a higher GFN (than the target memslot) towards the
 	 * front of the array.
 	 */
-	for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
+	for (i = oldslot - mslots; i < slots->used_slots - 1; i++) {
 		if (memslot->base_gfn > mslots[i + 1].base_gfn)
 			break;
 
 		WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
 
-		/* Shift the next memslot forward one and update its index. */
-		mslots[i] = mslots[i + 1];
-		slots->id_to_index[mslots[i].id] = i;
+		kvm_shift_memslot(slots, i, i + 1);
 	}
 	return i;
 }
@@ -1351,6 +1394,10 @@  static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
  * is not preserved in the array, i.e. not swapped at this time, only its new
  * index into the array is tracked.  Returns the changed memslot's final index
  * into the memslots array.
+ * The memslot at the returned index will not be in @slots->id_hash by then.
+ * @memslot is a detached struct with desired final data of the new or
+ * changed slot.
+ * Assumes that the memslot at @start index is not in @slots->id_hash.
  */
 static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
 					   struct kvm_memory_slot *memslot,
@@ -1365,9 +1412,7 @@  static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
 
 		WARN_ON_ONCE(memslot->base_gfn == mslots[i - 1].base_gfn);
 
-		/* Shift the next memslot back one and update its index. */
-		mslots[i] = mslots[i - 1];
-		slots->id_to_index[mslots[i].id] = i;
+		kvm_shift_memslot(slots, i, i - 1);
 	}
 	return i;
 }
@@ -1412,6 +1457,9 @@  static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
  * most likely to be referenced, sorting it to the front of the array was
  * advantageous.  The current binary search starts from the middle of the array
  * and uses an LRU pointer to improve performance for all memslots and GFNs.
+ *
+ * @memslot is a detached struct, not a part of the current or new memslot
+ * array.
  */
 static void update_memslots(struct kvm_memslots *slots,
 			    struct kvm_memory_slot *memslot,
@@ -1436,7 +1484,7 @@  static void update_memslots(struct kvm_memslots *slots,
 		 * its index accordingly.
 		 */
 		slots->memslots[i] = *memslot;
-		slots->id_to_index[memslot->id] = i;
+		kvm_replace_memslot(slots, NULL, &slots->memslots[i]);
 	}
 }
 
@@ -1529,6 +1577,7 @@  static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
 {
 	struct kvm_memslots *slots;
 	size_t new_size;
+	struct kvm_memory_slot *memslot;
 
 	if (change == KVM_MR_CREATE)
 		new_size = kvm_memslots_size(old->used_slots + 1);
@@ -1536,8 +1585,14 @@  static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
 		new_size = kvm_memslots_size(old->used_slots);
 
 	slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
-	if (likely(slots))
-		memcpy(slots, old, kvm_memslots_size(old->used_slots));
+	if (unlikely(!slots))
+		return NULL;
+
+	memcpy(slots, old, kvm_memslots_size(old->used_slots));
+
+	hash_init(slots->id_hash);
+	kvm_for_each_memslot(memslot, slots)
+		hash_add(slots->id_hash, &memslot->id_node, memslot->id);
 
 	return slots;
 }