[v5,13/13] KVM: Optimize overlapping memslots check

Message ID	4f8718fc8da57ab799e95ef7c2060f8be0f2391f.1632171479.git.maciej.szmigiero@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> To: Paolo Bonzini <pbonzini@redhat.com>, Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Sean Christopherson <seanjc@google.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Igor Mammedov <imammedo@redhat.com>, Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>, Julien Thierry <julien.thierry.kdev@gmail.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Huacai Chen <chenhuacai@kernel.org>, Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>, Paul Mackerras <paulus@ozlabs.org>, Christian Borntraeger <borntraeger@de.ibm.com>, Janosch Frank <frankja@linux.ibm.com>, David Hildenbrand <david@redhat.com>, Cornelia Huck <cohuck@redhat.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Joerg Roedel <joro@8bytes.org>, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 13/13] KVM: Optimize overlapping memslots check Date: Mon, 20 Sep 2021 23:39:01 +0200 Message-Id: <4f8718fc8da57ab799e95ef7c2060f8be0f2391f.1632171479.git.maciej.szmigiero@oracle.com> In-Reply-To: <cover.1632171478.git.maciej.szmigiero@oracle.com> References: <cover.1632171478.git.maciej.szmigiero@oracle.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM: Scalable memslots implementation \| expand [v5,00/13] KVM: Scalable memslots implementation [v5,01/13] KVM: x86: Cache total page count to avoid traversing the memslot array [v5,02/13] KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't changed [v5,03/13] KVM: Add "old" memslot parameter to kvm_arch_prepare_memory_region() [v5,04/13] KVM: x86: Move n_memslots_pages recalc to kvm_arch_prepare_memory_region() [v5,05/13] KVM: Integrate gfn_to_memslot_approx() into search_memslots() [v5,06/13] KVM: Move WARN on invalid memslot index to update_memslots() [v5,07/13] KVM: Just resync arch fields when slots_arch_lock gets reacquired [v5,08/13] KVM: Resolve memslot ID via a hash table instead of via a static array [v5,09/13] KVM: Use interval tree to do fast hva lookup in memslots [v5,10/13] KVM: s390: Introduce kvm_s390_get_gfn_end() [v5,11/13] KVM: Keep memslots in tree-based structures instead of array-based ones [v5,12/13] KVM: Optimize gfn lookup in kvm_zap_gfn_range() [v5,13/13] KVM: Optimize overlapping memslots check

Maciej S. Szmigiero Sept. 20, 2021, 9:39 p.m. UTC

From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

Do a quick lookup for possibly overlapping gfns when creating or moving
a memslot instead of performing a linear scan of the whole memslot set.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 virt/kvm/kvm_main.c | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)

Sean Christopherson Oct. 26, 2021, 6:59 p.m. UTC | #1

On Mon, Sep 20, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> Do a quick lookup for possibly overlapping gfns when creating or moving
> a memslot instead of performing a linear scan of the whole memslot set.
> 
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
>  virt/kvm/kvm_main.c | 36 +++++++++++++++++++++++++++---------
>  1 file changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 5fea467d6fec..78dad8c6376f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1667,6 +1667,30 @@ static int kvm_delete_memslot(struct kvm *kvm,
>  	return kvm_set_memslot(kvm, mem, old, &new, as_id, KVM_MR_DELETE);
>  }
>  
> +static bool kvm_check_memslot_overlap(struct kvm_memslots *slots,
> +				      struct kvm_memory_slot *nslot)
> +{
> +	int idx = slots->node_idx;
> +	gfn_t nend = nslot->base_gfn + nslot->npages;
> +	struct rb_node *node;
> +
> +	kvm_for_each_memslot_in_gfn_range(node, slots, nslot->base_gfn, nend) {
> +		struct kvm_memory_slot *cslot;
> +		gfn_t cend;
> +
> +		cslot = container_of(node, struct kvm_memory_slot, gfn_node[idx]);
> +		cend = cslot->base_gfn + cslot->npages;
> +		if (cslot->id == nslot->id)
> +			continue;
> +
> +		/* kvm_for_each_in_gfn_no_more() guarantees that cslot->base_gfn < nend */
> +		if (cend > nslot->base_gfn)

Hmm, IMO the need for this check means that kvm_for_each_memslot_in_gfn_range()
is flawed.  The user of kvm_for_each...() should not be responsible for skipping
memslots that do not actually overlap the requested range.  I.e. this function
should be no more than:

static bool kvm_check_memslot_overlap(struct kvm_memslots *slots,
				      struct kvm_memory_slot *slot)
{
	gfn_t start = slot->base_gfn;
	gfn_t end = start + slot->npages;

	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
		if (iter.slot->id != slot->id)
			return true;
	}

	return false;
}


and I suspect kvm_zap_gfn_range() could be further simplified as well.

Looking back at the introduction of the helper, its comment's highlighting of
"possibily" now makes sense.

  /* Iterate over each memslot *possibly* intersecting [start, end) range */
  #define kvm_for_each_memslot_in_gfn_range(node, slots, start, end)	\

That's an unnecessarily bad API.  It's a very solvable problem for the iterator
helpers to advance until there's actually overlap, not doing so violates the
principle of least surprise, and unless I'm missing something, there's no use
case for an "approximate" iteration.

> +			return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Allocate some memory and give it an address in the guest physical address
>   * space.
> @@ -1752,16 +1776,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	}
>  
>  	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> -		int bkt;
> -
>  		/* Check for overlaps */

This comment can be dropped, the new function is fairly self-documenting.

> -		kvm_for_each_memslot(tmp, bkt, __kvm_memslots(kvm, as_id)) {
> -			if (tmp->id == id)
> -				continue;
> -			if (!((new.base_gfn + new.npages <= tmp->base_gfn) ||
> -			      (new.base_gfn >= tmp->base_gfn + tmp->npages)))
> -				return -EEXIST;
> -		}
> +		if (kvm_check_memslot_overlap(__kvm_memslots(kvm, as_id),
> +					      &new))

And then with the comment dropped, the wrap can be avoided by folding the check
into the outer if statement, e.g.

	if (((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) &&
	    kvm_check_memslot_overlap(__kvm_memslots(kvm, as_id), &new))
		return -EEXIST;

> +			return -EEXIST;
>  	}
>  
>  	/* Allocate/free page dirty bitmap as needed */

Maciej S. Szmigiero Oct. 27, 2021, 1:48 p.m. UTC | #2

On 26.10.2021 20:59, Sean Christopherson wrote:
> On Mon, Sep 20, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> Do a quick lookup for possibly overlapping gfns when creating or moving
>> a memslot instead of performing a linear scan of the whole memslot set.
>>
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>> ---
>>   virt/kvm/kvm_main.c | 36 +++++++++++++++++++++++++++---------
>>   1 file changed, 27 insertions(+), 9 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 5fea467d6fec..78dad8c6376f 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1667,6 +1667,30 @@ static int kvm_delete_memslot(struct kvm *kvm,
>>   	return kvm_set_memslot(kvm, mem, old, &new, as_id, KVM_MR_DELETE);
>>   }
>>   
>> +static bool kvm_check_memslot_overlap(struct kvm_memslots *slots,
>> +				      struct kvm_memory_slot *nslot)
>> +{
>> +	int idx = slots->node_idx;
>> +	gfn_t nend = nslot->base_gfn + nslot->npages;
>> +	struct rb_node *node;
>> +
>> +	kvm_for_each_memslot_in_gfn_range(node, slots, nslot->base_gfn, nend) {
>> +		struct kvm_memory_slot *cslot;
>> +		gfn_t cend;
>> +
>> +		cslot = container_of(node, struct kvm_memory_slot, gfn_node[idx]);
>> +		cend = cslot->base_gfn + cslot->npages;
>> +		if (cslot->id == nslot->id)
>> +			continue;
>> +
>> +		/* kvm_for_each_in_gfn_no_more() guarantees that cslot->base_gfn < nend */
>> +		if (cend > nslot->base_gfn)
> 
> Hmm, IMO the need for this check means that kvm_for_each_memslot_in_gfn_range()
> is flawed.  The user of kvm_for_each...() should not be responsible for skipping
> memslots that do not actually overlap the requested range.  I.e. this function
> should be no more than:
> 
> static bool kvm_check_memslot_overlap(struct kvm_memslots *slots,
> 				      struct kvm_memory_slot *slot)
> {
> 	gfn_t start = slot->base_gfn;
> 	gfn_t end = start + slot->npages;
> 
> 	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
> 		if (iter.slot->id != slot->id)
> 			return true;
> 	}
> 
> 	return false;
> }
> 
> 
> and I suspect kvm_zap_gfn_range() could be further simplified as well.
> 
> Looking back at the introduction of the helper, its comment's highlighting of
> "possibily" now makes sense.
> 
>    /* Iterate over each memslot *possibly* intersecting [start, end) range */
>    #define kvm_for_each_memslot_in_gfn_range(node, slots, start, end)	\
> 
> That's an unnecessarily bad API.  It's a very solvable problem for the iterator
> helpers to advance until there's actually overlap, not doing so violates the
> principle of least surprise, and unless I'm missing something, there's no use
> case for an "approximate" iteration.

In principle this can be done, however this will complicate the gfn
iterator logic - especially the kvm_memslot_iter_start() part, which
will already get messier from open-coding kvm_memslots_gfn_upper_bound()
there.

At the same kvm_zap_gfn_range() will still need to do the memslot range
<-> request range merging by itself as it does not want to process the
whole returned memslot, but rather just the part that's actually
overlapping its requested range.

In the worst case, the current code can return one memslot too much, so
I don't think it's worth bringing additional complexity just to detect
and skip it - it's not that uncommon to design an API that needs extra
checking from its caller to cover some corner cases.

For example, see pthread_cond_wait() or kernel waitqueues with their
spurious wakeups or atomic_compare_exchange_weak() from C11.
And these are higher level APIs than a very limited internal KVM one
with just two callers.
In case of kvm_zap_gfn_range() the necessary checking is already
there and has to be kept due to the above range merging.

Also, a code that is simpler is easier to understand, maintain and
so less prone to subtle bugs.

>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /*
>>    * Allocate some memory and give it an address in the guest physical address
>>    * space.
>> @@ -1752,16 +1776,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
>>   	}
>>   
>>   	if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
>> -		int bkt;
>> -
>>   		/* Check for overlaps */
> 
> This comment can be dropped, the new function is fairly self-documenting.

Will drop it.

>> -		kvm_for_each_memslot(tmp, bkt, __kvm_memslots(kvm, as_id)) {
>> -			if (tmp->id == id)
>> -				continue;
>> -			if (!((new.base_gfn + new.npages <= tmp->base_gfn) ||
>> -			      (new.base_gfn >= tmp->base_gfn + tmp->npages)))
>> -				return -EEXIST;
>> -		}
>> +		if (kvm_check_memslot_overlap(__kvm_memslots(kvm, as_id),
>> +					      &new))
> 
> And then with the comment dropped, the wrap can be avoided by folding the check
> into the outer if statement, e.g.
> 
> 	if (((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) &&
> 	    kvm_check_memslot_overlap(__kvm_memslots(kvm, as_id), &new))
> 		return -EEXIST;
> 

Will fold it.

Thanks,
Maciej

Sean Christopherson Oct. 28, 2021, 5:53 p.m. UTC | #3

On Wed, Oct 27, 2021, Maciej S. Szmigiero wrote:
> On 26.10.2021 20:59, Sean Christopherson wrote:
> > > +		/* kvm_for_each_in_gfn_no_more() guarantees that cslot->base_gfn < nend */
> > > +		if (cend > nslot->base_gfn)
> > 
> > Hmm, IMO the need for this check means that kvm_for_each_memslot_in_gfn_range()
> > is flawed.  The user of kvm_for_each...() should not be responsible for skipping
> > memslots that do not actually overlap the requested range.  I.e. this function
> > should be no more than:
> > 
> > static bool kvm_check_memslot_overlap(struct kvm_memslots *slots,
> > 				      struct kvm_memory_slot *slot)
> > {
> > 	gfn_t start = slot->base_gfn;
> > 	gfn_t end = start + slot->npages;
> > 
> > 	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
> > 		if (iter.slot->id != slot->id)
> > 			return true;
> > 	}
> > 
> > 	return false;
> > }
> > 
> > 
> > and I suspect kvm_zap_gfn_range() could be further simplified as well.
> > 
> > Looking back at the introduction of the helper, its comment's highlighting of
> > "possibily" now makes sense.
> > 
> >    /* Iterate over each memslot *possibly* intersecting [start, end) range */
> >    #define kvm_for_each_memslot_in_gfn_range(node, slots, start, end)	\
> > 
> > That's an unnecessarily bad API.  It's a very solvable problem for the iterator
> > helpers to advance until there's actually overlap, not doing so violates the
> > principle of least surprise, and unless I'm missing something, there's no use
> > case for an "approximate" iteration.
> 
> In principle this can be done, however this will complicate the gfn
> iterator logic - especially the kvm_memslot_iter_start() part, which
> will already get messier from open-coding kvm_memslots_gfn_upper_bound()
> there.

Hmm, no, this is trivial to handle, though admittedly a bit unpleasant.

/*
 * Note, kvm_memslot_iter_start() finds the first memslot that _may_ overlap
 * the range, it does not verify that there is actual overlap.  The check in
 * the loop body filters out the case where the highest memslot with a base_gfn
 * below start doesn't actually overlap.
 */
#define kvm_for_each_memslot_in_gfn_range(iter, node, slots, start, end) \
        for (kvm_memslot_iter_start(iter, node, slots, start, end);      \
             kvm_memslot_iter_is_valid(iter);                            \
             kvm_memslot_iter_next(node))                                \
		if (iter->slot->base_gfn + iter->slot->npages < start) { \
		} else



> At the same kvm_zap_gfn_range() will still need to do the memslot range
> <-> request range merging by itself as it does not want to process the
> whole returned memslot, but rather just the part that's actually
> overlapping its requested range.

That's purely coincidental though.  IMO, kvm_zap_gfn_range() would be well within
its rights to sanity the memslot, e.g.

	if (WARN_ON(memslot->base_gfn + memslot->npages < gfn_start))
		continue;
 
> In the worst case, the current code can return one memslot too much, so
> I don't think it's worth bringing additional complexity just to detect
> and skip it

I strongly disagree.  This is very much a case of one chunk of code that knows
the internal details of what it's doing taking on all the pain and complexity
so that users of the helper

> it's not that uncommon to design an API that needs extra checking from its
> caller to cover some corner cases.

That doesn't mean it's desirable.

> For example, see pthread_cond_wait() or kernel waitqueues with their
> spurious wakeups or atomic_compare_exchange_weak() from C11.
> And these are higher level APIs than a very limited internal KVM one
> with just two callers.

Two _existing_ callers.  Odds are very, very high that future usage of
kvm_for_each_memslot_in_gfn_range() will overlook the detail about the helper
not actually doing what it says it does.  That could be addressed to some extent
by renaming it kvm_for_each_memslot_in_gfn_range_approx() or whatever, but as
above this isn't difficult to handle, just gross.

> In case of kvm_zap_gfn_range() the necessary checking is already
> there and has to be kept due to the above range merging.
> 
> Also, a code that is simpler is easier to understand, maintain and
> so less prone to subtle bugs.

Heh, and IMO that's an argument for putting all the complexity into a single
location.  :-)

Maciej S. Szmigiero Oct. 29, 2021, 4:23 p.m. UTC | #4

On 28.10.2021 19:53, Sean Christopherson wrote:
> On Wed, Oct 27, 2021, Maciej S. Szmigiero wrote:
>> On 26.10.2021 20:59, Sean Christopherson wrote:
>>>> +		/* kvm_for_each_in_gfn_no_more() guarantees that cslot->base_gfn < nend */
>>>> +		if (cend > nslot->base_gfn)
>>>
>>> Hmm, IMO the need for this check means that kvm_for_each_memslot_in_gfn_range()
>>> is flawed.  The user of kvm_for_each...() should not be responsible for skipping
>>> memslots that do not actually overlap the requested range.  I.e. this function
>>> should be no more than:
>>>
>>> static bool kvm_check_memslot_overlap(struct kvm_memslots *slots,
>>> 				      struct kvm_memory_slot *slot)
>>> {
>>> 	gfn_t start = slot->base_gfn;
>>> 	gfn_t end = start + slot->npages;
>>>
>>> 	kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
>>> 		if (iter.slot->id != slot->id)
>>> 			return true;
>>> 	}
>>>
>>> 	return false;
>>> }
>>>
>>>
>>> and I suspect kvm_zap_gfn_range() could be further simplified as well.
>>>
>>> Looking back at the introduction of the helper, its comment's highlighting of
>>> "possibily" now makes sense.
>>>
>>>     /* Iterate over each memslot *possibly* intersecting [start, end) range */
>>>     #define kvm_for_each_memslot_in_gfn_range(node, slots, start, end)	\
>>>
>>> That's an unnecessarily bad API.  It's a very solvable problem for the iterator
>>> helpers to advance until there's actually overlap, not doing so violates the
>>> principle of least surprise, and unless I'm missing something, there's no use
>>> case for an "approximate" iteration.
>>
>> In principle this can be done, however this will complicate the gfn
>> iterator logic - especially the kvm_memslot_iter_start() part, which
>> will already get messier from open-coding kvm_memslots_gfn_upper_bound()
>> there.
> 
> Hmm, no, this is trivial to handle, though admittedly a bit unpleasant.
> 
> /*
>   * Note, kvm_memslot_iter_start() finds the first memslot that _may_ overlap
>   * the range, it does not verify that there is actual overlap.  The check in
>   * the loop body filters out the case where the highest memslot with a base_gfn
>   * below start doesn't actually overlap.
>   */
> #define kvm_for_each_memslot_in_gfn_range(iter, node, slots, start, end) \
>          for (kvm_memslot_iter_start(iter, node, slots, start, end);      \
>               kvm_memslot_iter_is_valid(iter);                            \
>               kvm_memslot_iter_next(node))                                \
> 		if (iter->slot->base_gfn + iter->slot->npages < start) { \
> 		} else

As you say, that's rather unpleasant, since we know that the first
returned memslot is the only one that's possibly *not* overlapping
(and then it only happens sometimes).
Yet with the above change we'll pay the price of this check for every
loop iteration (for every returned memslot).
That's definitely not optimizing for the most common case.

Also, the above code has a bug - using a [start, end) notation compatible
with what kvm_for_each_memslot_in_gfn_range() expects,  where [1, 4)
means a range consisting of { 1, 2, 3 }, consider a tree consisting of the
following two memslots: [1, 2), [3, 5)

When kvm_for_each_memslot_in_gfn_range() is then called to "return"
memslots overlapping range [2, 4) it will "return" the [1, 2) memslot, too -
even though it does *not*  actually overlap the requested range.

While this bug is easy to fix (just use "<=" instead of "<") it serves to
underline that one has to be very careful with working with this type of
code as it is very easy to introduce subtle mistakes here.

Here, how many lines of code a function or macro has is not a good proxy
metric for how hard is to build a strictly accurate mental model of it.

>> At the same kvm_zap_gfn_range() will still need to do the memslot range
>> <-> request range merging by itself as it does not want to process the
>> whole returned memslot, but rather just the part that's actually
>> overlapping its requested range.
> 
> That's purely coincidental though.  IMO, kvm_zap_gfn_range() would be well within
> its rights to sanity the memslot, e.g.
> 
> 	if (WARN_ON(memslot->base_gfn + memslot->npages < gfn_start))
> 		continue;
>   
>> In the worst case, the current code can return one memslot too much, so
>> I don't think it's worth bringing additional complexity just to detect
>> and skip it
> 
> I strongly disagree.  This is very much a case of one chunk of code that knows
> the internal details of what it's doing taking on all the pain and complexity
> so that users of the helper
> 
>> it's not that uncommon to design an API that needs extra checking from its
>> caller to cover some corner cases.
> 
> That doesn't mean it's desirable.
> 
>> For example, see pthread_cond_wait() or kernel waitqueues with their
>> spurious wakeups or atomic_compare_exchange_weak() from C11.
>> And these are higher level APIs than a very limited internal KVM one
>> with just two callers.
> 
> Two _existing_ callers.  Odds are very, very high that future usage of
> kvm_for_each_memslot_in_gfn_range() will overlook the detail about the helper
> not actually doing what it says it does.  That could be addressed to some extent
> by renaming it kvm_for_each_memslot_in_gfn_range_approx() or whatever, but as
> above this isn't difficult to handle, just gross.

What kind of future users of this API do you envision?

I've pointed out above that adding this extra check means essentially
optimizing for an uncommon case.

One of the callers of this function already has the necessary code to
reject non-overlapping memslots and have to keep it to calculate the
effective overlapping range for each memslot.
For the second caller (which, by the way, is called much less often than
kvm_zap_gfn_range()) it's a matter of one extra condition.

>> In case of kvm_zap_gfn_range() the necessary checking is already
>> there and has to be kept due to the above range merging.
>>
>> Also, a code that is simpler is easier to understand, maintain and
>> so less prone to subtle bugs.
> 
> Heh, and IMO that's an argument for putting all the complexity into a single
> location.  :-)
> 

If you absolutely insist then obviously I can change the code to return
only memslots strictly overlapping the requested range in the next
patchset version.

However, I do want to know that this will be the final logic here,
since I am concerned that yet another bug might slip in, this time
under my radar, too.

Thanks,
Maciej

Sean Christopherson Oct. 30, 2021, 12:32 a.m. UTC | #5

On Fri, Oct 29, 2021, Maciej S. Szmigiero wrote:
> On 28.10.2021 19:53, Sean Christopherson wrote:
> > Hmm, no, this is trivial to handle, though admittedly a bit unpleasant.
> > 
> > /*
> >   * Note, kvm_memslot_iter_start() finds the first memslot that _may_ overlap
> >   * the range, it does not verify that there is actual overlap.  The check in
> >   * the loop body filters out the case where the highest memslot with a base_gfn
> >   * below start doesn't actually overlap.
> >   */
> > #define kvm_for_each_memslot_in_gfn_range(iter, node, slots, start, end) \
> >          for (kvm_memslot_iter_start(iter, node, slots, start, end);      \
> >               kvm_memslot_iter_is_valid(iter);                            \
> >               kvm_memslot_iter_next(node))                                \
> > 		if (iter->slot->base_gfn + iter->slot->npages < start) { \
> > 		} else
> 
> As you say, that's rather unpleasant, since we know that the first
> returned memslot is the only one that's possibly *not* overlapping
> (and then it only happens sometimes).
> Yet with the above change we'll pay the price of this check for every
> loop iteration (for every returned memslot).

I'm definitely not saying that it's the best/right/only way to handle this,
merely pointing out that it's not _that_ complex, modulo off-by-one bugs :-)

> That's definitely not optimizing for the most common case.

Meh, it's a nop for kvm_check_memslot_overlap() and completely in the noise for
kvm_zap_gfn_range().  Not saying I disagree that's a flawed way to handle this
just saying that even the quick-and-dirty solution is extremely unlikely to be
relevant to performance.

> Also, the above code has a bug - using a [start, end) notation compatible
> with what kvm_for_each_memslot_in_gfn_range() expects,  where [1, 4)
> means a range consisting of { 1, 2, 3 }, consider a tree consisting of the
> following two memslots: [1, 2), [3, 5)
> 
> When kvm_for_each_memslot_in_gfn_range() is then called to "return"
> memslots overlapping range [2, 4) it will "return" the [1, 2) memslot, too -
> even though it does *not*  actually overlap the requested range.
> 
> While this bug is easy to fix (just use "<=" instead of "<") it serves to
> underline that one has to be very careful with working with this type of
> code as it is very easy to introduce subtle mistakes here.

Yes, and that's exactly why I want to write this _once_.

> > Two _existing_ callers.  Odds are very, very high that future usage of
> > kvm_for_each_memslot_in_gfn_range() will overlook the detail about the helper
> > not actually doing what it says it does.  That could be addressed to some extent
> > by renaming it kvm_for_each_memslot_in_gfn_range_approx() or whatever, but as
> > above this isn't difficult to handle, just gross.
> 
> What kind of future users of this API do you envision?
> 
> I've pointed out above that adding this extra check means essentially
> optimizing for an uncommon case.

Usage similar to kvm_zap_gfn_range() where KVM wants to take action on a specific
gfn range.  I'm actually somewhat surprised that none of the other architectures
have a use case in their MMUs, though I don't know the story for things like
shadow paging that "necessitate" x86's behavior.

> One of the callers of this function already has the necessary code to
> reject non-overlapping memslots and have to keep it to calculate the
> effective overlapping range for each memslot.
> For the second caller (which, by the way, is called much less often than
> kvm_zap_gfn_range()) it's a matter of one extra condition.
> 
> > > In case of kvm_zap_gfn_range() the necessary checking is already
> > > there and has to be kept due to the above range merging.
> > > 
> > > Also, a code that is simpler is easier to understand, maintain and
> > > so less prone to subtle bugs.
> > 
> > Heh, and IMO that's an argument for putting all the complexity into a single
> > location.  :-)
> > 
> 
> If you absolutely insist then obviously I can change the code to return
> only memslots strictly overlapping the requested range in the next
> patchset version.

I feel pretty strongly that the risk vs. reward is heavily in favor of returning
only strictly overlapping memslots.  The risk being that a few years down the road
someone runs afoul of this and we end up with a bug in production.  The reward is
avoiding writing moderately complex code and at best shave a few uops in an x86
slooow path.  It's entirely possible there's never a third user, but IMO there
isn't enough reward to justify even the smallest amount of risk.

Paolo, any opinion?

[v5,13/13] KVM: Optimize overlapping memslots check

Commit Message

Comments

Patch