diff mbox series

[1/1] KVM: restrict kvm_gfn_to_hva_cache_init() to only accept address ranges within one page

Message ID 20240423024933.80143-1-kele@cs.columbia.edu (mailing list archive)
State New, archived
Headers show
Series [1/1] KVM: restrict kvm_gfn_to_hva_cache_init() to only accept address ranges within one page | expand

Commit Message

Kele Huang April 23, 2024, 2:49 a.m. UTC
Function kvm_gfn_to_hva_cache_init() is exported and used to init
gfn to hva cache at various places, such as called in function
kvm_pv_enable_async_pf().  However, this function directly tail
calls function __kvm_gfn_to_hva_cache_init(), which assigns
ghc->memslot to NULL and returns 0 for cache initialization of
cross pages cache.  This is unsafe as 0 typically means a successful
return, but it actually fails to return a valid ghc->memslot.
The functions call kvm_gfn_to_hva_cache_init(), such as
kvm_lapic_set_vapicz_addr() do not make future checking on the
ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0.  Moreover,
other developers may try to initialize a cache across pages by
calling this function but fail with a success return value.

This patch fixes this issue by explicitly restricting function
kvm_gfn_to_hva_cache_init() to only accept address ranges within
one page and adding comments to the function accordingly.

Signed-off-by: Kele Huang <kele@cs.columbia.edu>
---
 virt/kvm/kvm_main.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Zide Chen April 26, 2024, 1:16 a.m. UTC | #1
On 4/22/2024 7:49 PM, Kele Huang wrote:
> Function kvm_gfn_to_hva_cache_init() is exported and used to init
> gfn to hva cache at various places, such as called in function
> kvm_pv_enable_async_pf().  However, this function directly tail
> calls function __kvm_gfn_to_hva_cache_init(), which assigns
> ghc->memslot to NULL and returns 0 for cache initialization of
> cross pages cache.  This is unsafe as 0 typically means a successful
> return, but it actually fails to return a valid ghc->memslot.
> The functions call kvm_gfn_to_hva_cache_init(), such as
> kvm_lapic_set_vapicz_addr() do not make future checking on the
> ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0.  Moreover,
> other developers may try to initialize a cache across pages by
> calling this function but fail with a success return value.
> 
> This patch fixes this issue by explicitly restricting function
> kvm_gfn_to_hva_cache_init() to only accept address ranges within
> one page and adding comments to the function accordingly.
> 

For cross page cache, returning a zero is not really a mistake, since it
verifies the entire region against the memslot and does initialize ghc.

The nullified memslot is to indicate that the read or write should
follow the slow path, and it's the caller's responsibility to check
ghc->memslot if needed.  e.g., kvm_read_guest_offset_cached() and
kvm_write_guest_offset_cached().  Please refer to commit fcfbc617547f
("KVM: Check for a bad hva before dropping into the ghc slow path")
Kele Huang April 26, 2024, 4:18 a.m. UTC | #2
Thanks for the feedback!  Yes, callers including kvm_read_guest_offset_cached()
and kvm_write_guest_offset_cached() checked if the read or write should follow
the slow path for the guest memory read/write by checking if
ghc->memslot is nullified.
However, please note that they are calling the static function
`__kvm_gfn_to_hva_cache_init()` instead of the exported function
`kvm_gfn_to_hva_cache_init()`.

Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using
a slow path for cross page but actually the "slow path" is to read/write guest
memory instead of for setting up a cross page cache.  The difference here
I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and
its name indicates it is used for gfn to hva cache init, while the
return value 0
does not really guarantee the cache is initialized when the address range
crosses pages.  For example, function kvm_lapic_set_vapicz_addr()
called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is
successfully initialized by checking the return value.

My thought is there probably should be another function to provide correct
cross page cache initialization but I am not sure if this is really needed.
Nevertheless, I think we could at least make the existing function
more accurate?

-- Kele


On Fri, Apr 26, 2024 at 12:14 AM Kele Huang <kele@cs.columbia.edu> wrote:
>
> Thanks for the feedback!  Yes, callers including kvm_read_guest_offset_cached() and kvm_write_guest_offset_cached() checked if the read or write should follow the slow path for the guest memory read/write by checking if ghc->memslot is nullified.  However, please note that they are calling the static function `__kvm_gfn_to_hva_cache_init()` instead of the exported function `kvm_gfn_to_hva_cache_init()`.
>
> Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using a slow path for cross page but actually the "slow path" is to read/write guest memory instead of for setting up a cross page cache.  The difference here I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and its name indicates it is used for gfn to hva cache init, while the return value 0 does not really guarantee the cache is initialized when the address range crosses pages.  For example, function kvm_lapic_set_vapicz_addr() called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is successfully initialized by checking the return value.
>
> My thought is there probably should be another function to provide correct cross page cache initialization but I am not sure if this is really needed.  Nevertheless, I think we could at least make the existing function more accurate?
>
> -- Kele
>
> On Thu, Apr 25, 2024 at 9:16 PM Chen, Zide <zide.chen@intel.com> wrote:
>>
>>
>>
>> On 4/22/2024 7:49 PM, Kele Huang wrote:
>> > Function kvm_gfn_to_hva_cache_init() is exported and used to init
>> > gfn to hva cache at various places, such as called in function
>> > kvm_pv_enable_async_pf().  However, this function directly tail
>> > calls function __kvm_gfn_to_hva_cache_init(), which assigns
>> > ghc->memslot to NULL and returns 0 for cache initialization of
>> > cross pages cache.  This is unsafe as 0 typically means a successful
>> > return, but it actually fails to return a valid ghc->memslot.
>> > The functions call kvm_gfn_to_hva_cache_init(), such as
>> > kvm_lapic_set_vapicz_addr() do not make future checking on the
>> > ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0.  Moreover,
>> > other developers may try to initialize a cache across pages by
>> > calling this function but fail with a success return value.
>> >
>> > This patch fixes this issue by explicitly restricting function
>> > kvm_gfn_to_hva_cache_init() to only accept address ranges within
>> > one page and adding comments to the function accordingly.
>> >
>>
>> For cross page cache, returning a zero is not really a mistake, since it
>> verifies the entire region against the memslot and does initialize ghc.
>>
>> The nullified memslot is to indicate that the read or write should
>> follow the slow path, and it's the caller's responsibility to check
>> ghc->memslot if needed.  e.g., kvm_read_guest_offset_cached() and
>> kvm_write_guest_offset_cached().  Please refer to commit fcfbc617547f
>> ("KVM: Check for a bad hva before dropping into the ghc slow path")
Christoph Hellwig April 26, 2024, 6:25 a.m. UTC | #3
> +	if (likely(nr_pages_needed == 1))
> +		return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);
> +	else
> +		return -EINVAL;
>  }

I don't really have enough knowledge to comment on the functionality,
but this code structure is really odd to read.

Try to handle error conditions with early returns first, please:

	if (unlikely(nr_pages_needed != 1))
		return -EINVAL;
	return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);
Sean Christopherson April 26, 2024, 4:17 p.m. UTC | #4
On Fri, Apr 26, 2024, Kele Huang wrote:

Please don't top post.  https://people.kernel.org/tglx/notes-about-netiquette

> > On Thu, Apr 25, 2024 at 9:16 PM Chen, Zide <zide.chen@intel.com> wrote:
> >> On 4/22/2024 7:49 PM, Kele Huang wrote:
> >> > Function kvm_gfn_to_hva_cache_init() is exported and used to init
> >> > gfn to hva cache at various places, such as called in function
> >> > kvm_pv_enable_async_pf().  However, this function directly tail
> >> > calls function __kvm_gfn_to_hva_cache_init(), which assigns
> >> > ghc->memslot to NULL and returns 0 for cache initialization of
> >> > cross pages cache.  This is unsafe as 0 typically means a successful
> >> > return, but it actually fails to return a valid ghc->memslot.
> >> > The functions call kvm_gfn_to_hva_cache_init(), such as
> >> > kvm_lapic_set_vapicz_addr() do not make future checking on the
> >> > ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0.  Moreover,
> >> > other developers may try to initialize a cache across pages by
> >> > calling this function but fail with a success return value.
> >> >
> >> > This patch fixes this issue by explicitly restricting function
> >> > kvm_gfn_to_hva_cache_init() to only accept address ranges within
> >> > one page and adding comments to the function accordingly.
> >> >
> >>
> >> For cross page cache, returning a zero is not really a mistake, since it
> >> verifies the entire region against the memslot and does initialize ghc.
> >>
> >> The nullified memslot is to indicate that the read or write should
> >> follow the slow path, and it's the caller's responsibility to check
> >> ghc->memslot if needed.  e.g., kvm_read_guest_offset_cached() and
> >> kvm_write_guest_offset_cached().  Please refer to commit fcfbc617547f
> >> ("KVM: Check for a bad hva before dropping into the ghc slow path")
>
> Thanks for the feedback!  Yes, callers including kvm_read_guest_offset_cached()
> and kvm_write_guest_offset_cached() checked if the read or write should follow
> the slow path for the guest memory read/write by checking if
> ghc->memslot is nullified.
> However, please note that they are calling the static function
> `__kvm_gfn_to_hva_cache_init()` instead of the exported function
> `kvm_gfn_to_hva_cache_init()`.
> 
> Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using
> a slow path for cross page but actually the "slow path" is to read/write guest
> memory instead of for setting up a cross page cache.  The difference here
> I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and
> its name indicates it is used for gfn to hva cache init, while the
> return value 0
> does not really guarantee the cache is initialized when the address range
> crosses pages.

The exports from kvm.ko are intended for use only by KVM itself, e.g. by
kvm-intel.ko and kvm-amd.ko on x86.  Anyone trying to use KVM's exports in random
drivers is in for a world of hurt, as there many, many subtleties throughout KVM
that bleed all over the exports.

It's gross that KVM has "internal" exports, and we have line of sight to removing
them for most architectures, including x86, but that isn't the easiest of changes.

If there is a real problem with in-tree upstream KVM, then we'll fix it, but
changing the behavior of KVM functions just because they are exported isn't going
to happen.

> For example, function kvm_lapic_set_vapicz_addr()
> called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is
> successfully initialized by checking the return value.

I don't follow the concern here.  It's userspace's responsibility to provide a
page-aligned address.  KVM needs to not explode on an unaligned address, but
ensuring that KVM can actually use the fast path isn't KVM's problem.

> My thought is there probably should be another function to provide correct
> cross page cache initialization but I am not sure if this is really needed.

If there were a legitimate use case where it was truly valuable, then we could
add that functionality.  But all of the usage in KVM either deals with assets
that are exactly one page in size and must be page aligned, or with assets that
are userspace or guest controlled and not worth optimizing for page splits because
a well-behavior userspace/guest should ensure the asset doesn't split a page.

> Nevertheless, I think we could at least make the existing function
> more accurate?

More accurate with respect to what?
Kele Huang April 27, 2024, 3:50 a.m. UTC | #5
On Fri, Apr 26, 2024 at 2:26 AM Christoph Hellwig <hch@infradead.org> wrote:

>
> I don't really have enough knowledge to comment on the functionality,
> but this code structure is really odd to read.
>
> Try to handle error conditions with early returns first, please:
>
>         if (unlikely(nr_pages_needed != 1))
>                 return -EINVAL;
>         return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);

Thanks for the tip!  This makes sense to me and I will take care of
this code structure.
Kele Huang April 27, 2024, 4:27 a.m. UTC | #6
On Fri, Apr 26, 2024 at 12:17 PM Sean Christopherson <seanjc@google.com> wrote:

> Please don't top post.  https://people.kernel.org/tglx/notes-about-netiquette

Thanks for the tip!

> The exports from kvm.ko are intended for use only by KVM itself, e.g. by
> kvm-intel.ko and kvm-amd.ko on x86.  Anyone trying to use KVM's exports in random
> drivers is in for a world of hurt, as there many, many subtleties throughout KVM
> that bleed all over the exports.
>
> It's gross that KVM has "internal" exports, and we have line of sight to removing
> them for most architectures, including x86, but that isn't the easiest of changes.
>
> If there is a real problem with in-tree upstream KVM, then we'll fix it, but
> changing the behavior of KVM functions just because they are exported isn't going
> to happen.

Yes, I agree with this.

> > For example, function kvm_lapic_set_vapic_addr()
> > called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is
> > successfully initialized by checking the return value.
>
> I don't follow the concern here.  It's userspace's responsibility to provide a
> page-aligned address.  KVM needs to not explode on an unaligned address, but
> ensuring that KVM can actually use the fast path isn't KVM's problem.

Yes, you are right.  For the cross page address range, the return value 0 does
not mean the cache is successfully initialized, but the following read and
write to the corresponding guest memory would still check if the ghc->memslot
is nullified before directly using the cached hva address.

> > My thought is there probably should be another function to provide correct
> > cross page cache initialization but I am not sure if this is really needed.
>
> If there were a legitimate use case where it was truly valuable, then we could
> add that functionality.  But all of the usage in KVM either deals with assets
> that are exactly one page in size and must be page aligned, or with assets that
> are userspace or guest controlled and not worth optimizing for page splits because
> a well-behavior userspace/guest should ensure the asset doesn't split a page.

Yes, I agree.

> > Nevertheless, I think we could at least make the existing function
> > more accurate?
>
> More accurate with respect to what?

The correctness of the whole gfn to hva cache init logic looks good to me now.
Thanks for the clarifications from all of you.

Although, it is a bit not straightforward to me because it needs to
call designed
functions to do the guest memory read and write even though people assume
the cache is initialized, or need to do the ghc->memslot checking manually
before using the fast path.
Sean Christopherson April 29, 2024, 8:13 p.m. UTC | #7
On Sat, Apr 27, 2024, Kele Huang wrote:
> Although, it is a bit not straightforward to me because it needs to call
> designed functions to do the guest memory read and write even though people
> assume the cache is initialized, or need to do the ghc->memslot checking
> manually before using the fast path.

Oh, don't get me wrong, I completely agree that the code isn't straightforward.
But I don't think this is a problem that can be reasonably solved in the caching
code, at least not easily, as a lot of the oddities stem from KVM's memslots, and
KVM's contracts with userspace and the guest.
Kele Huang April 30, 2024, 7:29 p.m. UTC | #8
On Mon, Apr 29, 2024 at 4:13 PM Sean Christopherson <seanjc@google.com> wrote:

> Oh, don't get me wrong, I completely agree that the code isn't straightforward.
> But I don't think this is a problem that can be reasonably solved in the caching
> code, at least not easily, as a lot of the oddities stem from KVM's memslots, and
> KVM's contracts with userspace and the guest.

Thanks!  I agree with you.  It seems using flags to indicate that
cross-page cache
initialization failed and checking flags on following guest memory
read and write also
would mess up the code.  On the other hand, simply refusing to set up the
cross-page cache initialization also would confuse users.  I think it
is good to check
the validity of asserts before using it.  Not sure if a brief comment
is helpful to it.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9230ebe1753f..6efe579f6b5f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3476,11 +3476,22 @@  static int __kvm_gfn_to_hva_cache_init(struct kvm_memslots *slots,
 	return 0;
 }
 
+/*
+ * Please note that this function only supports gfn_to_hva_cache
+ * initialization within a single page.
+ */
 int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 			      gpa_t gpa, unsigned long len)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
-	return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);
+	gfn_t start_gfn = gpa >> PAGE_SHIFT;
+	gfn_t end_gfn = (gpa + len - 1) >> PAGE_SHIFT;
+	gfn_t nr_pages_needed = end_gfn - start_gfn + 1;
+
+	if (likely(nr_pages_needed == 1))
+		return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);
+	else
+		return -EINVAL;
 }
 EXPORT_SYMBOL_GPL(kvm_gfn_to_hva_cache_init);