Message ID | 20240423024933.80143-1-kele@cs.columbia.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] KVM: restrict kvm_gfn_to_hva_cache_init() to only accept address ranges within one page | expand |
On 4/22/2024 7:49 PM, Kele Huang wrote: > Function kvm_gfn_to_hva_cache_init() is exported and used to init > gfn to hva cache at various places, such as called in function > kvm_pv_enable_async_pf(). However, this function directly tail > calls function __kvm_gfn_to_hva_cache_init(), which assigns > ghc->memslot to NULL and returns 0 for cache initialization of > cross pages cache. This is unsafe as 0 typically means a successful > return, but it actually fails to return a valid ghc->memslot. > The functions call kvm_gfn_to_hva_cache_init(), such as > kvm_lapic_set_vapicz_addr() do not make future checking on the > ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0. Moreover, > other developers may try to initialize a cache across pages by > calling this function but fail with a success return value. > > This patch fixes this issue by explicitly restricting function > kvm_gfn_to_hva_cache_init() to only accept address ranges within > one page and adding comments to the function accordingly. > For cross page cache, returning a zero is not really a mistake, since it verifies the entire region against the memslot and does initialize ghc. The nullified memslot is to indicate that the read or write should follow the slow path, and it's the caller's responsibility to check ghc->memslot if needed. e.g., kvm_read_guest_offset_cached() and kvm_write_guest_offset_cached(). Please refer to commit fcfbc617547f ("KVM: Check for a bad hva before dropping into the ghc slow path")
Thanks for the feedback! Yes, callers including kvm_read_guest_offset_cached() and kvm_write_guest_offset_cached() checked if the read or write should follow the slow path for the guest memory read/write by checking if ghc->memslot is nullified. However, please note that they are calling the static function `__kvm_gfn_to_hva_cache_init()` instead of the exported function `kvm_gfn_to_hva_cache_init()`. Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using a slow path for cross page but actually the "slow path" is to read/write guest memory instead of for setting up a cross page cache. The difference here I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and its name indicates it is used for gfn to hva cache init, while the return value 0 does not really guarantee the cache is initialized when the address range crosses pages. For example, function kvm_lapic_set_vapicz_addr() called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is successfully initialized by checking the return value. My thought is there probably should be another function to provide correct cross page cache initialization but I am not sure if this is really needed. Nevertheless, I think we could at least make the existing function more accurate? -- Kele On Fri, Apr 26, 2024 at 12:14 AM Kele Huang <kele@cs.columbia.edu> wrote: > > Thanks for the feedback! Yes, callers including kvm_read_guest_offset_cached() and kvm_write_guest_offset_cached() checked if the read or write should follow the slow path for the guest memory read/write by checking if ghc->memslot is nullified. However, please note that they are calling the static function `__kvm_gfn_to_hva_cache_init()` instead of the exported function `kvm_gfn_to_hva_cache_init()`. > > Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using a slow path for cross page but actually the "slow path" is to read/write guest memory instead of for setting up a cross page cache. The difference here I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and its name indicates it is used for gfn to hva cache init, while the return value 0 does not really guarantee the cache is initialized when the address range crosses pages. For example, function kvm_lapic_set_vapicz_addr() called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is successfully initialized by checking the return value. > > My thought is there probably should be another function to provide correct cross page cache initialization but I am not sure if this is really needed. Nevertheless, I think we could at least make the existing function more accurate? > > -- Kele > > On Thu, Apr 25, 2024 at 9:16 PM Chen, Zide <zide.chen@intel.com> wrote: >> >> >> >> On 4/22/2024 7:49 PM, Kele Huang wrote: >> > Function kvm_gfn_to_hva_cache_init() is exported and used to init >> > gfn to hva cache at various places, such as called in function >> > kvm_pv_enable_async_pf(). However, this function directly tail >> > calls function __kvm_gfn_to_hva_cache_init(), which assigns >> > ghc->memslot to NULL and returns 0 for cache initialization of >> > cross pages cache. This is unsafe as 0 typically means a successful >> > return, but it actually fails to return a valid ghc->memslot. >> > The functions call kvm_gfn_to_hva_cache_init(), such as >> > kvm_lapic_set_vapicz_addr() do not make future checking on the >> > ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0. Moreover, >> > other developers may try to initialize a cache across pages by >> > calling this function but fail with a success return value. >> > >> > This patch fixes this issue by explicitly restricting function >> > kvm_gfn_to_hva_cache_init() to only accept address ranges within >> > one page and adding comments to the function accordingly. >> > >> >> For cross page cache, returning a zero is not really a mistake, since it >> verifies the entire region against the memslot and does initialize ghc. >> >> The nullified memslot is to indicate that the read or write should >> follow the slow path, and it's the caller's responsibility to check >> ghc->memslot if needed. e.g., kvm_read_guest_offset_cached() and >> kvm_write_guest_offset_cached(). Please refer to commit fcfbc617547f >> ("KVM: Check for a bad hva before dropping into the ghc slow path")
> + if (likely(nr_pages_needed == 1)) > + return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len); > + else > + return -EINVAL; > } I don't really have enough knowledge to comment on the functionality, but this code structure is really odd to read. Try to handle error conditions with early returns first, please: if (unlikely(nr_pages_needed != 1)) return -EINVAL; return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len);
On Fri, Apr 26, 2024, Kele Huang wrote: Please don't top post. https://people.kernel.org/tglx/notes-about-netiquette > > On Thu, Apr 25, 2024 at 9:16 PM Chen, Zide <zide.chen@intel.com> wrote: > >> On 4/22/2024 7:49 PM, Kele Huang wrote: > >> > Function kvm_gfn_to_hva_cache_init() is exported and used to init > >> > gfn to hva cache at various places, such as called in function > >> > kvm_pv_enable_async_pf(). However, this function directly tail > >> > calls function __kvm_gfn_to_hva_cache_init(), which assigns > >> > ghc->memslot to NULL and returns 0 for cache initialization of > >> > cross pages cache. This is unsafe as 0 typically means a successful > >> > return, but it actually fails to return a valid ghc->memslot. > >> > The functions call kvm_gfn_to_hva_cache_init(), such as > >> > kvm_lapic_set_vapicz_addr() do not make future checking on the > >> > ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0. Moreover, > >> > other developers may try to initialize a cache across pages by > >> > calling this function but fail with a success return value. > >> > > >> > This patch fixes this issue by explicitly restricting function > >> > kvm_gfn_to_hva_cache_init() to only accept address ranges within > >> > one page and adding comments to the function accordingly. > >> > > >> > >> For cross page cache, returning a zero is not really a mistake, since it > >> verifies the entire region against the memslot and does initialize ghc. > >> > >> The nullified memslot is to indicate that the read or write should > >> follow the slow path, and it's the caller's responsibility to check > >> ghc->memslot if needed. e.g., kvm_read_guest_offset_cached() and > >> kvm_write_guest_offset_cached(). Please refer to commit fcfbc617547f > >> ("KVM: Check for a bad hva before dropping into the ghc slow path") > > Thanks for the feedback! Yes, callers including kvm_read_guest_offset_cached() > and kvm_write_guest_offset_cached() checked if the read or write should follow > the slow path for the guest memory read/write by checking if > ghc->memslot is nullified. > However, please note that they are calling the static function > `__kvm_gfn_to_hva_cache_init()` instead of the exported function > `kvm_gfn_to_hva_cache_init()`. > > Although `__kvm_gfn_to_hva_cache_init()` has detailed comments about using > a slow path for cross page but actually the "slow path" is to read/write guest > memory instead of for setting up a cross page cache. The difference here > I think is that `kvm_gfn_to_hva_cache_init()` is an exported function and > its name indicates it is used for gfn to hva cache init, while the > return value 0 > does not really guarantee the cache is initialized when the address range > crosses pages. The exports from kvm.ko are intended for use only by KVM itself, e.g. by kvm-intel.ko and kvm-amd.ko on x86. Anyone trying to use KVM's exports in random drivers is in for a world of hurt, as there many, many subtleties throughout KVM that bleed all over the exports. It's gross that KVM has "internal" exports, and we have line of sight to removing them for most architectures, including x86, but that isn't the easiest of changes. If there is a real problem with in-tree upstream KVM, then we'll fix it, but changing the behavior of KVM functions just because they are exported isn't going to happen. > For example, function kvm_lapic_set_vapicz_addr() > called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is > successfully initialized by checking the return value. I don't follow the concern here. It's userspace's responsibility to provide a page-aligned address. KVM needs to not explode on an unaligned address, but ensuring that KVM can actually use the fast path isn't KVM's problem. > My thought is there probably should be another function to provide correct > cross page cache initialization but I am not sure if this is really needed. If there were a legitimate use case where it was truly valuable, then we could add that functionality. But all of the usage in KVM either deals with assets that are exactly one page in size and must be page aligned, or with assets that are userspace or guest controlled and not worth optimizing for page splits because a well-behavior userspace/guest should ensure the asset doesn't split a page. > Nevertheless, I think we could at least make the existing function > more accurate? More accurate with respect to what?
On Fri, Apr 26, 2024 at 2:26 AM Christoph Hellwig <hch@infradead.org> wrote: > > I don't really have enough knowledge to comment on the functionality, > but this code structure is really odd to read. > > Try to handle error conditions with early returns first, please: > > if (unlikely(nr_pages_needed != 1)) > return -EINVAL; > return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len); Thanks for the tip! This makes sense to me and I will take care of this code structure.
On Fri, Apr 26, 2024 at 12:17 PM Sean Christopherson <seanjc@google.com> wrote: > Please don't top post. https://people.kernel.org/tglx/notes-about-netiquette Thanks for the tip! > The exports from kvm.ko are intended for use only by KVM itself, e.g. by > kvm-intel.ko and kvm-amd.ko on x86. Anyone trying to use KVM's exports in random > drivers is in for a world of hurt, as there many, many subtleties throughout KVM > that bleed all over the exports. > > It's gross that KVM has "internal" exports, and we have line of sight to removing > them for most architectures, including x86, but that isn't the easiest of changes. > > If there is a real problem with in-tree upstream KVM, then we'll fix it, but > changing the behavior of KVM functions just because they are exported isn't going > to happen. Yes, I agree with this. > > For example, function kvm_lapic_set_vapic_addr() > > called `kvm_gfn_to_hva_cache_init()` and simply assumes the cache is > > successfully initialized by checking the return value. > > I don't follow the concern here. It's userspace's responsibility to provide a > page-aligned address. KVM needs to not explode on an unaligned address, but > ensuring that KVM can actually use the fast path isn't KVM's problem. Yes, you are right. For the cross page address range, the return value 0 does not mean the cache is successfully initialized, but the following read and write to the corresponding guest memory would still check if the ghc->memslot is nullified before directly using the cached hva address. > > My thought is there probably should be another function to provide correct > > cross page cache initialization but I am not sure if this is really needed. > > If there were a legitimate use case where it was truly valuable, then we could > add that functionality. But all of the usage in KVM either deals with assets > that are exactly one page in size and must be page aligned, or with assets that > are userspace or guest controlled and not worth optimizing for page splits because > a well-behavior userspace/guest should ensure the asset doesn't split a page. Yes, I agree. > > Nevertheless, I think we could at least make the existing function > > more accurate? > > More accurate with respect to what? The correctness of the whole gfn to hva cache init logic looks good to me now. Thanks for the clarifications from all of you. Although, it is a bit not straightforward to me because it needs to call designed functions to do the guest memory read and write even though people assume the cache is initialized, or need to do the ghc->memslot checking manually before using the fast path.
On Sat, Apr 27, 2024, Kele Huang wrote: > Although, it is a bit not straightforward to me because it needs to call > designed functions to do the guest memory read and write even though people > assume the cache is initialized, or need to do the ghc->memslot checking > manually before using the fast path. Oh, don't get me wrong, I completely agree that the code isn't straightforward. But I don't think this is a problem that can be reasonably solved in the caching code, at least not easily, as a lot of the oddities stem from KVM's memslots, and KVM's contracts with userspace and the guest.
On Mon, Apr 29, 2024 at 4:13 PM Sean Christopherson <seanjc@google.com> wrote: > Oh, don't get me wrong, I completely agree that the code isn't straightforward. > But I don't think this is a problem that can be reasonably solved in the caching > code, at least not easily, as a lot of the oddities stem from KVM's memslots, and > KVM's contracts with userspace and the guest. Thanks! I agree with you. It seems using flags to indicate that cross-page cache initialization failed and checking flags on following guest memory read and write also would mess up the code. On the other hand, simply refusing to set up the cross-page cache initialization also would confuse users. I think it is good to check the validity of asserts before using it. Not sure if a brief comment is helpful to it.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9230ebe1753f..6efe579f6b5f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3476,11 +3476,22 @@ static int __kvm_gfn_to_hva_cache_init(struct kvm_memslots *slots, return 0; } +/* + * Please note that this function only supports gfn_to_hva_cache + * initialization within a single page. + */ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc, gpa_t gpa, unsigned long len) { struct kvm_memslots *slots = kvm_memslots(kvm); - return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len); + gfn_t start_gfn = gpa >> PAGE_SHIFT; + gfn_t end_gfn = (gpa + len - 1) >> PAGE_SHIFT; + gfn_t nr_pages_needed = end_gfn - start_gfn + 1; + + if (likely(nr_pages_needed == 1)) + return __kvm_gfn_to_hva_cache_init(slots, ghc, gpa, len); + else + return -EINVAL; } EXPORT_SYMBOL_GPL(kvm_gfn_to_hva_cache_init);
Function kvm_gfn_to_hva_cache_init() is exported and used to init gfn to hva cache at various places, such as called in function kvm_pv_enable_async_pf(). However, this function directly tail calls function __kvm_gfn_to_hva_cache_init(), which assigns ghc->memslot to NULL and returns 0 for cache initialization of cross pages cache. This is unsafe as 0 typically means a successful return, but it actually fails to return a valid ghc->memslot. The functions call kvm_gfn_to_hva_cache_init(), such as kvm_lapic_set_vapicz_addr() do not make future checking on the ghc->memslot if kvm_gfn_to_hva_cache_init() returns a 0. Moreover, other developers may try to initialize a cache across pages by calling this function but fail with a success return value. This patch fixes this issue by explicitly restricting function kvm_gfn_to_hva_cache_init() to only accept address ranges within one page and adding comments to the function accordingly. Signed-off-by: Kele Huang <kele@cs.columbia.edu> --- virt/kvm/kvm_main.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)