Message ID | 1590712553-7298-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/svm: do not try to handle recalc NPT faults immediately | expand |
On Fri, May 29, 2020 at 01:35:53AM +0100, Igor Druzhinin wrote: > A recalculation NPT fault doesn't always require additional handling > in hvm_hap_nested_page_fault(), moreover in general case if there is no > explicit handling done there - the fault is wrongly considered fatal. > > Instead of trying to be opportunistic - use safer approach and handle > P2M recalculation in a separate NPT fault by attempting to retry after > making the necessary adjustments. This is aligned with Intel behavior > where there are separate VMEXITs for recalculation and EPT violations > (faults) and only faults are handled in hvm_hap_nested_page_fault(). > Do it by also unifying do_recalc return code with Intel implementation > where returning 1 means P2M was actually changed. That seems like a good approach IMO. Do you know whether this will make the code slower? (since there are cases previously handled in a single vmexit that would take two vmexits now) > This covers a specific case of migration with vGPU assigned on AMD: > global log-dirty is enabled and causes immediate recalculation NPT > fault in MMIO area upon access. > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > --- > This is a safer alternative to: > https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html > and more correct approach from my PoV. > --- > xen/arch/x86/hvm/svm/svm.c | 5 +++-- > xen/arch/x86/mm/p2m-pt.c | 8 ++++++-- > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index 46a1aac..7f6f578 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) > v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; > rc = vmcb->exitinfo1 & PFEC_page_present > ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; > - if ( rc >= 0 ) > + if ( rc == 0 ) > + /* If no recal adjustments were being made - handle this fault */ > svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); > - else > + else if ( rc < 0 ) > { > printk(XENLOG_G_ERR > "%pv: Error %d handling NPF (gpa=%08lx ec=%04lx)\n", > diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c > index 5c05017..377565b 100644 > --- a/xen/arch/x86/mm/p2m-pt.c > +++ b/xen/arch/x86/mm/p2m-pt.c > @@ -340,7 +340,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > unsigned long gfn_remainder = gfn; > unsigned int level = 4; > l1_pgentry_t *pent; > - int err = 0; > + int err = 0, rc = 0; > > table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); > while ( --level ) > @@ -402,6 +402,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > clear_recalc(l1, e); > err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); > ASSERT(!err); > + > + rc = 1; > } > } > unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); > @@ -448,12 +450,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > clear_recalc(l1, e); > err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); > ASSERT(!err); > + > + rc = 1; > } > > out: > unmap_domain_page(table); > > - return err; > + return err ? err : rc; Nit: you can use the elvis operator here: return err ?: rc; Also I couldn't spot any caller that would have troubles with the function now returning 1 in certain conditions, can you confirm the callers have been audited? Thanks, Roger.
On 29.05.2020 02:35, Igor Druzhinin wrote: > A recalculation NPT fault doesn't always require additional handling > in hvm_hap_nested_page_fault(), moreover in general case if there is no > explicit handling done there - the fault is wrongly considered fatal. > > Instead of trying to be opportunistic - use safer approach and handle > P2M recalculation in a separate NPT fault by attempting to retry after > making the necessary adjustments. This is aligned with Intel behavior > where there are separate VMEXITs for recalculation and EPT violations > (faults) and only faults are handled in hvm_hap_nested_page_fault(). > Do it by also unifying do_recalc return code with Intel implementation > where returning 1 means P2M was actually changed. > > This covers a specific case of migration with vGPU assigned on AMD: > global log-dirty is enabled and causes immediate recalculation NPT > fault in MMIO area upon access. To be honest, from this last paragraph I still can't really derive what goes wrong exactly why, before this change. > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> > --- > This is a safer alternative to: > https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html > and more correct approach from my PoV. Indeed - I was about to reply there, but then I thought I'd first look at this patch, in case it was a replacement. > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) > v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; > rc = vmcb->exitinfo1 & PFEC_page_present > ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; > - if ( rc >= 0 ) > + if ( rc == 0 ) > + /* If no recal adjustments were being made - handle this fault */ > svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); > - else > + else if ( rc < 0 ) So from going through the code and judging by the comment in finish_type_change() (which btw you will need to update, to avoid it becoming stale) the >= here was there just in case, without there actually being any case where a positive value would be returned. It that's also the conclusion you've drawn, then I think it would help mentioning this in the description. It is also desirable to mention finish_type_change() not being affected, as already dealing with the > 0 case. > --- a/xen/arch/x86/mm/p2m-pt.c > +++ b/xen/arch/x86/mm/p2m-pt.c > @@ -340,7 +340,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > unsigned long gfn_remainder = gfn; > unsigned int level = 4; > l1_pgentry_t *pent; > - int err = 0; > + int err = 0, rc = 0; > > table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); > while ( --level ) > @@ -402,6 +402,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > clear_recalc(l1, e); > err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); > ASSERT(!err); > + > + rc = 1; > } > } > unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); > @@ -448,12 +450,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > clear_recalc(l1, e); > err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); > ASSERT(!err); > + > + rc = 1; > } > > out: > unmap_domain_page(table); > > - return err; > + return err ? err : rc; Typically we write this as "err ?: rc". I'd like to ask that "rc" also be renamed, to something like "recalc_done", and then to become bool. Jan
On 29/05/2020 15:33, Roger Pau Monné wrote: > On Fri, May 29, 2020 at 01:35:53AM +0100, Igor Druzhinin wrote: >> A recalculation NPT fault doesn't always require additional handling >> in hvm_hap_nested_page_fault(), moreover in general case if there is no >> explicit handling done there - the fault is wrongly considered fatal. >> >> Instead of trying to be opportunistic - use safer approach and handle >> P2M recalculation in a separate NPT fault by attempting to retry after >> making the necessary adjustments. This is aligned with Intel behavior >> where there are separate VMEXITs for recalculation and EPT violations >> (faults) and only faults are handled in hvm_hap_nested_page_fault(). >> Do it by also unifying do_recalc return code with Intel implementation >> where returning 1 means P2M was actually changed. > > That seems like a good approach IMO. > > Do you know whether this will make the code slower? (since there are > cases previously handled in a single vmexit that would take two > vmexits now) The only case I could think of is memory writes during migration - first fault would propagate P2M type recalculation while the second actually log a dirty page. The slowdown would be only during live phase obviously but should be marginal and in line with what we currently have on Intel. >> This covers a specific case of migration with vGPU assigned on AMD: >> global log-dirty is enabled and causes immediate recalculation NPT >> fault in MMIO area upon access. >> >> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > >> --- >> This is a safer alternative to: >> https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html >> and more correct approach from my PoV. >> --- >> xen/arch/x86/hvm/svm/svm.c | 5 +++-- >> xen/arch/x86/mm/p2m-pt.c | 8 ++++++-- >> 2 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c >> index 46a1aac..7f6f578 100644 >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) >> v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; >> rc = vmcb->exitinfo1 & PFEC_page_present >> ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; >> - if ( rc >= 0 ) >> + if ( rc == 0 ) >> + /* If no recal adjustments were being made - handle this fault */ >> svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); >> - else >> + else if ( rc < 0 ) >> { >> printk(XENLOG_G_ERR >> "%pv: Error %d handling NPF (gpa=%08lx ec=%04lx)\n", >> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c >> index 5c05017..377565b 100644 >> --- a/xen/arch/x86/mm/p2m-pt.c >> +++ b/xen/arch/x86/mm/p2m-pt.c >> @@ -340,7 +340,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) >> unsigned long gfn_remainder = gfn; >> unsigned int level = 4; >> l1_pgentry_t *pent; >> - int err = 0; >> + int err = 0, rc = 0; >> >> table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); >> while ( --level ) >> @@ -402,6 +402,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) >> clear_recalc(l1, e); >> err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); >> ASSERT(!err); >> + >> + rc = 1; >> } >> } >> unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); >> @@ -448,12 +450,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) >> clear_recalc(l1, e); >> err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); >> ASSERT(!err); >> + >> + rc = 1; >> } >> >> out: >> unmap_domain_page(table); >> >> - return err; >> + return err ? err : rc; > > Nit: you can use the elvis operator here: return err ?: rc; > > Also I couldn't spot any caller that would have troubles with the > function now returning 1 in certain conditions, can you confirm the > callers have been audited? Yes, I checked all the callers before making the change. That's actually where I spotted Intel side is doing exactly the same already. Igor
On 29/05/2020 15:34, Jan Beulich wrote: > On 29.05.2020 02:35, Igor Druzhinin wrote: >> A recalculation NPT fault doesn't always require additional handling >> in hvm_hap_nested_page_fault(), moreover in general case if there is no >> explicit handling done there - the fault is wrongly considered fatal. >> >> Instead of trying to be opportunistic - use safer approach and handle >> P2M recalculation in a separate NPT fault by attempting to retry after >> making the necessary adjustments. This is aligned with Intel behavior >> where there are separate VMEXITs for recalculation and EPT violations >> (faults) and only faults are handled in hvm_hap_nested_page_fault(). >> Do it by also unifying do_recalc return code with Intel implementation >> where returning 1 means P2M was actually changed. >> >> This covers a specific case of migration with vGPU assigned on AMD: >> global log-dirty is enabled and causes immediate recalculation NPT >> fault in MMIO area upon access. > > To be honest, from this last paragraph I still can't really derive > what goes wrong exactly why, before this change. I admit it might require some knowledge of how vGPU is implemented. I will try to give more info in this paragraph. >> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> >> --- >> This is a safer alternative to: >> https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html >> and more correct approach from my PoV. > > Indeed - I was about to reply there, but then I thought I'd first > look at this patch, in case it was a replacement. > >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) >> v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; >> rc = vmcb->exitinfo1 & PFEC_page_present >> ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; >> - if ( rc >= 0 ) >> + if ( rc == 0 ) >> + /* If no recal adjustments were being made - handle this fault */ >> svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); >> - else >> + else if ( rc < 0 ) > > So from going through the code and judging by the comment in > finish_type_change() (which btw you will need to update, to avoid > it becoming stale) the >= here was there just in case, without > there actually being any case where a positive value would be > returned. It that's also the conclusion you've drawn, then I > think it would help mentioning this in the description. I re-read the comments in finish_type_change() and to me they look pretty much in line with the now common interface between EPT and NPT recalc calls. Ok, I will point out that I concluded there was no additional intent of necessarily calling svm_do_nested_pgfault() after recalc. > It is also desirable to mention finish_type_change() not being > affected, as already dealing with the > 0 case. Will mention that as well. >> --- a/xen/arch/x86/mm/p2m-pt.c >> +++ b/xen/arch/x86/mm/p2m-pt.c >> @@ -340,7 +340,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) >> unsigned long gfn_remainder = gfn; >> unsigned int level = 4; >> l1_pgentry_t *pent; >> - int err = 0; >> + int err = 0, rc = 0; >> >> table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); >> while ( --level ) >> @@ -402,6 +402,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) >> clear_recalc(l1, e); >> err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); >> ASSERT(!err); >> + >> + rc = 1; >> } >> } >> unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); >> @@ -448,12 +450,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) >> clear_recalc(l1, e); >> err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); >> ASSERT(!err); >> + >> + rc = 1; >> } >> >> out: >> unmap_domain_page(table); >> >> - return err; >> + return err ? err : rc; > > Typically we write this as "err ?: rc". I'd like to ask that "rc" also > be renamed, to something like "recalc_done", and then to become bool. Sure. Igor
On 29/05/2020 16:17, Igor Druzhinin wrote: > On 29/05/2020 15:34, Jan Beulich wrote: >> On 29.05.2020 02:35, Igor Druzhinin wrote: >>> A recalculation NPT fault doesn't always require additional handling >>> in hvm_hap_nested_page_fault(), moreover in general case if there is no >>> explicit handling done there - the fault is wrongly considered fatal. >>> >>> Instead of trying to be opportunistic - use safer approach and handle >>> P2M recalculation in a separate NPT fault by attempting to retry after >>> making the necessary adjustments. This is aligned with Intel behavior >>> where there are separate VMEXITs for recalculation and EPT violations >>> (faults) and only faults are handled in hvm_hap_nested_page_fault(). >>> Do it by also unifying do_recalc return code with Intel implementation >>> where returning 1 means P2M was actually changed. >>> >>> This covers a specific case of migration with vGPU assigned on AMD: >>> global log-dirty is enabled and causes immediate recalculation NPT >>> fault in MMIO area upon access. >> >> To be honest, from this last paragraph I still can't really derive >> what goes wrong exactly why, before this change. > > I admit it might require some knowledge of how vGPU is implemented. I will try > to give more info in this paragraph. > >>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> >>> --- >>> This is a safer alternative to: >>> https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html >>> and more correct approach from my PoV. >> >> Indeed - I was about to reply there, but then I thought I'd first >> look at this patch, in case it was a replacement. >> >>> --- a/xen/arch/x86/hvm/svm/svm.c >>> +++ b/xen/arch/x86/hvm/svm/svm.c >>> @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) >>> v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; >>> rc = vmcb->exitinfo1 & PFEC_page_present >>> ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; >>> - if ( rc >= 0 ) >>> + if ( rc == 0 ) >>> + /* If no recal adjustments were being made - handle this fault */ >>> svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); >>> - else >>> + else if ( rc < 0 ) >> >> So from going through the code and judging by the comment in >> finish_type_change() (which btw you will need to update, to avoid >> it becoming stale) the >= here was there just in case, without >> there actually being any case where a positive value would be >> returned. It that's also the conclusion you've drawn, then I >> think it would help mentioning this in the description. > > I re-read the comments in finish_type_change() and to me they look > pretty much in line with the now common interface between EPT and NPT > recalc calls. Sorry, upon close examination there is indeed a new case missed. Thanks for pointing out. Igor
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 46a1aac..7f6f578 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; rc = vmcb->exitinfo1 & PFEC_page_present ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; - if ( rc >= 0 ) + if ( rc == 0 ) + /* If no recal adjustments were being made - handle this fault */ svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); - else + else if ( rc < 0 ) { printk(XENLOG_G_ERR "%pv: Error %d handling NPF (gpa=%08lx ec=%04lx)\n", diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index 5c05017..377565b 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -340,7 +340,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) unsigned long gfn_remainder = gfn; unsigned int level = 4; l1_pgentry_t *pent; - int err = 0; + int err = 0, rc = 0; table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); while ( --level ) @@ -402,6 +402,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) clear_recalc(l1, e); err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); ASSERT(!err); + + rc = 1; } } unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); @@ -448,12 +450,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) clear_recalc(l1, e); err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); ASSERT(!err); + + rc = 1; } out: unmap_domain_page(table); - return err; + return err ? err : rc; } int p2m_pt_handle_deferred_changes(uint64_t gpa)
A recalculation NPT fault doesn't always require additional handling in hvm_hap_nested_page_fault(), moreover in general case if there is no explicit handling done there - the fault is wrongly considered fatal. Instead of trying to be opportunistic - use safer approach and handle P2M recalculation in a separate NPT fault by attempting to retry after making the necessary adjustments. This is aligned with Intel behavior where there are separate VMEXITs for recalculation and EPT violations (faults) and only faults are handled in hvm_hap_nested_page_fault(). Do it by also unifying do_recalc return code with Intel implementation where returning 1 means P2M was actually changed. This covers a specific case of migration with vGPU assigned on AMD: global log-dirty is enabled and causes immediate recalculation NPT fault in MMIO area upon access. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> --- This is a safer alternative to: https://lists.xenproject.org/archives/html/xen-devel/2020-05/msg01662.html and more correct approach from my PoV. --- xen/arch/x86/hvm/svm/svm.c | 5 +++-- xen/arch/x86/mm/p2m-pt.c | 8 ++++++-- 2 files changed, 9 insertions(+), 4 deletions(-)