Message ID | 1591224108-564-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.14,v3] x86/svm: do not try to handle recalc NPT faults immediately | expand |
> -----Original Message----- > From: Igor Druzhinin <igor.druzhinin@citrix.com> > Sent: 03 June 2020 23:42 > To: xen-devel@lists.xenproject.org > Cc: jbeulich@suse.com; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; > george.dunlap@citrix.com; paul@xen.org; Igor Druzhinin <igor.druzhinin@citrix.com> > Subject: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately > > A recalculation NPT fault doesn't always require additional handling > in hvm_hap_nested_page_fault(), moreover in general case if there is no > explicit handling done there - the fault is wrongly considered fatal. > > This covers a specific case of migration with vGPU assigned which > uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall: > at a moment log-dirty is enabled globally, recalculation is requested > for the whole guest memory including those mapped MMIO regions I still think it is odd to put this in the commit comment since, as I said before, Xen ensures that this situation cannot happen at the moment. > which causes a page fault being raised at the first access to them; > but due to MMIO P2M type not having any explicit handling in > hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled > SVM violation. > > Instead of trying to be opportunistic - use safer approach and handle > P2M recalculation in a separate NPT fault by attempting to retry after > making the necessary adjustments. This is aligned with Intel behavior > where there are separate VMEXITs for recalculation and EPT violations > (faults) and only faults are handled in hvm_hap_nested_page_fault(). > Do it by also unifying do_recalc return code with Intel implementation > where returning 1 means P2M was actually changed. > > Since there was no case previously where p2m_pt_handle_deferred_changes() > could return a positive value - it's safe to replace ">= 0" with just "== 0" > in VMEXIT_NPF handler. finish_type_change() is also not affected by the > change as being able to deal with >0 return value of p2m->recalc from > EPT implementation. > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> However, it's a worthy fix so... Release-acked-by: Paul Durrant <paul@xen.org> > --- > Changes in v2: > - replace rc with recalc_done bool > - updated comment in finish_type_change() > - significantly extended commit description > Changes in v3: > - covert bool to int implicitly > - a little bit more info of the usecase in the message > --- > xen/arch/x86/hvm/svm/svm.c | 5 +++-- > xen/arch/x86/mm/p2m-pt.c | 7 ++++++- > xen/arch/x86/mm/p2m.c | 2 +- > 3 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index 46a1aac..7f6f578 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) > v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; > rc = vmcb->exitinfo1 & PFEC_page_present > ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; > - if ( rc >= 0 ) > + if ( rc == 0 ) > + /* If no recal adjustments were being made - handle this fault */ > svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); > - else > + else if ( rc < 0 ) > { > printk(XENLOG_G_ERR > "%pv: Error %d handling NPF (gpa=%08lx ec=%04lx)\n", > diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c > index 5c05017..070389e 100644 > --- a/xen/arch/x86/mm/p2m-pt.c > +++ b/xen/arch/x86/mm/p2m-pt.c > @@ -341,6 +341,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > unsigned int level = 4; > l1_pgentry_t *pent; > int err = 0; > + bool recalc_done = false; > > table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); > while ( --level ) > @@ -402,6 +403,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > clear_recalc(l1, e); > err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); > ASSERT(!err); > + > + recalc_done = true; > } > } > unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); > @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) > clear_recalc(l1, e); > err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); > ASSERT(!err); > + > + recalc_done = true; > } > > out: > unmap_domain_page(table); > > - return err; > + return err ?: recalc_done; > } > > int p2m_pt_handle_deferred_changes(uint64_t gpa) > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c > index 17f320b..db7bde0 100644 > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -1197,7 +1197,7 @@ static int finish_type_change(struct p2m_domain *p2m, > rc = p2m->recalc(p2m, gfn); > /* > * ept->recalc could return 0/1/-ENOMEM. pt->recalc could return > - * 0/-ENOMEM/-ENOENT, -ENOENT isn't an error as we are looping > + * 0/1/-ENOMEM/-ENOENT, -ENOENT isn't an error as we are looping > * gfn here. If rc is 1 we need to have it 0 for success. > */ > if ( rc == -ENOENT || rc > 0 ) > -- > 2.7.4
On 04.06.2020 09:49, Paul Durrant wrote: >> -----Original Message----- >> From: Igor Druzhinin <igor.druzhinin@citrix.com> >> Sent: 03 June 2020 23:42 >> To: xen-devel@lists.xenproject.org >> Cc: jbeulich@suse.com; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; >> george.dunlap@citrix.com; paul@xen.org; Igor Druzhinin <igor.druzhinin@citrix.com> >> Subject: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately >> >> A recalculation NPT fault doesn't always require additional handling >> in hvm_hap_nested_page_fault(), moreover in general case if there is no >> explicit handling done there - the fault is wrongly considered fatal. >> >> This covers a specific case of migration with vGPU assigned which >> uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall: >> at a moment log-dirty is enabled globally, recalculation is requested >> for the whole guest memory including those mapped MMIO regions > > I still think it is odd to put this in the commit comment since, as I > said before, Xen ensures that this situation cannot happen at > the moment. Aiui Igor had replaced reference to passed-through devices by reference to mere handing of an MMIO range to a guest. Are you saying we suppress log-dirty enabling in this case as well? I didn't think we do: if ( has_arch_pdevs(d) && log_global ) { /* * Refuse to turn on global log-dirty mode * if the domain is sharing the P2M with the IOMMU. */ return -EINVAL; } Seeing this code I wonder about the non-sharing case: If what the comment says was true, the condition would need to change, but I think it's the comment which is wrong, and we don't want global log-dirty as long as an IOMMU is in use at all for a domain. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 04 June 2020 11:34 > To: paul@xen.org > Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org; > andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com > Subject: Re: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately > > On 04.06.2020 09:49, Paul Durrant wrote: > >> -----Original Message----- > >> From: Igor Druzhinin <igor.druzhinin@citrix.com> > >> Sent: 03 June 2020 23:42 > >> To: xen-devel@lists.xenproject.org > >> Cc: jbeulich@suse.com; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; > >> george.dunlap@citrix.com; paul@xen.org; Igor Druzhinin <igor.druzhinin@citrix.com> > >> Subject: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately > >> > >> A recalculation NPT fault doesn't always require additional handling > >> in hvm_hap_nested_page_fault(), moreover in general case if there is no > >> explicit handling done there - the fault is wrongly considered fatal. > >> > >> This covers a specific case of migration with vGPU assigned which > >> uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall: > >> at a moment log-dirty is enabled globally, recalculation is requested > >> for the whole guest memory including those mapped MMIO regions > > > > I still think it is odd to put this in the commit comment since, as I > > said before, Xen ensures that this situation cannot happen at > > the moment. > > Aiui Igor had replaced reference to passed-through devices by reference > to mere handing of an MMIO range to a guest. Are you saying we suppress > log-dirty enabling in this case as well? I didn't think we do: No, but the comment says "migration with vGPU *assigned*" (my emphasis), which surely means has_arch_pdevs() will be true. > > if ( has_arch_pdevs(d) && log_global ) > { > /* > * Refuse to turn on global log-dirty mode > * if the domain is sharing the P2M with the IOMMU. > */ > return -EINVAL; > } > > Seeing this code I wonder about the non-sharing case: If what the > comment says was true, the condition would need to change, but I > think it's the comment which is wrong, and we don't want global > log-dirty as long as an IOMMU is in use at all for a domain. I think is the comment that is correct, not the condition. It is only when using shared EPT that enabling logdirty is clearly an unsafe thing to do. Using sync-ed IOMMU mappings should be ok. Paul
On 04/06/2020 11:50, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 04 June 2020 11:34 >> To: paul@xen.org >> Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org; >> andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com >> Subject: Re: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately >> >> On 04.06.2020 09:49, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Igor Druzhinin <igor.druzhinin@citrix.com> >>>> Sent: 03 June 2020 23:42 >>>> To: xen-devel@lists.xenproject.org >>>> Cc: jbeulich@suse.com; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; >>>> george.dunlap@citrix.com; paul@xen.org; Igor Druzhinin <igor.druzhinin@citrix.com> >>>> Subject: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately >>>> >>>> A recalculation NPT fault doesn't always require additional handling >>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no >>>> explicit handling done there - the fault is wrongly considered fatal. >>>> >>>> This covers a specific case of migration with vGPU assigned which >>>> uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall: >>>> at a moment log-dirty is enabled globally, recalculation is requested >>>> for the whole guest memory including those mapped MMIO regions >>> >>> I still think it is odd to put this in the commit comment since, as I >>> said before, Xen ensures that this situation cannot happen at >>> the moment. >> >> Aiui Igor had replaced reference to passed-through devices by reference >> to mere handing of an MMIO range to a guest. Are you saying we suppress >> log-dirty enabling in this case as well? I didn't think we do: > > No, but the comment says "migration with vGPU *assigned*" (my emphasis), which surely means has_arch_pdevs() will be true. You may replace it with 'associated' or something if you don't like this word. >> >> if ( has_arch_pdevs(d) && log_global ) >> { >> /* >> * Refuse to turn on global log-dirty mode >> * if the domain is sharing the P2M with the IOMMU. >> */ >> return -EINVAL; >> } >> >> Seeing this code I wonder about the non-sharing case: If what the >> comment says was true, the condition would need to change, but I >> think it's the comment which is wrong, and we don't want global >> log-dirty as long as an IOMMU is in use at all for a domain. > > I think is the comment that is correct, not the condition. It is only when using shared EPT that enabling logdirty is clearly an unsafe thing to do. Using sync-ed IOMMU mappings should be ok. It seems that the case of simple MMIO mappings made without IOMMU being enabled for a domain, in fact, irrelevant to the this condition. I take it as a separate discussion on a different topic. Igor
On 04.06.2020 12:50, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 04 June 2020 11:34 >> >> On 04.06.2020 09:49, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Igor Druzhinin <igor.druzhinin@citrix.com> >>>> Sent: 03 June 2020 23:42 >>>> To: xen-devel@lists.xenproject.org >>>> Cc: jbeulich@suse.com; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; >>>> george.dunlap@citrix.com; paul@xen.org; Igor Druzhinin <igor.druzhinin@citrix.com> >>>> Subject: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately >>>> >>>> A recalculation NPT fault doesn't always require additional handling >>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no >>>> explicit handling done there - the fault is wrongly considered fatal. >>>> >>>> This covers a specific case of migration with vGPU assigned which >>>> uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall: >>>> at a moment log-dirty is enabled globally, recalculation is requested >>>> for the whole guest memory including those mapped MMIO regions >>> >>> I still think it is odd to put this in the commit comment since, as I >>> said before, Xen ensures that this situation cannot happen at >>> the moment. >> >> Aiui Igor had replaced reference to passed-through devices by reference >> to mere handing of an MMIO range to a guest. Are you saying we suppress >> log-dirty enabling in this case as well? I didn't think we do: > > No, but the comment says "migration with vGPU *assigned*" (my emphasis), which surely means has_arch_pdevs() will be true. > >> >> if ( has_arch_pdevs(d) && log_global ) >> { >> /* >> * Refuse to turn on global log-dirty mode >> * if the domain is sharing the P2M with the IOMMU. >> */ >> return -EINVAL; >> } >> >> Seeing this code I wonder about the non-sharing case: If what the >> comment says was true, the condition would need to change, but I >> think it's the comment which is wrong, and we don't want global >> log-dirty as long as an IOMMU is in use at all for a domain. > > I think is the comment that is correct, not the condition. It is > only when using shared EPT that enabling logdirty is clearly an > unsafe thing to do. Using sync-ed IOMMU mappings should be ok. Even with sync-ed IOMMU mappings dirtying happening by I/O won't be noticed, and hence the purpose of global log-dirty is undermined. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 04 June 2020 12:47 > To: paul@xen.org > Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org; > andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com > Subject: Re: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately > > On 04.06.2020 12:50, Paul Durrant wrote: > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 04 June 2020 11:34 > >> > >> On 04.06.2020 09:49, Paul Durrant wrote: > >>>> -----Original Message----- > >>>> From: Igor Druzhinin <igor.druzhinin@citrix.com> > >>>> Sent: 03 June 2020 23:42 > >>>> To: xen-devel@lists.xenproject.org > >>>> Cc: jbeulich@suse.com; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; > >>>> george.dunlap@citrix.com; paul@xen.org; Igor Druzhinin <igor.druzhinin@citrix.com> > >>>> Subject: [PATCH for-4.14 v3] x86/svm: do not try to handle recalc NPT faults immediately > >>>> > >>>> A recalculation NPT fault doesn't always require additional handling > >>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no > >>>> explicit handling done there - the fault is wrongly considered fatal. > >>>> > >>>> This covers a specific case of migration with vGPU assigned which > >>>> uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall: > >>>> at a moment log-dirty is enabled globally, recalculation is requested > >>>> for the whole guest memory including those mapped MMIO regions > >>> > >>> I still think it is odd to put this in the commit comment since, as I > >>> said before, Xen ensures that this situation cannot happen at > >>> the moment. > >> > >> Aiui Igor had replaced reference to passed-through devices by reference > >> to mere handing of an MMIO range to a guest. Are you saying we suppress > >> log-dirty enabling in this case as well? I didn't think we do: > > > > No, but the comment says "migration with vGPU *assigned*" (my emphasis), which surely means > has_arch_pdevs() will be true. > > > >> > >> if ( has_arch_pdevs(d) && log_global ) > >> { > >> /* > >> * Refuse to turn on global log-dirty mode > >> * if the domain is sharing the P2M with the IOMMU. > >> */ > >> return -EINVAL; > >> } > >> > >> Seeing this code I wonder about the non-sharing case: If what the > >> comment says was true, the condition would need to change, but I > >> think it's the comment which is wrong, and we don't want global > >> log-dirty as long as an IOMMU is in use at all for a domain. > > > > I think is the comment that is correct, not the condition. It is > > only when using shared EPT that enabling logdirty is clearly an > > unsafe thing to do. Using sync-ed IOMMU mappings should be ok. > > Even with sync-ed IOMMU mappings dirtying happening by I/O won't be > noticed, and hence the purpose of global log-dirty is undermined. It is, but there are point solutions in some devices and, if not in the device, in the emulator managing the device. This is why migration with assigned h/w is currently feasible even without IOMMU faulting. Paul > > Jan
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 46a1aac..7f6f578 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2923,9 +2923,10 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf; rc = vmcb->exitinfo1 & PFEC_page_present ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0; - if ( rc >= 0 ) + if ( rc == 0 ) + /* If no recal adjustments were being made - handle this fault */ svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2); - else + else if ( rc < 0 ) { printk(XENLOG_G_ERR "%pv: Error %d handling NPF (gpa=%08lx ec=%04lx)\n", diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index 5c05017..070389e 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -341,6 +341,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) unsigned int level = 4; l1_pgentry_t *pent; int err = 0; + bool recalc_done = false; table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); while ( --level ) @@ -402,6 +403,8 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) clear_recalc(l1, e); err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); ASSERT(!err); + + recalc_done = true; } } unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK)); @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn) clear_recalc(l1, e); err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); ASSERT(!err); + + recalc_done = true; } out: unmap_domain_page(table); - return err; + return err ?: recalc_done; } int p2m_pt_handle_deferred_changes(uint64_t gpa) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 17f320b..db7bde0 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1197,7 +1197,7 @@ static int finish_type_change(struct p2m_domain *p2m, rc = p2m->recalc(p2m, gfn); /* * ept->recalc could return 0/1/-ENOMEM. pt->recalc could return - * 0/-ENOMEM/-ENOENT, -ENOENT isn't an error as we are looping + * 0/1/-ENOMEM/-ENOENT, -ENOENT isn't an error as we are looping * gfn here. If rc is 1 we need to have it 0 for success. */ if ( rc == -ENOENT || rc > 0 )