diff mbox series

[v2] x86/svm: do not try to handle recalc NPT faults immediately

Message ID 1591116981-30162-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State Superseded
Headers show
Series [v2] x86/svm: do not try to handle recalc NPT faults immediately | expand

Commit Message

Igor Druzhinin June 2, 2020, 4:56 p.m. UTC
A recalculation NPT fault doesn't always require additional handling
in hvm_hap_nested_page_fault(), moreover in general case if there is no
explicit handling done there - the fault is wrongly considered fatal.

This covers a specific case of migration with vGPU assigned on AMD:
at a moment log-dirty is enabled globally, recalculation is requested
for the whole guest memory including directly mapped MMIO regions of vGPU
which causes a page fault being raised at the first access to those;
but due to MMIO P2M type not having any explicit handling in
hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
SVM violation.

Instead of trying to be opportunistic - use safer approach and handle
P2M recalculation in a separate NPT fault by attempting to retry after
making the necessary adjustments. This is aligned with Intel behavior
where there are separate VMEXITs for recalculation and EPT violations
(faults) and only faults are handled in hvm_hap_nested_page_fault().
Do it by also unifying do_recalc return code with Intel implementation
where returning 1 means P2M was actually changed.

Since there was no case previously where p2m_pt_handle_deferred_changes()
could return a positive value - it's safe to replace ">= 0" with just "== 0"
in VMEXIT_NPF handler. finish_type_change() is also not affected by the
change as being able to deal with >0 return value of p2m->recalc from
EPT implementation.

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
Changes in v2:
- replace rc with recalc_done bool
- updated comment in finish_type_change()
- significantly extended commit description
---
 xen/arch/x86/hvm/svm/svm.c | 5 +++--
 xen/arch/x86/mm/p2m-pt.c   | 7 ++++++-
 xen/arch/x86/mm/p2m.c      | 2 +-
 3 files changed, 10 insertions(+), 4 deletions(-)

Comments

Jan Beulich June 3, 2020, 10:03 a.m. UTC | #1
On 02.06.2020 18:56, Igor Druzhinin wrote:
> A recalculation NPT fault doesn't always require additional handling
> in hvm_hap_nested_page_fault(), moreover in general case if there is no
> explicit handling done there - the fault is wrongly considered fatal.
> 
> This covers a specific case of migration with vGPU assigned on AMD:
> at a moment log-dirty is enabled globally, recalculation is requested
> for the whole guest memory including directly mapped MMIO regions of vGPU
> which causes a page fault being raised at the first access to those;
> but due to MMIO P2M type not having any explicit handling in
> hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
> SVM violation.
> 
> Instead of trying to be opportunistic - use safer approach and handle
> P2M recalculation in a separate NPT fault by attempting to retry after
> making the necessary adjustments. This is aligned with Intel behavior
> where there are separate VMEXITs for recalculation and EPT violations
> (faults) and only faults are handled in hvm_hap_nested_page_fault().
> Do it by also unifying do_recalc return code with Intel implementation
> where returning 1 means P2M was actually changed.
> 
> Since there was no case previously where p2m_pt_handle_deferred_changes()
> could return a positive value - it's safe to replace ">= 0" with just "== 0"
> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
> change as being able to deal with >0 return value of p2m->recalc from
> EPT implementation.
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit preferably with ...

> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
>              clear_recalc(l1, e);
>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>          ASSERT(!err);
> +
> +        recalc_done = true;
>      }
>  
>   out:
>      unmap_domain_page(table);
>  
> -    return err;
> +    return err ?: (recalc_done ? 1 : 0);

... this shrunk to

    return err ?: recalc_done;

(easily doable while committing).

Also Cc Paul.

Jan
Paul Durrant June 3, 2020, 10:26 a.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 03 June 2020 11:03
> To: Igor Druzhinin <igor.druzhinin@citrix.com>
> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
> george.dunlap@citrix.com; Paul Durrant <paul@xen.org>
> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
> 
> On 02.06.2020 18:56, Igor Druzhinin wrote:
> > A recalculation NPT fault doesn't always require additional handling
> > in hvm_hap_nested_page_fault(), moreover in general case if there is no
> > explicit handling done there - the fault is wrongly considered fatal.
> >
> > This covers a specific case of migration with vGPU assigned on AMD:
> > at a moment log-dirty is enabled globally, recalculation is requested
> > for the whole guest memory including directly mapped MMIO regions of vGPU
> > which causes a page fault being raised at the first access to those;
> > but due to MMIO P2M type not having any explicit handling in
> > hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
> > SVM violation.
> >
> > Instead of trying to be opportunistic - use safer approach and handle
> > P2M recalculation in a separate NPT fault by attempting to retry after
> > making the necessary adjustments. This is aligned with Intel behavior
> > where there are separate VMEXITs for recalculation and EPT violations
> > (faults) and only faults are handled in hvm_hap_nested_page_fault().
> > Do it by also unifying do_recalc return code with Intel implementation
> > where returning 1 means P2M was actually changed.
> >
> > Since there was no case previously where p2m_pt_handle_deferred_changes()
> > could return a positive value - it's safe to replace ">= 0" with just "== 0"
> > in VMEXIT_NPF handler. finish_type_change() is also not affected by the
> > change as being able to deal with >0 return value of p2m->recalc from
> > EPT implementation.
> >
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> albeit preferably with ...
> 
> > @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
> >              clear_recalc(l1, e);
> >          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
> >          ASSERT(!err);
> > +
> > +        recalc_done = true;
> >      }
> >
> >   out:
> >      unmap_domain_page(table);
> >
> > -    return err;
> > +    return err ?: (recalc_done ? 1 : 0);
> 
> ... this shrunk to
> 
>     return err ?: recalc_done;
> 
> (easily doable while committing).
> 
> Also Cc Paul.
> 

paging_log_dirty_enable() still fails global enable if has_arch_pdevs() is true, so presumably there's no desperate need for this to go in 4.14?

  Paul

> Jan
Jan Beulich June 3, 2020, 11:22 a.m. UTC | #3
On 03.06.2020 12:26, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 03 June 2020 11:03
>> To: Igor Druzhinin <igor.druzhinin@citrix.com>
>> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
>> george.dunlap@citrix.com; Paul Durrant <paul@xen.org>
>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
>>
>> On 02.06.2020 18:56, Igor Druzhinin wrote:
>>> A recalculation NPT fault doesn't always require additional handling
>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
>>> explicit handling done there - the fault is wrongly considered fatal.
>>>
>>> This covers a specific case of migration with vGPU assigned on AMD:
>>> at a moment log-dirty is enabled globally, recalculation is requested
>>> for the whole guest memory including directly mapped MMIO regions of vGPU
>>> which causes a page fault being raised at the first access to those;
>>> but due to MMIO P2M type not having any explicit handling in
>>> hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
>>> SVM violation.
>>>
>>> Instead of trying to be opportunistic - use safer approach and handle
>>> P2M recalculation in a separate NPT fault by attempting to retry after
>>> making the necessary adjustments. This is aligned with Intel behavior
>>> where there are separate VMEXITs for recalculation and EPT violations
>>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
>>> Do it by also unifying do_recalc return code with Intel implementation
>>> where returning 1 means P2M was actually changed.
>>>
>>> Since there was no case previously where p2m_pt_handle_deferred_changes()
>>> could return a positive value - it's safe to replace ">= 0" with just "== 0"
>>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
>>> change as being able to deal with >0 return value of p2m->recalc from
>>> EPT implementation.
>>>
>>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> albeit preferably with ...
>>
>>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
>>>              clear_recalc(l1, e);
>>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>>>          ASSERT(!err);
>>> +
>>> +        recalc_done = true;
>>>      }
>>>
>>>   out:
>>>      unmap_domain_page(table);
>>>
>>> -    return err;
>>> +    return err ?: (recalc_done ? 1 : 0);
>>
>> ... this shrunk to
>>
>>     return err ?: recalc_done;
>>
>> (easily doable while committing).
>>
>> Also Cc Paul.
>>
> 
> paging_log_dirty_enable() still fails global enable if has_arch_pdevs()
> is true, so presumably there's no desperate need for this to go in 4.14?

The MMIO case is just the particular situation here. Aiui the same issue
could potentially surface with other p2m types. Also given I'd consider
this a backporting candidate, while it may not be desperately needed for
the release, I think it deserves considering beyond the specific aspect
you mention.

Jan
Paul Durrant June 3, 2020, 11:28 a.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 03 June 2020 12:22
> To: paul@xen.org
> Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org;
> andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com
> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
> 
> On 03.06.2020 12:26, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 03 June 2020 11:03
> >> To: Igor Druzhinin <igor.druzhinin@citrix.com>
> >> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
> >> george.dunlap@citrix.com; Paul Durrant <paul@xen.org>
> >> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
> >>
> >> On 02.06.2020 18:56, Igor Druzhinin wrote:
> >>> A recalculation NPT fault doesn't always require additional handling
> >>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
> >>> explicit handling done there - the fault is wrongly considered fatal.
> >>>
> >>> This covers a specific case of migration with vGPU assigned on AMD:
> >>> at a moment log-dirty is enabled globally, recalculation is requested
> >>> for the whole guest memory including directly mapped MMIO regions of vGPU
> >>> which causes a page fault being raised at the first access to those;
> >>> but due to MMIO P2M type not having any explicit handling in
> >>> hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
> >>> SVM violation.
> >>>
> >>> Instead of trying to be opportunistic - use safer approach and handle
> >>> P2M recalculation in a separate NPT fault by attempting to retry after
> >>> making the necessary adjustments. This is aligned with Intel behavior
> >>> where there are separate VMEXITs for recalculation and EPT violations
> >>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
> >>> Do it by also unifying do_recalc return code with Intel implementation
> >>> where returning 1 means P2M was actually changed.
> >>>
> >>> Since there was no case previously where p2m_pt_handle_deferred_changes()
> >>> could return a positive value - it's safe to replace ">= 0" with just "== 0"
> >>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
> >>> change as being able to deal with >0 return value of p2m->recalc from
> >>> EPT implementation.
> >>>
> >>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> >>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> >>
> >> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >> albeit preferably with ...
> >>
> >>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
> >>>              clear_recalc(l1, e);
> >>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
> >>>          ASSERT(!err);
> >>> +
> >>> +        recalc_done = true;
> >>>      }
> >>>
> >>>   out:
> >>>      unmap_domain_page(table);
> >>>
> >>> -    return err;
> >>> +    return err ?: (recalc_done ? 1 : 0);
> >>
> >> ... this shrunk to
> >>
> >>     return err ?: recalc_done;
> >>
> >> (easily doable while committing).
> >>
> >> Also Cc Paul.
> >>
> >
> > paging_log_dirty_enable() still fails global enable if has_arch_pdevs()
> > is true, so presumably there's no desperate need for this to go in 4.14?
> 
> The MMIO case is just the particular situation here. Aiui the same issue
> could potentially surface with other p2m types. Also given I'd consider
> this a backporting candidate, while it may not be desperately needed for
> the release, I think it deserves considering beyond the specific aspect
> you mention.
> 

In which case I think the commit message probably ought to be rephrased, since it appears to focus on a case that cannot currently happen.

  Paul
Igor Druzhinin June 3, 2020, 11:45 a.m. UTC | #5
On 03/06/2020 12:28, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 03 June 2020 12:22
>> To: paul@xen.org
>> Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org;
>> andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com
>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
>>
>> On 03.06.2020 12:26, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 03 June 2020 11:03
>>>> To: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
>>>> george.dunlap@citrix.com; Paul Durrant <paul@xen.org>
>>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
>>>>
>>>> On 02.06.2020 18:56, Igor Druzhinin wrote:
>>>>> A recalculation NPT fault doesn't always require additional handling
>>>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
>>>>> explicit handling done there - the fault is wrongly considered fatal.
>>>>>
>>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>>> at a moment log-dirty is enabled globally, recalculation is requested
>>>>> for the whole guest memory including directly mapped MMIO regions of vGPU
>>>>> which causes a page fault being raised at the first access to those;
>>>>> but due to MMIO P2M type not having any explicit handling in
>>>>> hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
>>>>> SVM violation.
>>>>>
>>>>> Instead of trying to be opportunistic - use safer approach and handle
>>>>> P2M recalculation in a separate NPT fault by attempting to retry after
>>>>> making the necessary adjustments. This is aligned with Intel behavior
>>>>> where there are separate VMEXITs for recalculation and EPT violations
>>>>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
>>>>> Do it by also unifying do_recalc return code with Intel implementation
>>>>> where returning 1 means P2M was actually changed.
>>>>>
>>>>> Since there was no case previously where p2m_pt_handle_deferred_changes()
>>>>> could return a positive value - it's safe to replace ">= 0" with just "== 0"
>>>>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
>>>>> change as being able to deal with >0 return value of p2m->recalc from
>>>>> EPT implementation.
>>>>>
>>>>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>> albeit preferably with ...
>>>>
>>>>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
>>>>>              clear_recalc(l1, e);
>>>>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>>>>>          ASSERT(!err);
>>>>> +
>>>>> +        recalc_done = true;
>>>>>      }
>>>>>
>>>>>   out:
>>>>>      unmap_domain_page(table);
>>>>>
>>>>> -    return err;
>>>>> +    return err ?: (recalc_done ? 1 : 0);
>>>>
>>>> ... this shrunk to
>>>>
>>>>     return err ?: recalc_done;
>>>>
>>>> (easily doable while committing).
>>>>
>>>> Also Cc Paul.
>>>>
>>>
>>> paging_log_dirty_enable() still fails global enable if has_arch_pdevs()
>>> is true, so presumably there's no desperate need for this to go in 4.14?
>>
>> The MMIO case is just the particular situation here. Aiui the same issue
>> could potentially surface with other p2m types. Also given I'd consider
>> this a backporting candidate, while it may not be desperately needed for
>> the release, I think it deserves considering beyond the specific aspect
>> you mention.
>>
> 
> In which case I think the commit message probably ought to be rephrased, since it appears to focus on a case that cannot currently happen.

This can happen without has_arch_pdevs() being true. It's enough to just
directly map some physical memory into a guest to get p2m_direct_mmio
type present in the page tables.

Igor
Paul Durrant June 3, 2020, 11:48 a.m. UTC | #6
> -----Original Message-----
> From: Igor Druzhinin <igor.druzhinin@citrix.com>
> Sent: 03 June 2020 12:45
> To: paul@xen.org; 'Jan Beulich' <jbeulich@suse.com>
> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
> george.dunlap@citrix.com
> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
> 
> On 03/06/2020 12:28, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 03 June 2020 12:22
> >> To: paul@xen.org
> >> Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org;
> >> andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com
> >> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
> >>
> >> On 03.06.2020 12:26, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 03 June 2020 11:03
> >>>> To: Igor Druzhinin <igor.druzhinin@citrix.com>
> >>>> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
> >>>> george.dunlap@citrix.com; Paul Durrant <paul@xen.org>
> >>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
> >>>>
> >>>> On 02.06.2020 18:56, Igor Druzhinin wrote:
> >>>>> A recalculation NPT fault doesn't always require additional handling
> >>>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
> >>>>> explicit handling done there - the fault is wrongly considered fatal.
> >>>>>
> >>>>> This covers a specific case of migration with vGPU assigned on AMD:
> >>>>> at a moment log-dirty is enabled globally, recalculation is requested
> >>>>> for the whole guest memory including directly mapped MMIO regions of vGPU
> >>>>> which causes a page fault being raised at the first access to those;
> >>>>> but due to MMIO P2M type not having any explicit handling in
> >>>>> hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
> >>>>> SVM violation.
> >>>>>
> >>>>> Instead of trying to be opportunistic - use safer approach and handle
> >>>>> P2M recalculation in a separate NPT fault by attempting to retry after
> >>>>> making the necessary adjustments. This is aligned with Intel behavior
> >>>>> where there are separate VMEXITs for recalculation and EPT violations
> >>>>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
> >>>>> Do it by also unifying do_recalc return code with Intel implementation
> >>>>> where returning 1 means P2M was actually changed.
> >>>>>
> >>>>> Since there was no case previously where p2m_pt_handle_deferred_changes()
> >>>>> could return a positive value - it's safe to replace ">= 0" with just "== 0"
> >>>>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
> >>>>> change as being able to deal with >0 return value of p2m->recalc from
> >>>>> EPT implementation.
> >>>>>
> >>>>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> >>>>
> >>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >>>> albeit preferably with ...
> >>>>
> >>>>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
> >>>>>              clear_recalc(l1, e);
> >>>>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
> >>>>>          ASSERT(!err);
> >>>>> +
> >>>>> +        recalc_done = true;
> >>>>>      }
> >>>>>
> >>>>>   out:
> >>>>>      unmap_domain_page(table);
> >>>>>
> >>>>> -    return err;
> >>>>> +    return err ?: (recalc_done ? 1 : 0);
> >>>>
> >>>> ... this shrunk to
> >>>>
> >>>>     return err ?: recalc_done;
> >>>>
> >>>> (easily doable while committing).
> >>>>
> >>>> Also Cc Paul.
> >>>>
> >>>
> >>> paging_log_dirty_enable() still fails global enable if has_arch_pdevs()
> >>> is true, so presumably there's no desperate need for this to go in 4.14?
> >>
> >> The MMIO case is just the particular situation here. Aiui the same issue
> >> could potentially surface with other p2m types. Also given I'd consider
> >> this a backporting candidate, while it may not be desperately needed for
> >> the release, I think it deserves considering beyond the specific aspect
> >> you mention.
> >>
> >
> > In which case I think the commit message probably ought to be rephrased, since it appears to focus
> on a case that cannot currently happen.
> 
> This can happen without has_arch_pdevs() being true. It's enough to just
> directly map some physical memory into a guest to get p2m_direct_mmio
> type present in the page tables.

OK, that's fine, but when will that happen without pass-through? If we can have a commit message justifying the change based on what can actually happen now, then I would not be opposed to it going in 4.14.

  Paul

> 
> Igor
Igor Druzhinin June 3, 2020, 12:10 p.m. UTC | #7
On 03/06/2020 12:48, Paul Durrant wrote:
>> -----Original Message-----
>> From: Igor Druzhinin <igor.druzhinin@citrix.com>
>> Sent: 03 June 2020 12:45
>> To: paul@xen.org; 'Jan Beulich' <jbeulich@suse.com>
>> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
>> george.dunlap@citrix.com
>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
>>
>> On 03/06/2020 12:28, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 03 June 2020 12:22
>>>> To: paul@xen.org
>>>> Cc: 'Igor Druzhinin' <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org;
>>>> andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com; george.dunlap@citrix.com
>>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
>>>>
>>>> On 03.06.2020 12:26, Paul Durrant wrote:
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 03 June 2020 11:03
>>>>>> To: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>>>> Cc: xen-devel@lists.xenproject.org; andrew.cooper3@citrix.com; wl@xen.org; roger.pau@citrix.com;
>>>>>> george.dunlap@citrix.com; Paul Durrant <paul@xen.org>
>>>>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults immediately
>>>>>>
>>>>>> On 02.06.2020 18:56, Igor Druzhinin wrote:
>>>>>>> A recalculation NPT fault doesn't always require additional handling
>>>>>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no
>>>>>>> explicit handling done there - the fault is wrongly considered fatal.
>>>>>>>
>>>>>>> This covers a specific case of migration with vGPU assigned on AMD:
>>>>>>> at a moment log-dirty is enabled globally, recalculation is requested
>>>>>>> for the whole guest memory including directly mapped MMIO regions of vGPU
>>>>>>> which causes a page fault being raised at the first access to those;
>>>>>>> but due to MMIO P2M type not having any explicit handling in
>>>>>>> hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
>>>>>>> SVM violation.
>>>>>>>
>>>>>>> Instead of trying to be opportunistic - use safer approach and handle
>>>>>>> P2M recalculation in a separate NPT fault by attempting to retry after
>>>>>>> making the necessary adjustments. This is aligned with Intel behavior
>>>>>>> where there are separate VMEXITs for recalculation and EPT violations
>>>>>>> (faults) and only faults are handled in hvm_hap_nested_page_fault().
>>>>>>> Do it by also unifying do_recalc return code with Intel implementation
>>>>>>> where returning 1 means P2M was actually changed.
>>>>>>>
>>>>>>> Since there was no case previously where p2m_pt_handle_deferred_changes()
>>>>>>> could return a positive value - it's safe to replace ">= 0" with just "== 0"
>>>>>>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the
>>>>>>> change as being able to deal with >0 return value of p2m->recalc from
>>>>>>> EPT implementation.
>>>>>>>
>>>>>>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>>>>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>>>>
>>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>> albeit preferably with ...
>>>>>>
>>>>>>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
>>>>>>>              clear_recalc(l1, e);
>>>>>>>          err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
>>>>>>>          ASSERT(!err);
>>>>>>> +
>>>>>>> +        recalc_done = true;
>>>>>>>      }
>>>>>>>
>>>>>>>   out:
>>>>>>>      unmap_domain_page(table);
>>>>>>>
>>>>>>> -    return err;
>>>>>>> +    return err ?: (recalc_done ? 1 : 0);
>>>>>>
>>>>>> ... this shrunk to
>>>>>>
>>>>>>     return err ?: recalc_done;
>>>>>>
>>>>>> (easily doable while committing).
>>>>>>
>>>>>> Also Cc Paul.
>>>>>>
>>>>>
>>>>> paging_log_dirty_enable() still fails global enable if has_arch_pdevs()
>>>>> is true, so presumably there's no desperate need for this to go in 4.14?
>>>>
>>>> The MMIO case is just the particular situation here. Aiui the same issue
>>>> could potentially surface with other p2m types. Also given I'd consider
>>>> this a backporting candidate, while it may not be desperately needed for
>>>> the release, I think it deserves considering beyond the specific aspect
>>>> you mention.
>>>>
>>>
>>> In which case I think the commit message probably ought to be rephrased, since it appears to focus
>> on a case that cannot currently happen.
>>
>> This can happen without has_arch_pdevs() being true. It's enough to just
>> directly map some physical memory into a guest to get p2m_direct_mmio
>> type present in the page tables.
> 
> OK, that's fine, but when will that happen without pass-through? If we can have a commit message justifying the change based on what can actually happen now, then I would not be opposed to it going in 4.14.

Yes, it can happen now - we had regular (not SR-IOV) vGPU migration totally
broken because of this on AMD - never got tested before at all. You don't need
special patches on top of Xen or anything.

To get p2m_mmio_direct you just need to call XEN_DOMCTL_memory_mapping on a domain.

All 

Igor
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 46a1aac..7f6f578 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2923,9 +2923,10 @@  void svm_vmexit_handler(struct cpu_user_regs *regs)
             v->arch.hvm.svm.cached_insn_len = vmcb->guest_ins_len & 0xf;
         rc = vmcb->exitinfo1 & PFEC_page_present
              ? p2m_pt_handle_deferred_changes(vmcb->exitinfo2) : 0;
-        if ( rc >= 0 )
+        if ( rc == 0 )
+            /* If no recal adjustments were being made - handle this fault */
             svm_do_nested_pgfault(v, regs, vmcb->exitinfo1, vmcb->exitinfo2);
-        else
+        else if ( rc < 0 )
         {
             printk(XENLOG_G_ERR
                    "%pv: Error %d handling NPF (gpa=%08lx ec=%04lx)\n",
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 5c05017..070389e 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -341,6 +341,7 @@  static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
     unsigned int level = 4;
     l1_pgentry_t *pent;
     int err = 0;
+    bool recalc_done = false;
 
     table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m)));
     while ( --level )
@@ -402,6 +403,8 @@  static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                 clear_recalc(l1, e);
                 err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
                 ASSERT(!err);
+
+                recalc_done = true;
             }
         }
         unmap_domain_page((void *)((unsigned long)pent & PAGE_MASK));
@@ -448,12 +451,14 @@  static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
             clear_recalc(l1, e);
         err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1);
         ASSERT(!err);
+
+        recalc_done = true;
     }
 
  out:
     unmap_domain_page(table);
 
-    return err;
+    return err ?: (recalc_done ? 1 : 0);
 }
 
 int p2m_pt_handle_deferred_changes(uint64_t gpa)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 17f320b..db7bde0 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1197,7 +1197,7 @@  static int finish_type_change(struct p2m_domain *p2m,
         rc = p2m->recalc(p2m, gfn);
         /*
          * ept->recalc could return 0/1/-ENOMEM. pt->recalc could return
-         * 0/-ENOMEM/-ENOENT, -ENOENT isn't an error as we are looping
+         * 0/1/-ENOMEM/-ENOENT, -ENOENT isn't an error as we are looping
          * gfn here. If rc is 1 we need to have it 0 for success.
          */
         if ( rc == -ENOENT || rc > 0 )