mbox series

[v5,00/10] runstate/time area registration by (guest) physical address

Message ID 20231002151127.71020-1-roger.pau@citrix.com (mailing list archive)
Headers show
Series runstate/time area registration by (guest) physical address | expand

Message

Roger Pau Monné Oct. 2, 2023, 3:11 p.m. UTC
Since it was indicated that introducing specific new vCPU ops may be
beneficial independent of the introduction of a fully physical-
address-based ABI flavor, here we go. There continue to be a few open
questions throughout the series, resolving of which was one of the main
goals of the earlier postings.

v5 adds one vm-fork specific pre-patch that does simply the introduced
code later on.  It does also fix a vm-fork bug.

Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.

Whole series will need a Release-Ack.

Thanks, Roger.

Jan Beulich (9):
  x86/shim: zap runstate and time area handles during shutdown
  domain: GADDR based shared guest area registration alternative -
    teardown
  domain: update GADDR based runstate guest area
  x86: update GADDR based secondary time area
  x86/mem-sharing: copy GADDR based shared guest areas
  domain: map/unmap GADDR based shared guest areas
  domain: introduce GADDR based runstate area registration alternative
  x86: introduce GADDR based secondary time area registration
    alternative
  common: convert vCPU info area registration

Roger Pau Monne (1):
  mem_sharing/fork: do not attempt to populate vcpu_info page

 xen/arch/x86/domain.c             |  33 +++
 xen/arch/x86/include/asm/domain.h |   3 +
 xen/arch/x86/include/asm/shared.h |  19 +-
 xen/arch/x86/mm/mem_sharing.c     |  73 +++----
 xen/arch/x86/pv/shim.c            |  10 +-
 xen/arch/x86/time.c               |  34 +++-
 xen/arch/x86/x86_64/asm-offsets.c |   2 +-
 xen/arch/x86/x86_64/domain.c      |  36 ++++
 xen/arch/x86/x86_64/traps.c       |   2 +-
 xen/common/compat/domain.c        |   2 +-
 xen/common/domain.c               | 324 ++++++++++++++++++++++--------
 xen/include/public/vcpu.h         |  19 ++
 xen/include/xen/domain.h          |  12 +-
 xen/include/xen/sched.h           |   8 +-
 xen/include/xen/shared.h          |   3 +-
 15 files changed, 440 insertions(+), 140 deletions(-)

Comments

Henry Wang Oct. 5, 2023, 1:27 a.m. UTC | #1
Hi Roger,

> On Oct 2, 2023, at 23:11, Roger Pau Monne <roger.pau@citrix.com> wrote:
> 
> Since it was indicated that introducing specific new vCPU ops may be
> beneficial independent of the introduction of a fully physical-
> address-based ABI flavor, here we go. There continue to be a few open
> questions throughout the series, resolving of which was one of the main
> goals of the earlier postings.
> 
> v5 adds one vm-fork specific pre-patch that does simply the introduced
> code later on.  It does also fix a vm-fork bug.
> 
> Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
> 
> Whole series will need a Release-Ack.

We agreed in [1] that this series is a good candidate for 4.18, so for the whole
series,

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

[1] https://lore.kernel.org/xen-devel/0be1e32f-5600-7b3a-8d72-84297a1ebee0@suse.com/

Kind regards,
Henry

> 
> Thanks, Roger.
> 
> Jan Beulich (9):
>  x86/shim: zap runstate and time area handles during shutdown
>  domain: GADDR based shared guest area registration alternative -
>    teardown
>  domain: update GADDR based runstate guest area
>  x86: update GADDR based secondary time area
>  x86/mem-sharing: copy GADDR based shared guest areas
>  domain: map/unmap GADDR based shared guest areas
>  domain: introduce GADDR based runstate area registration alternative
>  x86: introduce GADDR based secondary time area registration
>    alternative
>  common: convert vCPU info area registration
> 
> Roger Pau Monne (1):
>  mem_sharing/fork: do not attempt to populate vcpu_info page
> 
> xen/arch/x86/domain.c             |  33 +++
> xen/arch/x86/include/asm/domain.h |   3 +
> xen/arch/x86/include/asm/shared.h |  19 +-
> xen/arch/x86/mm/mem_sharing.c     |  73 +++----
> xen/arch/x86/pv/shim.c            |  10 +-
> xen/arch/x86/time.c               |  34 +++-
> xen/arch/x86/x86_64/asm-offsets.c |   2 +-
> xen/arch/x86/x86_64/domain.c      |  36 ++++
> xen/arch/x86/x86_64/traps.c       |   2 +-
> xen/common/compat/domain.c        |   2 +-
> xen/common/domain.c               | 324 ++++++++++++++++++++++--------
> xen/include/public/vcpu.h         |  19 ++
> xen/include/xen/domain.h          |  12 +-
> xen/include/xen/sched.h           |   8 +-
> xen/include/xen/shared.h          |   3 +-
> 15 files changed, 440 insertions(+), 140 deletions(-)
> 
> -- 
> 2.42.0
>
Julien Grall Oct. 5, 2023, 1:12 p.m. UTC | #2
Hi,

On 05/10/2023 02:27, Henry Wang wrote:
> Hi Roger,
> 
>> On Oct 2, 2023, at 23:11, Roger Pau Monne <roger.pau@citrix.com> wrote:
>>
>> Since it was indicated that introducing specific new vCPU ops may be
>> beneficial independent of the introduction of a fully physical-
>> address-based ABI flavor, here we go. There continue to be a few open
>> questions throughout the series, resolving of which was one of the main
>> goals of the earlier postings.
>>
>> v5 adds one vm-fork specific pre-patch that does simply the introduced
>> code later on.  It does also fix a vm-fork bug.
>>
>> Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
>>
>> Whole series will need a Release-Ack.
> 
> We agreed in [1] that this series is a good candidate for 4.18, so for the whole
> series,
> 
> Release-acked-by: Henry Wang <Henry.Wang@arm.com>

I have now committed the series.

Cheers,
Andrew Cooper Oct. 5, 2023, 6:58 p.m. UTC | #3
I see this series has been committed.  But it's broken in a really
fundamental way.


This is a new extension with persistent side effects to an existing part
of the guest ABI.

Yet there doesn't appear to be any enumeration that the interface is
available to begin with.  Requiring the guest to probe subops, and
having no way to disable it on a per-domain basis is unacceptable, and
has exploded on us more times than I care to count in security fixes
alone, and that doesn't even cover the issues Amazon have reported over
the years.


Henry: Blocker for 4.18.   The absolutely bare minimum necessary to
avoid reversion is some kind of positive enumeration that the two new
hypercalls are available.

Otherwise I will be #if 0'ing out the new hypercalls before this ABI
mistake gets set in stone.


If this were x86-only it would need to be a CPUID flag, but it will need
to be something arch-agnostic in this case.  The series should not have
come without a proper per-domain control and toolstack integration, but
everything else can be retrofitted in an emergency.

And on a related note, where is the documentation describing this new
feature?  Some tests perhaps, or any single implementation of the guest
side interface?

This is engineering principles so basic that they do go without saying.

~Andrew
Julien Grall Oct. 5, 2023, 10:40 p.m. UTC | #4
Hi Andrew,

On 05/10/2023 19:58, Andrew Cooper wrote:
> I see this series has been committed.  But it's broken in a really
> fundamental way.

Thanks for pointing out. But I'd like to understand how I come to only 
hear about those concerns on the series after committing. Did I miss any 
thread? Even if this series has been pending for over 6 months, should 
have I waited longer before committing?

Furthermore, Jan pointed out that this series was discussed recently 
during the x86 meeting. Did you raise any concern during the call and 
they were not carried on the ML?

> This is a new extension with persistent side effects to an existing part
> of the guest ABI.
> 
> Yet there doesn't appear to be any enumeration that the interface is
> available to begin with.  Requiring the guest to probe subops, and
> having no way to disable it on a per-domain basis is unacceptable, and
> has exploded on us more times than I care to count in security fixes
> alone, and that doesn't even cover the issues Amazon have reported over
> the years.

Indeed. But, AFAIR, all those patches got stuck because of diverging 
opinions between you and you. Can we finally come to an agreement on how 
to disable/expose a new hypercall/feature so we can move on?

> 
> Henry: Blocker for 4.18.   The absolutely bare minimum necessary to
> avoid reversion is some kind of positive enumeration that the two new
> hypercalls are available.

So to clarify, you would like both an interface for the guest and the 
toolstack for 4.18. Is this correct?

> 
> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
> mistake gets set in stone.
> 
> 
> If this were x86-only it would need to be a CPUID flag, but it will need
> to be something arch-agnostic in this case. 

I think we can add two new flags in XENVER_get_features to indicate to 
the guest that the feature is supported. What do you think?

> The series should not have
> come without a proper per-domain control and toolstack integration,

I think this requirement should be written down in a document we can use 
for future reference and not expect people to remember what may have 
been said on the ML for previous hypercall addition.

Cheers,
Julien Grall Oct. 5, 2023, 10:43 p.m. UTC | #5
On 05/10/2023 23:40, Julien Grall wrote:
> Hi Andrew,
> 
> On 05/10/2023 19:58, Andrew Cooper wrote:
>> I see this series has been committed.  But it's broken in a really
>> fundamental way.
> 
> Thanks for pointing out. But I'd like to understand how I come to only 
> hear about those concerns on the series after committing. Did I miss any 
> thread? Even if this series has been pending for over 6 months, should 
> have I waited longer before committing?
> 
> Furthermore, Jan pointed out that this series was discussed recently 
> during the x86 meeting. Did you raise any concern during the call and 
> they were not carried on the ML?
> 
>> This is a new extension with persistent side effects to an existing part
>> of the guest ABI.
>>
>> Yet there doesn't appear to be any enumeration that the interface is
>> available to begin with.  Requiring the guest to probe subops, and
>> having no way to disable it on a per-domain basis is unacceptable, and
>> has exploded on us more times than I care to count in security fixes
>> alone, and that doesn't even cover the issues Amazon have reported over
>> the years.
> 
> Indeed. But, AFAIR, all those patches got stuck because of diverging 
> opinions between you and you. Can we finally come to an agreement on how 

The second 'you' was meant to be Jan. I didn't intend to say you were 
disagreing with yourself.

Cheers,
Roger Pau Monné Oct. 6, 2023, 8 a.m. UTC | #6
On Thu, Oct 05, 2023 at 07:58:50PM +0100, Andrew Cooper wrote:
> I see this series has been committed.  But it's broken in a really
> fundamental way.
> 
> 
> This is a new extension with persistent side effects to an existing part
> of the guest ABI.

The only change in the ABI is the different return code for multiple
attempts to map the vcpu_info page, it used to be -EINVAL and it's
-EBUSY now, which seems more descriptive.

The added hypercalls are an extension of the ABI, not not a
modification of an existing part.  Or maybe I'm not understanding the
complaint.

> Yet there doesn't appear to be any enumeration that the interface is
> available to begin with.  Requiring the guest to probe subops, and
> having no way to disable it on a per-domain basis is unacceptable,

We have never mandated such disables to be part of the series adding
the new hypercalls, those have always been retro fitted in case of
need.  Not saying we shouldn't do it, but it's not something we have
asked submitters to do.

> and
> has exploded on us more times than I care to count in security fixes
> alone, and that doesn't even cover the issues Amazon have reported over
> the years.

That's fine, I can add the enumeration.  A CHANGELOG entry should also
be added.

> 
> Henry: Blocker for 4.18.   The absolutely bare minimum necessary to
> avoid reversion is some kind of positive enumeration that the two new
> hypercalls are available.
> 
> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
> mistake gets set in stone.
> 
> 
> If this were x86-only it would need to be a CPUID flag, but it will need
> to be something arch-agnostic in this case.  The series should not have
> come without a proper per-domain control and toolstack integration, but
> everything else can be retrofitted in an emergency.
> 
> And on a related note, where is the documentation describing this new
> feature?  Some tests perhaps, or any single implementation of the guest
> side interface?

Not that I know, I was expecting Jan to post that once he gets back
from PTO.

I already noted somewhere that I wasn't able to test myself because I
couldn't find any Linux side patches to test the feature with, and I
didn't have time to write ones myself (was expecting Jan to have the
Linux side done already for testing reasons).

> This is engineering principles so basic that they do go without saying.
> 
> ~Andrew
>
Roger Pau Monné Oct. 6, 2023, 8:22 a.m. UTC | #7
On Fri, Oct 06, 2023 at 10:00:35AM +0200, Roger Pau Monné wrote:
> On Thu, Oct 05, 2023 at 07:58:50PM +0100, Andrew Cooper wrote:
> > I see this series has been committed.  But it's broken in a really
> > fundamental way.
> > 
> > 
> > This is a new extension with persistent side effects to an existing part
> > of the guest ABI.
> 
> The only change in the ABI is the different return code for multiple
> attempts to map the vcpu_info page, it used to be -EINVAL and it's
> -EBUSY now, which seems more descriptive.
> 
> The added hypercalls are an extension of the ABI, not not a
> modification of an existing part.  Or maybe I'm not understanding the
> complaint.
> 
> > Yet there doesn't appear to be any enumeration that the interface is
> > available to begin with.  Requiring the guest to probe subops, and
> > having no way to disable it on a per-domain basis is unacceptable,
> 
> We have never mandated such disables to be part of the series adding
> the new hypercalls, those have always been retro fitted in case of
> need.  Not saying we shouldn't do it, but it's not something we have
> asked submitters to do.

I've been thinking about this, and I assume that we would like some
kind of tools interface to list supported features by the hypervisor,
and then a way to disable them.  We could then use such information to
level the hypercall interface across a pool of hosts, and make sure
it's always the same.

In principle guest should cope with some features/hypercalls not being
able on resume, but we all know this is not always the case.

I'm going to punt this, as adding such interface would be too
disruptive at this point in the release, and in any case it's
unlikely we could reach an agreement on how the interface should look
like in a very short time frame.

Roger.
Andrew Cooper Oct. 6, 2023, 10:21 a.m. UTC | #8
On 06/10/2023 9:00 am, Roger Pau Monné wrote:
> On Thu, Oct 05, 2023 at 07:58:50PM +0100, Andrew Cooper wrote:
>> Henry: Blocker for 4.18.   The absolutely bare minimum necessary to
>> avoid reversion is some kind of positive enumeration that the two new
>> hypercalls are available.
>>
>> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
>> mistake gets set in stone.
>>
>>
>> If this were x86-only it would need to be a CPUID flag, but it will need
>> to be something arch-agnostic in this case.  The series should not have
>> come without a proper per-domain control and toolstack integration, but
>> everything else can be retrofitted in an emergency.
>>
>> And on a related note, where is the documentation describing this new
>> feature?  Some tests perhaps, or any single implementation of the guest
>> side interface?
> Not that I know, I was expecting Jan to post that once he gets back
> from PTO.
>
> I already noted somewhere that I wasn't able to test myself because I
> couldn't find any Linux side patches to test the feature with, and I
> didn't have time to write ones myself (was expecting Jan to have the
> Linux side done already for testing reasons).

Big new feature, getting merged after RC1 and feature freeze, with no
enumeration and no disable.

How did this not set off massive alarm bells?


I was about to say that this is a disaster waiting to happen, except
OSSTest has beaten me to it.

This "new feature" managed to regress existing behaviour of something it
was trying not to change, and that's in an area that even I wouldn't
expect to put in a disable.


I'm very sorely tempted to insist that this gets disabled by default in
4.18.  It's clear that it isn't in a fit state, and the absence of any
testing whatsoever means it is likely to explode on the first guest
kernel which tries to use the new interface.

And to be clear, disabled-by-default is a question for Henry and Henry
alone, as the person ultimately responsible for 4.18.

~Andrew
Jan Beulich Oct. 16, 2023, 10:04 a.m. UTC | #9
On 05.10.2023 20:58, Andrew Cooper wrote:
> I see this series has been committed.  But it's broken in a really
> fundamental way.
> 
> 
> This is a new extension with persistent side effects to an existing part
> of the guest ABI.
> 
> Yet there doesn't appear to be any enumeration that the interface is
> available to begin with.  Requiring the guest to probe subops, and
> having no way to disable it on a per-domain basis is unacceptable, and
> has exploded on us more times than I care to count in security fixes
> alone, and that doesn't even cover the issues Amazon have reported over
> the years.

This has never been a requirement. Plus you had ample time to raise such
a request.

> Henry: Blocker for 4.18.   The absolutely bare minimum necessary to
> avoid reversion is some kind of positive enumeration that the two new
> hypercalls are available.

I disagree; to me this is a nice-to-have, not a requirement.

> Otherwise I will be #if 0'ing out the new hypercalls before this ABI
> mistake gets set in stone.
> 
> 
> If this were x86-only it would need to be a CPUID flag, but it will need
> to be something arch-agnostic in this case.  The series should not have
> come without a proper per-domain control and toolstack integration, but
> everything else can be retrofitted in an emergency.

To be honest, had it been clear that you expect a per-domain control, I
probably would not have taken on this piece of work.

> And on a related note, where is the documentation describing this new
> feature?  Some tests perhaps, or any single implementation of the guest
> side interface?

Documentation is as for sibling interfaces - as much or as little as
we have in the public headers. I did test all of this with XTF, but I've
pretty much given up posting XTF patches, seeing how even XSA tests and
alike never made it anywhere.

Jan
Jan Beulich Oct. 16, 2023, 10:07 a.m. UTC | #10
On 02.10.2023 17:11, Roger Pau Monne wrote:
> Since it was indicated that introducing specific new vCPU ops may be
> beneficial independent of the introduction of a fully physical-
> address-based ABI flavor, here we go. There continue to be a few open
> questions throughout the series, resolving of which was one of the main
> goals of the earlier postings.
> 
> v5 adds one vm-fork specific pre-patch that does simply the introduced
> code later on.  It does also fix a vm-fork bug.
> 
> Patches 1 and 6 are missing and Ack from the mem-sharing maintainer.
> 
> Whole series will need a Release-Ack.
> 
> Thanks, Roger.
> 
> Jan Beulich (9):
>   x86/shim: zap runstate and time area handles during shutdown
>   domain: GADDR based shared guest area registration alternative -
>     teardown
>   domain: update GADDR based runstate guest area
>   x86: update GADDR based secondary time area
>   x86/mem-sharing: copy GADDR based shared guest areas
>   domain: map/unmap GADDR based shared guest areas
>   domain: introduce GADDR based runstate area registration alternative
>   x86: introduce GADDR based secondary time area registration
>     alternative
>   common: convert vCPU info area registration
> 
> Roger Pau Monne (1):
>   mem_sharing/fork: do not attempt to populate vcpu_info page

Thanks much for picking this up during my absence.

Jan