diff mbox series

ns16550c: avoid crash in ns16550_endboot in PV shim mode

Message ID de07d56188f13e222ddaa1b963c20f8d7d61350e.camel@infradead.org (mailing list archive)
State New, archived
Headers show
Series ns16550c: avoid crash in ns16550_endboot in PV shim mode | expand

Commit Message

David Woodhouse Oct. 19, 2023, 4:21 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

In shim mode there is no hardware_domain. Dereferencing the pointer
doesn't end well.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
This is about as far as I got in my abortive attempt to use the PV shim
without an actual PV console being provided by the HVM hosting
environment. It still doesn't pass the guest's console through to
serial; that only seems to shim to an actual PV console.

 xen/drivers/char/ns16550.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jan Beulich Oct. 20, 2023, 6:17 a.m. UTC | #1
On 19.10.2023 18:21, David Woodhouse wrote:
> --- a/xen/drivers/char/ns16550.c
> +++ b/xen/drivers/char/ns16550.c
> @@ -586,6 +586,8 @@ static void __init cf_check ns16550_endboot(struct serial_port *port)
>  
>      if ( uart->remapped_io_base )
>          return;
> +    if (!hardware_domain)
> +        return;
>      rv = ioports_deny_access(hardware_domain, uart->io_base, uart->io_base + 7);
>      if ( rv != 0 )
>          BUG();

Considering this is the only use of hardware_domain in the driver, fixing
it like this (with style adjusted) is certainly an option. I'd like to
ask though whether it makes sense for execution to make it here in shim
mode, where there's no serial port anyway (unless we start supporting
pass-through [of possibly even non-PCI devices] for the shim). I wonder
whether we wouldn't want to bypass ns16550_init() - likely as well as the
EHCI and XHCI ones - instead, until (if ever) such passing through was
actually deemed necessary to support.

Jan
Andrew Cooper Oct. 20, 2023, 10:14 a.m. UTC | #2
On 19/10/2023 5:21 pm, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> In shim mode there is no hardware_domain. Dereferencing the pointer
> doesn't end well.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> This is about as far as I got in my abortive attempt to use the PV shim
> without an actual PV console being provided by the HVM hosting
> environment. It still doesn't pass the guest's console through to
> serial; that only seems to shim to an actual PV console.

There's no such thing as a Xen VM without a PV console.

And yes, this is an error, but that horse bolted 2 decades ago.


It would be nice if having a "real" serial didn't crash like this, but
PV Shim is specialised to transplant one normal-looking PV guest, and
the interposition logic is tied to the PV console.

~Andrew
David Woodhouse Oct. 20, 2023, 10:29 a.m. UTC | #3
On Fri, 2023-10-20 at 11:14 +0100, Andrew Cooper wrote:
> On 19/10/2023 5:21 pm, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > In shim mode there is no hardware_domain. Dereferencing the pointer
> > doesn't end well.
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> > This is about as far as I got in my abortive attempt to use the PV shim
> > without an actual PV console being provided by the HVM hosting
> > environment. It still doesn't pass the guest's console through to
> > serial; that only seems to shim to an actual PV console.
> 
> There's no such thing as a Xen VM without a PV console.

Huh? There are literally millions of them. Every EC2 Xen HVM instance
boots with a serial device but no Xen console. That's true of the ones
running on true Xen, as well as the newer launches which are running on
top of Linux/KVM.

We implemented Xen console support in the Nitro hypervisor, then had to
disable it because it wasn't faithful to the production environment
that guests previously experienced on Xen, and eventually ripped it out
because it was dead code.

Likewise, upstream Qemu's Xen emulation mode doesn't currently have
console support (although I did just add it to get the shim working).

> And yes, this is an error, but that horse bolted 2 decades ago.
> 
> 
> It would be nice if having a "real" serial didn't crash like this, but
> PV Shim is specialised to transplant one normal-looking PV guest, and
> the interposition logic is tied to the PV console.

That's a nicer model. When Spectre/Meltdown broke and we posted the
'Vixen' version of a shim, it actually implemented a console backend
and would output to the serial port. But I do agree it's nicer not to.

Even with a PV console though, it might still be useful to have the
shim's output going to a serial port while the guest's output goes to
the console though, to keep them separate.
Andrew Cooper Oct. 20, 2023, 1:25 p.m. UTC | #4
On 20/10/2023 11:29 am, David Woodhouse wrote:
> On Fri, 2023-10-20 at 11:14 +0100, Andrew Cooper wrote:
>> On 19/10/2023 5:21 pm, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>
>>> In shim mode there is no hardware_domain. Dereferencing the pointer
>>> doesn't end well.
>>>
>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>> ---
>>> This is about as far as I got in my abortive attempt to use the PV shim
>>> without an actual PV console being provided by the HVM hosting
>>> environment. It still doesn't pass the guest's console through to
>>> serial; that only seems to shim to an actual PV console.
>> There's no such thing as a Xen VM without a PV console.
> Huh? There are literally millions of them.

I'm very prepared to believe there are millions which don't overtly
malfunction when you don't fill in the HVM Params. 

Which is very different from saying "there's a way in the Xen guest ABI
to express 'you don't have a PV console' ".

~Andrew
Roger Pau Monné Oct. 20, 2023, 1:29 p.m. UTC | #5
On Fri, Oct 20, 2023 at 02:25:35PM +0100, Andrew Cooper wrote:
> On 20/10/2023 11:29 am, David Woodhouse wrote:
> > On Fri, 2023-10-20 at 11:14 +0100, Andrew Cooper wrote:
> >> On 19/10/2023 5:21 pm, David Woodhouse wrote:
> >>> From: David Woodhouse <dwmw@amazon.co.uk>
> >>>
> >>> In shim mode there is no hardware_domain. Dereferencing the pointer
> >>> doesn't end well.
> >>>
> >>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> >>> ---
> >>> This is about as far as I got in my abortive attempt to use the PV shim
> >>> without an actual PV console being provided by the HVM hosting
> >>> environment. It still doesn't pass the guest's console through to
> >>> serial; that only seems to shim to an actual PV console.
> >> There's no such thing as a Xen VM without a PV console.
> > Huh? There are literally millions of them.
> 
> I'm very prepared to believe there are millions which don't overtly
> malfunction when you don't fill in the HVM Params. 
> 
> Which is very different from saying "there's a way in the Xen guest ABI
> to express 'you don't have a PV console' ".

FWIW, Linux assumes that either the console page or the event channel
being 0 implies no console available [0], so I guess that's the ABI
now.

Roger.

[0] https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258
David Woodhouse Oct. 20, 2023, 1:37 p.m. UTC | #6
On Fri, 2023-10-20 at 15:29 +0200, Roger Pau Monné wrote:
> On Fri, Oct 20, 2023 at 02:25:35PM +0100, Andrew Cooper wrote:
> > On 20/10/2023 11:29 am, David Woodhouse wrote:
> > > On Fri, 2023-10-20 at 11:14 +0100, Andrew Cooper wrote:
> > > > On 19/10/2023 5:21 pm, David Woodhouse wrote:
> > > > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > > > 
> > > > > In shim mode there is no hardware_domain. Dereferencing the pointer
> > > > > doesn't end well.
> > > > > 
> > > > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > > > > ---
> > > > > This is about as far as I got in my abortive attempt to use the PV shim
> > > > > without an actual PV console being provided by the HVM hosting
> > > > > environment. It still doesn't pass the guest's console through to
> > > > > serial; that only seems to shim to an actual PV console.
> > > > There's no such thing as a Xen VM without a PV console.
> > > Huh? There are literally millions of them.
> > 
> > I'm very prepared to believe there are millions which don't overtly
> > malfunction when you don't fill in the HVM Params. 
> > 
> > Which is very different from saying "there's a way in the Xen guest ABI
> > to express 'you don't have a PV console' ".
> 
> FWIW, Linux assumes that either the console page or the event channel
> being 0 implies no console available [0], so I guess that's the ABI
> now.

Or if the HVMOP_get_param call returns an error.

> Roger.
> 
> [0] https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258

I'm not convinced I believe what the comment says there about evtchn 0
being theoretically valid. I don't believe zero is a valid evtchn#, is
it?
Paul Durrant Oct. 20, 2023, 2:50 p.m. UTC | #7
On 20/10/2023 14:37, David Woodhouse wrote:
[snip]
>>
>> [0] https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258
> 
> I'm not convinced I believe what the comment says there about evtchn 0
> being theoretically valid. I don't believe zero is a valid evtchn#, is
> it?

gfn 0 might be valid, but I'm also pretty sure evtchn 0 is not valid.

   Paul
Andrew Cooper Oct. 20, 2023, 3:16 p.m. UTC | #8
On 20/10/2023 3:50 pm, Durrant, Paul wrote:
> On 20/10/2023 14:37, David Woodhouse wrote:
> [snip]
>>>
>>> [0]
>>> https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258
>>
>> I'm not convinced I believe what the comment says there about evtchn 0
>> being theoretically valid. I don't believe zero is a valid evtchn#, is
>> it?
>
> gfn 0 might be valid, but I'm also pretty sure evtchn 0 is not valid.

GFN 0 very much is valid.

evtchn 0 OTOH is explicitly not valid.  From evtchn_init():

    evtchn_from_port(d, 0)->state = ECS_RESERVED;


However, the fields being 0 doesn't mean not available.  That's the
signal to saying "not connected yet", because that's what dom0 gets
before xenconsoled starts up.

~Andrew
Roger Pau Monné Oct. 23, 2023, 7:52 a.m. UTC | #9
On Fri, Oct 20, 2023 at 04:16:16PM +0100, Andrew Cooper wrote:
> On 20/10/2023 3:50 pm, Durrant, Paul wrote:
> > On 20/10/2023 14:37, David Woodhouse wrote:
> > [snip]
> >>>
> >>> [0]
> >>> https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258
> >>
> >> I'm not convinced I believe what the comment says there about evtchn 0
> >> being theoretically valid. I don't believe zero is a valid evtchn#, is
> >> it?
> >
> > gfn 0 might be valid, but I'm also pretty sure evtchn 0 is not valid.
> 
> GFN 0 very much is valid.
> 
> evtchn 0 OTOH is explicitly not valid.  From evtchn_init():
> 
>     evtchn_from_port(d, 0)->state = ECS_RESERVED;
> 
> 
> However, the fields being 0 doesn't mean not available.  That's the
> signal to saying "not connected yet", because that's what dom0 gets
> before xenconsoled starts up.

Someone asked me the same a while back, and IIRC we don't state
anywhere in the public headers that event channel 0 is reserved,
however that has always? been part of the implementation.

If we intend this to be reliable, we should add a define to the public
headers in order to signal that 0 will always be reserved.

Roger.
Jan Beulich Oct. 23, 2023, 8:05 a.m. UTC | #10
On 23.10.2023 09:52, Roger Pau Monné wrote:
> On Fri, Oct 20, 2023 at 04:16:16PM +0100, Andrew Cooper wrote:
>> On 20/10/2023 3:50 pm, Durrant, Paul wrote:
>>> On 20/10/2023 14:37, David Woodhouse wrote:
>>> [snip]
>>>>>
>>>>> [0]
>>>>> https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258
>>>>
>>>> I'm not convinced I believe what the comment says there about evtchn 0
>>>> being theoretically valid. I don't believe zero is a valid evtchn#, is
>>>> it?
>>>
>>> gfn 0 might be valid, but I'm also pretty sure evtchn 0 is not valid.
>>
>> GFN 0 very much is valid.
>>
>> evtchn 0 OTOH is explicitly not valid.  From evtchn_init():
>>
>>     evtchn_from_port(d, 0)->state = ECS_RESERVED;
>>
>>
>> However, the fields being 0 doesn't mean not available.  That's the
>> signal to saying "not connected yet", because that's what dom0 gets
>> before xenconsoled starts up.
> 
> Someone asked me the same a while back, and IIRC we don't state
> anywhere in the public headers that event channel 0 is reserved,
> however that has always? been part of the implementation.
> 
> If we intend this to be reliable, we should add a define to the public
> headers in order to signal that 0 will always be reserved.

I agree a comment should have been there; it's not clear to me what
useful #define we could add.

Jan
Roger Pau Monné Oct. 23, 2023, 8:17 a.m. UTC | #11
On Mon, Oct 23, 2023 at 10:05:48AM +0200, Jan Beulich wrote:
> On 23.10.2023 09:52, Roger Pau Monné wrote:
> > On Fri, Oct 20, 2023 at 04:16:16PM +0100, Andrew Cooper wrote:
> >> On 20/10/2023 3:50 pm, Durrant, Paul wrote:
> >>> On 20/10/2023 14:37, David Woodhouse wrote:
> >>> [snip]
> >>>>>
> >>>>> [0]
> >>>>> https://elixir.bootlin.com/linux/latest/source/drivers/tty/hvc/hvc_xen.c#L258
> >>>>
> >>>> I'm not convinced I believe what the comment says there about evtchn 0
> >>>> being theoretically valid. I don't believe zero is a valid evtchn#, is
> >>>> it?
> >>>
> >>> gfn 0 might be valid, but I'm also pretty sure evtchn 0 is not valid.
> >>
> >> GFN 0 very much is valid.
> >>
> >> evtchn 0 OTOH is explicitly not valid.  From evtchn_init():
> >>
> >>     evtchn_from_port(d, 0)->state = ECS_RESERVED;
> >>
> >>
> >> However, the fields being 0 doesn't mean not available.  That's the
> >> signal to saying "not connected yet", because that's what dom0 gets
> >> before xenconsoled starts up.
> > 
> > Someone asked me the same a while back, and IIRC we don't state
> > anywhere in the public headers that event channel 0 is reserved,
> > however that has always? been part of the implementation.
> > 
> > If we intend this to be reliable, we should add a define to the public
> > headers in order to signal that 0 will always be reserved.
> 
> I agree a comment should have been there; it's not clear to me what
> useful #define we could add.

`EVTCHN_PORT_INVALID 0` or some such, but a comment would also be
fine, the point is to be part of the public interface.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index 28ddedd50d..0818f5578c 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -586,6 +586,8 @@  static void __init cf_check ns16550_endboot(struct serial_port *port)
 
     if ( uart->remapped_io_base )
         return;
+    if (!hardware_domain)
+        return;
     rv = ioports_deny_access(hardware_domain, uart->io_base, uart->io_base + 7);
     if ( rv != 0 )
         BUG();