diff mbox series

[for-4.14,v2] x86/rtc: provide mediated access to RTC for PVH dom0

Message ID 20200605110240.52545-1-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series [for-4.14,v2] x86/rtc: provide mediated access to RTC for PVH dom0 | expand

Commit Message

Roger Pau Monné June 5, 2020, 11:02 a.m. UTC
Mediated access to the RTC was provided for PVHv1 dom0 using the PV
code paths (guest_io_{write/read}), but those accesses where never
implemented for PVHv2 dom0. This patch provides such mediated accesses
to the RTC for PVH dom0, just like it's provided for a classic PV
dom0.

Pull out some of the RTC logic from guest_io_{read/write} into
specific helpers that can be used by both PV and HVM guests. The
setup of the handlers for PVH is done in rtc_init, which is already
used to initialize the fully emulated RTC.

Without this a Linux PVH dom0 will read garbage when trying to access
the RTC, and one vCPU will be constantly looping in
rtc_timer_do_work.

Note that such issue doesn't happen on domUs because the ACPI
NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
accesses are not emulated but rather forwarded to the physical
hardware.

No functional change expected for classic PV dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
for-4.14 reasoning: the fix is mostly isolated to PVH dom0, and as
such the risk is very low of causing issues to other guests types, but
without this fix one vCPU when using a Linux dom0 will be constantly
looping over rtc_timer_do_work with 100% CPU usage, at least when
using Linux 4.19 or newer.
---
Changes since v1:
 - Share the code with PV.
 - Add access checks to the IO ports.
---
 xen/arch/x86/hvm/rtc.c            | 34 ++++++++++++++++++
 xen/arch/x86/pv/emul-priv-op.c    | 28 +++------------
 xen/arch/x86/time.c               | 59 +++++++++++++++++++++++++++++++
 xen/include/asm-x86/mc146818rtc.h |  3 ++
 4 files changed, 100 insertions(+), 24 deletions(-)

Comments

Wei Liu June 5, 2020, 11:43 a.m. UTC | #1
On Fri, Jun 05, 2020 at 01:02:39PM +0200, Roger Pau Monne wrote:
> Mediated access to the RTC was provided for PVHv1 dom0 using the PV
> code paths (guest_io_{write/read}), but those accesses where never
> implemented for PVHv2 dom0. This patch provides such mediated accesses
> to the RTC for PVH dom0, just like it's provided for a classic PV
> dom0.
> 
> Pull out some of the RTC logic from guest_io_{read/write} into
> specific helpers that can be used by both PV and HVM guests. The
> setup of the handlers for PVH is done in rtc_init, which is already
> used to initialize the fully emulated RTC.
> 
> Without this a Linux PVH dom0 will read garbage when trying to access
> the RTC, and one vCPU will be constantly looping in
> rtc_timer_do_work.
> 
> Note that such issue doesn't happen on domUs because the ACPI
> NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
> the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
> accesses are not emulated but rather forwarded to the physical
> hardware.
> 
> No functional change expected for classic PV dom0.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Wei Liu <wl@xen.org>
Paul Durrant June 5, 2020, 11:57 a.m. UTC | #2
> -----Original Message-----
> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: 05 June 2020 12:03
> To: xen-devel@lists.xenproject.org
> Cc: paul@xen.org; Roger Pau Monne <roger.pau@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>
> Subject: [PATCH for-4.14 v2] x86/rtc: provide mediated access to RTC for PVH dom0
> 
> Mediated access to the RTC was provided for PVHv1 dom0 using the PV
> code paths (guest_io_{write/read}), but those accesses where never
> implemented for PVHv2 dom0. This patch provides such mediated accesses
> to the RTC for PVH dom0, just like it's provided for a classic PV
> dom0.
> 
> Pull out some of the RTC logic from guest_io_{read/write} into
> specific helpers that can be used by both PV and HVM guests. The
> setup of the handlers for PVH is done in rtc_init, which is already
> used to initialize the fully emulated RTC.
> 
> Without this a Linux PVH dom0 will read garbage when trying to access
> the RTC, and one vCPU will be constantly looping in
> rtc_timer_do_work.
> 
> Note that such issue doesn't happen on domUs because the ACPI
> NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
> the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
> accesses are not emulated but rather forwarded to the physical
> hardware.
> 
> No functional change expected for classic PV dom0.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Release-acked-by: Paul Durrant <paul@xen.org>

> ---
> for-4.14 reasoning: the fix is mostly isolated to PVH dom0, and as
> such the risk is very low of causing issues to other guests types, but
> without this fix one vCPU when using a Linux dom0 will be constantly
> looping over rtc_timer_do_work with 100% CPU usage, at least when
> using Linux 4.19 or newer.
> ---
> Changes since v1:
>  - Share the code with PV.
>  - Add access checks to the IO ports.
> ---
>  xen/arch/x86/hvm/rtc.c            | 34 ++++++++++++++++++
>  xen/arch/x86/pv/emul-priv-op.c    | 28 +++------------
>  xen/arch/x86/time.c               | 59 +++++++++++++++++++++++++++++++
>  xen/include/asm-x86/mc146818rtc.h |  3 ++
>  4 files changed, 100 insertions(+), 24 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
> index 5bbbdc0e0f..9f304a0fb4 100644
> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -22,6 +22,7 @@
>   * IN THE SOFTWARE.
>   */
> 
> +#include <asm/iocap.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/hvm/vpt.h>
>  #include <asm/hvm/io.h>
> @@ -808,10 +809,43 @@ void rtc_reset(struct domain *d)
>      s->pt.source = PTSRC_isa;
>  }
> 
> +/* RTC mediator for HVM hardware domain. */
> +static int hw_rtc_io(int dir, unsigned int port, unsigned int size,
> +                     uint32_t *val)
> +{
> +    if ( dir == IOREQ_READ )
> +        *val = ~0;
> +
> +    if ( size != 1 )
> +    {
> +        gdprintk(XENLOG_WARNING, "bad RTC access size (%u)\n", size);
> +        return X86EMUL_OKAY;
> +    }
> +    if ( !ioports_access_permitted(current->domain, port, port) )
> +    {
> +        gdprintk(XENLOG_WARNING, "access to RTC port %x not allowed\n", port);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    if ( dir == IOREQ_WRITE )
> +        rtc_guest_write(port, *val);
> +    else
> +        *val = rtc_guest_read(port);
> +
> +    return X86EMUL_OKAY;
> +}
> +
>  void rtc_init(struct domain *d)
>  {
>      RTCState *s = domain_vrtc(d);
> 
> +    if ( is_hardware_domain(d) )
> +    {
> +        /* Hardware domain gets mediated access to the physical RTC. */
> +        register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
> +        return;
> +    }
> +
>      if ( !has_vrtc(d) )
>          return;
> 
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index fad6c754ad..1597f27ed9 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -280,19 +280,10 @@ static uint32_t guest_io_read(unsigned int port, unsigned int bytes,
>          {
>              sub_data = pv_pit_handler(port, 0, 0);
>          }
> -        else if ( port == RTC_PORT(0) )
> -        {
> -            sub_data = currd->arch.cmos_idx;
> -        }
> -        else if ( (port == RTC_PORT(1)) &&
> +        else if ( (port == RTC_PORT(0) || port == RTC_PORT(1)) &&
>                    ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
>          {
> -            unsigned long flags;
> -
> -            spin_lock_irqsave(&rtc_lock, flags);
> -            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> -            sub_data = inb(RTC_PORT(1));
> -            spin_unlock_irqrestore(&rtc_lock, flags);
> +            sub_data = rtc_guest_read(port);
>          }
>          else if ( (port == 0xcf8) && (bytes == 4) )
>          {
> @@ -413,21 +404,10 @@ static void guest_io_write(unsigned int port, unsigned int bytes,
>          {
>              pv_pit_handler(port, (uint8_t)data, 1);
>          }
> -        else if ( port == RTC_PORT(0) )
> -        {
> -            currd->arch.cmos_idx = data;
> -        }
> -        else if ( (port == RTC_PORT(1)) &&
> +        else if ( (port == RTC_PORT(0) || port == RTC_PORT(1)) &&
>                    ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
>          {
> -            unsigned long flags;
> -
> -            if ( pv_rtc_handler )
> -                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
> -            spin_lock_irqsave(&rtc_lock, flags);
> -            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> -            outb(data, RTC_PORT(1));
> -            spin_unlock_irqrestore(&rtc_lock, flags);
> +            rtc_guest_write(port, data);
>          }
>          else if ( (port == 0xcf8) && (bytes == 4) )
>          {
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index bbaea3aa65..e1c81067ce 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -27,6 +27,7 @@
>  #include <xen/keyhandler.h>
>  #include <xen/guest_access.h>
>  #include <asm/io.h>
> +#include <asm/iocap.h>
>  #include <asm/msr.h>
>  #include <asm/mpspec.h>
>  #include <asm/processor.h>
> @@ -1110,6 +1111,64 @@ static unsigned long get_cmos_time(void)
>      return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
>  }
> 
> +/* Helpers for guest accesses to the physical RTC. */
> +unsigned int rtc_guest_read(unsigned int port)
> +{
> +    const struct domain *currd = current->domain;
> +    unsigned long flags;
> +    unsigned int data = ~0;
> +
> +    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
> +    if ( !ioports_access_permitted(currd, port, port) )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return data;
> +    }
> +
> +    switch ( port )
> +    {
> +    case RTC_PORT(0):
> +        data = currd->arch.cmos_idx;
> +        break;
> +
> +    case RTC_PORT(1):
> +        spin_lock_irqsave(&rtc_lock, flags);
> +        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> +        data = inb(RTC_PORT(1));
> +        spin_unlock_irqrestore(&rtc_lock, flags);
> +        break;
> +    }
> +
> +    return data;
> +}
> +
> +void rtc_guest_write(unsigned int port, unsigned int data)
> +{
> +    struct domain *currd = current->domain;
> +    unsigned long flags;
> +
> +    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
> +    if ( !ioports_access_permitted(currd, port, port) )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    switch ( port )
> +    {
> +    case RTC_PORT(0):
> +        currd->arch.cmos_idx = data;
> +        break;
> +
> +    case RTC_PORT(1):
> +        spin_lock_irqsave(&rtc_lock, flags);
> +        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> +        outb(data, RTC_PORT(1));
> +        spin_unlock_irqrestore(&rtc_lock, flags);
> +        break;
> +    }
> +}
> +
>  static unsigned long get_wallclock_time(void)
>  {
>  #ifdef CONFIG_XEN_GUEST
> diff --git a/xen/include/asm-x86/mc146818rtc.h b/xen/include/asm-x86/mc146818rtc.h
> index 8758528f7c..803b236c0a 100644
> --- a/xen/include/asm-x86/mc146818rtc.h
> +++ b/xen/include/asm-x86/mc146818rtc.h
> @@ -110,4 +110,7 @@ outb_p((val),RTC_PORT(1)); \
> 
>  #define RTC_IRQ 8
> 
> +unsigned int rtc_guest_read(unsigned int port);
> +void rtc_guest_write(unsigned int port, unsigned int data);
> +
>  #endif /* _ASM_MC146818RTC_H */
> --
> 2.26.2
Jan Beulich June 5, 2020, 2:44 p.m. UTC | #3
On 05.06.2020 13:02, Roger Pau Monne wrote:
> Mediated access to the RTC was provided for PVHv1 dom0 using the PV
> code paths (guest_io_{write/read}), but those accesses where never
> implemented for PVHv2 dom0. This patch provides such mediated accesses
> to the RTC for PVH dom0, just like it's provided for a classic PV
> dom0.
> 
> Pull out some of the RTC logic from guest_io_{read/write} into
> specific helpers that can be used by both PV and HVM guests. The
> setup of the handlers for PVH is done in rtc_init, which is already
> used to initialize the fully emulated RTC.
> 
> Without this a Linux PVH dom0 will read garbage when trying to access
> the RTC, and one vCPU will be constantly looping in
> rtc_timer_do_work.
> 
> Note that such issue doesn't happen on domUs because the ACPI
> NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
> the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
> accesses are not emulated but rather forwarded to the physical
> hardware.
> 
> No functional change expected for classic PV dom0.

But there is, in whether (virtual) port 0x71 can be read/written (even
by a DomU). I'm afraid of being called guilty in splitting hair, though.

> @@ -808,10 +809,43 @@ void rtc_reset(struct domain *d)
>      s->pt.source = PTSRC_isa;
>  }
>  
> +/* RTC mediator for HVM hardware domain. */
> +static int hw_rtc_io(int dir, unsigned int port, unsigned int size,
> +                     uint32_t *val)
> +{
> +    if ( dir == IOREQ_READ )
> +        *val = ~0;
> +
> +    if ( size != 1 )
> +    {
> +        gdprintk(XENLOG_WARNING, "bad RTC access size (%u)\n", size);
> +        return X86EMUL_OKAY;
> +    }
> +    if ( !ioports_access_permitted(current->domain, port, port) )

This wants to move into the helper, such that the PV side can have
it moved too.

>  void rtc_init(struct domain *d)
>  {
>      RTCState *s = domain_vrtc(d);
>  
> +    if ( is_hardware_domain(d) )
> +    {
> +        /* Hardware domain gets mediated access to the physical RTC. */
> +        register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
> +        return;

Any reason for this explicit return, rather than ...

> +    }
> +
>      if ( !has_vrtc(d) )
>          return;

... making use of this one? In fact wouldn't it be more correct
to have

    if ( !has_vrtc(d) )
    {
        /* Hardware domain gets mediated access to the physical RTC. */
        if ( is_hardware_domain(d) )
            register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
        return;
    }

such that eventual (perhaps optional) enabling of vRTC for hwdom
would have it properly work without changing this function again?

> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -280,19 +280,10 @@ static uint32_t guest_io_read(unsigned int port, unsigned int bytes,
>          {
>              sub_data = pv_pit_handler(port, 0, 0);
>          }
> -        else if ( port == RTC_PORT(0) )
> -        {
> -            sub_data = currd->arch.cmos_idx;

Note how there was no permission check here. Having one or more
I/O ports that can be used to simply latch a value can, as I've
learned, be quite valuable as a debugging vehicle, and there
aren't many (if any) ports beyond this one that a PV DomU might
use for such a purpose. Arguably the value is somewhat limited
here, as the value wouldn't survive a crash, but I'd still
prefer if we could retain prior functionality.

> @@ -1110,6 +1111,64 @@ static unsigned long get_cmos_time(void)
>      return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
>  }
>  
> +/* Helpers for guest accesses to the physical RTC. */
> +unsigned int rtc_guest_read(unsigned int port)
> +{
> +    const struct domain *currd = current->domain;
> +    unsigned long flags;
> +    unsigned int data = ~0;
> +
> +    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));

Instead of this, how about ...

> +    if ( !ioports_access_permitted(currd, port, port) )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return data;
> +    }
> +
> +    switch ( port )
> +    {
> +    case RTC_PORT(0):
> +        data = currd->arch.cmos_idx;
> +        break;
> +
> +    case RTC_PORT(1):
> +        spin_lock_irqsave(&rtc_lock, flags);
> +        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> +        data = inb(RTC_PORT(1));
> +        spin_unlock_irqrestore(&rtc_lock, flags);
> +        break;

    default:
        ASSERT_UNREACHABLE();
        break;

?

Jan
Roger Pau Monné June 5, 2020, 3:17 p.m. UTC | #4
On Fri, Jun 05, 2020 at 04:44:32PM +0200, Jan Beulich wrote:
> On 05.06.2020 13:02, Roger Pau Monne wrote:
> > Mediated access to the RTC was provided for PVHv1 dom0 using the PV
> > code paths (guest_io_{write/read}), but those accesses where never
> > implemented for PVHv2 dom0. This patch provides such mediated accesses
> > to the RTC for PVH dom0, just like it's provided for a classic PV
> > dom0.
> > 
> > Pull out some of the RTC logic from guest_io_{read/write} into
> > specific helpers that can be used by both PV and HVM guests. The
> > setup of the handlers for PVH is done in rtc_init, which is already
> > used to initialize the fully emulated RTC.
> > 
> > Without this a Linux PVH dom0 will read garbage when trying to access
> > the RTC, and one vCPU will be constantly looping in
> > rtc_timer_do_work.
> > 
> > Note that such issue doesn't happen on domUs because the ACPI
> > NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
> > the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
> > accesses are not emulated but rather forwarded to the physical
> > hardware.
> > 
> > No functional change expected for classic PV dom0.
> 
> But there is, in whether (virtual) port 0x71 can be read/written (even
> by a DomU). I'm afraid of being called guilty in splitting hair, though.

Urg, OK, I realized that but considered it a harmless mistake.

> > @@ -808,10 +809,43 @@ void rtc_reset(struct domain *d)
> >      s->pt.source = PTSRC_isa;
> >  }
> >  
> > +/* RTC mediator for HVM hardware domain. */
> > +static int hw_rtc_io(int dir, unsigned int port, unsigned int size,
> > +                     uint32_t *val)
> > +{
> > +    if ( dir == IOREQ_READ )
> > +        *val = ~0;
> > +
> > +    if ( size != 1 )
> > +    {
> > +        gdprintk(XENLOG_WARNING, "bad RTC access size (%u)\n", size);
> > +        return X86EMUL_OKAY;
> > +    }
> > +    if ( !ioports_access_permitted(current->domain, port, port) )
> 
> This wants to move into the helper, such that the PV side can have
> it moved too.
> 
> >  void rtc_init(struct domain *d)
> >  {
> >      RTCState *s = domain_vrtc(d);
> >  
> > +    if ( is_hardware_domain(d) )
> > +    {
> > +        /* Hardware domain gets mediated access to the physical RTC. */
> > +        register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
> > +        return;
> 
> Any reason for this explicit return, rather than ...
> 
> > +    }
> > +
> >      if ( !has_vrtc(d) )
> >          return;
> 
> ... making use of this one? In fact wouldn't it be more correct
> to have
> 
>     if ( !has_vrtc(d) )
>     {
>         /* Hardware domain gets mediated access to the physical RTC. */
>         if ( is_hardware_domain(d) )
>             register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
>         return;
>     }
> 
> such that eventual (perhaps optional) enabling of vRTC for hwdom
> would have it properly work without changing this function again?

Right, that seems fine to me.

> > --- a/xen/arch/x86/pv/emul-priv-op.c
> > +++ b/xen/arch/x86/pv/emul-priv-op.c
> > @@ -280,19 +280,10 @@ static uint32_t guest_io_read(unsigned int port, unsigned int bytes,
> >          {
> >              sub_data = pv_pit_handler(port, 0, 0);
> >          }
> > -        else if ( port == RTC_PORT(0) )
> > -        {
> > -            sub_data = currd->arch.cmos_idx;
> 
> Note how there was no permission check here. Having one or more
> I/O ports that can be used to simply latch a value can, as I've
> learned, be quite valuable as a debugging vehicle, and there
> aren't many (if any) ports beyond this one that a PV DomU might
> use for such a purpose. Arguably the value is somewhat limited
> here, as the value wouldn't survive a crash, but I'd still
> prefer if we could retain prior functionality.

OK, as said above I considered this a harmless mistake, but seeing as
you find it valuable I will make sure to keep the behavior.

> > @@ -1110,6 +1111,64 @@ static unsigned long get_cmos_time(void)
> >      return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
> >  }
> >  
> > +/* Helpers for guest accesses to the physical RTC. */
> > +unsigned int rtc_guest_read(unsigned int port)
> > +{
> > +    const struct domain *currd = current->domain;
> > +    unsigned long flags;
> > +    unsigned int data = ~0;
> > +
> > +    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
> 
> Instead of this, how about ...
> 
> > +    if ( !ioports_access_permitted(currd, port, port) )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return data;
> > +    }
> > +
> > +    switch ( port )
> > +    {
> > +    case RTC_PORT(0):
> > +        data = currd->arch.cmos_idx;
> > +        break;
> > +
> > +    case RTC_PORT(1):
> > +        spin_lock_irqsave(&rtc_lock, flags);
> > +        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> > +        data = inb(RTC_PORT(1));
> > +        spin_unlock_irqrestore(&rtc_lock, flags);
> > +        break;
> 
>     default:
>         ASSERT_UNREACHABLE();
>         break;
> 
> ?

Sure.

Thanks, Roger.
Roman Shaposhnik June 5, 2020, 11:43 p.m. UTC | #5
On Fri, Jun 5, 2020 at 4:03 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>
> Mediated access to the RTC was provided for PVHv1 dom0 using the PV
> code paths (guest_io_{write/read}), but those accesses where never
> implemented for PVHv2 dom0. This patch provides such mediated accesses
> to the RTC for PVH dom0, just like it's provided for a classic PV
> dom0.
>
> Pull out some of the RTC logic from guest_io_{read/write} into
> specific helpers that can be used by both PV and HVM guests. The
> setup of the handlers for PVH is done in rtc_init, which is already
> used to initialize the fully emulated RTC.
>
> Without this a Linux PVH dom0 will read garbage when trying to access
> the RTC, and one vCPU will be constantly looping in
> rtc_timer_do_work.
>
> Note that such issue doesn't happen on domUs because the ACPI
> NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
> the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
> accesses are not emulated but rather forwarded to the physical
> hardware.
>
> No functional change expected for classic PV dom0.

For the dense guys like me: what is the user-visible feature that is now being
offered with this? Would really appreciate a pointer or two.

Thanks,
Roman.

>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> for-4.14 reasoning: the fix is mostly isolated to PVH dom0, and as
> such the risk is very low of causing issues to other guests types, but
> without this fix one vCPU when using a Linux dom0 will be constantly
> looping over rtc_timer_do_work with 100% CPU usage, at least when
> using Linux 4.19 or newer.
> ---
> Changes since v1:
>  - Share the code with PV.
>  - Add access checks to the IO ports.
> ---
>  xen/arch/x86/hvm/rtc.c            | 34 ++++++++++++++++++
>  xen/arch/x86/pv/emul-priv-op.c    | 28 +++------------
>  xen/arch/x86/time.c               | 59 +++++++++++++++++++++++++++++++
>  xen/include/asm-x86/mc146818rtc.h |  3 ++
>  4 files changed, 100 insertions(+), 24 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
> index 5bbbdc0e0f..9f304a0fb4 100644
> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -22,6 +22,7 @@
>   * IN THE SOFTWARE.
>   */
>
> +#include <asm/iocap.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/hvm/vpt.h>
>  #include <asm/hvm/io.h>
> @@ -808,10 +809,43 @@ void rtc_reset(struct domain *d)
>      s->pt.source = PTSRC_isa;
>  }
>
> +/* RTC mediator for HVM hardware domain. */
> +static int hw_rtc_io(int dir, unsigned int port, unsigned int size,
> +                     uint32_t *val)
> +{
> +    if ( dir == IOREQ_READ )
> +        *val = ~0;
> +
> +    if ( size != 1 )
> +    {
> +        gdprintk(XENLOG_WARNING, "bad RTC access size (%u)\n", size);
> +        return X86EMUL_OKAY;
> +    }
> +    if ( !ioports_access_permitted(current->domain, port, port) )
> +    {
> +        gdprintk(XENLOG_WARNING, "access to RTC port %x not allowed\n", port);
> +        return X86EMUL_OKAY;
> +    }
> +
> +    if ( dir == IOREQ_WRITE )
> +        rtc_guest_write(port, *val);
> +    else
> +        *val = rtc_guest_read(port);
> +
> +    return X86EMUL_OKAY;
> +}
> +
>  void rtc_init(struct domain *d)
>  {
>      RTCState *s = domain_vrtc(d);
>
> +    if ( is_hardware_domain(d) )
> +    {
> +        /* Hardware domain gets mediated access to the physical RTC. */
> +        register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
> +        return;
> +    }
> +
>      if ( !has_vrtc(d) )
>          return;
>
> diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
> index fad6c754ad..1597f27ed9 100644
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -280,19 +280,10 @@ static uint32_t guest_io_read(unsigned int port, unsigned int bytes,
>          {
>              sub_data = pv_pit_handler(port, 0, 0);
>          }
> -        else if ( port == RTC_PORT(0) )
> -        {
> -            sub_data = currd->arch.cmos_idx;
> -        }
> -        else if ( (port == RTC_PORT(1)) &&
> +        else if ( (port == RTC_PORT(0) || port == RTC_PORT(1)) &&
>                    ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
>          {
> -            unsigned long flags;
> -
> -            spin_lock_irqsave(&rtc_lock, flags);
> -            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> -            sub_data = inb(RTC_PORT(1));
> -            spin_unlock_irqrestore(&rtc_lock, flags);
> +            sub_data = rtc_guest_read(port);
>          }
>          else if ( (port == 0xcf8) && (bytes == 4) )
>          {
> @@ -413,21 +404,10 @@ static void guest_io_write(unsigned int port, unsigned int bytes,
>          {
>              pv_pit_handler(port, (uint8_t)data, 1);
>          }
> -        else if ( port == RTC_PORT(0) )
> -        {
> -            currd->arch.cmos_idx = data;
> -        }
> -        else if ( (port == RTC_PORT(1)) &&
> +        else if ( (port == RTC_PORT(0) || port == RTC_PORT(1)) &&
>                    ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
>          {
> -            unsigned long flags;
> -
> -            if ( pv_rtc_handler )
> -                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
> -            spin_lock_irqsave(&rtc_lock, flags);
> -            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> -            outb(data, RTC_PORT(1));
> -            spin_unlock_irqrestore(&rtc_lock, flags);
> +            rtc_guest_write(port, data);
>          }
>          else if ( (port == 0xcf8) && (bytes == 4) )
>          {
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index bbaea3aa65..e1c81067ce 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -27,6 +27,7 @@
>  #include <xen/keyhandler.h>
>  #include <xen/guest_access.h>
>  #include <asm/io.h>
> +#include <asm/iocap.h>
>  #include <asm/msr.h>
>  #include <asm/mpspec.h>
>  #include <asm/processor.h>
> @@ -1110,6 +1111,64 @@ static unsigned long get_cmos_time(void)
>      return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
>  }
>
> +/* Helpers for guest accesses to the physical RTC. */
> +unsigned int rtc_guest_read(unsigned int port)
> +{
> +    const struct domain *currd = current->domain;
> +    unsigned long flags;
> +    unsigned int data = ~0;
> +
> +    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
> +    if ( !ioports_access_permitted(currd, port, port) )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return data;
> +    }
> +
> +    switch ( port )
> +    {
> +    case RTC_PORT(0):
> +        data = currd->arch.cmos_idx;
> +        break;
> +
> +    case RTC_PORT(1):
> +        spin_lock_irqsave(&rtc_lock, flags);
> +        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> +        data = inb(RTC_PORT(1));
> +        spin_unlock_irqrestore(&rtc_lock, flags);
> +        break;
> +    }
> +
> +    return data;
> +}
> +
> +void rtc_guest_write(unsigned int port, unsigned int data)
> +{
> +    struct domain *currd = current->domain;
> +    unsigned long flags;
> +
> +    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
> +    if ( !ioports_access_permitted(currd, port, port) )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    switch ( port )
> +    {
> +    case RTC_PORT(0):
> +        currd->arch.cmos_idx = data;
> +        break;
> +
> +    case RTC_PORT(1):
> +        spin_lock_irqsave(&rtc_lock, flags);
> +        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
> +        outb(data, RTC_PORT(1));
> +        spin_unlock_irqrestore(&rtc_lock, flags);
> +        break;
> +    }
> +}
> +
>  static unsigned long get_wallclock_time(void)
>  {
>  #ifdef CONFIG_XEN_GUEST
> diff --git a/xen/include/asm-x86/mc146818rtc.h b/xen/include/asm-x86/mc146818rtc.h
> index 8758528f7c..803b236c0a 100644
> --- a/xen/include/asm-x86/mc146818rtc.h
> +++ b/xen/include/asm-x86/mc146818rtc.h
> @@ -110,4 +110,7 @@ outb_p((val),RTC_PORT(1)); \
>
>  #define RTC_IRQ 8
>
> +unsigned int rtc_guest_read(unsigned int port);
> +void rtc_guest_write(unsigned int port, unsigned int data);
> +
>  #endif /* _ASM_MC146818RTC_H */
> --
> 2.26.2
>
>
Jan Beulich June 8, 2020, 8:30 a.m. UTC | #6
On 06.06.2020 01:43, Roman Shaposhnik wrote:
> On Fri, Jun 5, 2020 at 4:03 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>>
>> Mediated access to the RTC was provided for PVHv1 dom0 using the PV
>> code paths (guest_io_{write/read}), but those accesses where never
>> implemented for PVHv2 dom0. This patch provides such mediated accesses
>> to the RTC for PVH dom0, just like it's provided for a classic PV
>> dom0.
>>
>> Pull out some of the RTC logic from guest_io_{read/write} into
>> specific helpers that can be used by both PV and HVM guests. The
>> setup of the handlers for PVH is done in rtc_init, which is already
>> used to initialize the fully emulated RTC.
>>
>> Without this a Linux PVH dom0 will read garbage when trying to access
>> the RTC, and one vCPU will be constantly looping in
>> rtc_timer_do_work.
>>
>> Note that such issue doesn't happen on domUs because the ACPI
>> NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
>> the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
>> accesses are not emulated but rather forwarded to the physical
>> hardware.
>>
>> No functional change expected for classic PV dom0.
> 
> For the dense guys like me: what is the user-visible feature that is now being
> offered with this? Would really appreciate a pointer or two.

PV Dom0 has always been permitted direct access to the hardware RTC.
This change makes PVH Dom0 follow suit.

Jan
Roger Pau Monné June 8, 2020, 8:51 a.m. UTC | #7
On Fri, Jun 05, 2020 at 04:43:12PM -0700, Roman Shaposhnik wrote:
> On Fri, Jun 5, 2020 at 4:03 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >
> > Mediated access to the RTC was provided for PVHv1 dom0 using the PV
> > code paths (guest_io_{write/read}), but those accesses where never
> > implemented for PVHv2 dom0. This patch provides such mediated accesses
> > to the RTC for PVH dom0, just like it's provided for a classic PV
> > dom0.
> >
> > Pull out some of the RTC logic from guest_io_{read/write} into
> > specific helpers that can be used by both PV and HVM guests. The
> > setup of the handlers for PVH is done in rtc_init, which is already
> > used to initialize the fully emulated RTC.
> >
> > Without this a Linux PVH dom0 will read garbage when trying to access
> > the RTC, and one vCPU will be constantly looping in
> > rtc_timer_do_work.
> >
> > Note that such issue doesn't happen on domUs because the ACPI
> > NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
> > the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
> > accesses are not emulated but rather forwarded to the physical
> > hardware.
> >
> > No functional change expected for classic PV dom0.
> 
> For the dense guys like me: what is the user-visible feature that is now being
> offered with this? Would really appreciate a pointer or two.

Without this dom0 is not able to change the date. Note that
XENPF_settime{32/64} doesn't write the changes to the RTC (at least on
x86), so dom0 needs to write such changes to the RTC so they can
survive a poweroff.

However dom0 cannot be given direct access to the registers, since the
RTC uses an indirect access interface using two IO registers, so Xen
needs to trap such accesses by dom0 in order to serialize them and
prevent conflicts with Xen accesses to the RTC.

Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 5bbbdc0e0f..9f304a0fb4 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -22,6 +22,7 @@ 
  * IN THE SOFTWARE.
  */
 
+#include <asm/iocap.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/io.h>
@@ -808,10 +809,43 @@  void rtc_reset(struct domain *d)
     s->pt.source = PTSRC_isa;
 }
 
+/* RTC mediator for HVM hardware domain. */
+static int hw_rtc_io(int dir, unsigned int port, unsigned int size,
+                     uint32_t *val)
+{
+    if ( dir == IOREQ_READ )
+        *val = ~0;
+
+    if ( size != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "bad RTC access size (%u)\n", size);
+        return X86EMUL_OKAY;
+    }
+    if ( !ioports_access_permitted(current->domain, port, port) )
+    {
+        gdprintk(XENLOG_WARNING, "access to RTC port %x not allowed\n", port);
+        return X86EMUL_OKAY;
+    }
+
+    if ( dir == IOREQ_WRITE )
+        rtc_guest_write(port, *val);
+    else
+        *val = rtc_guest_read(port);
+
+    return X86EMUL_OKAY;
+}
+
 void rtc_init(struct domain *d)
 {
     RTCState *s = domain_vrtc(d);
 
+    if ( is_hardware_domain(d) )
+    {
+        /* Hardware domain gets mediated access to the physical RTC. */
+        register_portio_handler(d, RTC_PORT(0), 2, hw_rtc_io);
+        return;
+    }
+
     if ( !has_vrtc(d) )
         return;
 
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index fad6c754ad..1597f27ed9 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -280,19 +280,10 @@  static uint32_t guest_io_read(unsigned int port, unsigned int bytes,
         {
             sub_data = pv_pit_handler(port, 0, 0);
         }
-        else if ( port == RTC_PORT(0) )
-        {
-            sub_data = currd->arch.cmos_idx;
-        }
-        else if ( (port == RTC_PORT(1)) &&
+        else if ( (port == RTC_PORT(0) || port == RTC_PORT(1)) &&
                   ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
         {
-            unsigned long flags;
-
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            sub_data = inb(RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
+            sub_data = rtc_guest_read(port);
         }
         else if ( (port == 0xcf8) && (bytes == 4) )
         {
@@ -413,21 +404,10 @@  static void guest_io_write(unsigned int port, unsigned int bytes,
         {
             pv_pit_handler(port, (uint8_t)data, 1);
         }
-        else if ( port == RTC_PORT(0) )
-        {
-            currd->arch.cmos_idx = data;
-        }
-        else if ( (port == RTC_PORT(1)) &&
+        else if ( (port == RTC_PORT(0) || port == RTC_PORT(1)) &&
                   ioports_access_permitted(currd, RTC_PORT(0), RTC_PORT(1)) )
         {
-            unsigned long flags;
-
-            if ( pv_rtc_handler )
-                pv_rtc_handler(currd->arch.cmos_idx & 0x7f, data);
-            spin_lock_irqsave(&rtc_lock, flags);
-            outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
-            outb(data, RTC_PORT(1));
-            spin_unlock_irqrestore(&rtc_lock, flags);
+            rtc_guest_write(port, data);
         }
         else if ( (port == 0xcf8) && (bytes == 4) )
         {
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index bbaea3aa65..e1c81067ce 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -27,6 +27,7 @@ 
 #include <xen/keyhandler.h>
 #include <xen/guest_access.h>
 #include <asm/io.h>
+#include <asm/iocap.h>
 #include <asm/msr.h>
 #include <asm/mpspec.h>
 #include <asm/processor.h>
@@ -1110,6 +1111,64 @@  static unsigned long get_cmos_time(void)
     return mktime(rtc.year, rtc.mon, rtc.day, rtc.hour, rtc.min, rtc.sec);
 }
 
+/* Helpers for guest accesses to the physical RTC. */
+unsigned int rtc_guest_read(unsigned int port)
+{
+    const struct domain *currd = current->domain;
+    unsigned long flags;
+    unsigned int data = ~0;
+
+    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
+    if ( !ioports_access_permitted(currd, port, port) )
+    {
+        ASSERT_UNREACHABLE();
+        return data;
+    }
+
+    switch ( port )
+    {
+    case RTC_PORT(0):
+        data = currd->arch.cmos_idx;
+        break;
+
+    case RTC_PORT(1):
+        spin_lock_irqsave(&rtc_lock, flags);
+        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+        data = inb(RTC_PORT(1));
+        spin_unlock_irqrestore(&rtc_lock, flags);
+        break;
+    }
+
+    return data;
+}
+
+void rtc_guest_write(unsigned int port, unsigned int data)
+{
+    struct domain *currd = current->domain;
+    unsigned long flags;
+
+    ASSERT(port == RTC_PORT(0) || port == RTC_PORT(1));
+    if ( !ioports_access_permitted(currd, port, port) )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    switch ( port )
+    {
+    case RTC_PORT(0):
+        currd->arch.cmos_idx = data;
+        break;
+
+    case RTC_PORT(1):
+        spin_lock_irqsave(&rtc_lock, flags);
+        outb(currd->arch.cmos_idx & 0x7f, RTC_PORT(0));
+        outb(data, RTC_PORT(1));
+        spin_unlock_irqrestore(&rtc_lock, flags);
+        break;
+    }
+}
+
 static unsigned long get_wallclock_time(void)
 {
 #ifdef CONFIG_XEN_GUEST
diff --git a/xen/include/asm-x86/mc146818rtc.h b/xen/include/asm-x86/mc146818rtc.h
index 8758528f7c..803b236c0a 100644
--- a/xen/include/asm-x86/mc146818rtc.h
+++ b/xen/include/asm-x86/mc146818rtc.h
@@ -110,4 +110,7 @@  outb_p((val),RTC_PORT(1)); \
 
 #define RTC_IRQ 8
 
+unsigned int rtc_guest_read(unsigned int port);
+void rtc_guest_write(unsigned int port, unsigned int data);
+
 #endif /* _ASM_MC146818RTC_H */