diff mbox series

x86/rtc: Avoid UIP flag being set for longer than expected

Message ID 20240408162620.1633551-1-ross.lagerwall@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86/rtc: Avoid UIP flag being set for longer than expected | expand

Commit Message

Ross Lagerwall April 8, 2024, 4:26 p.m. UTC
In a test, OVMF reported an error initializing the RTC without
indicating the precise nature of the error. The only plausible
explanation I can find is as follows:

As part of the initialization, OVMF reads register C and then reads
register A repatedly until the UIP flag is not set. If this takes longer
than 100 ms, OVMF fails and reports an error. This may happen with the
following sequence of events:

At guest time=0s, rtc_init() calls check_update_timer() which schedules
update_timer for t=(1 - 244us).

At t=1s, the update_timer function happens to have been called >= 244us
late. In the timer callback, it sets the UIP flag and schedules
update_timer2 for t=1s.

Before update_timer2 runs, the guest reads register C which calls
check_update_timer(). check_update_timer() stops the scheduled
update_timer2 and since the guest time is now outside of the update
cycle, it schedules update_timer for t=(2 - 244us).

The UIP flag will therefore be set for a whole second from t=1 to t=2
while the guest repeatedly reads register A waiting for the UIP flag to
clear. Fix it by clearing the UIP flag when scheduling update_timer.

I was able to reproduce this issue with a synthetic test and this
resolves the issue.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/hvm/rtc.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jan Beulich April 18, 2024, 1:24 p.m. UTC | #1
On 08.04.2024 18:26, Ross Lagerwall wrote:
> In a test, OVMF reported an error initializing the RTC without
> indicating the precise nature of the error. The only plausible
> explanation I can find is as follows:
> 
> As part of the initialization, OVMF reads register C and then reads
> register A repatedly until the UIP flag is not set. If this takes longer
> than 100 ms, OVMF fails and reports an error. This may happen with the
> following sequence of events:
> 
> At guest time=0s, rtc_init() calls check_update_timer() which schedules
> update_timer for t=(1 - 244us).
> 
> At t=1s, the update_timer function happens to have been called >= 244us
> late.

Any theory on why this timer runs so late? (It needs dealing with anyway,
but I'm still curious.)

> In the timer callback, it sets the UIP flag and schedules
> update_timer2 for t=1s.
> 
> Before update_timer2 runs, the guest reads register C which calls
> check_update_timer(). check_update_timer() stops the scheduled
> update_timer2 and since the guest time is now outside of the update
> cycle, it schedules update_timer for t=(2 - 244us).
> 
> The UIP flag will therefore be set for a whole second from t=1 to t=2
> while the guest repeatedly reads register A waiting for the UIP flag to
> clear. Fix it by clearing the UIP flag when scheduling update_timer.

I can accept this as a workaround (perhaps with a tweak, see below), but
I wonder whether we couldn't do this in a less ad hoc way. If stop_timer()
returned whether the timer was actually stopped, couldn't the clearing of
UIP be made conditional upon that?

> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
>          }
>          else
>          {
> +            s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
>              next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
>              expire_time = NOW() + next_update_time;
>              s->next_update_time = expire_time;

Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling
here? Hmm, yes, it is; the question is rather why the function calls
check_update_timer() when all that'll do (due to UF now being set) is stop
the other timer (in case it's also running) and clear ->use_timer.

Jan
Ross Lagerwall April 22, 2024, 4:38 p.m. UTC | #2
On Thu, Apr 18, 2024 at 2:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 08.04.2024 18:26, Ross Lagerwall wrote:
> > In a test, OVMF reported an error initializing the RTC without
> > indicating the precise nature of the error. The only plausible
> > explanation I can find is as follows:
> >
> > As part of the initialization, OVMF reads register C and then reads
> > register A repatedly until the UIP flag is not set. If this takes longer
> > than 100 ms, OVMF fails and reports an error. This may happen with the
> > following sequence of events:
> >
> > At guest time=0s, rtc_init() calls check_update_timer() which schedules
> > update_timer for t=(1 - 244us).
> >
> > At t=1s, the update_timer function happens to have been called >= 244us
> > late.
>
> Any theory on why this timer runs so late? (It needs dealing with anyway,
> but I'm still curious.)

I don't know specifically why our testcase spotted it. I reproduced the
issue by repeatedly running "xl debug-key h" so that a lot of time was
spent writing to the serial console. That caused timer interrupts to
run late but I suppose there could be a lot of reasons why timer
interrupts occasionally run late (e.g. a live patch critical region).

>
> > In the timer callback, it sets the UIP flag and schedules
> > update_timer2 for t=1s.
> >
> > Before update_timer2 runs, the guest reads register C which calls
> > check_update_timer(). check_update_timer() stops the scheduled
> > update_timer2 and since the guest time is now outside of the update
> > cycle, it schedules update_timer for t=(2 - 244us).
> >
> > The UIP flag will therefore be set for a whole second from t=1 to t=2
> > while the guest repeatedly reads register A waiting for the UIP flag to
> > clear. Fix it by clearing the UIP flag when scheduling update_timer.
>
> I can accept this as a workaround (perhaps with a tweak, see below), but
> I wonder whether we couldn't do this in a less ad hoc way. If stop_timer()
> returned whether the timer was actually stopped, couldn't the clearing of
> UIP be made conditional upon that?

I think that would work though I'd need to spend some time convincing
myself that it doesn't introduce any other race conditions. I'm not
convinced it is really any better than just unconditionally clearing
the flag though.

>
> > --- a/xen/arch/x86/hvm/rtc.c
> > +++ b/xen/arch/x86/hvm/rtc.c
> > @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
> >          }
> >          else
> >          {
> > +            s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
> >              next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
> >              expire_time = NOW() + next_update_time;
> >              s->next_update_time = expire_time;
>
> Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling
> here? Hmm, yes, it is; the question is rather why the function calls
> check_update_timer() when all that'll do (due to UF now being set) is stop
> the other timer (in case it's also running) and clear ->use_timer.
>

Are you suggesting something like this?

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 4bb1c7505534..eb4901bf129a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -240,7 +240,8 @@ static void cf_check rtc_update_timer2(void *opaque)
         s->hw.cmos_data[RTC_REG_C] |= RTC_UF;
         s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
         rtc_update_irq(s);
-        check_update_timer(s);
+        stop_timer(&s->update_timer);
+        s->use_timer = 0;
     }
     spin_unlock(&s->lock);
 }

That may indeed be an improvement but I'm not sure it is really related
to this patch?

Thanks,
Ross
Jan Beulich April 23, 2024, 8:04 a.m. UTC | #3
On 22.04.2024 18:38, Ross Lagerwall wrote:
> On Thu, Apr 18, 2024 at 2:24 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 08.04.2024 18:26, Ross Lagerwall wrote:
>>> --- a/xen/arch/x86/hvm/rtc.c
>>> +++ b/xen/arch/x86/hvm/rtc.c
>>> @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
>>>          }
>>>          else
>>>          {
>>> +            s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
>>>              next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
>>>              expire_time = NOW() + next_update_time;
>>>              s->next_update_time = expire_time;
>>
>> Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling
>> here? Hmm, yes, it is; the question is rather why the function calls
>> check_update_timer() when all that'll do (due to UF now being set) is stop
>> the other timer (in case it's also running) and clear ->use_timer.
> 
> Are you suggesting something like this?
> 
> diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
> index 4bb1c7505534..eb4901bf129a 100644
> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -240,7 +240,8 @@ static void cf_check rtc_update_timer2(void *opaque)
>          s->hw.cmos_data[RTC_REG_C] |= RTC_UF;
>          s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
>          rtc_update_irq(s);
> -        check_update_timer(s);
> +        stop_timer(&s->update_timer);
> +        s->use_timer = 0;
>      }
>      spin_unlock(&s->lock);
>  }
> 
> That may indeed be an improvement but I'm not sure it is really related
> to this patch?

Well, the connection is the initial question of this part of my earlier
reply. That's not to say the further change needs (or even wants) doing
right here. However, upon looking again I get the impression that I was
mixing up timer and timer2. The code path you alter is one where timer
is set, not timer2.

Overall:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 4b31382619f4..4bb1c7505534 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -202,6 +202,7 @@  static void check_update_timer(RTCState *s)
         }
         else
         {
+            s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
             next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
             expire_time = NOW() + next_update_time;
             s->next_update_time = expire_time;