Message ID | 20240408162620.1633551-1-ross.lagerwall@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/rtc: Avoid UIP flag being set for longer than expected | expand |
On 08.04.2024 18:26, Ross Lagerwall wrote: > In a test, OVMF reported an error initializing the RTC without > indicating the precise nature of the error. The only plausible > explanation I can find is as follows: > > As part of the initialization, OVMF reads register C and then reads > register A repatedly until the UIP flag is not set. If this takes longer > than 100 ms, OVMF fails and reports an error. This may happen with the > following sequence of events: > > At guest time=0s, rtc_init() calls check_update_timer() which schedules > update_timer for t=(1 - 244us). > > At t=1s, the update_timer function happens to have been called >= 244us > late. Any theory on why this timer runs so late? (It needs dealing with anyway, but I'm still curious.) > In the timer callback, it sets the UIP flag and schedules > update_timer2 for t=1s. > > Before update_timer2 runs, the guest reads register C which calls > check_update_timer(). check_update_timer() stops the scheduled > update_timer2 and since the guest time is now outside of the update > cycle, it schedules update_timer for t=(2 - 244us). > > The UIP flag will therefore be set for a whole second from t=1 to t=2 > while the guest repeatedly reads register A waiting for the UIP flag to > clear. Fix it by clearing the UIP flag when scheduling update_timer. I can accept this as a workaround (perhaps with a tweak, see below), but I wonder whether we couldn't do this in a less ad hoc way. If stop_timer() returned whether the timer was actually stopped, couldn't the clearing of UIP be made conditional upon that? > --- a/xen/arch/x86/hvm/rtc.c > +++ b/xen/arch/x86/hvm/rtc.c > @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s) > } > else > { > + s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP; > next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC; > expire_time = NOW() + next_update_time; > s->next_update_time = expire_time; Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling here? Hmm, yes, it is; the question is rather why the function calls check_update_timer() when all that'll do (due to UF now being set) is stop the other timer (in case it's also running) and clear ->use_timer. Jan
On Thu, Apr 18, 2024 at 2:24 PM Jan Beulich <jbeulich@suse.com> wrote: > > On 08.04.2024 18:26, Ross Lagerwall wrote: > > In a test, OVMF reported an error initializing the RTC without > > indicating the precise nature of the error. The only plausible > > explanation I can find is as follows: > > > > As part of the initialization, OVMF reads register C and then reads > > register A repatedly until the UIP flag is not set. If this takes longer > > than 100 ms, OVMF fails and reports an error. This may happen with the > > following sequence of events: > > > > At guest time=0s, rtc_init() calls check_update_timer() which schedules > > update_timer for t=(1 - 244us). > > > > At t=1s, the update_timer function happens to have been called >= 244us > > late. > > Any theory on why this timer runs so late? (It needs dealing with anyway, > but I'm still curious.) I don't know specifically why our testcase spotted it. I reproduced the issue by repeatedly running "xl debug-key h" so that a lot of time was spent writing to the serial console. That caused timer interrupts to run late but I suppose there could be a lot of reasons why timer interrupts occasionally run late (e.g. a live patch critical region). > > > In the timer callback, it sets the UIP flag and schedules > > update_timer2 for t=1s. > > > > Before update_timer2 runs, the guest reads register C which calls > > check_update_timer(). check_update_timer() stops the scheduled > > update_timer2 and since the guest time is now outside of the update > > cycle, it schedules update_timer for t=(2 - 244us). > > > > The UIP flag will therefore be set for a whole second from t=1 to t=2 > > while the guest repeatedly reads register A waiting for the UIP flag to > > clear. Fix it by clearing the UIP flag when scheduling update_timer. > > I can accept this as a workaround (perhaps with a tweak, see below), but > I wonder whether we couldn't do this in a less ad hoc way. If stop_timer() > returned whether the timer was actually stopped, couldn't the clearing of > UIP be made conditional upon that? I think that would work though I'd need to spend some time convincing myself that it doesn't introduce any other race conditions. I'm not convinced it is really any better than just unconditionally clearing the flag though. > > > --- a/xen/arch/x86/hvm/rtc.c > > +++ b/xen/arch/x86/hvm/rtc.c > > @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s) > > } > > else > > { > > + s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP; > > next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC; > > expire_time = NOW() + next_update_time; > > s->next_update_time = expire_time; > > Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling > here? Hmm, yes, it is; the question is rather why the function calls > check_update_timer() when all that'll do (due to UF now being set) is stop > the other timer (in case it's also running) and clear ->use_timer. > Are you suggesting something like this? diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index 4bb1c7505534..eb4901bf129a 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -240,7 +240,8 @@ static void cf_check rtc_update_timer2(void *opaque) s->hw.cmos_data[RTC_REG_C] |= RTC_UF; s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP; rtc_update_irq(s); - check_update_timer(s); + stop_timer(&s->update_timer); + s->use_timer = 0; } spin_unlock(&s->lock); } That may indeed be an improvement but I'm not sure it is really related to this patch? Thanks, Ross
On 22.04.2024 18:38, Ross Lagerwall wrote: > On Thu, Apr 18, 2024 at 2:24 PM Jan Beulich <jbeulich@suse.com> wrote: >> On 08.04.2024 18:26, Ross Lagerwall wrote: >>> --- a/xen/arch/x86/hvm/rtc.c >>> +++ b/xen/arch/x86/hvm/rtc.c >>> @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s) >>> } >>> else >>> { >>> + s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP; >>> next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC; >>> expire_time = NOW() + next_update_time; >>> s->next_update_time = expire_time; >> >> Is rtc_update_timer2() clearing UIP then still necessary, ahead of calling >> here? Hmm, yes, it is; the question is rather why the function calls >> check_update_timer() when all that'll do (due to UF now being set) is stop >> the other timer (in case it's also running) and clear ->use_timer. > > Are you suggesting something like this? > > diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c > index 4bb1c7505534..eb4901bf129a 100644 > --- a/xen/arch/x86/hvm/rtc.c > +++ b/xen/arch/x86/hvm/rtc.c > @@ -240,7 +240,8 @@ static void cf_check rtc_update_timer2(void *opaque) > s->hw.cmos_data[RTC_REG_C] |= RTC_UF; > s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP; > rtc_update_irq(s); > - check_update_timer(s); > + stop_timer(&s->update_timer); > + s->use_timer = 0; > } > spin_unlock(&s->lock); > } > > That may indeed be an improvement but I'm not sure it is really related > to this patch? Well, the connection is the initial question of this part of my earlier reply. That's not to say the further change needs (or even wants) doing right here. However, upon looking again I get the impression that I was mixing up timer and timer2. The code path you alter is one where timer is set, not timer2. Overall: Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c index 4b31382619f4..4bb1c7505534 100644 --- a/xen/arch/x86/hvm/rtc.c +++ b/xen/arch/x86/hvm/rtc.c @@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s) } else { + s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP; next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC; expire_time = NOW() + next_update_time; s->next_update_time = expire_time;
In a test, OVMF reported an error initializing the RTC without indicating the precise nature of the error. The only plausible explanation I can find is as follows: As part of the initialization, OVMF reads register C and then reads register A repatedly until the UIP flag is not set. If this takes longer than 100 ms, OVMF fails and reports an error. This may happen with the following sequence of events: At guest time=0s, rtc_init() calls check_update_timer() which schedules update_timer for t=(1 - 244us). At t=1s, the update_timer function happens to have been called >= 244us late. In the timer callback, it sets the UIP flag and schedules update_timer2 for t=1s. Before update_timer2 runs, the guest reads register C which calls check_update_timer(). check_update_timer() stops the scheduled update_timer2 and since the guest time is now outside of the update cycle, it schedules update_timer for t=(2 - 244us). The UIP flag will therefore be set for a whole second from t=1 to t=2 while the guest repeatedly reads register A waiting for the UIP flag to clear. Fix it by clearing the UIP flag when scheduling update_timer. I was able to reproduce this issue with a synthetic test and this resolves the issue. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> --- xen/arch/x86/hvm/rtc.c | 1 + 1 file changed, 1 insertion(+)