Message ID | ZS-1wVlZjdAJyUz6@mail.soc.lip6.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Xen 4.18 pvshim console issue (with patch) | expand |
On Wed, Oct 18, 2023 at 12:38:57PM +0200, Manuel Bouyer wrote: > Hello, > With Xen 4.18, a PV domain running under pvshim doesn't get console input. > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > hardwired to 0), so console_input_domain() will never select that domain > as input. > > The attached patch fixes it by translating 0 to the real domain id for > pvshim, but there may be a better way to do this. Small improvement, print the real domain ID instead of DOM0
On 18/10/2023 11:38 am, Manuel Bouyer wrote: > Hello, > With Xen 4.18, a PV domain running under pvshim doesn't get console input. > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > hardwired to 0), so console_input_domain() will never select that domain > as input. > > The attached patch fixes it by translating 0 to the real domain id for > pvshim, but there may be a better way to do this. > Thankyou for the report. First, CC'ing Henry as 4.18 release manager. There have been changes in how this works recently in 4.18, notably c/s c2581c58bec96. However, it's not obvious whether this worked in 4.17 or not. i.e. whether it's a regression in 4.18, or whether it's been broken since PV Shim was introduced. ~Andrew
On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: > On 18/10/2023 11:38 am, Manuel Bouyer wrote: > > Hello, > > With Xen 4.18, a PV domain running under pvshim doesn't get console input. > > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > > hardwired to 0), so console_input_domain() will never select that domain > > as input. > > > > The attached patch fixes it by translating 0 to the real domain id for > > pvshim, but there may be a better way to do this. > > > > Thankyou for the report. > > First, CC'ing Henry as 4.18 release manager. > > There have been changes in how this works recently in 4.18, notably c/s > c2581c58bec96. Yes, it looks like this one introduced the problem. Before this, we would switch console_rx to 1 without checking if domain (console_rx - 1) exists, and console_rx == 1 is a special case in __serial_rx() > > However, it's not obvious whether this worked in 4.17 or not. i.e. > whether it's a regression in 4.18, or whether it's been broken since PV > Shim was introduced. I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15
On 18/10/2023 12:20 pm, Manuel Bouyer wrote: > On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: >> On 18/10/2023 11:38 am, Manuel Bouyer wrote: >>> Hello, >>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. >>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is >>> hardwired to 0), so console_input_domain() will never select that domain >>> as input. >>> >>> The attached patch fixes it by translating 0 to the real domain id for >>> pvshim, but there may be a better way to do this. >>> >> Thankyou for the report. >> >> First, CC'ing Henry as 4.18 release manager. >> >> There have been changes in how this works recently in 4.18, notably c/s >> c2581c58bec96. > Yes, it looks like this one introduced the problem. > Before this, we would switch console_rx to 1 without checking if > domain (console_rx - 1) exists, and console_rx == 1 is a special case > in __serial_rx() > >> However, it's not obvious whether this worked in 4.17 or not. i.e. >> whether it's a regression in 4.18, or whether it's been broken since PV >> Shim was introduced. > I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15 > That commit is new in 4.18, so Henry - this is a release critical/blocker owing to it being a regression vs 4.17. I'll try and put some brainpower towards it when I get some other 4.18 work sorted. ~Andrew
Hi Andrew, > On Oct 18, 2023, at 19:23, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 18/10/2023 12:20 pm, Manuel Bouyer wrote: >> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: >>> On 18/10/2023 11:38 am, Manuel Bouyer wrote: >>>> Hello, >>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. >>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is >>>> hardwired to 0), so console_input_domain() will never select that domain >>>> as input. >>>> >>>> The attached patch fixes it by translating 0 to the real domain id for >>>> pvshim, but there may be a better way to do this. >>>> >>> Thankyou for the report. >>> >>> First, CC'ing Henry as 4.18 release manager. >>> >>> There have been changes in how this works recently in 4.18, notably c/s >>> c2581c58bec96. >> Yes, it looks like this one introduced the problem. >> Before this, we would switch console_rx to 1 without checking if >> domain (console_rx - 1) exists, and console_rx == 1 is a special case >> in __serial_rx() >> >>> However, it's not obvious whether this worked in 4.17 or not. i.e. >>> whether it's a regression in 4.18, or whether it's been broken since PV >>> Shim was introduced. >> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15 >> > > That commit is new in 4.18, so Henry - this is a release > critical/blocker owing to it being a regression vs 4.17. Noted down. Out of curiosity, do we have maintainers for that specific driver? It would be good to looping them in. Kind regards, Henry > > I'll try and put some brainpower towards it when I get some other 4.18 > work sorted. > > ~Andrew
On 18.10.2023 13:20, Manuel Bouyer wrote: > On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: >> On 18/10/2023 11:38 am, Manuel Bouyer wrote: >>> Hello, >>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. >>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is >>> hardwired to 0), so console_input_domain() will never select that domain >>> as input. >>> >>> The attached patch fixes it by translating 0 to the real domain id for >>> pvshim, but there may be a better way to do this. >>> >> >> Thankyou for the report. >> >> First, CC'ing Henry as 4.18 release manager. >> >> There have been changes in how this works recently in 4.18, notably c/s >> c2581c58bec96. > > Yes, it looks like this one introduced the problem. > Before this, we would switch console_rx to 1 without checking if > domain (console_rx - 1) exists, and console_rx == 1 is a special case > in __serial_rx() > >> >> However, it's not obvious whether this worked in 4.17 or not. i.e. >> whether it's a regression in 4.18, or whether it's been broken since PV >> Shim was introduced. > > I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15 From looking at the code, it doesn't look like it would: There switch_serial_input() toggles console_rx between 0 and 1 afaict, and console_input_domain() handles value 0 as "Xen" (like in 4.18), while otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying to get hold of Dom0's domain structure, not Dom1's). Jan
On 18.10.2023 12:38, Manuel Bouyer wrote: > Hello, > With Xen 4.18, a PV domain running under pvshim doesn't get console input. > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > hardwired to 0), so console_input_domain() will never select that domain > as input. > > The attached patch fixes it by translating 0 to the real domain id for > pvshim, but there may be a better way to do this. My primary observation with the patch is that it presumably won't build for other than x86. There are also indentation and other style issues, no S-o-b, and no description. But I wonder whether a different approach doesn't want taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no need for any other changes? Also Cc-ing Michal as the author of the (possibly) offending patch. Jan
On Wed, Oct 18, 2023 at 03:24:22PM +0200, Jan Beulich wrote: > On 18.10.2023 13:20, Manuel Bouyer wrote: > > On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: > >> On 18/10/2023 11:38 am, Manuel Bouyer wrote: > >>> Hello, > >>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. > >>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > >>> hardwired to 0), so console_input_domain() will never select that domain > >>> as input. > >>> > >>> The attached patch fixes it by translating 0 to the real domain id for > >>> pvshim, but there may be a better way to do this. > >>> > >> > >> Thankyou for the report. > >> > >> First, CC'ing Henry as 4.18 release manager. > >> > >> There have been changes in how this works recently in 4.18, notably c/s > >> c2581c58bec96. > > > > Yes, it looks like this one introduced the problem. > > Before this, we would switch console_rx to 1 without checking if > > domain (console_rx - 1) exists, and console_rx == 1 is a special case > > in __serial_rx() > > > >> > >> However, it's not obvious whether this worked in 4.17 or not. i.e. > >> whether it's a regression in 4.18, or whether it's been broken since PV > >> Shim was introduced. > > > > I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15 > > >From looking at the code, it doesn't look like it would: There > switch_serial_input() toggles console_rx between 0 and 1 afaict, and > console_input_domain() handles value 0 as "Xen" (like in 4.18), while > otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying > to get hold of Dom0's domain structure, not Dom1's). Well, I have a 4.15.5 installation in production and I can tell you that with PV+PVSHIM, the console is working, for sure. AFAIK console_input_domain() is called only on ARM, from vpl011_write_data_xen(). It's never called for x86.
On Wed, Oct 18, 2023 at 03:29:08PM +0200, Jan Beulich wrote: > On 18.10.2023 12:38, Manuel Bouyer wrote: > > Hello, > > With Xen 4.18, a PV domain running under pvshim doesn't get console input. > > This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > > hardwired to 0), so console_input_domain() will never select that domain > > as input. > > > > The attached patch fixes it by translating 0 to the real domain id for > > pvshim, but there may be a better way to do this. > > My primary observation with the patch is that it presumably won't build for > other than x86. It's possible, I don't have other platform to test > There are also indentation and other style issues, no S-o-b, > and no description. I'll let whoever commit it deal with this > But I wonder whether a different approach doesn't want > taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no > need for any other changes? max_init_domid=1 won't help, because we're using the real domid from the real hypervisor here. So it can be arbitrary large (in my tests I did a printk here, and the domid was matching the id from 'xl list')
On 18.10.2023 15:36, Manuel Bouyer wrote: > On Wed, Oct 18, 2023 at 03:24:22PM +0200, Jan Beulich wrote: >> On 18.10.2023 13:20, Manuel Bouyer wrote: >>> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: >>>> On 18/10/2023 11:38 am, Manuel Bouyer wrote: >>>>> Hello, >>>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. >>>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is >>>>> hardwired to 0), so console_input_domain() will never select that domain >>>>> as input. >>>>> >>>>> The attached patch fixes it by translating 0 to the real domain id for >>>>> pvshim, but there may be a better way to do this. >>>>> >>>> >>>> Thankyou for the report. >>>> >>>> First, CC'ing Henry as 4.18 release manager. >>>> >>>> There have been changes in how this works recently in 4.18, notably c/s >>>> c2581c58bec96. >>> >>> Yes, it looks like this one introduced the problem. >>> Before this, we would switch console_rx to 1 without checking if >>> domain (console_rx - 1) exists, and console_rx == 1 is a special case >>> in __serial_rx() >>> >>>> >>>> However, it's not obvious whether this worked in 4.17 or not. i.e. >>>> whether it's a regression in 4.18, or whether it's been broken since PV >>>> Shim was introduced. >>> >>> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15 >> >> >From looking at the code, it doesn't look like it would: There >> switch_serial_input() toggles console_rx between 0 and 1 afaict, and >> console_input_domain() handles value 0 as "Xen" (like in 4.18), while >> otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying >> to get hold of Dom0's domain structure, not Dom1's). > > Well, I have a 4.15.5 installation in production and I can tell you that > with PV+PVSHIM, the console is working, for sure. > > AFAIK console_input_domain() is called only on ARM, from > vpl011_write_data_xen(). It's never called for x86. Oh, indeed. I took your patch touching the function as meaning it's used on x86. This then means my earlier suggestion won't work, as we need console_rx to have the value 1 for input to be accepted from Dom1. Which in turn means your change to switch_serial_input(), suitably cleaned up, is then likely the best we can do, despite me not really liking the shim special casing. As to cleaning up - besides the build, indentation, and style issues I think you want to drop the "&& pv_shim" part of the condition (as get_initial_domain_id() takes care of that already), and ideally you'd also move the new "domid" variable into the more narrow scope. If you don't feel like providing a proper (updated) patch, I'm happy to take over, but to retain indication of your initial work I'd need you to permit me to add your S-o-b (alongside mine). Jan
On Wed, Oct 18, 2023 at 03:51:54PM +0200, Jan Beulich wrote: > On 18.10.2023 15:36, Manuel Bouyer wrote: > > On Wed, Oct 18, 2023 at 03:24:22PM +0200, Jan Beulich wrote: > >> On 18.10.2023 13:20, Manuel Bouyer wrote: > >>> On Wed, Oct 18, 2023 at 11:44:22AM +0100, Andrew Cooper wrote: > >>>> On 18/10/2023 11:38 am, Manuel Bouyer wrote: > >>>>> Hello, > >>>>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. > >>>>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > >>>>> hardwired to 0), so console_input_domain() will never select that domain > >>>>> as input. > >>>>> > >>>>> The attached patch fixes it by translating 0 to the real domain id for > >>>>> pvshim, but there may be a better way to do this. > >>>>> > >>>> > >>>> Thankyou for the report. > >>>> > >>>> First, CC'ing Henry as 4.18 release manager. > >>>> > >>>> There have been changes in how this works recently in 4.18, notably c/s > >>>> c2581c58bec96. > >>> > >>> Yes, it looks like this one introduced the problem. > >>> Before this, we would switch console_rx to 1 without checking if > >>> domain (console_rx - 1) exists, and console_rx == 1 is a special case > >>> in __serial_rx() > >>> > >>>> > >>>> However, it's not obvious whether this worked in 4.17 or not. i.e. > >>>> whether it's a regression in 4.18, or whether it's been broken since PV > >>>> Shim was introduced. > >>> > >>> I don't know for 4.16 or 4.17 but I can tell that it's working in 4.15 > >> > >> >From looking at the code, it doesn't look like it would: There > >> switch_serial_input() toggles console_rx between 0 and 1 afaict, and > >> console_input_domain() handles value 0 as "Xen" (like in 4.18), while > >> otherwise it calls rcu_lock_domain_by_id(console_rx - 1) (i.e. trying > >> to get hold of Dom0's domain structure, not Dom1's). > > > > Well, I have a 4.15.5 installation in production and I can tell you that > > with PV+PVSHIM, the console is working, for sure. > > > > AFAIK console_input_domain() is called only on ARM, from > > vpl011_write_data_xen(). It's never called for x86. > > Oh, indeed. I took your patch touching the function as meaning it's used > on x86. This then means my earlier suggestion won't work, as we need > console_rx to have the value 1 for input to be accepted from Dom1. Which > in turn means your change to switch_serial_input(), suitably cleaned up, > is then likely the best we can do, despite me not really liking the shim > special casing. > > As to cleaning up - besides the build, indentation, and style issues I > think you want to drop the "&& pv_shim" part of the condition (as > get_initial_domain_id() takes care of that already), and ideally you'd > also move the new "domid" variable into the more narrow scope. If you > don't feel like providing a proper (updated) patch, I'm happy to take > over, but to retain indication of your initial work I'd need you to > permit me to add your S-o-b (alongside mine). No problem, please do !
Hi, On 18/10/2023 15:29, Jan Beulich wrote: > > > On 18.10.2023 12:38, Manuel Bouyer wrote: >> Hello, >> With Xen 4.18, a PV domain running under pvshim doesn't get console input. >> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is >> hardwired to 0), so console_input_domain() will never select that domain >> as input. >> >> The attached patch fixes it by translating 0 to the real domain id for >> pvshim, but there may be a better way to do this. > > My primary observation with the patch is that it presumably won't build for > other than x86. There are also indentation and other style issues, no S-o-b, > and no description. But I wonder whether a different approach doesn't want > taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no > need for any other changes? > > Also Cc-ing Michal as the author of the (possibly) offending patch. What if we set max_init_domid in pvshim case to the value returned by get_initial_domain_id() in create_dom0()? The drawback is that we would waste some time in a loop if the value is large. ~Michal
On Wed, Oct 18, 2023 at 04:18:26PM +0200, Michal Orzel wrote: > Hi, > > On 18/10/2023 15:29, Jan Beulich wrote: > > > > > > On 18.10.2023 12:38, Manuel Bouyer wrote: > >> Hello, > >> With Xen 4.18, a PV domain running under pvshim doesn't get console input. > >> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is > >> hardwired to 0), so console_input_domain() will never select that domain > >> as input. > >> > >> The attached patch fixes it by translating 0 to the real domain id for > >> pvshim, but there may be a better way to do this. > > > > My primary observation with the patch is that it presumably won't build for > > other than x86. There are also indentation and other style issues, no S-o-b, > > and no description. But I wonder whether a different approach doesn't want > > taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no > > need for any other changes? > > > > Also Cc-ing Michal as the author of the (possibly) offending patch. > What if we set max_init_domid in pvshim case to the value returned by get_initial_domain_id() > in create_dom0()? The drawback is that we would waste some time in a loop if the value is large. I considered this too, but max_init_domid is a constant on x86; it looked like a more intrusive change.
On 18.10.2023 16:18, Michal Orzel wrote: > On 18/10/2023 15:29, Jan Beulich wrote: >> On 18.10.2023 12:38, Manuel Bouyer wrote: >>> Hello, >>> With Xen 4.18, a PV domain running under pvshim doesn't get console input. >>> This is because the domain id in pvshim isn't 0 (and on x86 max_init_domid is >>> hardwired to 0), so console_input_domain() will never select that domain >>> as input. >>> >>> The attached patch fixes it by translating 0 to the real domain id for >>> pvshim, but there may be a better way to do this. >> >> My primary observation with the patch is that it presumably won't build for >> other than x86. There are also indentation and other style issues, no S-o-b, >> and no description. But I wonder whether a different approach doesn't want >> taking: Wouldn't it help if max_init_domid was 1 in the shim case, with no >> need for any other changes? >> >> Also Cc-ing Michal as the author of the (possibly) offending patch. > What if we set max_init_domid in pvshim case to the value returned by get_initial_domain_id() > in create_dom0()? The drawback is that we would waste some time in a loop if the value is large. I'd like to avoid specifically that. Jan
--- drivers/char/console.c.orig 2023-10-18 12:24:57.221891748 +0200 +++ drivers/char/console.c 2023-10-18 12:30:26.072844802 +0200 @@ -478,14 +478,20 @@ /* Make sure to rcu_unlock_domain after use */ struct domain *console_input_domain(void) { + domid_t domid; if ( console_rx == 0 ) return NULL; - return rcu_lock_domain_by_id(console_rx - 1); + if (console_rx == 1 && pv_shim) + domid = get_initial_domain_id(); + else + domid = console_rx - 1; + return rcu_lock_domain_by_id(domid); } static void switch_serial_input(void) { unsigned int next_rx = console_rx; + domid_t domid; /* * Rotate among Xen, dom0 and boot-time created domUs while skipping @@ -502,7 +508,11 @@ break; } - d = rcu_lock_domain_by_id(next_rx - 1); + if (next_rx == 1 && pv_shim) + domid = get_initial_domain_id(); + else + domid = next_rx - 1; + d = rcu_lock_domain_by_id(domid); if ( d ) { rcu_unlock_domain(d);