diff mbox series

console: generalize the ability for domU access

Message ID 20230801160608.19219-1-dpsmith@apertussolutions.com (mailing list archive)
State New, archived
Headers show
Series console: generalize the ability for domU access | expand

Commit Message

Daniel P. Smith Aug. 1, 2023, 4:06 p.m. UTC
This patch reworks the console rotation logic to provide a general mechanism to
incorporate domU in to the rotation. It does so by walking the domain list
using the XSM console privlege check to determine if the domain is given access.

In reworking the rotation logic, the assumption that the hardware domain is the
first domain created is removed and is changed to explicitly locate the
hardware domain at boot. As part of this removal, the reliance on the
`max_init_domid` global is eliminated, allowing the removal of a compatibility
`#define` for the x86 arch.

Suggested-by: Stefano Stabellini <stefano.stabellini@amd.com>
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---

This patch was developed as part of enabling PVH domain construction for
hyperlaunch. There will be a corresponding Linux patch to enable the kernel to
make use of console_io when running as a DomU. For x86, this will only be
usable by leveraging FLASK to assign the privilege to a label that is applied
to domains desired to be able to use the Xen console. When hyperlaunch is
merged, it will become possible to assign the privilege without FLASK. This
should work for Arm environments with or without vp1011 backend.

 xen/arch/x86/include/asm/setup.h |   2 -
 xen/drivers/char/console.c       | 134 +++++++++++++++++++++++--------
 2 files changed, 99 insertions(+), 37 deletions(-)

Comments

Jan Beulich Aug. 2, 2023, 11:01 a.m. UTC | #1
On 01.08.2023 18:06, Daniel P. Smith wrote:
> This patch reworks the console rotation logic to provide a general mechanism to
> incorporate domU in to the rotation. It does so by walking the domain list
> using the XSM console privlege check to determine if the domain is given access.
> 
> In reworking the rotation logic, the assumption that the hardware domain is the
> first domain created is removed and is changed to explicitly locate the
> hardware domain at boot.

I guess I'm unable to identify any "at boot only" code. I'm also
puzzled by this indicating there is a need to do so, when the global
variable "hardware_domain" is available, and you actually use it.

> --- a/xen/drivers/char/console.c
> +++ b/xen/drivers/char/console.c
> @@ -473,45 +473,102 @@ static void cf_check dump_console_ring_key(unsigned char key)
>   */
>  static unsigned int __read_mostly console_rx = 0;
>  
> -#define max_console_rx (max_init_domid + 1)
> +#define CON_RX_DOMID (console_rx - 1)
>  
>  /* Make sure to rcu_unlock_domain after use */
>  struct domain *console_input_domain(void)
>  {
>      if ( console_rx == 0 )
>              return NULL;
> -    return rcu_lock_domain_by_id(console_rx - 1);
> +    return rcu_lock_domain_by_id(CON_RX_DOMID);
>  }
>  
>  static void switch_serial_input(void)
>  {
> -    unsigned int next_rx = console_rx;
> +    struct domain *next, *d = console_input_domain();

Looks like "next" cannot be pointer-to-const just because the XSM hooks
still aren't properly const-ified. Oh well.

> -    /*
> -     * Rotate among Xen, dom0 and boot-time created domUs while skipping
> -     * switching serial input to non existing domains.
> -     */

While it would need adjustment, I don't think the comment wants deleting
altogether.

> -    for ( ; ; )
> +    if ( d == NULL )

This covers both Xen receiving input and the domain receiving input having
gone away. Originally in the latter case the next sequential (in domid
numbering) domain would be switched to. In the new logic you start over
from the beginning of the domain list. Such a change in behavior (if
deemed acceptable at all, which I'm not convinced of) needs calling out in
the description.

>      {
> -        struct domain *d;
> +        if ( hardware_domain )
> +        {
> +            console_rx = hardware_domain->domain_id + 1;
> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);

Here and elsewhere - why %d when original code properly used %u? I also
think there are now quite a few too many of these all identical
printk()s.

> +            goto out; //print switch_code statement & newline

Leftover development comment? (There's at least one more.)

> +        }
> +        else

Please avoid "else" after an if() that ends in "return", "goto", or
alike.

> +        {
> +            for_each_domain(next)

What guarantees that the list won't change behind your back? You don't
hold domlist_read_lock here afaict. It might be that you're safe because
that lock is an RCU one and this function is only invoked at init time
or from some form of interrupt handler. But that's far from obvious and
will hence need both properly confirming and stating in a comment. (It
is actually this concern, iirc, which so far had us avoid iterating the
domain list here.)

> +            {
> +                if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
> +                {
> +                    console_rx = next->domain_id + 1;
> +                    printk("*** Serial input to DOM%d", CON_RX_DOMID);
> +                    goto out; //print switch_code statement & newline
> +                }
> +            }
>  
> -        if ( next_rx++ >= max_console_rx )
> +            console_rx = 0;
> +            printk("*** Serial input to Xen");
> +            goto out;
> +        }
> +    }
> +
> +    for ( next = rcu_dereference(d->next_in_list); next != NULL;
> +          next = rcu_dereference(next->next_in_list) )

This looks like an open-coded continuation of for_each_domain() - I'm
afraid I'm wary of introducing anything like this.

> +    {
> +        if ( hardware_domain && next == hardware_domain )
>          {
>              console_rx = 0;
>              printk("*** Serial input to Xen");
> -            break;
> +            goto out;

Since you use "goto" anyway, this wants introducing a 2nd label (maybe
"xen"?) ahead of the identical code you add further down (immediately
ahead of the "out" label), to avoid code duplication.

>          }
>  
> -        d = rcu_lock_domain_by_id(next_rx - 1);
> -        if ( d )
> +        if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>          {
> -            rcu_unlock_domain(d);
> -            console_rx = next_rx;
> -            printk("*** Serial input to DOM%u", next_rx - 1);
> -            break;
> +            console_rx = next->domain_id + 1;
> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
> +            goto out;
> +        }
> +    }
> +
> +    /*
> +     * Hit the end of the domain list and instead of assuming that the
> +     * hardware domain is the first in the list, get the first domain
> +     * in the domain list and then if it is not the hardware domain or
> +     * does not have console privilege, iterate the list until we find
> +     * the hardware domain or a domain with console privilege.
> +     */
> +    if ( next == NULL )
> +    {
> +        for_each_domain(next)
> +        {
> +            if ( hardware_domain && next == hardware_domain )
> +            {
> +                console_rx = 0;
> +                printk("*** Serial input to Xen");
> +                goto out;
> +            }
> +
> +            if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
> +            {
> +                console_rx = next->domain_id + 1;
> +                printk("*** Serial input to DOM%d", CON_RX_DOMID);
> +                goto out;
> +            }
>          }
>      }
>  
> +    /*
> +     * If we got here, could not find a domain with console io privilege.
> +     * Default to Xen.
> +     */

"Default to" is a little odd when there are no other options.

> +    console_rx = 0;
> +    printk("*** Serial input to Xen");
> +
> +out:

Labels indented by at least one blank please.

> @@ -520,12 +577,11 @@ static void switch_serial_input(void)
>  
>  static void __serial_rx(char c, struct cpu_user_regs *regs)
>  {
> -    switch ( console_rx )
> -    {
> -    case 0:
> +    if ( console_rx == 0 )

By using CON_RX_DOMID everywhere else you try to carefully avoid and
open-coded assumptions on the precise biasing used there. With this
it would seem to me that here "CON_RX_DOMID > DOMID_MASK" would be
more in line with that model then.

>          return handle_keypress(c, regs);
>  
> -    case 1:
> +    if ( hardware_domain->domain_id == CON_RX_DOMID )

No check of hardware_domain against NULL?

> +    {
>          /*
>           * Deliver input to the hardware domain buffer, unless it is
>           * already full.
> @@ -538,31 +594,37 @@ static void __serial_rx(char c, struct cpu_user_regs *regs)
>           * getting stuck.
>           */
>          send_global_virq(VIRQ_CONSOLE);
> -        break;
> -
> -#ifdef CONFIG_SBSA_VUART_CONSOLE
> -    default:
> +    }
> +    else
>      {
> -        struct domain *d = rcu_lock_domain_by_id(console_rx - 1);
> +        struct domain *d = rcu_lock_domain_by_any_id(CON_RX_DOMID);
>  
> +        if ( d == NULL )
> +            goto unlock_out;
> +
> +#ifdef CONFIG_SBSA_VUART_CONSOLE
>          /*
>           * If we have a properly initialized vpl011 console for the
>           * domain, without a full PV ring to Dom0 (in that case input
>           * comes from the PV ring), then send the character to it.
>           */
> -        if ( d != NULL &&
> -             !d->arch.vpl011.backend_in_domain &&
> +        if ( !d->arch.vpl011.backend_in_domain &&
>               d->arch.vpl011.backend.xen != NULL )
> +        {
>              vpl011_rx_char_xen(d, c);
> -        else
> -            printk("Cannot send chars to Dom%d: no UART available\n",
> -                   console_rx - 1);
> +            goto unlock_out;
> +        }
> +#endif
> +
> +        if ( (serial_rx_prod - serial_rx_cons) != SERIAL_RX_SIZE )
> +            serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c;

This is Dom0's buffer; I don't think sharing with DomU-s is correct.
You also cannot ...

> @@ -717,6 +779,8 @@ long do_console_io(
>          rc = -E2BIG;
>          if ( count > INT_MAX )
>              break;
> +        if ( CON_RX_DOMID != current->domain->domain_id )
> +            return 0;
>  
>          rc = 0;
>          while ( (serial_rx_cons != serial_rx_prod) && (rc < count) )

... assume that by the time this hypercall is invoked input focus
hasn't switched. I think there's no way around a per-domain input
buffer, which of course would need setting up only for console-io-
capable domains.

> @@ -1107,7 +1171,7 @@ void __init console_endboot(void)
>       * a useful 'how to switch' message.
>       */
>      if ( opt_conswitch[1] == 'x' )
> -        console_rx = max_console_rx;
> +        console_rx = 0;

I can't bring this change in line with the comment ahead of the if():
Won't this result in switch_serial_input() switching to Dom0?

Jan
Stefano Stabellini Aug. 2, 2023, 11:58 p.m. UTC | #2
On Wed, 2 Aug 2023, Jan Beulich wrote:
> > -    for ( ; ; )
> > +    if ( d == NULL )
> 
> This covers both Xen receiving input and the domain receiving input having
> gone away. Originally in the latter case the next sequential (in domid
> numbering) domain would be switched to. In the new logic you start over
> from the beginning of the domain list. Such a change in behavior (if
> deemed acceptable at all, which I'm not convinced of) needs calling out in
> the description.

I think it would be best to keep the current behavior as we already
have people using it unless we have strong reasons to change it.
Daniel P. Smith Aug. 3, 2023, 12:56 p.m. UTC | #3
On 8/2/23 07:01, Jan Beulich wrote:
> On 01.08.2023 18:06, Daniel P. Smith wrote:
>> This patch reworks the console rotation logic to provide a general mechanism to
>> incorporate domU in to the rotation. It does so by walking the domain list
>> using the XSM console privlege check to determine if the domain is given access.
>>
>> In reworking the rotation logic, the assumption that the hardware domain is the
>> first domain created is removed and is changed to explicitly locate the
>> hardware domain at boot.
> 
> I guess I'm unable to identify any "at boot only" code. I'm also
> puzzled by this indicating there is a need to do so, when the global
> variable "hardware_domain" is available, and you actually use it.

As to the "at boot only", there is a switch, which upon closer look I 
have broken and will now have to fix, that if the 2nd char of the 
'conswitch' boot param is 'x', then the console is not supposed to auto 
switch to "dom0".

I was trying to keep the console_rx tracking logic the same, but I have 
already diverted far from what was there, I might as well rewrite that 
as well. Doing so would allow the value 0 can actually reflect domain id 
0 and not have to continue the off-by-one game played mapping domid 
from/to console_rx.

>> --- a/xen/drivers/char/console.c
>> +++ b/xen/drivers/char/console.c
>> @@ -473,45 +473,102 @@ static void cf_check dump_console_ring_key(unsigned char key)
>>    */
>>   static unsigned int __read_mostly console_rx = 0;
>>   
>> -#define max_console_rx (max_init_domid + 1)
>> +#define CON_RX_DOMID (console_rx - 1)
>>   
>>   /* Make sure to rcu_unlock_domain after use */
>>   struct domain *console_input_domain(void)
>>   {
>>       if ( console_rx == 0 )
>>               return NULL;
>> -    return rcu_lock_domain_by_id(console_rx - 1);
>> +    return rcu_lock_domain_by_id(CON_RX_DOMID);
>>   }
>>   
>>   static void switch_serial_input(void)
>>   {
>> -    unsigned int next_rx = console_rx;
>> +    struct domain *next, *d = console_input_domain();
> 
> Looks like "next" cannot be pointer-to-const just because the XSM hooks
> still aren't properly const-ified. Oh well.

Something I can look at another point as XSM maintainer.

>> -    /*
>> -     * Rotate among Xen, dom0 and boot-time created domUs while skipping
>> -     * switching serial input to non existing domains.
>> -     */
> 
> While it would need adjustment, I don't think the comment wants deleting
> altogether.

Ack.

>> -    for ( ; ; )
>> +    if ( d == NULL )
> 
> This covers both Xen receiving input and the domain receiving input having
> gone away. Originally in the latter case the next sequential (in domid
> numbering) domain would be switched to. In the new logic you start over
> from the beginning of the domain list. Such a change in behavior (if
> deemed acceptable at all, which I'm not convinced of) needs calling out in
> the description.

No, the intent is to keep the existing behavior. Again, this is a 
fallout from trying to track by domid with the assumptions that 1. only 
boot time constructed domains would be given access, 2. those domains 
where created with sequential domids and 3.) those domains never went 
away (or if they did, it would result in a dead spot in the rotation). 
As stated above, just dropping the pretense of tracking by domid will 
provide a simpler solution and maintain the existing behavior, minus the 
possibility of having dead spots in the rotation.

>>       {
>> -        struct domain *d;
>> +        if ( hardware_domain )
>> +        {
>> +            console_rx = hardware_domain->domain_id + 1;
>> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
> 
> Here and elsewhere - why %d when original code properly used %u? I also
> think there are now quite a few too many of these all identical
> printk()s.

Good question, I did not write the line, it was copy/paste from 
elsewhere in the file and then continued to replicate from there.

>> +            goto out; //print switch_code statement & newline
> 
> Leftover development comment? (There's at least one more.)

Yes and no, the comment came from elsewhere in the file and early in 
development it I place it here to later decide if it should stay (and 
get converted into a compliant comment). I will drop it in the next 
iteration.

>> +        }
>> +        else
> 
> Please avoid "else" after an if() that ends in "return", "goto", or
> alike.

Really? How would you propose handling common finalization when
completion happens during the execution of two branches of the logical 
purpose of the function? Do you want to see two separate if statements 
of `if ( condition A )` and `if ( ! condition A )`?

>> +        {
>> +            for_each_domain(next)
> 
> What guarantees that the list won't change behind your back? You don't
> hold domlist_read_lock here afaict. It might be that you're safe because
> that lock is an RCU one and this function is only invoked at init time
> or from some form of interrupt handler. But that's far from obvious and
> will hence need both properly confirming and stating in a comment. (It
> is actually this concern, iirc, which so far had us avoid iterating the
> domain list here.)

It is better to error on the side of caution instead of assuming this 
will always be invoked in a safe manner. I will add a read lock for the 
domain list.

>> +            {
>> +                if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>> +                {
>> +                    console_rx = next->domain_id + 1;
>> +                    printk("*** Serial input to DOM%d", CON_RX_DOMID);
>> +                    goto out; //print switch_code statement & newline
>> +                }
>> +            }
>>   
>> -        if ( next_rx++ >= max_console_rx )
>> +            console_rx = 0;
>> +            printk("*** Serial input to Xen");
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    for ( next = rcu_dereference(d->next_in_list); next != NULL;
>> +          next = rcu_dereference(next->next_in_list) )
> 
> This looks like an open-coded continuation of for_each_domain() - I'm
> afraid I'm wary of introducing anything like this.

Not exactly, for_each_domain() always starts with beginning of the 
domain list and walks from that point forward. This open coded version 
stats at domain d and walks from there to the end of the list. Which is 
why there is logic below, which uses for_each_domain(), to walk from the 
beginning of the list until the next domain with console_io or the 
hardware domain, whichever occurs first.

What I did not want to do is potentially waste a lot of cycles doing 
for_each_domain() with a continue until it reached the current domain 
and then start checking for the privilege.

I could take this and introduce a new macro, for_each_domain_from (or a 
better name if there are suggestions) and use it here.

>> +    {
>> +        if ( hardware_domain && next == hardware_domain )
>>           {
>>               console_rx = 0;
>>               printk("*** Serial input to Xen");
>> -            break;
>> +            goto out;
> 
> Since you use "goto" anyway, this wants introducing a 2nd label (maybe
> "xen"?) ahead of the identical code you add further down (immediately
> ahead of the "out" label), to avoid code duplication.

Ack.

>>           }
>>   
>> -        d = rcu_lock_domain_by_id(next_rx - 1);
>> -        if ( d )
>> +        if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>           {
>> -            rcu_unlock_domain(d);
>> -            console_rx = next_rx;
>> -            printk("*** Serial input to DOM%u", next_rx - 1);
>> -            break;
>> +            console_rx = next->domain_id + 1;
>> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
>> +            goto out;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * Hit the end of the domain list and instead of assuming that the
>> +     * hardware domain is the first in the list, get the first domain
>> +     * in the domain list and then if it is not the hardware domain or
>> +     * does not have console privilege, iterate the list until we find
>> +     * the hardware domain or a domain with console privilege.
>> +     */
>> +    if ( next == NULL )
>> +    {
>> +        for_each_domain(next)
>> +        {
>> +            if ( hardware_domain && next == hardware_domain )
>> +            {
>> +                console_rx = 0;
>> +                printk("*** Serial input to Xen");
>> +                goto out;
>> +            }
>> +
>> +            if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>> +            {
>> +                console_rx = next->domain_id + 1;
>> +                printk("*** Serial input to DOM%d", CON_RX_DOMID);
>> +                goto out;
>> +            }
>>           }
>>       }
>>   
>> +    /*
>> +     * If we got here, could not find a domain with console io privilege.
>> +     * Default to Xen.
>> +     */
> 
> "Default to" is a little odd when there are no other options.

Fallback to?

>> +    console_rx = 0;
>> +    printk("*** Serial input to Xen");
>> +
>> +out:
> 
> Labels indented by at least one blank please.

Ack.

>> @@ -520,12 +577,11 @@ static void switch_serial_input(void)
>>   
>>   static void __serial_rx(char c, struct cpu_user_regs *regs)
>>   {
>> -    switch ( console_rx )
>> -    {
>> -    case 0:
>> +    if ( console_rx == 0 )
> 
> By using CON_RX_DOMID everywhere else you try to carefully avoid and
> open-coded assumptions on the precise biasing used there. With this
> it would seem to me that here "CON_RX_DOMID > DOMID_MASK" would be
> more in line with that model then.

Yep and it results in not correctly implementing the commandline 
parameter behavior. Since I am already doing a domain list walk, instead 
of having to convert from domids, this really should just be tracked 
with a domain ref and use the rcu lock to catch if the domain drops at 
any point.

>>           return handle_keypress(c, regs);
>>   
>> -    case 1:
>> +    if ( hardware_domain->domain_id == CON_RX_DOMID )
> 
> No check of hardware_domain against NULL?

Missed this one, will fix.

>> +    {
>>           /*
>>            * Deliver input to the hardware domain buffer, unless it is
>>            * already full.
>> @@ -538,31 +594,37 @@ static void __serial_rx(char c, struct cpu_user_regs *regs)
>>            * getting stuck.
>>            */
>>           send_global_virq(VIRQ_CONSOLE);
>> -        break;
>> -
>> -#ifdef CONFIG_SBSA_VUART_CONSOLE
>> -    default:
>> +    }
>> +    else
>>       {
>> -        struct domain *d = rcu_lock_domain_by_id(console_rx - 1);
>> +        struct domain *d = rcu_lock_domain_by_any_id(CON_RX_DOMID);
>>   
>> +        if ( d == NULL )
>> +            goto unlock_out;
>> +
>> +#ifdef CONFIG_SBSA_VUART_CONSOLE
>>           /*
>>            * If we have a properly initialized vpl011 console for the
>>            * domain, without a full PV ring to Dom0 (in that case input
>>            * comes from the PV ring), then send the character to it.
>>            */
>> -        if ( d != NULL &&
>> -             !d->arch.vpl011.backend_in_domain &&
>> +        if ( !d->arch.vpl011.backend_in_domain &&
>>                d->arch.vpl011.backend.xen != NULL )
>> +        {
>>               vpl011_rx_char_xen(d, c);
>> -        else
>> -            printk("Cannot send chars to Dom%d: no UART available\n",
>> -                   console_rx - 1);
>> +            goto unlock_out;
>> +        }
>> +#endif
>> +
>> +        if ( (serial_rx_prod - serial_rx_cons) != SERIAL_RX_SIZE )
>> +            serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c;
> 
> This is Dom0's buffer; I don't think sharing with DomU-s is correct.

I would disagree, it is the hypervisor's buffer that it decides to share 
with domains it trust. It just so happens that it always trusts the 
hardware domain. This is why I explicitly changed this to the XSM call, 
to express that when the system manager, by enabling this privilege on 
the domain, has decided to trust these domains to have access to the 
hypervisor's buffer.

> You also cannot ...
> 
>> @@ -717,6 +779,8 @@ long do_console_io(
>>           rc = -E2BIG;
>>           if ( count > INT_MAX )
>>               break;
>> +        if ( CON_RX_DOMID != current->domain->domain_id )
>> +            return 0;
>>   
>>           rc = 0;
>>           while ( (serial_rx_cons != serial_rx_prod) && (rc < count) )
> 
> ... assume that by the time this hypercall is invoked input focus
> hasn't switched. I think there's no way around a per-domain input
> buffer, which of course would need setting up only for console-io-
> capable domains.

Let's explore the exact concern here, the scenarios as far as I can see 
it is as follows.

A person at the serial/console types keys for the current console domain 
(domA), then enters the console switch sequence, switching to another 
domain (domB). DomA's CONSOLEIO_read hypercall arrives after the switch 
and thus is not sent the rx buffer contents. Then domB's CONSOLEIO_read 
arrives and then because `serial_rx_cons` and `serial_rx_prod` are not 
the same, domB is sent the bytes that were intended for domA.

While a per domain console_io buffer would address this issue, I believe 
there is a simpler solution that can be extended depending on whether it 
is acceptable for the undelivered bytes to be dropped or not.

Simply upon switching, if serial_rx_cons and serial_rx_prod are set the 
same, then no bytes will be leaked to domB from domA above. An extra 
precaution could be taken to zero the serial_rx buffer. If guaranteed 
delivery is desired, a list of buffer remnants could be drained on 
hypercall and console switching.

IMHO I believe the reality is while there is potential that the scenario 
could happen, the probability is low. Doing a per domain buffer will 
always incur the resource overhead even if the event never happens, 
while the above approach would only incur the resource overhead when the 
situation occurs.

>> @@ -1107,7 +1171,7 @@ void __init console_endboot(void)
>>        * a useful 'how to switch' message.
>>        */
>>       if ( opt_conswitch[1] == 'x' )
>> -        console_rx = max_console_rx;
>> +        console_rx = 0;
> 
> I can't bring this change in line with the comment ahead of the if():
> Won't this result in switch_serial_input() switching to Dom0?

Correct, if there is a Dom0 and it is the hardware domain. As noted, I 
missed that I broke the behavior this was intended to create.

v/r.
dps
Daniel P. Smith Aug. 3, 2023, 1:12 p.m. UTC | #4
On 8/2/23 19:58, Stefano Stabellini wrote:
> On Wed, 2 Aug 2023, Jan Beulich wrote:
>>> -    for ( ; ; )
>>> +    if ( d == NULL )
>>
>> This covers both Xen receiving input and the domain receiving input having
>> gone away. Originally in the latter case the next sequential (in domid
>> numbering) domain would be switched to. In the new logic you start over
>> from the beginning of the domain list. Such a change in behavior (if
>> deemed acceptable at all, which I'm not convinced of) needs calling out in
>> the description.
> 
> I think it would be best to keep the current behavior as we already
> have people using it unless we have strong reasons to change it.

I agree and intended to keep the order of switching but I disagree on 
keeping the complete current behavior. I mean that by the complete 
current behavior being defined, at least for Arm, as meaning only the 
domains created at boot. The is_console flag in struct domain is the DAC 
equivalent to granting the FLASK access XEN__READCONSOLE to a domain, it 
was just never implemented/used until domoless enable it. An intended 
consequence of this patch is to ensure any domain granted the privilege, 
either through the DAC is_console or FLASK XEN__READCONSOLE, is included 
in the rotation regardless if the domain was created at boot or at runtime.

v/r,
dps
Jan Beulich Aug. 3, 2023, 3:56 p.m. UTC | #5
On 03.08.2023 14:56, Daniel P. Smith wrote:
> On 8/2/23 07:01, Jan Beulich wrote:
>> On 01.08.2023 18:06, Daniel P. Smith wrote:
>>> +        if ( hardware_domain )
>>> +        {
>>> +            console_rx = hardware_domain->domain_id + 1;
>>> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>
>> Here and elsewhere - why %d when original code properly used %u? I also
>> think there are now quite a few too many of these all identical
>> printk()s.
> 
> Good question, I did not write the line, it was copy/paste from 
> elsewhere in the file and then continued to replicate from there.

There's exactly one such line right now, using DOM%u. If I'm not
mistaken, it's not all that long ago that this was changed, so I
would suspect an incomplete rebase.

>>> +            goto out; //print switch_code statement & newline
>>
>> Leftover development comment? (There's at least one more.)
> 
> Yes and no, the comment came from elsewhere in the file

Did it?

> and early in 
> development it I place it here to later decide if it should stay (and 
> get converted into a compliant comment). I will drop it in the next 
> iteration.

Thanks.

>>> +        }
>>> +        else
>>
>> Please avoid "else" after an if() that ends in "return", "goto", or
>> alike.
> 
> Really? How would you propose handling common finalization when
> completion happens during the execution of two branches of the logical 
> purpose of the function? Do you want to see two separate if statements 
> of `if ( condition A )` and `if ( ! condition A )`?

What would you need the 2nd if() for when the first one ends in "return",
"goto", or alike?

>>> +        {
>>> +            for_each_domain(next)
>>
>> What guarantees that the list won't change behind your back? You don't
>> hold domlist_read_lock here afaict. It might be that you're safe because
>> that lock is an RCU one and this function is only invoked at init time
>> or from some form of interrupt handler. But that's far from obvious and
>> will hence need both properly confirming and stating in a comment. (It
>> is actually this concern, iirc, which so far had us avoid iterating the
>> domain list here.)
> 
> It is better to error on the side of caution instead of assuming this 
> will always be invoked in a safe manner. I will add a read lock for the 
> domain list.

I'm not firm enough in RCU to be certain whether acquiring that lock is
permissible here.

>>> +            {
>>> +                if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>> +                {
>>> +                    console_rx = next->domain_id + 1;
>>> +                    printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>> +                    goto out; //print switch_code statement & newline
>>> +                }
>>> +            }
>>>   
>>> -        if ( next_rx++ >= max_console_rx )
>>> +            console_rx = 0;
>>> +            printk("*** Serial input to Xen");
>>> +            goto out;
>>> +        }
>>> +    }
>>> +
>>> +    for ( next = rcu_dereference(d->next_in_list); next != NULL;
>>> +          next = rcu_dereference(next->next_in_list) )
>>
>> This looks like an open-coded continuation of for_each_domain() - I'm
>> afraid I'm wary of introducing anything like this.
> 
> Not exactly, for_each_domain() always starts with beginning of the 
> domain list and walks from that point forward.

Right, hence my use of the word "continuation".

> This open coded version 
> stats at domain d and walks from there to the end of the list. Which is 
> why there is logic below, which uses for_each_domain(), to walk from the 
> beginning of the list until the next domain with console_io or the 
> hardware domain, whichever occurs first.
> 
> What I did not want to do is potentially waste a lot of cycles doing 
> for_each_domain() with a continue until it reached the current domain 
> and then start checking for the privilege.
> 
> I could take this and introduce a new macro, for_each_domain_from (or a 
> better name if there are suggestions) and use it here.

That's effectively what I would like to be done, yes.

>>> +    {
>>> +        if ( hardware_domain && next == hardware_domain )
>>>           {
>>>               console_rx = 0;
>>>               printk("*** Serial input to Xen");
>>> -            break;
>>> +            goto out;
>>
>> Since you use "goto" anyway, this wants introducing a 2nd label (maybe
>> "xen"?) ahead of the identical code you add further down (immediately
>> ahead of the "out" label), to avoid code duplication.
> 
> Ack.
> 
>>>           }
>>>   
>>> -        d = rcu_lock_domain_by_id(next_rx - 1);
>>> -        if ( d )
>>> +        if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>>           {
>>> -            rcu_unlock_domain(d);
>>> -            console_rx = next_rx;
>>> -            printk("*** Serial input to DOM%u", next_rx - 1);
>>> -            break;
>>> +            console_rx = next->domain_id + 1;
>>> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>> +            goto out;
>>> +        }
>>> +    }
>>> +
>>> +    /*
>>> +     * Hit the end of the domain list and instead of assuming that the
>>> +     * hardware domain is the first in the list, get the first domain
>>> +     * in the domain list and then if it is not the hardware domain or
>>> +     * does not have console privilege, iterate the list until we find
>>> +     * the hardware domain or a domain with console privilege.
>>> +     */
>>> +    if ( next == NULL )
>>> +    {
>>> +        for_each_domain(next)
>>> +        {
>>> +            if ( hardware_domain && next == hardware_domain )
>>> +            {
>>> +                console_rx = 0;
>>> +                printk("*** Serial input to Xen");
>>> +                goto out;
>>> +            }
>>> +
>>> +            if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>> +            {
>>> +                console_rx = next->domain_id + 1;
>>> +                printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>> +                goto out;
>>> +            }
>>>           }
>>>       }
>>>   
>>> +    /*
>>> +     * If we got here, could not find a domain with console io privilege.
>>> +     * Default to Xen.
>>> +     */
>>
>> "Default to" is a little odd when there are no other options.
> 
> Fallback to?

Yes.

>>> @@ -538,31 +594,37 @@ static void __serial_rx(char c, struct cpu_user_regs *regs)
>>>            * getting stuck.
>>>            */
>>>           send_global_virq(VIRQ_CONSOLE);
>>> -        break;
>>> -
>>> -#ifdef CONFIG_SBSA_VUART_CONSOLE
>>> -    default:
>>> +    }
>>> +    else
>>>       {
>>> -        struct domain *d = rcu_lock_domain_by_id(console_rx - 1);
>>> +        struct domain *d = rcu_lock_domain_by_any_id(CON_RX_DOMID);
>>>   
>>> +        if ( d == NULL )
>>> +            goto unlock_out;
>>> +
>>> +#ifdef CONFIG_SBSA_VUART_CONSOLE
>>>           /*
>>>            * If we have a properly initialized vpl011 console for the
>>>            * domain, without a full PV ring to Dom0 (in that case input
>>>            * comes from the PV ring), then send the character to it.
>>>            */
>>> -        if ( d != NULL &&
>>> -             !d->arch.vpl011.backend_in_domain &&
>>> +        if ( !d->arch.vpl011.backend_in_domain &&
>>>                d->arch.vpl011.backend.xen != NULL )
>>> +        {
>>>               vpl011_rx_char_xen(d, c);
>>> -        else
>>> -            printk("Cannot send chars to Dom%d: no UART available\n",
>>> -                   console_rx - 1);
>>> +            goto unlock_out;
>>> +        }
>>> +#endif
>>> +
>>> +        if ( (serial_rx_prod - serial_rx_cons) != SERIAL_RX_SIZE )
>>> +            serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c;
>>
>> This is Dom0's buffer; I don't think sharing with DomU-s is correct.
> 
> I would disagree, it is the hypervisor's buffer that it decides to share 
> with domains it trust. It just so happens that it always trusts the 
> hardware domain. This is why I explicitly changed this to the XSM call, 
> to express that when the system manager, by enabling this privilege on 
> the domain, has decided to trust these domains to have access to the 
> hypervisor's buffer.

I don't think such trust can be assumed to allow the domains to see e.g.
each others root passwords.

>>> @@ -717,6 +779,8 @@ long do_console_io(
>>>           rc = -E2BIG;
>>>           if ( count > INT_MAX )
>>>               break;
>>> +        if ( CON_RX_DOMID != current->domain->domain_id )
>>> +            return 0;
>>>   
>>>           rc = 0;
>>>           while ( (serial_rx_cons != serial_rx_prod) && (rc < count) )
>>
>> ... assume that by the time this hypercall is invoked input focus
>> hasn't switched. I think there's no way around a per-domain input
>> buffer, which of course would need setting up only for console-io-
>> capable domains.
> 
> Let's explore the exact concern here, the scenarios as far as I can see 
> it is as follows.
> 
> A person at the serial/console types keys for the current console domain 
> (domA), then enters the console switch sequence, switching to another 
> domain (domB). DomA's CONSOLEIO_read hypercall arrives after the switch 
> and thus is not sent the rx buffer contents. Then domB's CONSOLEIO_read 
> arrives and then because `serial_rx_cons` and `serial_rx_prod` are not 
> the same, domB is sent the bytes that were intended for domA.
> 
> While a per domain console_io buffer would address this issue, I believe 
> there is a simpler solution that can be extended depending on whether it 
> is acceptable for the undelivered bytes to be dropped or not.
> 
> Simply upon switching, if serial_rx_cons and serial_rx_prod are set the 
> same, then no bytes will be leaked to domB from domA above. An extra 
> precaution could be taken to zero the serial_rx buffer. If guaranteed 
> delivery is desired, a list of buffer remnants could be drained on 
> hypercall and console switching.
> 
> IMHO I believe the reality is while there is potential that the scenario 
> could happen, the probability is low. Doing a per domain buffer will 
> always incur the resource overhead even if the event never happens, 
> while the above approach would only incur the resource overhead when the 
> situation occurs.

Which, afaict, would then result in stuff typed in for one domain being
discarded, without there being any indication how much of it was discarded
(and hence would need typing again).

I don't share the resource concern: If an admin wants multiple domains
to take input this way, they can't assume this comes at no price at all.

Jan
Daniel P. Smith Aug. 3, 2023, 4:31 p.m. UTC | #6
On 8/3/23 11:56, Jan Beulich wrote:
> On 03.08.2023 14:56, Daniel P. Smith wrote:
>> On 8/2/23 07:01, Jan Beulich wrote:
>>> On 01.08.2023 18:06, Daniel P. Smith wrote:
>>>> +        if ( hardware_domain )
>>>> +        {
>>>> +            console_rx = hardware_domain->domain_id + 1;
>>>> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>>
>>> Here and elsewhere - why %d when original code properly used %u? I also
>>> think there are now quite a few too many of these all identical
>>> printk()s.
>>
>> Good question, I did not write the line, it was copy/paste from
>> elsewhere in the file and then continued to replicate from there.
> 
> There's exactly one such line right now, using DOM%u. If I'm not
> mistaken, it's not all that long ago that this was changed, so I
> would suspect an incomplete rebase.
> 
>>>> +            goto out; //print switch_code statement & newline
>>>
>>> Leftover development comment? (There's at least one more.)
>>
>> Yes and no, the comment came from elsewhere in the file
> 
> Did it?

I thought it did.

>> and early in
>> development it I place it here to later decide if it should stay (and
>> get converted into a compliant comment). I will drop it in the next
>> iteration.
> 
> Thanks.

No problem.

>>>> +        }
>>>> +        else
>>>
>>> Please avoid "else" after an if() that ends in "return", "goto", or
>>> alike.
>>
>> Really? How would you propose handling common finalization when
>> completion happens during the execution of two branches of the logical
>> purpose of the function? Do you want to see two separate if statements
>> of `if ( condition A )` and `if ( ! condition A )`?
> 
> What would you need the 2nd if() for when the first one ends in "return",
> "goto", or alike?

Good point, not sure why I wasn't thinking this way.

>>>> +        {
>>>> +            for_each_domain(next)
>>>
>>> What guarantees that the list won't change behind your back? You don't
>>> hold domlist_read_lock here afaict. It might be that you're safe because
>>> that lock is an RCU one and this function is only invoked at init time
>>> or from some form of interrupt handler. But that's far from obvious and
>>> will hence need both properly confirming and stating in a comment. (It
>>> is actually this concern, iirc, which so far had us avoid iterating the
>>> domain list here.)
>>
>> It is better to error on the side of caution instead of assuming this
>> will always be invoked in a safe manner. I will add a read lock for the
>> domain list.
> 
> I'm not firm enough in RCU to be certain whether acquiring that lock is
> permissible here.

Same and I took your statements to suggest that I should.

>>>> +            {
>>>> +                if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>>> +                {
>>>> +                    console_rx = next->domain_id + 1;
>>>> +                    printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>>> +                    goto out; //print switch_code statement & newline
>>>> +                }
>>>> +            }
>>>>    
>>>> -        if ( next_rx++ >= max_console_rx )
>>>> +            console_rx = 0;
>>>> +            printk("*** Serial input to Xen");
>>>> +            goto out;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    for ( next = rcu_dereference(d->next_in_list); next != NULL;
>>>> +          next = rcu_dereference(next->next_in_list) )
>>>
>>> This looks like an open-coded continuation of for_each_domain() - I'm
>>> afraid I'm wary of introducing anything like this.
>>
>> Not exactly, for_each_domain() always starts with beginning of the
>> domain list and walks from that point forward.
> 
> Right, hence my use of the word "continuation".

Ah. okay.

>> This open coded version
>> stats at domain d and walks from there to the end of the list. Which is
>> why there is logic below, which uses for_each_domain(), to walk from the
>> beginning of the list until the next domain with console_io or the
>> hardware domain, whichever occurs first.
>>
>> What I did not want to do is potentially waste a lot of cycles doing
>> for_each_domain() with a continue until it reached the current domain
>> and then start checking for the privilege.
>>
>> I could take this and introduce a new macro, for_each_domain_from (or a
>> better name if there are suggestions) and use it here.
> 
> That's effectively what I would like to be done, yes.

Okay, unless you (or anyone else) have another name suggestion, I will 
add it as the above name.

>>>> +    {
>>>> +        if ( hardware_domain && next == hardware_domain )
>>>>            {
>>>>                console_rx = 0;
>>>>                printk("*** Serial input to Xen");
>>>> -            break;
>>>> +            goto out;
>>>
>>> Since you use "goto" anyway, this wants introducing a 2nd label (maybe
>>> "xen"?) ahead of the identical code you add further down (immediately
>>> ahead of the "out" label), to avoid code duplication.
>>
>> Ack.
>>
>>>>            }
>>>>    
>>>> -        d = rcu_lock_domain_by_id(next_rx - 1);
>>>> -        if ( d )
>>>> +        if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>>>            {
>>>> -            rcu_unlock_domain(d);
>>>> -            console_rx = next_rx;
>>>> -            printk("*** Serial input to DOM%u", next_rx - 1);
>>>> -            break;
>>>> +            console_rx = next->domain_id + 1;
>>>> +            printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>>> +            goto out;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Hit the end of the domain list and instead of assuming that the
>>>> +     * hardware domain is the first in the list, get the first domain
>>>> +     * in the domain list and then if it is not the hardware domain or
>>>> +     * does not have console privilege, iterate the list until we find
>>>> +     * the hardware domain or a domain with console privilege.
>>>> +     */
>>>> +    if ( next == NULL )
>>>> +    {
>>>> +        for_each_domain(next)
>>>> +        {
>>>> +            if ( hardware_domain && next == hardware_domain )
>>>> +            {
>>>> +                console_rx = 0;
>>>> +                printk("*** Serial input to Xen");
>>>> +                goto out;
>>>> +            }
>>>> +
>>>> +            if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
>>>> +            {
>>>> +                console_rx = next->domain_id + 1;
>>>> +                printk("*** Serial input to DOM%d", CON_RX_DOMID);
>>>> +                goto out;
>>>> +            }
>>>>            }
>>>>        }
>>>>    
>>>> +    /*
>>>> +     * If we got here, could not find a domain with console io privilege.
>>>> +     * Default to Xen.
>>>> +     */
>>>
>>> "Default to" is a little odd when there are no other options.
>>
>> Fallback to?
> 
> Yes.

Ack.

>>>> @@ -538,31 +594,37 @@ static void __serial_rx(char c, struct cpu_user_regs *regs)
>>>>             * getting stuck.
>>>>             */
>>>>            send_global_virq(VIRQ_CONSOLE);
>>>> -        break;
>>>> -
>>>> -#ifdef CONFIG_SBSA_VUART_CONSOLE
>>>> -    default:
>>>> +    }
>>>> +    else
>>>>        {
>>>> -        struct domain *d = rcu_lock_domain_by_id(console_rx - 1);
>>>> +        struct domain *d = rcu_lock_domain_by_any_id(CON_RX_DOMID);
>>>>    
>>>> +        if ( d == NULL )
>>>> +            goto unlock_out;
>>>> +
>>>> +#ifdef CONFIG_SBSA_VUART_CONSOLE
>>>>            /*
>>>>             * If we have a properly initialized vpl011 console for the
>>>>             * domain, without a full PV ring to Dom0 (in that case input
>>>>             * comes from the PV ring), then send the character to it.
>>>>             */
>>>> -        if ( d != NULL &&
>>>> -             !d->arch.vpl011.backend_in_domain &&
>>>> +        if ( !d->arch.vpl011.backend_in_domain &&
>>>>                 d->arch.vpl011.backend.xen != NULL )
>>>> +        {
>>>>                vpl011_rx_char_xen(d, c);
>>>> -        else
>>>> -            printk("Cannot send chars to Dom%d: no UART available\n",
>>>> -                   console_rx - 1);
>>>> +            goto unlock_out;
>>>> +        }
>>>> +#endif
>>>> +
>>>> +        if ( (serial_rx_prod - serial_rx_cons) != SERIAL_RX_SIZE )
>>>> +            serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c;
>>>
>>> This is Dom0's buffer; I don't think sharing with DomU-s is correct.
>>
>> I would disagree, it is the hypervisor's buffer that it decides to share
>> with domains it trust. It just so happens that it always trusts the
>> hardware domain. This is why I explicitly changed this to the XSM call,
>> to express that when the system manager, by enabling this privilege on
>> the domain, has decided to trust these domains to have access to the
>> hypervisor's buffer.
> 
> I don't think such trust can be assumed to allow the domains to see e.g.
> each others root passwords.

Should or can are two different things, it does not change the fact what 
the action is being done here. Another aspect here to your "what if" is 
the question of what really is the likelihood that an admin is going to 
enter the root password and then send the console switch immediately to 
another domain via the serial interface on a production system that they 
made the conscious decision to trust an untrustworthy domain with the a 
highly trusted privilege of serial/console access.

>>>> @@ -717,6 +779,8 @@ long do_console_io(
>>>>            rc = -E2BIG;
>>>>            if ( count > INT_MAX )
>>>>                break;
>>>> +        if ( CON_RX_DOMID != current->domain->domain_id )
>>>> +            return 0;
>>>>    
>>>>            rc = 0;
>>>>            while ( (serial_rx_cons != serial_rx_prod) && (rc < count) )
>>>
>>> ... assume that by the time this hypercall is invoked input focus
>>> hasn't switched. I think there's no way around a per-domain input
>>> buffer, which of course would need setting up only for console-io-
>>> capable domains.
>>
>> Let's explore the exact concern here, the scenarios as far as I can see
>> it is as follows.
>>
>> A person at the serial/console types keys for the current console domain
>> (domA), then enters the console switch sequence, switching to another
>> domain (domB). DomA's CONSOLEIO_read hypercall arrives after the switch
>> and thus is not sent the rx buffer contents. Then domB's CONSOLEIO_read
>> arrives and then because `serial_rx_cons` and `serial_rx_prod` are not
>> the same, domB is sent the bytes that were intended for domA.
>>
>> While a per domain console_io buffer would address this issue, I believe
>> there is a simpler solution that can be extended depending on whether it
>> is acceptable for the undelivered bytes to be dropped or not.
>>
>> Simply upon switching, if serial_rx_cons and serial_rx_prod are set the
>> same, then no bytes will be leaked to domB from domA above. An extra
>> precaution could be taken to zero the serial_rx buffer. If guaranteed
>> delivery is desired, a list of buffer remnants could be drained on
>> hypercall and console switching.
>>
>> IMHO I believe the reality is while there is potential that the scenario
>> could happen, the probability is low. Doing a per domain buffer will
>> always incur the resource overhead even if the event never happens,
>> while the above approach would only incur the resource overhead when the
>> situation occurs.
> 
> Which, afaict, would then result in stuff typed in for one domain being
> discarded, without there being any indication how much of it was discarded
> (and hence would need typing again).

Again, I find the scenario unlikely but agree that good behavior is not 
to have a lossy console. That is why I suggested that a simple list (or 
other representation) of pending buffer fragments can be held for 
deliver to previous domains.

> I don't share the resource concern: If an admin wants multiple domains
> to take input this way, they can't assume this comes at no price at all.

Except its not just the admins that pay the price. I had already 
considered a per domain buffer, but realized that this is a more 
complicated approach requiring code in multiple places within the 
hypervisor. Whereas I believe the proposal above can exist solely within 
console.c and will not require having to hook in to domain 
create/destruction.

v/r,
dps
Stefano Stabellini Aug. 3, 2023, 7:30 p.m. UTC | #7
On Thu, 3 Aug 2023, Daniel P. Smith wrote:
> On 8/2/23 19:58, Stefano Stabellini wrote:
> > On Wed, 2 Aug 2023, Jan Beulich wrote:
> > > > -    for ( ; ; )
> > > > +    if ( d == NULL )
> > > 
> > > This covers both Xen receiving input and the domain receiving input having
> > > gone away. Originally in the latter case the next sequential (in domid
> > > numbering) domain would be switched to. In the new logic you start over
> > > from the beginning of the domain list. Such a change in behavior (if
> > > deemed acceptable at all, which I'm not convinced of) needs calling out in
> > > the description.
> > 
> > I think it would be best to keep the current behavior as we already
> > have people using it unless we have strong reasons to change it.
> 
> I agree and intended to keep the order of switching but I disagree on keeping
> the complete current behavior. I mean that by the complete current behavior
> being defined, at least for Arm, as meaning only the domains created at boot.
> The is_console flag in struct domain is the DAC equivalent to granting the
> FLASK access XEN__READCONSOLE to a domain, it was just never implemented/used
> until domoless enable it. An intended consequence of this patch is to ensure
> any domain granted the privilege, either through the DAC is_console or FLASK
> XEN__READCONSOLE, is included in the rotation regardless if the domain was
> created at boot or at runtime.

I think that's fine.

Let's say that we have Xen, Dom0, and 2 Dom0less DomUs at boot.  The
console will start on Dom0 and the rotation would go:

Dom0->Dom1->Dom2->Xen->Dom0...

If a new domain comes up at runtime with is_console, it would become:

Dom0->Dom1->Dom2->Dom3->Xen->Dom0...
Daniel P. Smith Aug. 3, 2023, 7:42 p.m. UTC | #8
On 8/3/23 15:30, Stefano Stabellini wrote:
> On Thu, 3 Aug 2023, Daniel P. Smith wrote:
>> On 8/2/23 19:58, Stefano Stabellini wrote:
>>> On Wed, 2 Aug 2023, Jan Beulich wrote:
>>>>> -    for ( ; ; )
>>>>> +    if ( d == NULL )
>>>>
>>>> This covers both Xen receiving input and the domain receiving input having
>>>> gone away. Originally in the latter case the next sequential (in domid
>>>> numbering) domain would be switched to. In the new logic you start over
>>>> from the beginning of the domain list. Such a change in behavior (if
>>>> deemed acceptable at all, which I'm not convinced of) needs calling out in
>>>> the description.
>>>
>>> I think it would be best to keep the current behavior as we already
>>> have people using it unless we have strong reasons to change it.
>>
>> I agree and intended to keep the order of switching but I disagree on keeping
>> the complete current behavior. I mean that by the complete current behavior
>> being defined, at least for Arm, as meaning only the domains created at boot.
>> The is_console flag in struct domain is the DAC equivalent to granting the
>> FLASK access XEN__READCONSOLE to a domain, it was just never implemented/used
>> until domoless enable it. An intended consequence of this patch is to ensure
>> any domain granted the privilege, either through the DAC is_console or FLASK
>> XEN__READCONSOLE, is included in the rotation regardless if the domain was
>> created at boot or at runtime.
> 
> I think that's fine.
> 
> Let's say that we have Xen, Dom0, and 2 Dom0less DomUs at boot.  The
> console will start on Dom0 and the rotation would go:
> 
> Dom0->Dom1->Dom2->Xen->Dom0...
> 
> If a new domain comes up at runtime with is_console, it would become:
> 
> Dom0->Dom1->Dom2->Dom3->Xen->Dom0...

Correct, that is the sequence I was attempting to achieve, but then I 
ran into the dom0less CI configuration of booting a system with Xen and 
a DomU. The above rotation will stay the same less dom0/hwdom, but it 
did quickly point out that I made an assumption about the hardware 
domain always being present.

v/r,
dps
Jan Beulich Aug. 4, 2023, 7:49 a.m. UTC | #9
On 03.08.2023 18:31, Daniel P. Smith wrote:
> On 8/3/23 11:56, Jan Beulich wrote:
>> On 03.08.2023 14:56, Daniel P. Smith wrote:
>>> On 8/2/23 07:01, Jan Beulich wrote:
>>>> On 01.08.2023 18:06, Daniel P. Smith wrote:
>>>>> +        {
>>>>> +            for_each_domain(next)
>>>>
>>>> What guarantees that the list won't change behind your back? You don't
>>>> hold domlist_read_lock here afaict. It might be that you're safe because
>>>> that lock is an RCU one and this function is only invoked at init time
>>>> or from some form of interrupt handler. But that's far from obvious and
>>>> will hence need both properly confirming and stating in a comment. (It
>>>> is actually this concern, iirc, which so far had us avoid iterating the
>>>> domain list here.)
>>>
>>> It is better to error on the side of caution instead of assuming this
>>> will always be invoked in a safe manner. I will add a read lock for the
>>> domain list.
>>
>> I'm not firm enough in RCU to be certain whether acquiring that lock is
>> permissible here.
> 
> Same and I took your statements to suggest that I should.

Actually I wasn't paying close enough attention here: The code already
uses rcu_lock_domain_by_id(), which acquires domlist_read_lock.

Jan
Daniel P. Smith Aug. 8, 2023, 4:58 p.m. UTC | #10
On 8/4/23 03:49, Jan Beulich wrote:
> On 03.08.2023 18:31, Daniel P. Smith wrote:
>> On 8/3/23 11:56, Jan Beulich wrote:
>>> On 03.08.2023 14:56, Daniel P. Smith wrote:
>>>> On 8/2/23 07:01, Jan Beulich wrote:
>>>>> On 01.08.2023 18:06, Daniel P. Smith wrote:
>>>>>> +        {
>>>>>> +            for_each_domain(next)
>>>>>
>>>>> What guarantees that the list won't change behind your back? You don't
>>>>> hold domlist_read_lock here afaict. It might be that you're safe because
>>>>> that lock is an RCU one and this function is only invoked at init time
>>>>> or from some form of interrupt handler. But that's far from obvious and
>>>>> will hence need both properly confirming and stating in a comment. (It
>>>>> is actually this concern, iirc, which so far had us avoid iterating the
>>>>> domain list here.)
>>>>
>>>> It is better to error on the side of caution instead of assuming this
>>>> will always be invoked in a safe manner. I will add a read lock for the
>>>> domain list.
>>>
>>> I'm not firm enough in RCU to be certain whether acquiring that lock is
>>> permissible here.
>>
>> Same and I took your statements to suggest that I should.
> 
> Actually I wasn't paying close enough attention here: The code already
> uses rcu_lock_domain_by_id(), which acquires domlist_read_lock.
> 

Right, it grabs the lock while iterating through domain_hash[], I 
thought your concern was with regard to the iterating with 
for_each_domain and the embedded open coded version. Because of your 
inquiry, I have been thinking about it and I should be grabbing the lock 
as I iterate to be sure that I don't get deceived into believing the end 
of list was hit because a domain was being removed as I walked the list. 
And if it so happens that the context is always safe, then there should 
be no contention on grabbing the lock. Do you disagree?

v/r,
dps
Jan Beulich Aug. 9, 2023, 6:39 a.m. UTC | #11
On 08.08.2023 18:58, Daniel P. Smith wrote:
> On 8/4/23 03:49, Jan Beulich wrote:
>> On 03.08.2023 18:31, Daniel P. Smith wrote:
>>> On 8/3/23 11:56, Jan Beulich wrote:
>>>> On 03.08.2023 14:56, Daniel P. Smith wrote:
>>>>> On 8/2/23 07:01, Jan Beulich wrote:
>>>>>> On 01.08.2023 18:06, Daniel P. Smith wrote:
>>>>>>> +        {
>>>>>>> +            for_each_domain(next)
>>>>>>
>>>>>> What guarantees that the list won't change behind your back? You don't
>>>>>> hold domlist_read_lock here afaict. It might be that you're safe because
>>>>>> that lock is an RCU one and this function is only invoked at init time
>>>>>> or from some form of interrupt handler. But that's far from obvious and
>>>>>> will hence need both properly confirming and stating in a comment. (It
>>>>>> is actually this concern, iirc, which so far had us avoid iterating the
>>>>>> domain list here.)
>>>>>
>>>>> It is better to error on the side of caution instead of assuming this
>>>>> will always be invoked in a safe manner. I will add a read lock for the
>>>>> domain list.
>>>>
>>>> I'm not firm enough in RCU to be certain whether acquiring that lock is
>>>> permissible here.
>>>
>>> Same and I took your statements to suggest that I should.
>>
>> Actually I wasn't paying close enough attention here: The code already
>> uses rcu_lock_domain_by_id(), which acquires domlist_read_lock.
>>
> 
> Right, it grabs the lock while iterating through domain_hash[], I 
> thought your concern was with regard to the iterating with 
> for_each_domain and the embedded open coded version. Because of your 
> inquiry, I have been thinking about it and I should be grabbing the lock 
> as I iterate to be sure that I don't get deceived into believing the end 
> of list was hit because a domain was being removed as I walked the list. 
> And if it so happens that the context is always safe, then there should 
> be no contention on grabbing the lock. Do you disagree?

Well, RCU locks aren't real locks, so there's no contention in the first
place. As to the context being safe - I continue to be uncertain, but
the pre-existing use of the lock means you adding the necessary locking
to your code won't regress in that regard.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index 51fce66607..5242dfcf93 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -64,6 +64,4 @@  extern bool opt_dom0_verbose;
 extern bool opt_dom0_cpuid_faulting;
 extern bool opt_dom0_msr_relaxed;
 
-#define max_init_domid (0)
-
 #endif
diff --git a/xen/drivers/char/console.c b/xen/drivers/char/console.c
index 0e410fa086..f5b759898e 100644
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -473,45 +473,102 @@  static void cf_check dump_console_ring_key(unsigned char key)
  */
 static unsigned int __read_mostly console_rx = 0;
 
-#define max_console_rx (max_init_domid + 1)
+#define CON_RX_DOMID (console_rx - 1)
 
 /* Make sure to rcu_unlock_domain after use */
 struct domain *console_input_domain(void)
 {
     if ( console_rx == 0 )
             return NULL;
-    return rcu_lock_domain_by_id(console_rx - 1);
+    return rcu_lock_domain_by_id(CON_RX_DOMID);
 }
 
 static void switch_serial_input(void)
 {
-    unsigned int next_rx = console_rx;
+    struct domain *next, *d = console_input_domain();
 
-    /*
-     * Rotate among Xen, dom0 and boot-time created domUs while skipping
-     * switching serial input to non existing domains.
-     */
-    for ( ; ; )
+    if ( d == NULL )
     {
-        struct domain *d;
+        if ( hardware_domain )
+        {
+            console_rx = hardware_domain->domain_id + 1;
+            printk("*** Serial input to DOM%d", CON_RX_DOMID);
+            goto out; //print switch_code statement & newline
+        }
+        else
+        {
+            for_each_domain(next)
+            {
+                if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
+                {
+                    console_rx = next->domain_id + 1;
+                    printk("*** Serial input to DOM%d", CON_RX_DOMID);
+                    goto out; //print switch_code statement & newline
+                }
+            }
 
-        if ( next_rx++ >= max_console_rx )
+            console_rx = 0;
+            printk("*** Serial input to Xen");
+            goto out;
+        }
+    }
+
+    for ( next = rcu_dereference(d->next_in_list); next != NULL;
+          next = rcu_dereference(next->next_in_list) )
+    {
+        if ( hardware_domain && next == hardware_domain )
         {
             console_rx = 0;
             printk("*** Serial input to Xen");
-            break;
+            goto out;
         }
 
-        d = rcu_lock_domain_by_id(next_rx - 1);
-        if ( d )
+        if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
         {
-            rcu_unlock_domain(d);
-            console_rx = next_rx;
-            printk("*** Serial input to DOM%u", next_rx - 1);
-            break;
+            console_rx = next->domain_id + 1;
+            printk("*** Serial input to DOM%d", CON_RX_DOMID);
+            goto out;
+        }
+    }
+
+    /*
+     * Hit the end of the domain list and instead of assuming that the
+     * hardware domain is the first in the list, get the first domain
+     * in the domain list and then if it is not the hardware domain or
+     * does not have console privilege, iterate the list until we find
+     * the hardware domain or a domain with console privilege.
+     */
+    if ( next == NULL )
+    {
+        for_each_domain(next)
+        {
+            if ( hardware_domain && next == hardware_domain )
+            {
+                console_rx = 0;
+                printk("*** Serial input to Xen");
+                goto out;
+            }
+
+            if ( xsm_console_io(XSM_OTHER, next, CONSOLEIO_read) == 0 )
+            {
+                console_rx = next->domain_id + 1;
+                printk("*** Serial input to DOM%d", CON_RX_DOMID);
+                goto out;
+            }
         }
     }
 
+    /*
+     * If we got here, could not find a domain with console io privilege.
+     * Default to Xen.
+     */
+    console_rx = 0;
+    printk("*** Serial input to Xen");
+
+out:
+    if ( d != NULL)
+        rcu_unlock_domain(d);
+
     if ( switch_code )
         printk(" (type 'CTRL-%c' three times to switch input)",
                opt_conswitch[0]);
@@ -520,12 +577,11 @@  static void switch_serial_input(void)
 
 static void __serial_rx(char c, struct cpu_user_regs *regs)
 {
-    switch ( console_rx )
-    {
-    case 0:
+    if ( console_rx == 0 )
         return handle_keypress(c, regs);
 
-    case 1:
+    if ( hardware_domain->domain_id == CON_RX_DOMID )
+    {
         /*
          * Deliver input to the hardware domain buffer, unless it is
          * already full.
@@ -538,31 +594,37 @@  static void __serial_rx(char c, struct cpu_user_regs *regs)
          * getting stuck.
          */
         send_global_virq(VIRQ_CONSOLE);
-        break;
-
-#ifdef CONFIG_SBSA_VUART_CONSOLE
-    default:
+    }
+    else
     {
-        struct domain *d = rcu_lock_domain_by_id(console_rx - 1);
+        struct domain *d = rcu_lock_domain_by_any_id(CON_RX_DOMID);
 
+        if ( d == NULL )
+            goto unlock_out;
+
+#ifdef CONFIG_SBSA_VUART_CONSOLE
         /*
          * If we have a properly initialized vpl011 console for the
          * domain, without a full PV ring to Dom0 (in that case input
          * comes from the PV ring), then send the character to it.
          */
-        if ( d != NULL &&
-             !d->arch.vpl011.backend_in_domain &&
+        if ( !d->arch.vpl011.backend_in_domain &&
              d->arch.vpl011.backend.xen != NULL )
+        {
             vpl011_rx_char_xen(d, c);
-        else
-            printk("Cannot send chars to Dom%d: no UART available\n",
-                   console_rx - 1);
+            goto unlock_out;
+        }
+#endif
+
+        if ( (serial_rx_prod - serial_rx_cons) != SERIAL_RX_SIZE )
+            serial_rx_ring[SERIAL_RX_MASK(serial_rx_prod++)] = c;
 
+        send_guest_global_virq(d, VIRQ_CONSOLE);
+
+unlock_out:
         if ( d != NULL )
             rcu_unlock_domain(d);
     }
-#endif
-    }
 
 #ifdef CONFIG_X86
     if ( pv_shim && pv_console )
@@ -627,7 +689,7 @@  static long guest_console_write(XEN_GUEST_HANDLE_PARAM(char) buffer,
         if ( copy_from_guest(kbuf, buffer, kcount) )
             return -EFAULT;
 
-        if ( is_hardware_domain(cd) )
+        if ( cd->domain_id == CON_RX_DOMID )
         {
             /* Use direct console output as it could be interactive */
             spin_lock_irq(&console_lock);
@@ -717,6 +779,8 @@  long do_console_io(
         rc = -E2BIG;
         if ( count > INT_MAX )
             break;
+        if ( CON_RX_DOMID != current->domain->domain_id )
+            return 0;
 
         rc = 0;
         while ( (serial_rx_cons != serial_rx_prod) && (rc < count) )
@@ -1107,7 +1171,7 @@  void __init console_endboot(void)
      * a useful 'how to switch' message.
      */
     if ( opt_conswitch[1] == 'x' )
-        console_rx = max_console_rx;
+        console_rx = 0;
 
     register_keyhandler('w', dump_console_ring_key,
                         "synchronously dump console ring buffer (dmesg)", 0);