diff mbox series

[2/5] xen/gnttab: Rework resource acquisition

Message ID 20200728113712.22966-3-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series Multiple fixes to XENMEM_acquire_resource | expand

Commit Message

Andrew Cooper July 28, 2020, 11:37 a.m. UTC
The existing logic doesn't function in the general case for mapping a guests
grant table, due to arbitrary 32 frame limit, and the default grant table
limit being 64.

In order to start addressing this, rework the existing grant table logic by
implementing a single gnttab_acquire_resource().  This is far more efficient
than the previous acquire_grant_table() in memory.c because it doesn't take
the grant table write lock, and attempt to grow the table, for every single
frame.

The new gnttab_acquire_resource() function subsumes the previous two
gnttab_get_{shared,status}_frame() helpers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Michał Leszczyński <michal.leszczynski@cert.pl>
CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
---
 xen/common/grant_table.c      | 93 ++++++++++++++++++++++++++++++-------------
 xen/common/memory.c           | 42 +------------------
 xen/include/xen/grant_table.h | 19 ++++-----
 3 files changed, 75 insertions(+), 79 deletions(-)

Comments

Andrew Cooper July 28, 2020, 2:11 p.m. UTC | #1
On 28/07/2020 12:37, Andrew Cooper wrote:
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 9f0cae52c0..122d1e7596 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>      return 0;
>  }
>  
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +    struct grant_table *gt = d->grant_table;
> +    unsigned int i = nr_frames, tot_frames;
> +    void **vaddrs;

I've folded an "= NULL" here to fix the CentOS 6 build.

~Andrew
Jan Beulich July 29, 2020, 8:02 p.m. UTC | #2
On 28.07.2020 13:37, Andrew Cooper wrote:
> The existing logic doesn't function in the general case for mapping a guests
> grant table, due to arbitrary 32 frame limit, and the default grant table
> limit being 64.
> 
> In order to start addressing this, rework the existing grant table logic by
> implementing a single gnttab_acquire_resource().  This is far more efficient
> than the previous acquire_grant_table() in memory.c because it doesn't take
> the grant table write lock, and attempt to grow the table, for every single
> frame.

Among the code you replace there is a comment "Iterate backwards in case
table needs to grow" explaining why what you say about growing the grant
table didn't actually happen.

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>       return 0;
>   }
>   
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +    struct grant_table *gt = d->grant_table;
> +    unsigned int i = nr_frames, tot_frames;
> +    void **vaddrs;
> +    int rc = 0;
> +
> +    /* Input sanity. */
> +    if ( !nr_frames )
> +        return -EINVAL;

I can't seem to be able to find an equivalent of this in the old logic,
and hence this looks like an unwarranted change in behavior to me. We
have quite a few hypercall ops where some count being zero is simply
a no-op, i.e. yielding success without doing anything.

> +    /* Overflow checks */
> +    if ( frame + nr_frames < frame )
> +        return -EINVAL;
> +
> +    tot_frames = frame + nr_frames;
> +    if ( tot_frames != frame + nr_frames )
> +        return -EINVAL;

I find the naming here quite confusing. I realize part of this stems
from the code you replace, but anyway: "unsigned long frame" typically
represents a memory frame number of some sort, making the calculation
look as if it was wrong. (Initially I merely meant to ask whether this
check isn't redundant with the prior one, or vice versa.)

> +    /* Grow table if necessary. */
> +    grant_write_lock(gt);
> +    switch ( id )
> +    {
> +        mfn_t tmp;
> +
> +    case XENMEM_resource_grant_table_id_shared:
> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        if ( gt->gt_version != 2 )
> +        {
> +    default:
> +            rc = -EINVAL;
> +            break;
> +        }
> +        rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, &tmp);
> +        break;
> +    }
> +
> +    /* Any errors from growing the table? */
> +    if ( rc )
> +        goto out;
> +
> +    switch ( id )
> +    {
> +    case XENMEM_resource_grant_table_id_shared:
> +        vaddrs = gt->shared_raw;
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        vaddrs = (void **)gt->status;

Now this is the kind of cast that I consider really dangerous, and hence
worth trying hard to avoid. With the code structure as is, I don't see
an immediate solution though.

> +        break;
> +    }

Worth having an ASSERT_UNREACHABLE() default case here?

Jan
Paul Durrant July 30, 2020, 8:14 a.m. UTC | #3
> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 28 July 2020 12:37
> To: Xen-devel <xen-devel@lists.xenproject.org>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap <George.Dunlap@eu.citrix.com>; Ian
> Jackson <ian.jackson@citrix.com>; Jan Beulich <JBeulich@suse.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Julien
> Grall <julien@xen.org>; Paul Durrant <paul@xen.org>; Michał Leszczyński <michal.leszczynski@cert.pl>;
> Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
> Subject: [PATCH 2/5] xen/gnttab: Rework resource acquisition
> 
> The existing logic doesn't function in the general case for mapping a guests
> grant table, due to arbitrary 32 frame limit, and the default grant table
> limit being 64.
> 
> In order to start addressing this, rework the existing grant table logic by
> implementing a single gnttab_acquire_resource().  This is far more efficient
> than the previous acquire_grant_table() in memory.c because it doesn't take
> the grant table write lock, and attempt to grow the table, for every single
> frame.
> 

But that should not have happened before because the code deliberately iterates backwards, thereby starting with the last frame, thereby growing the table at most once. (I agree that dropping and re-acquiring the lock every time was sub-optimal).

> The new gnttab_acquire_resource() function subsumes the previous two
> gnttab_get_{shared,status}_frame() helpers.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: George Dunlap <George.Dunlap@eu.citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Wei Liu <wl@xen.org>
> CC: Julien Grall <julien@xen.org>
> CC: Paul Durrant <paul@xen.org>
> CC: Michał Leszczyński <michal.leszczynski@cert.pl>
> CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
> ---
>  xen/common/grant_table.c      | 93 ++++++++++++++++++++++++++++++-------------
>  xen/common/memory.c           | 42 +------------------
>  xen/include/xen/grant_table.h | 19 ++++-----
>  3 files changed, 75 insertions(+), 79 deletions(-)
> 
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 9f0cae52c0..122d1e7596 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>      return 0;
>  }
> 
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
> +{
> +    struct grant_table *gt = d->grant_table;
> +    unsigned int i = nr_frames, tot_frames;
> +    void **vaddrs;
> +    int rc = 0;
> +
> +    /* Input sanity. */
> +    if ( !nr_frames )
> +        return -EINVAL;

This was not an error before. Does mapping 0 frames really need to be a failure?

> +
> +    /* Overflow checks */
> +    if ( frame + nr_frames < frame )
> +        return -EINVAL;
> +
> +    tot_frames = frame + nr_frames;

That name is confusing. 'last_frame' perhaps (and then make the -1 implicit)?

> +    if ( tot_frames != frame + nr_frames )
> +        return -EINVAL;
> +
> +    /* Grow table if necessary. */
> +    grant_write_lock(gt);
> +    switch ( id )
> +    {
> +        mfn_t tmp;
> +
> +    case XENMEM_resource_grant_table_id_shared:
> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        if ( gt->gt_version != 2 )
> +        {
> +    default:
> +            rc = -EINVAL;
> +            break;
> +        }
> +        rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, &tmp);
> +        break;
> +    }
> +
> +    /* Any errors from growing the table? */
> +    if ( rc )
> +        goto out;
> +
> +    switch ( id )
> +    {
> +    case XENMEM_resource_grant_table_id_shared:
> +        vaddrs = gt->shared_raw;
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        vaddrs = (void **)gt->status;

Erk. Could we avoid this casting nastiness by putting a loop in each switch case. I know it could be considered code duplication but this is a bit icky.

> +        break;
> +    }
> +
> +    for ( i = 0; i < nr_frames; ++i )
> +        mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
> +
> + out:
> +    grant_write_unlock(gt);

Since you deliberately grew the table first, could you not drop the write lock and acquire it a read lock before looping over the frames?

  Paul

> +
> +    return rc;
> +}
> +
>  int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn, mfn_t *mfn)
>  {
>      int rc = 0;
> @@ -4047,33 +4113,6 @@ int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn, mfn_t
> *mfn)
>      return rc;
>  }
> 
> -int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
> -                            mfn_t *mfn)
> -{
> -    struct grant_table *gt = d->grant_table;
> -    int rc;
> -
> -    grant_write_lock(gt);
> -    rc = gnttab_get_shared_frame_mfn(d, idx, mfn);
> -    grant_write_unlock(gt);
> -
> -    return rc;
> -}
> -
> -int gnttab_get_status_frame(struct domain *d, unsigned long idx,
> -                            mfn_t *mfn)
> -{
> -    struct grant_table *gt = d->grant_table;
> -    int rc;
> -
> -    grant_write_lock(gt);
> -    rc = (gt->gt_version == 2) ?
> -        gnttab_get_status_frame_mfn(d, idx, mfn) : -EINVAL;
> -    grant_write_unlock(gt);
> -
> -    return rc;
> -}
> -
>  static void gnttab_usage_print(struct domain *rd)
>  {
>      int first = 1;
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 714077c1e5..dc3a7248e3 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1007,44 +1007,6 @@ static long xatp_permission_check(struct domain *d, unsigned int space)
>      return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
>  }
> 
> -static int acquire_grant_table(struct domain *d, unsigned int id,
> -                               unsigned long frame,
> -                               unsigned int nr_frames,
> -                               xen_pfn_t mfn_list[])
> -{
> -    unsigned int i = nr_frames;
> -
> -    /* Iterate backwards in case table needs to grow */
> -    while ( i-- != 0 )
> -    {
> -        mfn_t mfn = INVALID_MFN;
> -        int rc;
> -
> -        switch ( id )
> -        {
> -        case XENMEM_resource_grant_table_id_shared:
> -            rc = gnttab_get_shared_frame(d, frame + i, &mfn);
> -            break;
> -
> -        case XENMEM_resource_grant_table_id_status:
> -            rc = gnttab_get_status_frame(d, frame + i, &mfn);
> -            break;
> -
> -        default:
> -            rc = -EINVAL;
> -            break;
> -        }
> -
> -        if ( rc )
> -            return rc;
> -
> -        ASSERT(!mfn_eq(mfn, INVALID_MFN));
> -        mfn_list[i] = mfn_x(mfn);
> -    }
> -
> -    return 0;
> -}
> -
>  static int acquire_resource(
>      XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
>  {
> @@ -1091,8 +1053,8 @@ static int acquire_resource(
>      switch ( xmar.type )
>      {
>      case XENMEM_resource_grant_table:
> -        rc = acquire_grant_table(d, xmar.id, xmar.frame, xmar.nr_frames,
> -                                 mfn_list);
> +        rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
> +                                     mfn_list);
>          break;
> 
>      default:
> diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
> index 98603604b8..5a2c75b880 100644
> --- a/xen/include/xen/grant_table.h
> +++ b/xen/include/xen/grant_table.h
> @@ -56,10 +56,10 @@ int mem_sharing_gref_to_gfn(struct grant_table *gt, grant_ref_t ref,
> 
>  int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn,
>                       mfn_t *mfn);
> -int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
> -                            mfn_t *mfn);
> -int gnttab_get_status_frame(struct domain *d, unsigned long idx,
> -                            mfn_t *mfn);
> +
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[]);
> 
>  #else
> 
> @@ -93,14 +93,9 @@ static inline int gnttab_map_frame(struct domain *d, unsigned long idx,
>      return -EINVAL;
>  }
> 
> -static inline int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
> -                                          mfn_t *mfn)
> -{
> -    return -EINVAL;
> -}
> -
> -static inline int gnttab_get_status_frame(struct domain *d, unsigned long idx,
> -                                          mfn_t *mfn)
> +static inline int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>  {
>      return -EINVAL;
>  }
> --
> 2.11.0
Julien Grall July 30, 2020, 10:56 a.m. UTC | #4
Hi Andrew,

On 28/07/2020 12:37, Andrew Cooper wrote:
> The existing logic doesn't function in the general case for mapping a guests
> grant table, due to arbitrary 32 frame limit, and the default grant table
> limit being 64.
> 
> In order to start addressing this, rework the existing grant table logic by
> implementing a single gnttab_acquire_resource().  This is far more efficient
> than the previous acquire_grant_table() in memory.c because it doesn't take
> the grant table write lock, and attempt to grow the table, for every single
> frame.
> 
> The new gnttab_acquire_resource() function subsumes the previous two
> gnttab_get_{shared,status}_frame() helpers.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: George Dunlap <George.Dunlap@eu.citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Wei Liu <wl@xen.org>
> CC: Julien Grall <julien@xen.org>
> CC: Paul Durrant <paul@xen.org>
> CC: Michał Leszczyński <michal.leszczynski@cert.pl>
> CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
> ---
>   xen/common/grant_table.c      | 93 ++++++++++++++++++++++++++++++-------------
>   xen/common/memory.c           | 42 +------------------
>   xen/include/xen/grant_table.h | 19 ++++-----
>   3 files changed, 75 insertions(+), 79 deletions(-)
> 
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 9f0cae52c0..122d1e7596 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>       return 0;
>   }
>   
> +int gnttab_acquire_resource(
> +    struct domain *d, unsigned int id, unsigned long frame,
> +    unsigned int nr_frames, xen_pfn_t mfn_list[])

Any reasons for changing the way we usually indent parameters?

> +{
> +    struct grant_table *gt = d->grant_table;
> +    unsigned int i = nr_frames, tot_frames;
> +    void **vaddrs;
> +    int rc = 0;

AFAICT rc will always be initialized when used. So is it necessary?

> +
> +    /* Input sanity. */
> +    if ( !nr_frames )
> +        return -EINVAL;
> +
> +    /* Overflow checks */
> +    if ( frame + nr_frames < frame )
> +        return -EINVAL;
> +
> +    tot_frames = frame + nr_frames;
> +    if ( tot_frames != frame + nr_frames )
> +        return -EINVAL;
> +
> +    /* Grow table if necessary. */
> +    grant_write_lock(gt);
> +    switch ( id )
> +    {
> +        mfn_t tmp;
> +
> +    case XENMEM_resource_grant_table_id_shared:
> +        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        if ( gt->gt_version != 2 )
> +        {
> +    default:
> +            rc = -EINVAL;
> +            break;
I am aware this is valid C code, but I find this construct confusing. 
Could you just duplicate the 2 lines and have a separate default case?

> +        }
> +        rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, &tmp);
> +        break;
> +    }
> +
> +    /* Any errors from growing the table? */
> +    if ( rc )
> +        goto out;
> +
> +    switch ( id )
> +    {
> +    case XENMEM_resource_grant_table_id_shared:
> +        vaddrs = gt->shared_raw;
> +        break;
> +
> +    case XENMEM_resource_grant_table_id_status:
> +        vaddrs = (void **)gt->status;
> +        break;
> +    }
> +

NIT: Would it make sense to check vaddrs? This would avoid NULL 
dereference pointers if something went wrong.

> +    for ( i = 0; i < nr_frames; ++i )
> +        mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
> +
> + out:
> +    grant_write_unlock(gt);
> +
> +    return rc;
> +}
> +

Cheers,
Andrew Cooper Sept. 22, 2020, 1:10 p.m. UTC | #5
On 29/07/2020 21:02, Jan Beulich wrote:
> On 28.07.2020 13:37, Andrew Cooper wrote:
>> The existing logic doesn't function in the general case for mapping a
>> guests
>> grant table, due to arbitrary 32 frame limit, and the default grant
>> table
>> limit being 64.
>>
>> In order to start addressing this, rework the existing grant table
>> logic by
>> implementing a single gnttab_acquire_resource().  This is far more
>> efficient
>> than the previous acquire_grant_table() in memory.c because it
>> doesn't take
>> the grant table write lock, and attempt to grow the table, for every
>> single
>> frame.
>
> Among the code you replace there is a comment "Iterate backwards in case
> table needs to grow" explaining why what you say about growing the grant
> table didn't actually happen.

What I have said is accurate.

Iterating backwards caused it to actually grow once, but every iteration
still takes the lock and attempts to grow the table.

>
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct
>> domain *d,
>>       return 0;
>>   }
>>   +int gnttab_acquire_resource(
>> +    struct domain *d, unsigned int id, unsigned long frame,
>> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>> +{
>> +    struct grant_table *gt = d->grant_table;
>> +    unsigned int i = nr_frames, tot_frames;
>> +    void **vaddrs;
>> +    int rc = 0;
>> +
>> +    /* Input sanity. */
>> +    if ( !nr_frames )
>> +        return -EINVAL;
>
> I can't seem to be able to find an equivalent of this in the old logic,
> and hence this looks like an unwarranted change in behavior to me. We
> have quite a few hypercall ops where some count being zero is simply
> a no-op, i.e. yielding success without doing anything.

There is no possible circumstance when getting here requesting 0 is
legitimate.

Tolerating a zero input for a mapping request is a very expensive way of
hiding caller bugs.

Most importantly however, the correctness of the logic does depends on
nr_frames being nonzero.

>
>> +    /* Overflow checks */
>> +    if ( frame + nr_frames < frame )
>> +        return -EINVAL;
>> +
>> +    tot_frames = frame + nr_frames;
>> +    if ( tot_frames != frame + nr_frames )
>> +        return -EINVAL;
>
> I find the naming here quite confusing. I realize part of this stems
> from the code you replace, but anyway: "unsigned long frame" typically
> represents a memory frame number of some sort, making the calculation
> look as if it was wrong. (Initially I merely meant to ask whether this
> check isn't redundant with the prior one, or vice versa.)

Both checks are strictly necessary.  The top one checks 64bit wrapping,
while the second is checking 32bit truncation.

This code is horrible, both in terms of the acquire_resource API, and
the grant API being off by one compared to normal, and needing the
max_frame index, not the number of frames.

~Andrew
Andrew Cooper Sept. 22, 2020, 1:13 p.m. UTC | #6
On 30/07/2020 09:14, Paul Durrant wrote:
>> -----Original Message-----
>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>> Sent: 28 July 2020 12:37
>> To: Xen-devel <xen-devel@lists.xenproject.org>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap <George.Dunlap@eu.citrix.com>; Ian
>> Jackson <ian.jackson@citrix.com>; Jan Beulich <JBeulich@suse.com>; Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Julien
>> Grall <julien@xen.org>; Paul Durrant <paul@xen.org>; Michał Leszczyński <michal.leszczynski@cert.pl>;
>> Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
>> Subject: [PATCH 2/5] xen/gnttab: Rework resource acquisition
>>
>> The existing logic doesn't function in the general case for mapping a guests
>> grant table, due to arbitrary 32 frame limit, and the default grant table
>> limit being 64.
>>
>> In order to start addressing this, rework the existing grant table logic by
>> implementing a single gnttab_acquire_resource().  This is far more efficient
>> than the previous acquire_grant_table() in memory.c because it doesn't take
>> the grant table write lock, and attempt to grow the table, for every single
>> frame.
>>
> But that should not have happened before because the code deliberately iterates backwards, thereby starting with the last frame, thereby growing the table at most once. (I agree that dropping and re-acquiring the lock every time was sub-optimal).

It still attempts to grow on every iteration.  Its just growing to a
smaller size than already succeeded.

>
>> The new gnttab_acquire_resource() function subsumes the previous two
>> gnttab_get_{shared,status}_frame() helpers.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: George Dunlap <George.Dunlap@eu.citrix.com>
>> CC: Ian Jackson <ian.jackson@citrix.com>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Wei Liu <wl@xen.org>
>> CC: Julien Grall <julien@xen.org>
>> CC: Paul Durrant <paul@xen.org>
>> CC: Michał Leszczyński <michal.leszczynski@cert.pl>
>> CC: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
>> ---
>>  xen/common/grant_table.c      | 93 ++++++++++++++++++++++++++++++-------------
>>  xen/common/memory.c           | 42 +------------------
>>  xen/include/xen/grant_table.h | 19 ++++-----
>>  3 files changed, 75 insertions(+), 79 deletions(-)
>>
>> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
>> index 9f0cae52c0..122d1e7596 100644
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
>> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct domain *d,
>>      return 0;
>>  }
>>
>> +int gnttab_acquire_resource(
>> +    struct domain *d, unsigned int id, unsigned long frame,
>> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>> +{
>> +    struct grant_table *gt = d->grant_table;
>> +    unsigned int i = nr_frames, tot_frames;
>> +    void **vaddrs;
>> +    int rc = 0;
>> +
>> +    /* Input sanity. */
>> +    if ( !nr_frames )
>> +        return -EINVAL;
> This was not an error before. Does mapping 0 frames really need to be a failure?

Yes.

You have spotted the -1 which depends on nr_frames being nonzero to
function correctly.

>> +
>> +    /* Overflow checks */
>> +    if ( frame + nr_frames < frame )
>> +        return -EINVAL;
>> +
>> +    tot_frames = frame + nr_frames;
> That name is confusing. 'last_frame' perhaps (and then make the -1 implicit)?

How is that naming any less confusing?

>> +        break;
>> +    }
>> +
>> +    for ( i = 0; i < nr_frames; ++i )
>> +        mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
>> +
>> + out:
>> +    grant_write_unlock(gt);
> Since you deliberately grew the table first, could you not drop the write lock and acquire it a read lock before looping over the frames?

I tried originally.  That's not an operation supported by percpu
read/write locks, and this isn't a fastpath.

~Andrew
Jan Beulich Sept. 22, 2020, 1:34 p.m. UTC | #7
On 22.09.2020 15:10, Andrew Cooper wrote:
> On 29/07/2020 21:02, Jan Beulich wrote:
>> On 28.07.2020 13:37, Andrew Cooper wrote:
>>> --- a/xen/common/grant_table.c
>>> +++ b/xen/common/grant_table.c
>>> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct
>>> domain *d,
>>>       return 0;
>>>   }
>>>   +int gnttab_acquire_resource(
>>> +    struct domain *d, unsigned int id, unsigned long frame,
>>> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>>> +{
>>> +    struct grant_table *gt = d->grant_table;
>>> +    unsigned int i = nr_frames, tot_frames;
>>> +    void **vaddrs;
>>> +    int rc = 0;
>>> +
>>> +    /* Input sanity. */
>>> +    if ( !nr_frames )
>>> +        return -EINVAL;
>>
>> I can't seem to be able to find an equivalent of this in the old logic,
>> and hence this looks like an unwarranted change in behavior to me. We
>> have quite a few hypercall ops where some count being zero is simply
>> a no-op, i.e. yielding success without doing anything.
> 
> There is no possible circumstance when getting here requesting 0 is
> legitimate.
> 
> Tolerating a zero input for a mapping request is a very expensive way of
> hiding caller bugs.

That's just one possible view. There are people thinking that
some extra runtime overhead doesn't outweigh the "clutter" of
extra conditionals in their code, and hence would rather avoid
checking if some calculation of theirs yielded zero before
making a hypercall. I think we should try to consistently
regard counts of zero as "nothing to do" in all our hypercalls.

> Most importantly however, the correctness of the logic does depends on
> nr_frames being nonzero.

I didn't mean to ask to drop the conditional if it's needed;
all I'm opposed to is raising an error in this case.

Jan
Andrew Cooper Sept. 22, 2020, 2:50 p.m. UTC | #8
On 22/09/2020 14:34, Jan Beulich wrote:
> On 22.09.2020 15:10, Andrew Cooper wrote:
>> On 29/07/2020 21:02, Jan Beulich wrote:
>>> On 28.07.2020 13:37, Andrew Cooper wrote:
>>>> --- a/xen/common/grant_table.c
>>>> +++ b/xen/common/grant_table.c
>>>> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct
>>>> domain *d,
>>>>       return 0;
>>>>   }
>>>>   +int gnttab_acquire_resource(
>>>> +    struct domain *d, unsigned int id, unsigned long frame,
>>>> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>>>> +{
>>>> +    struct grant_table *gt = d->grant_table;
>>>> +    unsigned int i = nr_frames, tot_frames;
>>>> +    void **vaddrs;
>>>> +    int rc = 0;
>>>> +
>>>> +    /* Input sanity. */
>>>> +    if ( !nr_frames )
>>>> +        return -EINVAL;
>>> I can't seem to be able to find an equivalent of this in the old logic,
>>> and hence this looks like an unwarranted change in behavior to me. We
>>> have quite a few hypercall ops where some count being zero is simply
>>> a no-op, i.e. yielding success without doing anything.
>> There is no possible circumstance when getting here requesting 0 is
>> legitimate.
>>
>> Tolerating a zero input for a mapping request is a very expensive way of
>> hiding caller bugs.
> That's just one possible view.

... that is fully enforced in the ecosystem we work in.

You can't get a 0 to here in the first place.

However, when it comes to the XLAT and the chunking fixes, getting 0
here is a hard error, and I'm not interested in breaking the that logic
for a non-existent corner case.

~Andrew
Jan Beulich Sept. 22, 2020, 4:01 p.m. UTC | #9
On 22.09.2020 16:50, Andrew Cooper wrote:
> On 22/09/2020 14:34, Jan Beulich wrote:
>> On 22.09.2020 15:10, Andrew Cooper wrote:
>>> On 29/07/2020 21:02, Jan Beulich wrote:
>>>> On 28.07.2020 13:37, Andrew Cooper wrote:
>>>>> --- a/xen/common/grant_table.c
>>>>> +++ b/xen/common/grant_table.c
>>>>> @@ -4013,6 +4013,72 @@ static int gnttab_get_shared_frame_mfn(struct
>>>>> domain *d,
>>>>>       return 0;
>>>>>   }
>>>>>   +int gnttab_acquire_resource(
>>>>> +    struct domain *d, unsigned int id, unsigned long frame,
>>>>> +    unsigned int nr_frames, xen_pfn_t mfn_list[])
>>>>> +{
>>>>> +    struct grant_table *gt = d->grant_table;
>>>>> +    unsigned int i = nr_frames, tot_frames;
>>>>> +    void **vaddrs;
>>>>> +    int rc = 0;
>>>>> +
>>>>> +    /* Input sanity. */
>>>>> +    if ( !nr_frames )
>>>>> +        return -EINVAL;
>>>> I can't seem to be able to find an equivalent of this in the old logic,
>>>> and hence this looks like an unwarranted change in behavior to me. We
>>>> have quite a few hypercall ops where some count being zero is simply
>>>> a no-op, i.e. yielding success without doing anything.
>>> There is no possible circumstance when getting here requesting 0 is
>>> legitimate.
>>>
>>> Tolerating a zero input for a mapping request is a very expensive way of
>>> hiding caller bugs.
>> That's just one possible view.
> 
> ... that is fully enforced in the ecosystem we work in.
> 
> You can't get a 0 to here in the first place.

Would you mind pointing me at why this is? All I can see is

    if ( xmar.nr_frames > ARRAY_SIZE(mfn_list) )
        return -E2BIG;

as far as the caller's argument checking goes here. And
acquire_grant_table(), which is what you're replacing, also is dealing
with zero quite fine afaics. As is arch_acquire_resource().

> However, when it comes to the XLAT and the chunking fixes, getting 0
> here is a hard error, and I'm not interested in breaking the that logic
> for a non-existent corner case.

I don't share this view. It's also interesting that you had to address
similar questions by Paul, isn't it?

Jan
diff mbox series

Patch

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 9f0cae52c0..122d1e7596 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -4013,6 +4013,72 @@  static int gnttab_get_shared_frame_mfn(struct domain *d,
     return 0;
 }
 
+int gnttab_acquire_resource(
+    struct domain *d, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[])
+{
+    struct grant_table *gt = d->grant_table;
+    unsigned int i = nr_frames, tot_frames;
+    void **vaddrs;
+    int rc = 0;
+
+    /* Input sanity. */
+    if ( !nr_frames )
+        return -EINVAL;
+
+    /* Overflow checks */
+    if ( frame + nr_frames < frame )
+        return -EINVAL;
+
+    tot_frames = frame + nr_frames;
+    if ( tot_frames != frame + nr_frames )
+        return -EINVAL;
+
+    /* Grow table if necessary. */
+    grant_write_lock(gt);
+    switch ( id )
+    {
+        mfn_t tmp;
+
+    case XENMEM_resource_grant_table_id_shared:
+        rc = gnttab_get_shared_frame_mfn(d, tot_frames - 1, &tmp);
+        break;
+
+    case XENMEM_resource_grant_table_id_status:
+        if ( gt->gt_version != 2 )
+        {
+    default:
+            rc = -EINVAL;
+            break;
+        }
+        rc = gnttab_get_status_frame_mfn(d, tot_frames - 1, &tmp);
+        break;
+    }
+
+    /* Any errors from growing the table? */
+    if ( rc )
+        goto out;
+
+    switch ( id )
+    {
+    case XENMEM_resource_grant_table_id_shared:
+        vaddrs = gt->shared_raw;
+        break;
+
+    case XENMEM_resource_grant_table_id_status:
+        vaddrs = (void **)gt->status;
+        break;
+    }
+
+    for ( i = 0; i < nr_frames; ++i )
+        mfn_list[i] = virt_to_mfn(vaddrs[frame + i]);
+
+ out:
+    grant_write_unlock(gt);
+
+    return rc;
+}
+
 int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn, mfn_t *mfn)
 {
     int rc = 0;
@@ -4047,33 +4113,6 @@  int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn, mfn_t *mfn)
     return rc;
 }
 
-int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn)
-{
-    struct grant_table *gt = d->grant_table;
-    int rc;
-
-    grant_write_lock(gt);
-    rc = gnttab_get_shared_frame_mfn(d, idx, mfn);
-    grant_write_unlock(gt);
-
-    return rc;
-}
-
-int gnttab_get_status_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn)
-{
-    struct grant_table *gt = d->grant_table;
-    int rc;
-
-    grant_write_lock(gt);
-    rc = (gt->gt_version == 2) ?
-        gnttab_get_status_frame_mfn(d, idx, mfn) : -EINVAL;
-    grant_write_unlock(gt);
-
-    return rc;
-}
-
 static void gnttab_usage_print(struct domain *rd)
 {
     int first = 1;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c1e5..dc3a7248e3 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1007,44 +1007,6 @@  static long xatp_permission_check(struct domain *d, unsigned int space)
     return xsm_add_to_physmap(XSM_TARGET, current->domain, d);
 }
 
-static int acquire_grant_table(struct domain *d, unsigned int id,
-                               unsigned long frame,
-                               unsigned int nr_frames,
-                               xen_pfn_t mfn_list[])
-{
-    unsigned int i = nr_frames;
-
-    /* Iterate backwards in case table needs to grow */
-    while ( i-- != 0 )
-    {
-        mfn_t mfn = INVALID_MFN;
-        int rc;
-
-        switch ( id )
-        {
-        case XENMEM_resource_grant_table_id_shared:
-            rc = gnttab_get_shared_frame(d, frame + i, &mfn);
-            break;
-
-        case XENMEM_resource_grant_table_id_status:
-            rc = gnttab_get_status_frame(d, frame + i, &mfn);
-            break;
-
-        default:
-            rc = -EINVAL;
-            break;
-        }
-
-        if ( rc )
-            return rc;
-
-        ASSERT(!mfn_eq(mfn, INVALID_MFN));
-        mfn_list[i] = mfn_x(mfn);
-    }
-
-    return 0;
-}
-
 static int acquire_resource(
     XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1091,8 +1053,8 @@  static int acquire_resource(
     switch ( xmar.type )
     {
     case XENMEM_resource_grant_table:
-        rc = acquire_grant_table(d, xmar.id, xmar.frame, xmar.nr_frames,
-                                 mfn_list);
+        rc = gnttab_acquire_resource(d, xmar.id, xmar.frame, xmar.nr_frames,
+                                     mfn_list);
         break;
 
     default:
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 98603604b8..5a2c75b880 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -56,10 +56,10 @@  int mem_sharing_gref_to_gfn(struct grant_table *gt, grant_ref_t ref,
 
 int gnttab_map_frame(struct domain *d, unsigned long idx, gfn_t gfn,
                      mfn_t *mfn);
-int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn);
-int gnttab_get_status_frame(struct domain *d, unsigned long idx,
-                            mfn_t *mfn);
+
+int gnttab_acquire_resource(
+    struct domain *d, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[]);
 
 #else
 
@@ -93,14 +93,9 @@  static inline int gnttab_map_frame(struct domain *d, unsigned long idx,
     return -EINVAL;
 }
 
-static inline int gnttab_get_shared_frame(struct domain *d, unsigned long idx,
-                                          mfn_t *mfn)
-{
-    return -EINVAL;
-}
-
-static inline int gnttab_get_status_frame(struct domain *d, unsigned long idx,
-                                          mfn_t *mfn)
+static inline int gnttab_acquire_resource(
+    struct domain *d, unsigned int id, unsigned long frame,
+    unsigned int nr_frames, xen_pfn_t mfn_list[])
 {
     return -EINVAL;
 }