diff mbox

[v3,1/2] Interface for grant copy operation in libs.

Message ID 20160622132918.GD1790@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wei Liu June 22, 2016, 1:29 p.m. UTC
On Wed, Jun 22, 2016 at 01:37:50PM +0100, David Vrabel wrote:
> On 22/06/16 12:21, Wei Liu wrote:
> > On Wed, Jun 22, 2016 at 10:37:24AM +0100, David Vrabel wrote:
> >> On 22/06/16 09:38, Paulina Szubarczyk wrote:
> >>> In a linux part an ioctl(gntdev, IOCTL_GNTDEV_GRANT_COPY, ..)
> >>> system call is invoked. In mini-os the operation is yet not
> >>> implemented. For other OSs there is a dummy implementation.
> >> [...]
> >>> --- a/tools/libs/gnttab/linux.c
> >>> +++ b/tools/libs/gnttab/linux.c
> >>> @@ -235,6 +235,51 @@ int osdep_gnttab_unmap(xengnttab_handle *xgt,
> >>>      return 0;
> >>>  }
> >>>  
> >>> +int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
> >>> +                            uint32_t count,
> >>> +                            xengnttab_grant_copy_segment_t *segs)
> >>> +{
> >>> +    int i, rc;
> >>> +    int fd = xgt->fd;
> >>> +    struct ioctl_gntdev_grant_copy copy;
> >>> +
> >>> +    copy.segments = calloc(count, sizeof(struct ioctl_gntdev_grant_copy_segment));
> >>> +    copy.count = count;
> >>> +    for (i = 0; i < count; i++)
> >>> +    {
> >>> +        copy.segments[i].flags = segs[i].flags;
> >>> +        copy.segments[i].len = segs[i].len;
> >>> +        if (segs[i].flags == GNTCOPY_dest_gref) 
> >>> +        {
> >>> +            copy.segments[i].dest.foreign.ref = segs[i].dest.foreign.ref;
> >>> +            copy.segments[i].dest.foreign.domid = segs[i].dest.foreign.domid;
> >>> +            copy.segments[i].dest.foreign.offset = segs[i].dest.foreign.offset;
> >>> +            copy.segments[i].source.virt = segs[i].source.virt;
> >>> +        } 
> >>> +        else 
> >>> +        {
> >>> +            copy.segments[i].source.foreign.ref = segs[i].source.foreign.ref;
> >>> +            copy.segments[i].source.foreign.domid = segs[i].source.foreign.domid;
> >>> +            copy.segments[i].source.foreign.offset = segs[i].source.foreign.offset;
> >>> +            copy.segments[i].dest.virt = segs[i].dest.virt;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    rc = ioctl(fd, IOCTL_GNTDEV_GRANT_COPY, &copy);
> >>> +    if (rc) 
> >>> +    {
> >>> +        GTERROR(xgt->logger, "ioctl GRANT COPY failed %d ", errno);
> >>> +    }
> >>> +    else 
> >>> +    {
> >>> +        for (i = 0; i < count; i++)
> >>> +            segs[i].status = copy.segments[i].status;
> >>> +    }
> >>> +
> >>> +    free(copy.segments);
> >>> +    return rc;
> >>> +}
> >>
> >> I know Wei asked for this but you've replaced what should be a single
> >> pointer assignment with a memory allocation and two loops over all the
> >> segments.
> >>
> >> This is a hot path and the two structures (the libxengnttab one and the
> >> Linux kernel one) are both part of their respective ABIs and won't
> >> change so Wei's concern that they might change in the future is unfounded.
> >>
> > 
> > The fundamental question is: will the ABI between the library and the
> > kernel ever go mismatch?
> > 
> > My answer is "maybe".  My rationale is that everything goes across
> > boundary of components need to be considered with caution. And I tend to
> > assume the worst things will happen.
> > 
> > To guarantee that they will never go mismatch is to have
> > 
> >    typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t;
> > 
> > But that's not how the code is written.
> > 
> > I would like to hear a third opinion. Is my concern unfounded? Am I too
> > cautious? Is there any compelling argument that I missed?
> > 
> > Somewhat related, can we have some numbers please? It could well be the
> > cost of the two loops is much cheaper than whatever is going on inside
> > the kernel / hypervisor. And it could turn out that the numbers render
> > this issue moot.
> 
> I did some (very) adhoc measurements and with the worst case of single
> short segments for each ioctl, the optimized version of
> osdep_gnttab_grant_copy() looks to be ~5% faster.
> 
> This is enough of a difference that we should use the optimized version.
> 
> The unoptimized version also adds an additional failure path (the
> calloc) which would be best avoided.
> 

Your test case includes a lot of  noise in libc allocator, so...

Can you give try the following patch (apply on top of Paulina's patch)?
The basic idea is to provide scratch space for the structures. Note, the
patch is compile test only.

---8<---
From e72c1abb9852f40db5eeee48ef208492c3283884 Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@citrix.com>
Date: Wed, 22 Jun 2016 14:22:48 +0100
Subject: [PATCH] xengnttab: provide osdep cache and use it in Linux grant copy

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 tools/libs/gnttab/linux.c   | 35 +++++++++++++++++++++++++++++------
 tools/libs/gnttab/private.h |  2 ++
 2 files changed, 31 insertions(+), 6 deletions(-)

Comments

David Vrabel June 22, 2016, 1:52 p.m. UTC | #1
On 22/06/16 14:29, Wei Liu wrote:
> On Wed, Jun 22, 2016 at 01:37:50PM +0100, David Vrabel wrote:
>> On 22/06/16 12:21, Wei Liu wrote:
>>> On Wed, Jun 22, 2016 at 10:37:24AM +0100, David Vrabel wrote:
>>>> On 22/06/16 09:38, Paulina Szubarczyk wrote:
>>>>> In a linux part an ioctl(gntdev, IOCTL_GNTDEV_GRANT_COPY, ..)
>>>>> system call is invoked. In mini-os the operation is yet not
>>>>> implemented. For other OSs there is a dummy implementation.
>>>> [...]
>>>>> --- a/tools/libs/gnttab/linux.c
>>>>> +++ b/tools/libs/gnttab/linux.c
>>>>> @@ -235,6 +235,51 @@ int osdep_gnttab_unmap(xengnttab_handle *xgt,
>>>>>      return 0;
>>>>>  }
>>>>>  
>>>>> +int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
>>>>> +                            uint32_t count,
>>>>> +                            xengnttab_grant_copy_segment_t *segs)
>>>>> +{
>>>>> +    int i, rc;
>>>>> +    int fd = xgt->fd;
>>>>> +    struct ioctl_gntdev_grant_copy copy;
>>>>> +
>>>>> +    copy.segments = calloc(count, sizeof(struct ioctl_gntdev_grant_copy_segment));
>>>>> +    copy.count = count;
>>>>> +    for (i = 0; i < count; i++)
>>>>> +    {
>>>>> +        copy.segments[i].flags = segs[i].flags;
>>>>> +        copy.segments[i].len = segs[i].len;
>>>>> +        if (segs[i].flags == GNTCOPY_dest_gref) 
>>>>> +        {
>>>>> +            copy.segments[i].dest.foreign.ref = segs[i].dest.foreign.ref;
>>>>> +            copy.segments[i].dest.foreign.domid = segs[i].dest.foreign.domid;
>>>>> +            copy.segments[i].dest.foreign.offset = segs[i].dest.foreign.offset;
>>>>> +            copy.segments[i].source.virt = segs[i].source.virt;
>>>>> +        } 
>>>>> +        else 
>>>>> +        {
>>>>> +            copy.segments[i].source.foreign.ref = segs[i].source.foreign.ref;
>>>>> +            copy.segments[i].source.foreign.domid = segs[i].source.foreign.domid;
>>>>> +            copy.segments[i].source.foreign.offset = segs[i].source.foreign.offset;
>>>>> +            copy.segments[i].dest.virt = segs[i].dest.virt;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    rc = ioctl(fd, IOCTL_GNTDEV_GRANT_COPY, &copy);
>>>>> +    if (rc) 
>>>>> +    {
>>>>> +        GTERROR(xgt->logger, "ioctl GRANT COPY failed %d ", errno);
>>>>> +    }
>>>>> +    else 
>>>>> +    {
>>>>> +        for (i = 0; i < count; i++)
>>>>> +            segs[i].status = copy.segments[i].status;
>>>>> +    }
>>>>> +
>>>>> +    free(copy.segments);
>>>>> +    return rc;
>>>>> +}
>>>>
>>>> I know Wei asked for this but you've replaced what should be a single
>>>> pointer assignment with a memory allocation and two loops over all the
>>>> segments.
>>>>
>>>> This is a hot path and the two structures (the libxengnttab one and the
>>>> Linux kernel one) are both part of their respective ABIs and won't
>>>> change so Wei's concern that they might change in the future is unfounded.
>>>>
>>>
>>> The fundamental question is: will the ABI between the library and the
>>> kernel ever go mismatch?
>>>
>>> My answer is "maybe".  My rationale is that everything goes across
>>> boundary of components need to be considered with caution. And I tend to
>>> assume the worst things will happen.
>>>
>>> To guarantee that they will never go mismatch is to have
>>>
>>>    typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t;
>>>
>>> But that's not how the code is written.
>>>
>>> I would like to hear a third opinion. Is my concern unfounded? Am I too
>>> cautious? Is there any compelling argument that I missed?
>>>
>>> Somewhat related, can we have some numbers please? It could well be the
>>> cost of the two loops is much cheaper than whatever is going on inside
>>> the kernel / hypervisor. And it could turn out that the numbers render
>>> this issue moot.
>>
>> I did some (very) adhoc measurements and with the worst case of single
>> short segments for each ioctl, the optimized version of
>> osdep_gnttab_grant_copy() looks to be ~5% faster.
>>
>> This is enough of a difference that we should use the optimized version.
>>
>> The unoptimized version also adds an additional failure path (the
>> calloc) which would be best avoided.
>>
> 
> Your test case includes a lot of  noise in libc allocator, so...
> 
> Can you give try the following patch (apply on top of Paulina's patch)?
> The basic idea is to provide scratch space for the structures. Note, the
> patch is compile test only.
[...]
> +#define COPY_SEGMENT_CACHE_SIZE 1024

Arbitrary limit on number of segments.

> +    copy.segments = xgt->osdep_data;

Not thread safe.

I tried using alloca() which has <1% performance penalty but the failure
mode for alloca() is really bad so I would not recommend it.

I think the best solution is to allow the osdep code to provide the
implementation of xengnttab_grant_copy_segment_t, allowing the Linux
code to do:

typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t

You should still provide the generic structure as well, for those
platforms that don't provide their own optimized version.

David
Wei Liu June 22, 2016, 2:52 p.m. UTC | #2
On Wed, Jun 22, 2016 at 02:52:47PM +0100, David Vrabel wrote:
> On 22/06/16 14:29, Wei Liu wrote:
> > On Wed, Jun 22, 2016 at 01:37:50PM +0100, David Vrabel wrote:
> >> On 22/06/16 12:21, Wei Liu wrote:
> >>> On Wed, Jun 22, 2016 at 10:37:24AM +0100, David Vrabel wrote:
> >>>> On 22/06/16 09:38, Paulina Szubarczyk wrote:
> >>>>> In a linux part an ioctl(gntdev, IOCTL_GNTDEV_GRANT_COPY, ..)
> >>>>> system call is invoked. In mini-os the operation is yet not
> >>>>> implemented. For other OSs there is a dummy implementation.
> >>>> [...]
> >>>>> --- a/tools/libs/gnttab/linux.c
> >>>>> +++ b/tools/libs/gnttab/linux.c
> >>>>> @@ -235,6 +235,51 @@ int osdep_gnttab_unmap(xengnttab_handle *xgt,
> >>>>>      return 0;
> >>>>>  }
> >>>>>  
> >>>>> +int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
> >>>>> +                            uint32_t count,
> >>>>> +                            xengnttab_grant_copy_segment_t *segs)
> >>>>> +{
> >>>>> +    int i, rc;
> >>>>> +    int fd = xgt->fd;
> >>>>> +    struct ioctl_gntdev_grant_copy copy;
> >>>>> +
> >>>>> +    copy.segments = calloc(count, sizeof(struct ioctl_gntdev_grant_copy_segment));
> >>>>> +    copy.count = count;
> >>>>> +    for (i = 0; i < count; i++)
> >>>>> +    {
> >>>>> +        copy.segments[i].flags = segs[i].flags;
> >>>>> +        copy.segments[i].len = segs[i].len;
> >>>>> +        if (segs[i].flags == GNTCOPY_dest_gref) 
> >>>>> +        {
> >>>>> +            copy.segments[i].dest.foreign.ref = segs[i].dest.foreign.ref;
> >>>>> +            copy.segments[i].dest.foreign.domid = segs[i].dest.foreign.domid;
> >>>>> +            copy.segments[i].dest.foreign.offset = segs[i].dest.foreign.offset;
> >>>>> +            copy.segments[i].source.virt = segs[i].source.virt;
> >>>>> +        } 
> >>>>> +        else 
> >>>>> +        {
> >>>>> +            copy.segments[i].source.foreign.ref = segs[i].source.foreign.ref;
> >>>>> +            copy.segments[i].source.foreign.domid = segs[i].source.foreign.domid;
> >>>>> +            copy.segments[i].source.foreign.offset = segs[i].source.foreign.offset;
> >>>>> +            copy.segments[i].dest.virt = segs[i].dest.virt;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    rc = ioctl(fd, IOCTL_GNTDEV_GRANT_COPY, &copy);
> >>>>> +    if (rc) 
> >>>>> +    {
> >>>>> +        GTERROR(xgt->logger, "ioctl GRANT COPY failed %d ", errno);
> >>>>> +    }
> >>>>> +    else 
> >>>>> +    {
> >>>>> +        for (i = 0; i < count; i++)
> >>>>> +            segs[i].status = copy.segments[i].status;
> >>>>> +    }
> >>>>> +
> >>>>> +    free(copy.segments);
> >>>>> +    return rc;
> >>>>> +}
> >>>>
> >>>> I know Wei asked for this but you've replaced what should be a single
> >>>> pointer assignment with a memory allocation and two loops over all the
> >>>> segments.
> >>>>
> >>>> This is a hot path and the two structures (the libxengnttab one and the
> >>>> Linux kernel one) are both part of their respective ABIs and won't
> >>>> change so Wei's concern that they might change in the future is unfounded.
> >>>>
> >>>
> >>> The fundamental question is: will the ABI between the library and the
> >>> kernel ever go mismatch?
> >>>
> >>> My answer is "maybe".  My rationale is that everything goes across
> >>> boundary of components need to be considered with caution. And I tend to
> >>> assume the worst things will happen.
> >>>
> >>> To guarantee that they will never go mismatch is to have
> >>>
> >>>    typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t;
> >>>
> >>> But that's not how the code is written.
> >>>
> >>> I would like to hear a third opinion. Is my concern unfounded? Am I too
> >>> cautious? Is there any compelling argument that I missed?
> >>>
> >>> Somewhat related, can we have some numbers please? It could well be the
> >>> cost of the two loops is much cheaper than whatever is going on inside
> >>> the kernel / hypervisor. And it could turn out that the numbers render
> >>> this issue moot.
> >>
> >> I did some (very) adhoc measurements and with the worst case of single
> >> short segments for each ioctl, the optimized version of
> >> osdep_gnttab_grant_copy() looks to be ~5% faster.
> >>
> >> This is enough of a difference that we should use the optimized version.
> >>
> >> The unoptimized version also adds an additional failure path (the
> >> calloc) which would be best avoided.
> >>
> > 
> > Your test case includes a lot of  noise in libc allocator, so...
> > 
> > Can you give try the following patch (apply on top of Paulina's patch)?
> > The basic idea is to provide scratch space for the structures. Note, the
> > patch is compile test only.
> [...]
> > +#define COPY_SEGMENT_CACHE_SIZE 1024
> 
> Arbitrary limit on number of segments.
> 
> > +    copy.segments = xgt->osdep_data;
> 
> Not thread safe.
> 

Both issues are real, but this is just a gross hack to try to get some
numbers.

> I tried using alloca() which has <1% performance penalty but the failure
> mode for alloca() is really bad so I would not recommend it.
> 

Agreed.

But if you want to use the stack, maybe C99 variable length array would
do?

> I think the best solution is to allow the osdep code to provide the
> implementation of xengnttab_grant_copy_segment_t, allowing the Linux
> code to do:
> 
> typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t
> 
> You should still provide the generic structure as well, for those
> platforms that don't provide their own optimized version.
> 

We can't do that (yet). This means we open the door for divergence on
different platforms.

Basically this approach requires each platform to do the same thing
(typedef) This implies any application that uses libxengnttab will need
to test what platform it runs on. It is just pushing the issue somewhere
else.

Still, I think I would wait a bit for other people to weight in because
I'm not sure if my concern is wrong headed.

Wei.

> David
Wei Liu June 22, 2016, 4:49 p.m. UTC | #3
On Wed, Jun 22, 2016 at 03:52:43PM +0100, Wei Liu wrote:
> On Wed, Jun 22, 2016 at 02:52:47PM +0100, David Vrabel wrote:
> > On 22/06/16 14:29, Wei Liu wrote:
> > > On Wed, Jun 22, 2016 at 01:37:50PM +0100, David Vrabel wrote:
> > >> On 22/06/16 12:21, Wei Liu wrote:
> > >>> On Wed, Jun 22, 2016 at 10:37:24AM +0100, David Vrabel wrote:
> > >>>> On 22/06/16 09:38, Paulina Szubarczyk wrote:
> > >>>>> In a linux part an ioctl(gntdev, IOCTL_GNTDEV_GRANT_COPY, ..)
> > >>>>> system call is invoked. In mini-os the operation is yet not
> > >>>>> implemented. For other OSs there is a dummy implementation.
> > >>>> [...]
> > >>>>> --- a/tools/libs/gnttab/linux.c
> > >>>>> +++ b/tools/libs/gnttab/linux.c
> > >>>>> @@ -235,6 +235,51 @@ int osdep_gnttab_unmap(xengnttab_handle *xgt,
> > >>>>>      return 0;
> > >>>>>  }
> > >>>>>  
> > >>>>> +int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
> > >>>>> +                            uint32_t count,
> > >>>>> +                            xengnttab_grant_copy_segment_t *segs)
> > >>>>> +{
> > >>>>> +    int i, rc;
> > >>>>> +    int fd = xgt->fd;
> > >>>>> +    struct ioctl_gntdev_grant_copy copy;
> > >>>>> +
> > >>>>> +    copy.segments = calloc(count, sizeof(struct ioctl_gntdev_grant_copy_segment));
> > >>>>> +    copy.count = count;
> > >>>>> +    for (i = 0; i < count; i++)
> > >>>>> +    {
> > >>>>> +        copy.segments[i].flags = segs[i].flags;
> > >>>>> +        copy.segments[i].len = segs[i].len;
> > >>>>> +        if (segs[i].flags == GNTCOPY_dest_gref) 
> > >>>>> +        {
> > >>>>> +            copy.segments[i].dest.foreign.ref = segs[i].dest.foreign.ref;
> > >>>>> +            copy.segments[i].dest.foreign.domid = segs[i].dest.foreign.domid;
> > >>>>> +            copy.segments[i].dest.foreign.offset = segs[i].dest.foreign.offset;
> > >>>>> +            copy.segments[i].source.virt = segs[i].source.virt;
> > >>>>> +        } 
> > >>>>> +        else 
> > >>>>> +        {
> > >>>>> +            copy.segments[i].source.foreign.ref = segs[i].source.foreign.ref;
> > >>>>> +            copy.segments[i].source.foreign.domid = segs[i].source.foreign.domid;
> > >>>>> +            copy.segments[i].source.foreign.offset = segs[i].source.foreign.offset;
> > >>>>> +            copy.segments[i].dest.virt = segs[i].dest.virt;
> > >>>>> +        }
> > >>>>> +    }
> > >>>>> +
> > >>>>> +    rc = ioctl(fd, IOCTL_GNTDEV_GRANT_COPY, &copy);
> > >>>>> +    if (rc) 
> > >>>>> +    {
> > >>>>> +        GTERROR(xgt->logger, "ioctl GRANT COPY failed %d ", errno);
> > >>>>> +    }
> > >>>>> +    else 
> > >>>>> +    {
> > >>>>> +        for (i = 0; i < count; i++)
> > >>>>> +            segs[i].status = copy.segments[i].status;
> > >>>>> +    }
> > >>>>> +
> > >>>>> +    free(copy.segments);
> > >>>>> +    return rc;
> > >>>>> +}
> > >>>>
> > >>>> I know Wei asked for this but you've replaced what should be a single
> > >>>> pointer assignment with a memory allocation and two loops over all the
> > >>>> segments.
> > >>>>
> > >>>> This is a hot path and the two structures (the libxengnttab one and the
> > >>>> Linux kernel one) are both part of their respective ABIs and won't
> > >>>> change so Wei's concern that they might change in the future is unfounded.
> > >>>>
> > >>>
> > >>> The fundamental question is: will the ABI between the library and the
> > >>> kernel ever go mismatch?
> > >>>
> > >>> My answer is "maybe".  My rationale is that everything goes across
> > >>> boundary of components need to be considered with caution. And I tend to
> > >>> assume the worst things will happen.
> > >>>
> > >>> To guarantee that they will never go mismatch is to have
> > >>>
> > >>>    typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t;
> > >>>
> > >>> But that's not how the code is written.
> > >>>
> > >>> I would like to hear a third opinion. Is my concern unfounded? Am I too
> > >>> cautious? Is there any compelling argument that I missed?
> > >>>
> > >>> Somewhat related, can we have some numbers please? It could well be the
> > >>> cost of the two loops is much cheaper than whatever is going on inside
> > >>> the kernel / hypervisor. And it could turn out that the numbers render
> > >>> this issue moot.
> > >>
> > >> I did some (very) adhoc measurements and with the worst case of single
> > >> short segments for each ioctl, the optimized version of
> > >> osdep_gnttab_grant_copy() looks to be ~5% faster.
> > >>
> > >> This is enough of a difference that we should use the optimized version.
> > >>
> > >> The unoptimized version also adds an additional failure path (the
> > >> calloc) which would be best avoided.
> > >>
> > > 
> > > Your test case includes a lot of  noise in libc allocator, so...
> > > 
> > > Can you give try the following patch (apply on top of Paulina's patch)?
> > > The basic idea is to provide scratch space for the structures. Note, the
> > > patch is compile test only.
> > [...]
> > > +#define COPY_SEGMENT_CACHE_SIZE 1024
> > 
> > Arbitrary limit on number of segments.
> > 
> > > +    copy.segments = xgt->osdep_data;
> > 
> > Not thread safe.
> > 
> 
> Both issues are real, but this is just a gross hack to try to get some
> numbers.
> 
> > I tried using alloca() which has <1% performance penalty but the failure
> > mode for alloca() is really bad so I would not recommend it.
> > 
> 
> Agreed.
> 
> But if you want to use the stack, maybe C99 variable length array would
> do?
> 

The numbers (stack based < 1% overhead, heap based ~5% overhead) suggest
that all the assignments are fast. It is the malloc / free pair that is
slow.

And actually we can just use a combination of statically allocated stack
based array and heap based array. Say, let's have a X element array
(pick the number used in hypervisor preemption check), if count > X, use
heap based array (with the hope that the libc allocation / free overhead
should be masked by the copying overhead in hypervisor).

That would achieve both safety and performance, and render a lot of the
other discussions (the expectation of application, the interface in
other platform etc) moot. Looks like the good solution for me.

David, what do you think?

Wei.
George Dunlap July 5, 2016, 4:27 p.m. UTC | #4
On Wed, Jun 22, 2016 at 3:52 PM, Wei Liu <wei.liu2@citrix.com> wrote:
>> I think the best solution is to allow the osdep code to provide the
>> implementation of xengnttab_grant_copy_segment_t, allowing the Linux
>> code to do:
>>
>> typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t
>>
>> You should still provide the generic structure as well, for those
>> platforms that don't provide their own optimized version.
>>
>
> We can't do that (yet). This means we open the door for divergence on
> different platforms.
>
> Basically this approach requires each platform to do the same thing
> (typedef) This implies any application that uses libxengnttab will need
> to test what platform it runs on. It is just pushing the issue somewhere
> else.
>
> Still, I think I would wait a bit for other people to weight in because
> I'm not sure if my concern is wrong headed.

I tend to be sympathetic to David's argument here.  The library has to
provide some ABI to callers; and it has to know the appropriate Linux
ABI in order to translate from the library ABI to the Linux ABI.  If
it happens to know these are the same, I don't see a reason not to
"translate" it by just by casting the pointer.

If we want to declare the library ABI in a stand-alone fashion (i.e.,
instead of just doing a typedef, so that the library definition is the
same on all platforms), then having some compile-time checking to make
sure that the layouts of the two structures are identical makes sense.
Beyond that, I'm not sure what the extra copying really buys us.

 -George
Roger Pau Monné July 6, 2016, 3:49 p.m. UTC | #5
On Wed, Jun 22, 2016 at 05:49:59PM +0100, Wei Liu wrote:
> On Wed, Jun 22, 2016 at 03:52:43PM +0100, Wei Liu wrote:
> > On Wed, Jun 22, 2016 at 02:52:47PM +0100, David Vrabel wrote:
> > > On 22/06/16 14:29, Wei Liu wrote:
> > > > On Wed, Jun 22, 2016 at 01:37:50PM +0100, David Vrabel wrote:
> > > >> On 22/06/16 12:21, Wei Liu wrote:
> > > >>> On Wed, Jun 22, 2016 at 10:37:24AM +0100, David Vrabel wrote:
> > > >>>> On 22/06/16 09:38, Paulina Szubarczyk wrote:
> > > >>>>> In a linux part an ioctl(gntdev, IOCTL_GNTDEV_GRANT_COPY, ..)
> > > >>>>> system call is invoked. In mini-os the operation is yet not
> > > >>>>> implemented. For other OSs there is a dummy implementation.
> > > >>>> [...]
> > > >>>>> --- a/tools/libs/gnttab/linux.c
> > > >>>>> +++ b/tools/libs/gnttab/linux.c
> > > >>>>> @@ -235,6 +235,51 @@ int osdep_gnttab_unmap(xengnttab_handle *xgt,
> > > >>>>>      return 0;
> > > >>>>>  }
> > > >>>>>  
> > > >>>>> +int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
> > > >>>>> +                            uint32_t count,
> > > >>>>> +                            xengnttab_grant_copy_segment_t *segs)
> > > >>>>> +{
> > > >>>>> +    int i, rc;
> > > >>>>> +    int fd = xgt->fd;
> > > >>>>> +    struct ioctl_gntdev_grant_copy copy;
> > > >>>>> +
> > > >>>>> +    copy.segments = calloc(count, sizeof(struct ioctl_gntdev_grant_copy_segment));
> > > >>>>> +    copy.count = count;
> > > >>>>> +    for (i = 0; i < count; i++)
> > > >>>>> +    {
> > > >>>>> +        copy.segments[i].flags = segs[i].flags;
> > > >>>>> +        copy.segments[i].len = segs[i].len;
> > > >>>>> +        if (segs[i].flags == GNTCOPY_dest_gref) 
> > > >>>>> +        {
> > > >>>>> +            copy.segments[i].dest.foreign.ref = segs[i].dest.foreign.ref;
> > > >>>>> +            copy.segments[i].dest.foreign.domid = segs[i].dest.foreign.domid;
> > > >>>>> +            copy.segments[i].dest.foreign.offset = segs[i].dest.foreign.offset;
> > > >>>>> +            copy.segments[i].source.virt = segs[i].source.virt;
> > > >>>>> +        } 
> > > >>>>> +        else 
> > > >>>>> +        {
> > > >>>>> +            copy.segments[i].source.foreign.ref = segs[i].source.foreign.ref;
> > > >>>>> +            copy.segments[i].source.foreign.domid = segs[i].source.foreign.domid;
> > > >>>>> +            copy.segments[i].source.foreign.offset = segs[i].source.foreign.offset;
> > > >>>>> +            copy.segments[i].dest.virt = segs[i].dest.virt;
> > > >>>>> +        }
> > > >>>>> +    }
> > > >>>>> +
> > > >>>>> +    rc = ioctl(fd, IOCTL_GNTDEV_GRANT_COPY, &copy);
> > > >>>>> +    if (rc) 
> > > >>>>> +    {
> > > >>>>> +        GTERROR(xgt->logger, "ioctl GRANT COPY failed %d ", errno);
> > > >>>>> +    }
> > > >>>>> +    else 
> > > >>>>> +    {
> > > >>>>> +        for (i = 0; i < count; i++)
> > > >>>>> +            segs[i].status = copy.segments[i].status;
> > > >>>>> +    }
> > > >>>>> +
> > > >>>>> +    free(copy.segments);
> > > >>>>> +    return rc;
> > > >>>>> +}
> > > >>>>
> > > >>>> I know Wei asked for this but you've replaced what should be a single
> > > >>>> pointer assignment with a memory allocation and two loops over all the
> > > >>>> segments.
> > > >>>>
> > > >>>> This is a hot path and the two structures (the libxengnttab one and the
> > > >>>> Linux kernel one) are both part of their respective ABIs and won't
> > > >>>> change so Wei's concern that they might change in the future is unfounded.
> > > >>>>
> > > >>>
> > > >>> The fundamental question is: will the ABI between the library and the
> > > >>> kernel ever go mismatch?
> > > >>>
> > > >>> My answer is "maybe".  My rationale is that everything goes across
> > > >>> boundary of components need to be considered with caution. And I tend to
> > > >>> assume the worst things will happen.
> > > >>>
> > > >>> To guarantee that they will never go mismatch is to have
> > > >>>
> > > >>>    typedef ioctl_gntdev_grant_copy_segment xengnttab_grant_copy_segment_t;
> > > >>>
> > > >>> But that's not how the code is written.
> > > >>>
> > > >>> I would like to hear a third opinion. Is my concern unfounded? Am I too
> > > >>> cautious? Is there any compelling argument that I missed?
> > > >>>
> > > >>> Somewhat related, can we have some numbers please? It could well be the
> > > >>> cost of the two loops is much cheaper than whatever is going on inside
> > > >>> the kernel / hypervisor. And it could turn out that the numbers render
> > > >>> this issue moot.
> > > >>
> > > >> I did some (very) adhoc measurements and with the worst case of single
> > > >> short segments for each ioctl, the optimized version of
> > > >> osdep_gnttab_grant_copy() looks to be ~5% faster.
> > > >>
> > > >> This is enough of a difference that we should use the optimized version.
> > > >>
> > > >> The unoptimized version also adds an additional failure path (the
> > > >> calloc) which would be best avoided.
> > > >>
> > > > 
> > > > Your test case includes a lot of  noise in libc allocator, so...
> > > > 
> > > > Can you give try the following patch (apply on top of Paulina's patch)?
> > > > The basic idea is to provide scratch space for the structures. Note, the
> > > > patch is compile test only.
> > > [...]
> > > > +#define COPY_SEGMENT_CACHE_SIZE 1024
> > > 
> > > Arbitrary limit on number of segments.
> > > 
> > > > +    copy.segments = xgt->osdep_data;
> > > 
> > > Not thread safe.
> > > 
> > 
> > Both issues are real, but this is just a gross hack to try to get some
> > numbers.
> > 
> > > I tried using alloca() which has <1% performance penalty but the failure
> > > mode for alloca() is really bad so I would not recommend it.
> > > 
> > 
> > Agreed.
> > 
> > But if you want to use the stack, maybe C99 variable length array would
> > do?
> > 
> 
> The numbers (stack based < 1% overhead, heap based ~5% overhead) suggest
> that all the assignments are fast. It is the malloc / free pair that is
> slow.
> 
> And actually we can just use a combination of statically allocated stack
> based array and heap based array. Say, let's have a X element array
> (pick the number used in hypervisor preemption check), if count > X, use
> heap based array (with the hope that the libc allocation / free overhead
> should be masked by the copying overhead in hypervisor).
> 
> That would achieve both safety and performance, and render a lot of the
> other discussions (the expectation of application, the interface in
> other platform etc) moot. Looks like the good solution for me.

IMHO, I don't think the cast is specially bad, and even if we do the copy 
there's no guarantee that the ioctl structure we are copying to is the one 
that the kernel expects iff someone changed it (because the gntdev header 
lives in the Xen tree, it's not picked from the host).

In any case, changing the ioctl in any way is not an option AFAICT, and it 
would be a bug if someone tried to do it. If we ever need to change the 
grant copy ioctl we would have to introduce a new one, like we did with the 
privcmd foreign memory mapping ioctl.

What I think should be avoided is the typedef from the Linux ioctl structure 
to the public library headers. As Wei says, this is OS agnostic, and 
although other OSes try to follow suit there's no guarantee that the exact 
same structure can be reused. I think the full structure definition has to 
live in the library itself (which will be just a copy of 
ioctl_gntdev_grant_copy), so other OSes can reuse it even if their ioctl 
structure is slightly different. The Linux specific implementation can add 
some static asserts in order to make sure it doesn't change, and then just 
do a straight cast.

Roger.
diff mbox

Patch

diff --git a/tools/libs/gnttab/linux.c b/tools/libs/gnttab/linux.c
index 62ad7bd..17d4d29 100644
--- a/tools/libs/gnttab/linux.c
+++ b/tools/libs/gnttab/linux.c
@@ -47,13 +47,28 @@ 
 #define O_CLOEXEC 0
 #endif
 
+#define COPY_SEGMENT_CACHE_SIZE 1024
+
 int osdep_gnttab_open(xengnttab_handle *xgt)
 {
-    int fd = open(DEVXEN "gntdev", O_RDWR|O_CLOEXEC);
-    if ( fd == -1 )
-        return -1;
-    xgt->fd = fd;
+    size_t s = COPY_SEGMENT_CACHE_SIZE *
+        sizeof(struct ioctl_gntdev_grant_copy_segment);
+
+    xgt->fd = open(DEVXEN "gntdev", O_RDWR|O_CLOEXEC);
+    if (xgt->fd == -1) goto err;
+
+    xgt->osdep_data = malloc(s);
+    if (!xgt->osdep_data) goto err;
+    xgt->osdep_data_size = s;
+
     return 0;
+err:
+    if (xgt->fd != -1) {
+        close(xgt->fd);
+        xgt->fd = -1;
+    }
+
+    return -1;
 }
 
 int osdep_gnttab_close(xengnttab_handle *xgt)
@@ -61,6 +76,10 @@  int osdep_gnttab_close(xengnttab_handle *xgt)
     if ( xgt->fd == -1 )
         return 0;
 
+    free(xgt->osdep_data);
+    xgt->osdep_data = NULL;
+    xgt->osdep_data_size = 0;
+
     return close(xgt->fd);
 }
 
@@ -243,7 +262,12 @@  int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
     int fd = xgt->fd;
     struct ioctl_gntdev_grant_copy copy;
 
-    copy.segments = calloc(count, sizeof(struct ioctl_gntdev_grant_copy_segment));
+    if (count > COPY_SEGMENT_CACHE_SIZE) {
+        errno = E2BIG;
+        return -1;
+    }
+
+    copy.segments = xgt->osdep_data;
     copy.count = count;
     for (i = 0; i < count; i++)
     {
@@ -276,7 +300,6 @@  int osdep_gnttab_grant_copy(xengnttab_handle *xgt,
             segs[i].status = copy.segments[i].status;
     }
 
-    free(copy.segments);
     return rc;
 }
 
diff --git a/tools/libs/gnttab/private.h b/tools/libs/gnttab/private.h
index d6c5594..e99a80d 100644
--- a/tools/libs/gnttab/private.h
+++ b/tools/libs/gnttab/private.h
@@ -7,6 +7,8 @@ 
 struct xengntdev_handle {
     xentoollog_logger *logger, *logger_tofree;
     int fd;
+    void *osdep_data;              /* osdep private data */
+    size_t osdep_data_size;        /* osdep private data size */
 };
 
 int osdep_gnttab_open(xengnttab_handle *xgt);