mbox series

[RFC,v1,0/2] vhost: ring format independence

Message ID 20191011134358.16912-1-mst@redhat.com (mailing list archive)
Headers show
Series vhost: ring format independence | expand

Message

Michael S. Tsirkin Oct. 11, 2019, 1:45 p.m. UTC
So the idea is as follows: we convert descriptors to an
independent format first, and process that converting to
iov later.

The point is that we have a tight loop that fetches
descriptors, which is good for cache utilization.
This will also allow all kind of batching tricks -
e.g. it seems possible to keep SMAP disabled while
we are fetching multiple descriptors.

And perhaps more importantly, this is a very good fit for the packed
ring layout, where we get and put descriptors in order.

This patchset seems to already perform exactly the same as the original
code already based on a microbenchmark.  More testing would be very much
appreciated.

Biggest TODO before this first step is ready to go in is to
batch indirect descriptors as well.

Integrating into vhost-net is basically
s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ -
or add a module parameter like I did in the test module.



Michael S. Tsirkin (2):
  vhost: option to fetch descriptors through an independent struct
  vhost: batching fetches

 drivers/vhost/test.c  |  19 ++-
 drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++-
 drivers/vhost/vhost.h |  20 ++-
 3 files changed, 365 insertions(+), 7 deletions(-)

Comments

Jason Wang Oct. 12, 2019, 7:31 a.m. UTC | #1
On 2019/10/11 下午9:45, Michael S. Tsirkin wrote:
> So the idea is as follows: we convert descriptors to an
> independent format first, and process that converting to
> iov later.
>
> The point is that we have a tight loop that fetches
> descriptors, which is good for cache utilization.
> This will also allow all kind of batching tricks -
> e.g. it seems possible to keep SMAP disabled while
> we are fetching multiple descriptors.
>
> And perhaps more importantly, this is a very good fit for the packed
> ring layout, where we get and put descriptors in order.
>
> This patchset seems to already perform exactly the same as the original
> code already based on a microbenchmark.  More testing would be very much
> appreciated.
>
> Biggest TODO before this first step is ready to go in is to
> batch indirect descriptors as well.
>
> Integrating into vhost-net is basically
> s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ -
> or add a module parameter like I did in the test module.


It would be better to convert vhost_net then I can do some benchmark on 
that.

Thanks


>
>
>
> Michael S. Tsirkin (2):
>    vhost: option to fetch descriptors through an independent struct
>    vhost: batching fetches
>
>   drivers/vhost/test.c  |  19 ++-
>   drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++-
>   drivers/vhost/vhost.h |  20 ++-
>   3 files changed, 365 insertions(+), 7 deletions(-)
>
Jason Wang Oct. 12, 2019, 8:15 a.m. UTC | #2
On 2019/10/11 下午9:45, Michael S. Tsirkin wrote:
> So the idea is as follows: we convert descriptors to an
> independent format first, and process that converting to
> iov later.
>
> The point is that we have a tight loop that fetches
> descriptors, which is good for cache utilization.
> This will also allow all kind of batching tricks -
> e.g. it seems possible to keep SMAP disabled while
> we are fetching multiple descriptors.


I wonder this may help for performance:

- another indirection layer, increased footprint

- won't help or even degrade when there's no batch

- an extra overhead in the case of in order where we should already had 
tight loop

- need carefully deal with indirect and chain or make it only work for 
packet sit just in a single descriptor

Thanks


>
> And perhaps more importantly, this is a very good fit for the packed
> ring layout, where we get and put descriptors in order.
>
> This patchset seems to already perform exactly the same as the original
> code already based on a microbenchmark.  More testing would be very much
> appreciated.
>
> Biggest TODO before this first step is ready to go in is to
> batch indirect descriptors as well.
>
> Integrating into vhost-net is basically
> s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ -
> or add a module parameter like I did in the test module.
>
>
>
> Michael S. Tsirkin (2):
>    vhost: option to fetch descriptors through an independent struct
>    vhost: batching fetches
>
>   drivers/vhost/test.c  |  19 ++-
>   drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++-
>   drivers/vhost/vhost.h |  20 ++-
>   3 files changed, 365 insertions(+), 7 deletions(-)
>
Michael S. Tsirkin Oct. 12, 2019, 7:26 p.m. UTC | #3
On Sat, Oct 12, 2019 at 04:15:42PM +0800, Jason Wang wrote:
> 
> On 2019/10/11 下午9:45, Michael S. Tsirkin wrote:
> > So the idea is as follows: we convert descriptors to an
> > independent format first, and process that converting to
> > iov later.
> > 
> > The point is that we have a tight loop that fetches
> > descriptors, which is good for cache utilization.
> > This will also allow all kind of batching tricks -
> > e.g. it seems possible to keep SMAP disabled while
> > we are fetching multiple descriptors.
> 
> 
> I wonder this may help for performance:

Could you try it out and report please?
Would be very much appreciated.

> - another indirection layer, increased footprint

Seems to be offset off by improved batching.
For sure will be even better if we can move stac/clac out,
or replace some get/put user with bigger copy to/from.

> - won't help or even degrade when there's no batch

I couldn't measure a difference. I'm guessing

> - an extra overhead in the case of in order where we should already had
> tight loop

it's not so tight with translation in there.
this exactly makes the loop tight.

> - need carefully deal with indirect and chain or make it only work for
> packet sit just in a single descriptor
> 
> Thanks

I don't understand this last comment.

> 
> > 
> > And perhaps more importantly, this is a very good fit for the packed
> > ring layout, where we get and put descriptors in order.
> > 
> > This patchset seems to already perform exactly the same as the original
> > code already based on a microbenchmark.  More testing would be very much
> > appreciated.
> > 
> > Biggest TODO before this first step is ready to go in is to
> > batch indirect descriptors as well.
> > 
> > Integrating into vhost-net is basically
> > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ -
> > or add a module parameter like I did in the test module.
> > 
> > 
> > 
> > Michael S. Tsirkin (2):
> >    vhost: option to fetch descriptors through an independent struct
> >    vhost: batching fetches
> > 
> >   drivers/vhost/test.c  |  19 ++-
> >   drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++-
> >   drivers/vhost/vhost.h |  20 ++-
> >   3 files changed, 365 insertions(+), 7 deletions(-)
> >
Michael S. Tsirkin Oct. 12, 2019, 8:36 p.m. UTC | #4
On Sat, Oct 12, 2019 at 03:31:50PM +0800, Jason Wang wrote:
> 
> On 2019/10/11 下午9:45, Michael S. Tsirkin wrote:
> > So the idea is as follows: we convert descriptors to an
> > independent format first, and process that converting to
> > iov later.
> > 
> > The point is that we have a tight loop that fetches
> > descriptors, which is good for cache utilization.
> > This will also allow all kind of batching tricks -
> > e.g. it seems possible to keep SMAP disabled while
> > we are fetching multiple descriptors.
> > 
> > And perhaps more importantly, this is a very good fit for the packed
> > ring layout, where we get and put descriptors in order.
> > 
> > This patchset seems to already perform exactly the same as the original
> > code already based on a microbenchmark.  More testing would be very much
> > appreciated.
> > 
> > Biggest TODO before this first step is ready to go in is to
> > batch indirect descriptors as well.
> > 
> > Integrating into vhost-net is basically
> > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ -
> > or add a module parameter like I did in the test module.
> 
> 
> It would be better to convert vhost_net then I can do some benchmark on
> that.
> 
> Thanks

Sure, I post a small patch that does this.

> 
> > 
> > 
> > 
> > Michael S. Tsirkin (2):
> >    vhost: option to fetch descriptors through an independent struct
> >    vhost: batching fetches
> > 
> >   drivers/vhost/test.c  |  19 ++-
> >   drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++-
> >   drivers/vhost/vhost.h |  20 ++-
> >   3 files changed, 365 insertions(+), 7 deletions(-)
> >