Message ID | 20191011134358.16912-1-mst@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | vhost: ring format independence | expand |
On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > So the idea is as follows: we convert descriptors to an > independent format first, and process that converting to > iov later. > > The point is that we have a tight loop that fetches > descriptors, which is good for cache utilization. > This will also allow all kind of batching tricks - > e.g. it seems possible to keep SMAP disabled while > we are fetching multiple descriptors. > > And perhaps more importantly, this is a very good fit for the packed > ring layout, where we get and put descriptors in order. > > This patchset seems to already perform exactly the same as the original > code already based on a microbenchmark. More testing would be very much > appreciated. > > Biggest TODO before this first step is ready to go in is to > batch indirect descriptors as well. > > Integrating into vhost-net is basically > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - > or add a module parameter like I did in the test module. It would be better to convert vhost_net then I can do some benchmark on that. Thanks > > > > Michael S. Tsirkin (2): > vhost: option to fetch descriptors through an independent struct > vhost: batching fetches > > drivers/vhost/test.c | 19 ++- > drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++- > drivers/vhost/vhost.h | 20 ++- > 3 files changed, 365 insertions(+), 7 deletions(-) >
On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > So the idea is as follows: we convert descriptors to an > independent format first, and process that converting to > iov later. > > The point is that we have a tight loop that fetches > descriptors, which is good for cache utilization. > This will also allow all kind of batching tricks - > e.g. it seems possible to keep SMAP disabled while > we are fetching multiple descriptors. I wonder this may help for performance: - another indirection layer, increased footprint - won't help or even degrade when there's no batch - an extra overhead in the case of in order where we should already had tight loop - need carefully deal with indirect and chain or make it only work for packet sit just in a single descriptor Thanks > > And perhaps more importantly, this is a very good fit for the packed > ring layout, where we get and put descriptors in order. > > This patchset seems to already perform exactly the same as the original > code already based on a microbenchmark. More testing would be very much > appreciated. > > Biggest TODO before this first step is ready to go in is to > batch indirect descriptors as well. > > Integrating into vhost-net is basically > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - > or add a module parameter like I did in the test module. > > > > Michael S. Tsirkin (2): > vhost: option to fetch descriptors through an independent struct > vhost: batching fetches > > drivers/vhost/test.c | 19 ++- > drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++- > drivers/vhost/vhost.h | 20 ++- > 3 files changed, 365 insertions(+), 7 deletions(-) >
On Sat, Oct 12, 2019 at 04:15:42PM +0800, Jason Wang wrote: > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > > So the idea is as follows: we convert descriptors to an > > independent format first, and process that converting to > > iov later. > > > > The point is that we have a tight loop that fetches > > descriptors, which is good for cache utilization. > > This will also allow all kind of batching tricks - > > e.g. it seems possible to keep SMAP disabled while > > we are fetching multiple descriptors. > > > I wonder this may help for performance: Could you try it out and report please? Would be very much appreciated. > - another indirection layer, increased footprint Seems to be offset off by improved batching. For sure will be even better if we can move stac/clac out, or replace some get/put user with bigger copy to/from. > - won't help or even degrade when there's no batch I couldn't measure a difference. I'm guessing > - an extra overhead in the case of in order where we should already had > tight loop it's not so tight with translation in there. this exactly makes the loop tight. > - need carefully deal with indirect and chain or make it only work for > packet sit just in a single descriptor > > Thanks I don't understand this last comment. > > > > > And perhaps more importantly, this is a very good fit for the packed > > ring layout, where we get and put descriptors in order. > > > > This patchset seems to already perform exactly the same as the original > > code already based on a microbenchmark. More testing would be very much > > appreciated. > > > > Biggest TODO before this first step is ready to go in is to > > batch indirect descriptors as well. > > > > Integrating into vhost-net is basically > > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - > > or add a module parameter like I did in the test module. > > > > > > > > Michael S. Tsirkin (2): > > vhost: option to fetch descriptors through an independent struct > > vhost: batching fetches > > > > drivers/vhost/test.c | 19 ++- > > drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++- > > drivers/vhost/vhost.h | 20 ++- > > 3 files changed, 365 insertions(+), 7 deletions(-) > >
On Sat, Oct 12, 2019 at 03:31:50PM +0800, Jason Wang wrote: > > On 2019/10/11 下午9:45, Michael S. Tsirkin wrote: > > So the idea is as follows: we convert descriptors to an > > independent format first, and process that converting to > > iov later. > > > > The point is that we have a tight loop that fetches > > descriptors, which is good for cache utilization. > > This will also allow all kind of batching tricks - > > e.g. it seems possible to keep SMAP disabled while > > we are fetching multiple descriptors. > > > > And perhaps more importantly, this is a very good fit for the packed > > ring layout, where we get and put descriptors in order. > > > > This patchset seems to already perform exactly the same as the original > > code already based on a microbenchmark. More testing would be very much > > appreciated. > > > > Biggest TODO before this first step is ready to go in is to > > batch indirect descriptors as well. > > > > Integrating into vhost-net is basically > > s/vhost_get_vq_desc/vhost_get_vq_desc_batch/ - > > or add a module parameter like I did in the test module. > > > It would be better to convert vhost_net then I can do some benchmark on > that. > > Thanks Sure, I post a small patch that does this. > > > > > > > > > Michael S. Tsirkin (2): > > vhost: option to fetch descriptors through an independent struct > > vhost: batching fetches > > > > drivers/vhost/test.c | 19 ++- > > drivers/vhost/vhost.c | 333 +++++++++++++++++++++++++++++++++++++++++- > > drivers/vhost/vhost.h | 20 ++- > > 3 files changed, 365 insertions(+), 7 deletions(-) > >