Message ID | 20240619132827.51306-1-anand.a.khoje@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4] net/mlx5: Reclaim max 50K pages at once | expand |
On Wed, Jun 19, 2024 at 06:58:27PM +0530, Anand Khoje wrote: > In non FLR context, at times CX-5 requests release of ~8 million FW pages. > This needs humongous number of cmd mailboxes, which to be released once > the pages are reclaimed. Release of humongous number of cmd mailboxes is > consuming cpu time running into many seconds. Which with non preemptible > kernels is leading to critical process starving on that cpu’s RQ. > To alleviate this, this change restricts the total number of pages > a worker will try to reclaim maximum 50K pages in one go. > The limit 50K is aligned with the current firmware capacity/limit of > releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE > device command. > > Our tests have shown significant benefit of this change in terms of > time consumed by dma_pool_free(). > During a test where an event was raised by HCA > to release 1.3 Million pages, following observations were made: > > - Without this change: > Number of mailbox messages allocated was around 20K, to accommodate > the DMA addresses of 1.3 million pages. > The average time spent by dma_pool_free() to free the DMA pool is between > 16 usec to 32 usec. > value ------------- Distribution ------------- count > 256 | 0 > 512 |@ 287 > 1024 |@@@ 1332 > 2048 |@ 656 > 4096 |@@@@@ 2599 > 8192 |@@@@@@@@@@ 4755 > 16384 |@@@@@@@@@@@@@@@ 7545 > 32768 |@@@@@ 2501 > 65536 | 0 > > - With this change: > Number of mailbox messages allocated was around 800; this was to > accommodate DMA addresses of only 50K pages. > The average time spent by dma_pool_free() to free the DMA pool in this case > lies between 1 usec to 2 usec. > value ------------- Distribution ------------- count > 256 | 0 > 512 |@@@@@@@@@@@@@@@@@@ 346 > 1024 |@@@@@@@@@@@@@@@@@@@@@@ 435 > 2048 | 0 > 4096 | 0 > 8192 | 1 > 16384 | 0 > > Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> > --- > Changes in v4: > - Fixed a nit in patch subject. > --- > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > index dcf58ef..06eee3a 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > @@ -608,6 +608,7 @@ enum { > RELEASE_ALL_PAGES_MASK = 0x4000, > }; > > +#define MAX_RECLAIM_NPAGES -50000 > static int req_pages_handler(struct notifier_block *nb, > unsigned long type, void *data) > { > @@ -639,9 +640,13 @@ static int req_pages_handler(struct notifier_block *nb, > > req->dev = dev; > req->func_id = func_id; > - req->npages = npages; > req->ec_function = ec_function; > req->release_all = release_all; > + if (npages < MAX_RECLAIM_NPAGES) > + req->npages = MAX_RECLAIM_NPAGES; > + else > + req->npages = npages; > + BTW, this can be written as: req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); Thanks > INIT_WORK(&req->work, pages_work_handler); > queue_work(dev->priv.pg_wq, &req->work); > return NOTIFY_OK; > -- > 1.8.3.1 >
On 6/24/24 15:27, Leon Romanovsky wrote: > On Wed, Jun 19, 2024 at 06:58:27PM +0530, Anand Khoje wrote: >> In non FLR context, at times CX-5 requests release of ~8 million FW pages. >> This needs humongous number of cmd mailboxes, which to be released once >> the pages are reclaimed. Release of humongous number of cmd mailboxes is >> consuming cpu time running into many seconds. Which with non preemptible >> kernels is leading to critical process starving on that cpu’s RQ. >> To alleviate this, this change restricts the total number of pages >> a worker will try to reclaim maximum 50K pages in one go. >> The limit 50K is aligned with the current firmware capacity/limit of >> releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE >> device command. >> >> Our tests have shown significant benefit of this change in terms of >> time consumed by dma_pool_free(). >> During a test where an event was raised by HCA >> to release 1.3 Million pages, following observations were made: >> >> - Without this change: >> Number of mailbox messages allocated was around 20K, to accommodate >> the DMA addresses of 1.3 million pages. >> The average time spent by dma_pool_free() to free the DMA pool is between >> 16 usec to 32 usec. >> value ------------- Distribution ------------- count >> 256 | 0 >> 512 |@ 287 >> 1024 |@@@ 1332 >> 2048 |@ 656 >> 4096 |@@@@@ 2599 >> 8192 |@@@@@@@@@@ 4755 >> 16384 |@@@@@@@@@@@@@@@ 7545 >> 32768 |@@@@@ 2501 >> 65536 | 0 >> >> - With this change: >> Number of mailbox messages allocated was around 800; this was to >> accommodate DMA addresses of only 50K pages. >> The average time spent by dma_pool_free() to free the DMA pool in this case >> lies between 1 usec to 2 usec. >> value ------------- Distribution ------------- count >> 256 | 0 >> 512 |@@@@@@@@@@@@@@@@@@ 346 >> 1024 |@@@@@@@@@@@@@@@@@@@@@@ 435 >> 2048 | 0 >> 4096 | 0 >> 8192 | 1 >> 16384 | 0 >> >> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> >> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> >> --- >> Changes in v4: >> - Fixed a nit in patch subject. >> --- >> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c >> index dcf58ef..06eee3a 100644 >> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c >> @@ -608,6 +608,7 @@ enum { >> RELEASE_ALL_PAGES_MASK = 0x4000, >> }; >> >> +#define MAX_RECLAIM_NPAGES -50000 >> static int req_pages_handler(struct notifier_block *nb, >> unsigned long type, void *data) >> { >> @@ -639,9 +640,13 @@ static int req_pages_handler(struct notifier_block *nb, >> >> req->dev = dev; >> req->func_id = func_id; >> - req->npages = npages; >> req->ec_function = ec_function; >> req->release_all = release_all; >> + if (npages < MAX_RECLAIM_NPAGES) >> + req->npages = MAX_RECLAIM_NPAGES; >> + else >> + req->npages = npages; >> + > BTW, this can be written as: > req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); > > Thanks > Hi Leon, Thanks for the suggestion. I have sent a v5 with this change. -Anand >> INIT_WORK(&req->work, pages_work_handler); >> queue_work(dev->priv.pg_wq, &req->work); >> return NOTIFY_OK; >> -- >> 1.8.3.1 >>
From: Leon Romanovsky > Sent: 24 June 2024 10:58 > > On Wed, Jun 19, 2024 at 06:58:27PM +0530, Anand Khoje wrote: > > In non FLR context, at times CX-5 requests release of ~8 million FW pages. > > This needs humongous number of cmd mailboxes, which to be released once > > the pages are reclaimed. Release of humongous number of cmd mailboxes is > > consuming cpu time running into many seconds. Which with non preemptible > > kernels is leading to critical process starving on that cpu’s RQ. > > To alleviate this, this change restricts the total number of pages > > a worker will try to reclaim maximum 50K pages in one go. > > The limit 50K is aligned with the current firmware capacity/limit of > > releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE > > device command. > > > > Our tests have shown significant benefit of this change in terms of > > time consumed by dma_pool_free(). > > During a test where an event was raised by HCA > > to release 1.3 Million pages, following observations were made: > > > > - Without this change: > > Number of mailbox messages allocated was around 20K, to accommodate > > the DMA addresses of 1.3 million pages. > > The average time spent by dma_pool_free() to free the DMA pool is between > > 16 usec to 32 usec. > > value ------------- Distribution ------------- count > > 256 | 0 > > 512 |@ 287 > > 1024 |@@@ 1332 > > 2048 |@ 656 > > 4096 |@@@@@ 2599 > > 8192 |@@@@@@@@@@ 4755 > > 16384 |@@@@@@@@@@@@@@@ 7545 > > 32768 |@@@@@ 2501 > > 65536 | 0 > > > > - With this change: > > Number of mailbox messages allocated was around 800; this was to > > accommodate DMA addresses of only 50K pages. > > The average time spent by dma_pool_free() to free the DMA pool in this case > > lies between 1 usec to 2 usec. > > value ------------- Distribution ------------- count > > 256 | 0 > > 512 |@@@@@@@@@@@@@@@@@@ 346 > > 1024 |@@@@@@@@@@@@@@@@@@@@@@ 435 > > 2048 | 0 > > 4096 | 0 > > 8192 | 1 > > 16384 | 0 > > > > Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> > > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> > > --- > > Changes in v4: > > - Fixed a nit in patch subject. > > --- > > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > index dcf58ef..06eee3a 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > @@ -608,6 +608,7 @@ enum { > > RELEASE_ALL_PAGES_MASK = 0x4000, > > }; > > > > +#define MAX_RECLAIM_NPAGES -50000 It would be traditional to enclose a negative value in (). (Although only 30+ year old compilers would generate unexpected code for foo-MAX_RECLAIM_NPAGES and you have to go back into the 1970s for foo=MAX_RECLAIM_NPAGES to be a problem.) > > static int req_pages_handler(struct notifier_block *nb, > > unsigned long type, void *data) > > { > > @@ -639,9 +640,13 @@ static int req_pages_handler(struct notifier_block *nb, > > > > req->dev = dev; > > req->func_id = func_id; > > - req->npages = npages; > > req->ec_function = ec_function; > > req->release_all = release_all; > > + if (npages < MAX_RECLAIM_NPAGES) > > + req->npages = MAX_RECLAIM_NPAGES; > > + else > > + req->npages = npages; > > + > > BTW, this can be written as: > req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); That shouldn't need all the (s32) casts. (I don't think it even needed them before I relaxed the type check.) David > > Thanks > > > INIT_WORK(&req->work, pages_work_handler); > > queue_work(dev->priv.pg_wq, &req->work); > > return NOTIFY_OK; > > -- > > 1.8.3.1 > > - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Jun 28, 2024 at 03:13:45PM +0000, David Laight wrote: > From: Leon Romanovsky > > Sent: 24 June 2024 10:58 > > > > On Wed, Jun 19, 2024 at 06:58:27PM +0530, Anand Khoje wrote: > > > In non FLR context, at times CX-5 requests release of ~8 million FW pages. > > > This needs humongous number of cmd mailboxes, which to be released once > > > the pages are reclaimed. Release of humongous number of cmd mailboxes is > > > consuming cpu time running into many seconds. Which with non preemptible > > > kernels is leading to critical process starving on that cpu’s RQ. > > > To alleviate this, this change restricts the total number of pages > > > a worker will try to reclaim maximum 50K pages in one go. > > > The limit 50K is aligned with the current firmware capacity/limit of > > > releasing 50K pages at once per MLX5_CMD_OP_MANAGE_PAGES + MLX5_PAGES_TAKE > > > device command. > > > > > > Our tests have shown significant benefit of this change in terms of > > > time consumed by dma_pool_free(). > > > During a test where an event was raised by HCA > > > to release 1.3 Million pages, following observations were made: > > > > > > - Without this change: > > > Number of mailbox messages allocated was around 20K, to accommodate > > > the DMA addresses of 1.3 million pages. > > > The average time spent by dma_pool_free() to free the DMA pool is between > > > 16 usec to 32 usec. > > > value ------------- Distribution ------------- count > > > 256 | 0 > > > 512 |@ 287 > > > 1024 |@@@ 1332 > > > 2048 |@ 656 > > > 4096 |@@@@@ 2599 > > > 8192 |@@@@@@@@@@ 4755 > > > 16384 |@@@@@@@@@@@@@@@ 7545 > > > 32768 |@@@@@ 2501 > > > 65536 | 0 > > > > > > - With this change: > > > Number of mailbox messages allocated was around 800; this was to > > > accommodate DMA addresses of only 50K pages. > > > The average time spent by dma_pool_free() to free the DMA pool in this case > > > lies between 1 usec to 2 usec. > > > value ------------- Distribution ------------- count > > > 256 | 0 > > > 512 |@@@@@@@@@@@@@@@@@@ 346 > > > 1024 |@@@@@@@@@@@@@@@@@@@@@@ 435 > > > 2048 | 0 > > > 4096 | 0 > > > 8192 | 1 > > > 16384 | 0 > > > > > > Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> > > > Reviewed-by: Leon Romanovsky <leonro@nvidia.com> > > > --- > > > Changes in v4: > > > - Fixed a nit in patch subject. > > > --- > > > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > > index dcf58ef..06eee3a 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > > > @@ -608,6 +608,7 @@ enum { > > > RELEASE_ALL_PAGES_MASK = 0x4000, > > > }; > > > > > > +#define MAX_RECLAIM_NPAGES -50000 > > It would be traditional to enclose a negative value in (). > (Although only 30+ year old compilers would generate unexpected code for > foo-MAX_RECLAIM_NPAGES > and you have to go back into the 1970s for > foo=MAX_RECLAIM_NPAGES > to be a problem.) > > > > static int req_pages_handler(struct notifier_block *nb, > > > unsigned long type, void *data) > > > { > > > @@ -639,9 +640,13 @@ static int req_pages_handler(struct notifier_block *nb, > > > > > > req->dev = dev; > > > req->func_id = func_id; > > > - req->npages = npages; > > > req->ec_function = ec_function; > > > req->release_all = release_all; > > > + if (npages < MAX_RECLAIM_NPAGES) > > > + req->npages = MAX_RECLAIM_NPAGES; > > > + else > > > + req->npages = npages; > > > + > > > > BTW, this can be written as: > > req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); > > That shouldn't need all the (s32) casts. #define doesn't have a type, so it is better to be explicit here. > (I don't think it even needed them before I relaxed the type check.) > > David > > > > > Thanks > > > > > INIT_WORK(&req->work, pages_work_handler); > > > queue_work(dev->priv.pg_wq, &req->work); > > > return NOTIFY_OK; > > > -- > > > 1.8.3.1 > > > > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales)
From: Leon Romanovsky > Sent: 01 July 2024 12:50 > To: David Laight <David.Laight@ACULAB.COM> ... > > > BTW, this can be written as: > > > req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); > > > > That shouldn't need all the (s32) casts. > > #define doesn't have a type, so it is better to be explicit here. The constant has a type, the cast just hides any checking that might be done. Would you really write: if ((s32)npages > (s32)-50000) ... Because that is what you are generating. Here it probably doesn't matter, but it is really bad practise. If, for example, 'npages' is 64 bit than the cast can change the value. There are places where the type of the result has been used and caused bugs because a value got masked to 16 bits. In reality there should be a big trawl through the code to try to remove all the 'pointless' min_t() and max_t(). Adding new ones just makes it worse. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Jul 15, 2024 at 08:22:19AM +0000, David Laight wrote: > From: Leon Romanovsky > > Sent: 01 July 2024 12:50 > > To: David Laight <David.Laight@ACULAB.COM> > ... > > > > BTW, this can be written as: > > > > req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); > > > > > > That shouldn't need all the (s32) casts. > > > > #define doesn't have a type, so it is better to be explicit here. > > The constant has a type, the cast just hides any checking that might be done. According to the C standard, type can be int, long int or long long int: "The type of an integer constant is the first of the corresponding list in which its value can be represented.". > Would you really write: > if ((s32)npages > (s32)-50000) > ... > Because that is what you are generating. > So instead of comparing s32 with int, which can be misleading. I prefer to compare s32 with s32. > Here it probably doesn't matter, but it is really bad practise. > If, for example, 'npages' is 64 bit than the cast can change the value. In our case npages is s32. Thanks
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index dcf58ef..06eee3a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -608,6 +608,7 @@ enum { RELEASE_ALL_PAGES_MASK = 0x4000, }; +#define MAX_RECLAIM_NPAGES -50000 static int req_pages_handler(struct notifier_block *nb, unsigned long type, void *data) { @@ -639,9 +640,13 @@ static int req_pages_handler(struct notifier_block *nb, req->dev = dev; req->func_id = func_id; - req->npages = npages; req->ec_function = ec_function; req->release_all = release_all; + if (npages < MAX_RECLAIM_NPAGES) + req->npages = MAX_RECLAIM_NPAGES; + else + req->npages = npages; + INIT_WORK(&req->work, pages_work_handler); queue_work(dev->priv.pg_wq, &req->work); return NOTIFY_OK;