[rdma-next,v1,0/7] MR cache enhancement

Message ID	cover.1640862842.git.leonro@nvidia.com (mailing list archive)
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Leon Romanovsky <leon@kernel.org> To: Jason Gunthorpe <jgg@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com>, Aharon Landau <aharonl@nvidia.com>, linux-rdma@vger.kernel.org Subject: [PATCH rdma-next v1 0/7] MR cache enhancement Date: Thu, 30 Dec 2021 13:23:17 +0200 Message-Id: <cover.1640862842.git.leonro@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	MR cache enhancement \| expand [rdma-next,v1,0/7] MR cache enhancement [rdma-next,v1,1/7] RDMA/mlx5: Merge similar flows of allocating MR from the cache [rdma-next,v1,2/7] RDMA/mlx5: Replace cache list with Xarray [rdma-next,v1,3/7] RDMA/mlx5: Store in the cache mkeys instead of mrs [rdma-next,v1,4/7] RDMA/mlx5: Reorder calls to pcie_relaxed_ordering_enabled() [rdma-next,v1,5/7] RDMA/mlx5: Change the cache structure to an RB-tree [rdma-next,v1,6/7] RDMA/mlx5: Delay the deregistration of a non-cache mkey [rdma-next,v1,7/7] RDMA/mlx5: Rename the mkey cache variables and functions

Message ID

cover.1640862842.git.leonro@nvidia.com (mailing list archive)

Headers

From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>,
        Aharon Landau <aharonl@nvidia.com>, linux-rdma@vger.kernel.org
Subject: [PATCH rdma-next v1 0/7] MR cache enhancement
Date: Thu, 30 Dec 2021 13:23:17 +0200
Message-Id: <cover.1640862842.git.leonro@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

MR cache enhancement | expand

Message

Leon Romanovsky Dec. 30, 2021, 11:23 a.m. UTC

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v1:
 * Based on DM revert https://lore.kernel.org/all/20211222101312.1358616-1-maorg@nvidia.com
v0: https://lore.kernel.org/all/cover.1638781506.git.leonro@nvidia.com

---------------------------------------------------------
Hi,

This series from Aharon refactors mlx5 MR cache management logic to
speedup deregistration significantly.

Thanks

Aharon Landau (7):
  RDMA/mlx5: Merge similar flows of allocating MR from the cache
  RDMA/mlx5: Replace cache list with Xarray
  RDMA/mlx5: Store in the cache mkeys instead of mrs
  RDMA/mlx5: Reorder calls to pcie_relaxed_ordering_enabled()
  RDMA/mlx5: Change the cache structure to an RB-tree
  RDMA/mlx5: Delay the deregistration of a non-cache mkey
  RDMA/mlx5: Rename the mkey cache variables and functions

 drivers/infiniband/hw/mlx5/main.c    |    4 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   76 +-
 drivers/infiniband/hw/mlx5/mr.c      | 1021 +++++++++++++++++---------
 drivers/infiniband/hw/mlx5/odp.c     |   72 +-
 include/linux/mlx5/driver.h          |    7 +-
 5 files changed, 741 insertions(+), 439 deletions(-)

Comments

Leon Romanovsky Jan. 3, 2022, 12:17 p.m. UTC | #1

On Sun, Jan 02, 2022 at 11:03:10AM +0800, Hillf Danton wrote:
> On Thu, 30 Dec 2021 13:23:23 +0200
> > From: Aharon Landau <aharonl@nvidia.com>
> > 
> > When restarting an application with many non-cached mkeys, all the mkeys
> > will be destroyed and then recreated.
> > 
> > This process takes a long time (about 20 seconds for deregistration and
> > 28 seconds for registration of 100,000 MRs).
> > 
> > To shorten the restart runtime, insert the mkeys temporarily into the
> > cache and schedule a delayed work to destroy them later. If there is no
> > fitting entry to these mkeys, create a temporary entry that fits them.
> > 
> > If 30 seconds have passed and no user reclaimed the temporarily cached
> > mkeys, the scheduled work will destroy the mkeys and the temporary
> > entries.
> > 
> > When restarting an application, the mkeys will still be in the cache
> > when trying to reg them again, therefore, the registration will be
> > faster (4 seconds for deregistration and 5 seconds or registration of
> > 100,000 MRs).
> > 
> > Signed-off-by: Aharon Landau <aharonl@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h |   3 +
> >  drivers/infiniband/hw/mlx5/mr.c      | 131 ++++++++++++++++++++++++++-
> >  2 files changed, 132 insertions(+), 2 deletions(-)

<...>

> > +	if (!ent->is_tmp)
> > +		mr->mmkey.cache_ent = ent;
> > +	else {
> > +		ent->total_mrs--;
> > +		cancel_delayed_work(&ent->dev->cache.remove_ent_dwork);
> > +		queue_delayed_work(ent->dev->cache.wq,
> > +				   &ent->dev->cache.remove_ent_dwork,
> > +				   msecs_to_jiffies(30 * 1000));
> > +	}
> 
> Nit: collapse cancel and queue into mod_delayed_work().
> 
> >  }

<...>

> > +	INIT_WORK(&ent->work, cache_work_func);
> > +	INIT_DELAYED_WORK(&ent->dwork, delayed_cache_work_func);
> 
> More important IMHO is to cut work in a seperate patch given that dwork can
> be queued with zero delay and both work callbacks are simple wrappers of
> __cache_work_func(). 

Thanks, I'll collect more feedback and resubmit.

> 
> Hillf