mbox series

[0/8] Add Host control mode to HPB

Message ID 20210127151217.24760-1-avri.altman@wdc.com (mailing list archive)
Headers show
Series Add Host control mode to HPB | expand

Message

Avri Altman Jan. 27, 2021, 3:12 p.m. UTC
The HPB spec defines 2 control modes - device control mode and host
control mode. In oppose to device control mode, in which the host obey
to whatever recommendation received from the device - In host control
mode, the host uses its own algorithms to decide which regions should
be activated or inactivated.

We kept the host managed heuristic simple and concise.

Aside from adding a by-spec functionality, host control mode entails
some further potential benefits: makes the hpb logic transparent and
readable, while allow tuning / scaling its various parameters, and
utilize system-wide info to optimize HPB potential.

This series is based on Samsung's V18 device-control HPB1.0 driver, see
msg-id: 20201222015704epcms2p643f0c5011064a7ce56b08331811a8509@epcms2p6
in lore.kernel.org. The patches are also available in wdc ufs repo:
https://github.com/westerndigitalcorporation/WDC-UFS-REPO/tree/hpb-v18

This version was tested on Galaxy S20, and Xiaomi Mi10 pro.
Your meticulous review and testing is mostly welcome and appreciated.

Thanks,
Avri

Avri Altman (8):
  scsi: ufshpb: Cache HPB Control mode on init
  scsi: ufshpb: Add host control mode support to rsp_upiu
  scsi: ufshpb: Add region's reads counter
  scsi: ufshpb: Make eviction depends on region's reads
  scsi: ufshpb: Region inactivation in host mode
  scsi: ufshpb: Add hpb dev reset response
  scsi: ufshpb: Add "Cold" regions timer
  scsi: ufshpb: Add support for host control mode

 drivers/scsi/ufs/ufshpb.c | 430 +++++++++++++++++++++++++++++++++-----
 drivers/scsi/ufs/ufshpb.h |  23 ++
 2 files changed, 406 insertions(+), 47 deletions(-)

Comments

Greg Kroah-Hartman Jan. 27, 2021, 3:19 p.m. UTC | #1
On Wed, Jan 27, 2021 at 05:12:10PM +0200, Avri Altman wrote:
> We will use it later, when we'll need to differentiate between device
> and host control modes.
> 
> Signed-off-by: Avri Altman <avri.altman@wdc.com>
> ---
>  drivers/scsi/ufs/ufshpb.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> index d3e6c5b32328..183bdf35f2d0 100644
> --- a/drivers/scsi/ufs/ufshpb.c
> +++ b/drivers/scsi/ufs/ufshpb.c
> @@ -26,6 +26,8 @@ static int tot_active_srgn_pages;
>  
>  static struct workqueue_struct *ufshpb_wq;
>  
> +static enum UFSHPB_MODE ufshpb_mode;

How are you allowed to have a single variable for a device-specific
thing?  What happens when you have two controllers or disks or whatever
you are binding to here?  How does this work at all?

This should be per-device, right?

thanks,

greg k-h
Greg Kroah-Hartman Jan. 27, 2021, 3:21 p.m. UTC | #2
On Wed, Jan 27, 2021 at 05:12:12PM +0200, Avri Altman wrote:
> In host control mode, reads are the major source of activation trials.
> Keep track of those reads counters, for both active as well inactive
> regions.
> 
> We reset the read counter upon write - we are only interested in "clean"
> reads.  less intuitive however, is that we also reset it upon region's
> deactivation.  Region deactivation is often due to the fact that
> eviction took place: a region become active on the expense of another.
> This is happening when the max-active-regions limit has crossed. If we
> don’t reset the counter, we will trigger a lot of trashing of the HPB
> database, since few reads (or even one) to the region that was
> deactivated, will trigger a re-activation trial.
> 
> Keep those counters normalized, as we are using those reads as a
> comparative score, to make various decisions.
> If during consecutive normalizations an active region has exhaust its
> reads - inactivate it.
> 
> Signed-off-by: Avri Altman <avri.altman@wdc.com>
> ---
>  drivers/scsi/ufs/ufshpb.c | 96 +++++++++++++++++++++++++++++++++------
>  drivers/scsi/ufs/ufshpb.h |  5 ++
>  2 files changed, 86 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
> index 5fa1f5bc08e6..51c3607166bc 100644
> --- a/drivers/scsi/ufs/ufshpb.c
> +++ b/drivers/scsi/ufs/ufshpb.c
> @@ -16,6 +16,9 @@
>  #include "ufshpb.h"
>  #include "../sd.h"
>  
> +#define WORK_PENDING 0
> +#define ACTIVATION_THRSHLD 4 /* 4 IOs */
> +
>  /* memory management */
>  static struct kmem_cache *ufshpb_mctx_cache;
>  static mempool_t *ufshpb_mctx_pool;
> @@ -261,6 +264,21 @@ ufshpb_set_hpb_read_to_upiu(struct ufshpb_lu *hpb, struct ufshcd_lrb *lrbp,
>  	cdb[14] = transfer_len;
>  }
>  
> +static void ufshpb_update_active_info(struct ufshpb_lu *hpb, int rgn_idx,
> +				      int srgn_idx)
> +{
> +	struct ufshpb_region *rgn;
> +	struct ufshpb_subregion *srgn;
> +
> +	rgn = hpb->rgn_tbl + rgn_idx;
> +	srgn = rgn->srgn_tbl + srgn_idx;
> +
> +	list_del_init(&rgn->list_inact_rgn);
> +
> +	if (list_empty(&srgn->list_act_srgn))
> +		list_add_tail(&srgn->list_act_srgn, &hpb->lh_act_srgn);
> +}
> +
>  /*
>   * This function will set up HPB read command using host-side L2P map data.
>   * In HPB v1.0, maximum size of HPB read command is 4KB.
> @@ -276,6 +294,7 @@ void ufshpb_prep(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
>  	unsigned long flags;
>  	int transfer_len, rgn_idx, srgn_idx, srgn_offset;
>  	int err = 0;
> +	u64 reads;
>  
>  	hpb = ufshpb_get_hpb_data(cmd->device);
>  	if (!hpb)
> @@ -306,12 +325,39 @@ void ufshpb_prep(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
>  		ufshpb_set_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset,
>  				 transfer_len);
>  		spin_unlock_irqrestore(&hpb->rgn_state_lock, flags);
> +
> +		if (ufshpb_mode == HPB_HOST_CONTROL)
> +			atomic64_set(&rgn->reads, 0);
> +
>  		return;
>  	}
>  
> +	if (ufshpb_mode == HPB_HOST_CONTROL)
> +		reads = atomic64_inc_return(&rgn->reads);
> +
>  	if (!ufshpb_is_support_chunk(transfer_len))
>  		return;
>  
> +	if (ufshpb_mode == HPB_HOST_CONTROL) {
> +		/*
> +		 * in host control mode, reads are the main source for
> +		 * activation trials.
> +		 */
> +		if (reads == ACTIVATION_THRSHLD) {
> +			spin_lock_irqsave(&hpb->rsp_list_lock, flags);
> +			ufshpb_update_active_info(hpb, rgn_idx, srgn_idx);
> +			hpb->stats.rb_active_cnt++;
> +			spin_unlock_irqrestore(&hpb->rsp_list_lock, flags);
> +			dev_dbg(&hpb->sdev_ufs_lu->sdev_dev,
> +				"activate region %d-%d\n", rgn_idx, srgn_idx);
> +		}
> +
> +		/* keep those counters normalized */
> +		if (reads > hpb->entries_per_srgn &&
> +		    !test_and_set_bit(WORK_PENDING, &hpb->work_data_bits))
> +			schedule_work(&hpb->ufshpb_normalization_work);
> +	}
> +
>  	spin_lock_irqsave(&hpb->rgn_state_lock, flags);
>  	if (ufshpb_test_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset,
>  				   transfer_len)) {
> @@ -396,21 +442,6 @@ static int ufshpb_clear_dirty_bitmap(struct ufshpb_lu *hpb,
>  	return 0;
>  }
>  
> -static void ufshpb_update_active_info(struct ufshpb_lu *hpb, int rgn_idx,
> -				      int srgn_idx)
> -{
> -	struct ufshpb_region *rgn;
> -	struct ufshpb_subregion *srgn;
> -
> -	rgn = hpb->rgn_tbl + rgn_idx;
> -	srgn = rgn->srgn_tbl + srgn_idx;
> -
> -	list_del_init(&rgn->list_inact_rgn);
> -
> -	if (list_empty(&srgn->list_act_srgn))
> -		list_add_tail(&srgn->list_act_srgn, &hpb->lh_act_srgn);
> -}
> -
>  static void ufshpb_update_inactive_info(struct ufshpb_lu *hpb, int rgn_idx)
>  {
>  	struct ufshpb_region *rgn;
> @@ -646,6 +677,9 @@ static void __ufshpb_evict_region(struct ufshpb_lu *hpb,
>  
>  	ufshpb_cleanup_lru_info(lru_info, rgn);
>  
> +	if (ufshpb_mode == HPB_HOST_CONTROL)
> +		atomic64_set(&rgn->reads, 0);
> +
>  	for_each_sub_region(rgn, srgn_idx, srgn)
>  		ufshpb_purge_active_subregion(hpb, srgn);
>  }
> @@ -1044,6 +1078,33 @@ static void ufshpb_run_inactive_region_list(struct ufshpb_lu *hpb)
>  	spin_unlock_irqrestore(&hpb->rsp_list_lock, flags);
>  }
>  
> +static void ufshpb_normalization_work_handler(struct work_struct *work)
> +{
> +	struct ufshpb_lu *hpb;
> +	int rgn_idx;
> +
> +	hpb = container_of(work, struct ufshpb_lu, ufshpb_normalization_work);
> +
> +	for (rgn_idx = 0; rgn_idx < hpb->rgns_per_lu; rgn_idx++) {
> +		struct ufshpb_region *rgn = hpb->rgn_tbl + rgn_idx;
> +		u64 reads = atomic64_read(&rgn->reads);
> +
> +		if (reads)
> +			atomic64_set(&rgn->reads, reads >> 1);
> +
> +		if (rgn->rgn_state != HPB_RGN_ACTIVE ||
> +			atomic64_read(&rgn->reads))
> +			continue;
> +
> +		/* if region is active but has no reads - inactivate it */
> +		spin_lock(&hpb->rsp_list_lock);
> +		ufshpb_update_inactive_info(hpb, rgn->rgn_idx);
> +		spin_unlock(&hpb->rsp_list_lock);
> +	}
> +
> +	clear_bit(WORK_PENDING, &hpb->work_data_bits);
> +}
> +
>  static void ufshpb_map_work_handler(struct work_struct *work)
>  {
>  	struct ufshpb_lu *hpb = container_of(work, struct ufshpb_lu, map_work);
> @@ -1308,6 +1369,9 @@ static int ufshpb_lu_hpb_init(struct ufs_hba *hba, struct ufshpb_lu *hpb)
>  	INIT_LIST_HEAD(&hpb->list_hpb_lu);
>  
>  	INIT_WORK(&hpb->map_work, ufshpb_map_work_handler);
> +	if (ufshpb_mode == HPB_HOST_CONTROL)
> +		INIT_WORK(&hpb->ufshpb_normalization_work,
> +			  ufshpb_normalization_work_handler);
>  
>  	hpb->map_req_cache = kmem_cache_create("ufshpb_req_cache",
>  			  sizeof(struct ufshpb_req), 0, 0, NULL);
> @@ -1394,6 +1458,8 @@ static void ufshpb_discard_rsp_lists(struct ufshpb_lu *hpb)
>  
>  static void ufshpb_cancel_jobs(struct ufshpb_lu *hpb)
>  {
> +	if (ufshpb_mode == HPB_HOST_CONTROL)
> +		cancel_work_sync(&hpb->ufshpb_normalization_work);
>  	cancel_work_sync(&hpb->map_work);
>  }
>  
> diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
> index 8a34b0f42754..b0e78728af38 100644
> --- a/drivers/scsi/ufs/ufshpb.h
> +++ b/drivers/scsi/ufs/ufshpb.h
> @@ -115,6 +115,9 @@ struct ufshpb_region {
>  	/* below information is used by lru */
>  	struct list_head list_lru_rgn;
>  	unsigned long rgn_flags;
> +
> +	/* region reads - for host mode */
> +	atomic64_t reads;

Why do you need an atomic variable for this?  What are you trying to
"protect" here by flushing the cpus all the time?  What protects this
variable from changing right after you have read from it (hint, you do
that above...)

atomics are not race-free, use a real lock if you need that.

thanks,

greg k-h
Avri Altman Jan. 31, 2021, 7:08 a.m. UTC | #3
> >
> > +static enum UFSHPB_MODE ufshpb_mode;
> 
> How are you allowed to have a single variable for a device-specific
> thing?  What happens when you have two controllers or disks or whatever
> you are binding to here?  How does this work at all?
> 
> This should be per-device, right?
Right. Done.

Not being bickering,  AFAIK, there aren't, nor will be in the foreseen future, any multi-ufs-hosts designs.
There were some talks in the past about ufs cards, but this is officially off the table.

Thanks,
Avri
Greg Kroah-Hartman Jan. 31, 2021, 7:13 a.m. UTC | #4
On Sun, Jan 31, 2021 at 07:08:00AM +0000, Avri Altman wrote:
> > >
> > > +static enum UFSHPB_MODE ufshpb_mode;
> > 
> > How are you allowed to have a single variable for a device-specific
> > thing?  What happens when you have two controllers or disks or whatever
> > you are binding to here?  How does this work at all?
> > 
> > This should be per-device, right?
> Right. Done.
> 
> Not being bickering,  AFAIK, there aren't, nor will be in the foreseen future, any multi-ufs-hosts designs.

Why not?  What prevents someone from putting 2 PCI ufs host controllers
in a system tomorrow?

> There were some talks in the past about ufs cards, but this is officially off the table.

Never say never :)

Seriously, how can you somehow ensure that a random manufacturer will
not do this?

thanks,

greg k-h
Avri Altman Jan. 31, 2021, 7:17 a.m. UTC | #5
> On Sun, Jan 31, 2021 at 07:08:00AM +0000, Avri Altman wrote:
> > > >
> > > > +static enum UFSHPB_MODE ufshpb_mode;
> > >
> > > How are you allowed to have a single variable for a device-specific
> > > thing?  What happens when you have two controllers or disks or whatever
> > > you are binding to here?  How does this work at all?
> > >
> > > This should be per-device, right?
> > Right. Done.
> >
> > Not being bickering,  AFAIK, there aren't, nor will be in the foreseen future,
> any multi-ufs-hosts designs.
> 
> Why not?  What prevents someone from putting 2 PCI ufs host controllers
> in a system tomorrow?
> 
> > There were some talks in the past about ufs cards, but this is officially off
> the table.
> 
> Never say never :)
> 
> Seriously, how can you somehow ensure that a random manufacturer will
> not do this?
Better let the platform vendors answer this.

As for your comment - you are obviously right - I will fix this.

Thanks,
Avri
Avri Altman Jan. 31, 2021, 7:25 a.m. UTC | #6
> >
> > +     if (ufshpb_mode == HPB_HOST_CONTROL)
> > +             reads = atomic64_inc_return(&rgn->reads);
> > +
> >       if (!ufshpb_is_support_chunk(transfer_len))
> >               return;
> >
> > +     if (ufshpb_mode == HPB_HOST_CONTROL) {
> > +             /*
> > +              * in host control mode, reads are the main source for
> > +              * activation trials.
> > +              */
> > +             if (reads == ACTIVATION_THRSHLD) {
Oops - this is a bug...

> > +
> > +     /* region reads - for host mode */
> > +     atomic64_t reads;
> 
> Why do you need an atomic variable for this?  What are you trying to
> "protect" here by flushing the cpus all the time?  What protects this
> variable from changing right after you have read from it (hint, you do
> that above...)
> 
> atomics are not race-free, use a real lock if you need that.
We are on the data path here - this is called from queuecommand.
The "reads" counter is being symmetrically read and written,
so adding a spin lock here might become a source for contention.

Also I am not worried about the exact value of this counter, except of one place - 
See above.  Will fix it.

Thanks,
Avri
Greg Kroah-Hartman Jan. 31, 2021, 7:35 a.m. UTC | #7
On Sun, Jan 31, 2021 at 07:25:37AM +0000, Avri Altman wrote:
> > >
> > > +     if (ufshpb_mode == HPB_HOST_CONTROL)
> > > +             reads = atomic64_inc_return(&rgn->reads);
> > > +
> > >       if (!ufshpb_is_support_chunk(transfer_len))
> > >               return;
> > >
> > > +     if (ufshpb_mode == HPB_HOST_CONTROL) {
> > > +             /*
> > > +              * in host control mode, reads are the main source for
> > > +              * activation trials.
> > > +              */
> > > +             if (reads == ACTIVATION_THRSHLD) {
> Oops - this is a bug...
> 
> > > +
> > > +     /* region reads - for host mode */
> > > +     atomic64_t reads;
> > 
> > Why do you need an atomic variable for this?  What are you trying to
> > "protect" here by flushing the cpus all the time?  What protects this
> > variable from changing right after you have read from it (hint, you do
> > that above...)
> > 
> > atomics are not race-free, use a real lock if you need that.
> We are on the data path here - this is called from queuecommand.
> The "reads" counter is being symmetrically read and written,
> so adding a spin lock here might become a source for contention.

And an atomic varible is not?  You do know what spinlocks are made of,
right?  :)

> Also I am not worried about the exact value of this counter, except of one place - 
> See above.  Will fix it.

So it's not really needed?

thanks,

greg k-h