diff mbox

[v2] dm pref-path: provides preferred path load balance policy

Message ID 1453469502-15606-1-git-send-email-ravikanth.nalla@hpe.com (mailing list archive)
State Rejected, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Ravikanth Nalla Jan. 22, 2016, 1:31 p.m. UTC
v2:
  - changes merged with latest mainline and functionality re-verified.
  - performed additional tests to illustrate performance benefits of
    using this feature in certain configuration.

In a dm multipath environment, providing end user with an option of
selecting preferred path for an I/O in the SAN based on path speed,
health status and user preference is found to be useful. This allows
a user to select a reliable path over flakey/bad paths thereby
achieving higher I/O success rate. The specific scenario in which
it is found to be useful is where a user has a need to eliminate
the paths experiencing frequent I/O errors due to SAN failures and
use the best performing path for I/O whenever it is available.

Another scenario where it is found to be useful is in providing
option for user to select a high speed path (say 16GB/8GB FC)
over alternative low speed paths (4GB/2GB FC).

A new dm path selector kernel loadable module named "dm_pref_path"
is introduced to handle preferred path load balance policy
(pref-path) operations. The key operations of this policy is to
select and return user specified path from the current discovered
online/ healthy paths. If the user specified path do not exist in
the online/ healthy paths list due to path being currently in
failed state or user has mentioned wrong device information, it
will fall back to round-robin policy, where all the online/ healthy
paths are given equal preference.

Functionality provided in this module is verified on wide variety
of servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets).
Additionally in some specific multipathing configurations involving
varied path speeds, proposed preferred path policy provided some
performance improvements over existing round-robin and service-time
load balance policies.

Signed-off-by: Ravikanth Nalla <ravikanth.nalla@hpe.com>
---
 Documentation/device-mapper/dm-pref-path.txt |  52 ++++++
 drivers/md/Makefile                          |   6 +-
 drivers/md/dm-pref-path.c                    | 249 +++++++++++++++++++++++++++
 3 files changed, 304 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/device-mapper/dm-pref-path.txt
 create mode 100644 drivers/md/dm-pref-path.c

Comments

Hannes Reinecke Jan. 22, 2016, 1:42 p.m. UTC | #1
On 01/22/2016 02:31 PM, Ravikanth Nalla wrote:
> v2:
>   - changes merged with latest mainline and functionality re-verified.
>   - performed additional tests to illustrate performance benefits of
>     using this feature in certain configuration.
> 
> In a dm multipath environment, providing end user with an option of
> selecting preferred path for an I/O in the SAN based on path speed,
> health status and user preference is found to be useful. This allows
> a user to select a reliable path over flakey/bad paths thereby
> achieving higher I/O success rate. The specific scenario in which
> it is found to be useful is where a user has a need to eliminate
> the paths experiencing frequent I/O errors due to SAN failures and
> use the best performing path for I/O whenever it is available.
> 
> Another scenario where it is found to be useful is in providing
> option for user to select a high speed path (say 16GB/8GB FC)
> over alternative low speed paths (4GB/2GB FC).
> 
> A new dm path selector kernel loadable module named "dm_pref_path"
> is introduced to handle preferred path load balance policy
> (pref-path) operations. The key operations of this policy is to
> select and return user specified path from the current discovered
> online/ healthy paths. If the user specified path do not exist in
> the online/ healthy paths list due to path being currently in
> failed state or user has mentioned wrong device information, it
> will fall back to round-robin policy, where all the online/ healthy
> paths are given equal preference.
> 
> Functionality provided in this module is verified on wide variety
> of servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets).
> Additionally in some specific multipathing configurations involving
> varied path speeds, proposed preferred path policy provided some
> performance improvements over existing round-robin and service-time
> load balance policies.
> 
Shouldn't service-time provide similar functionality?
After all, for all scenarios described above the preferred path
would have a lower service time, so they should be selected
automatically, no?

Cheers,

Hannes
Mike Snitzer Jan. 22, 2016, 4:59 p.m. UTC | #2
[Hannes please fix your mail client, seems you dropped all the original CCs]

On Fri, Jan 22 2016 at  8:42am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 01/22/2016 02:31 PM, Ravikanth Nalla wrote:
> > v2:
> >   - changes merged with latest mainline and functionality re-verified.
> >   - performed additional tests to illustrate performance benefits of
> >     using this feature in certain configuration.
> > 
> > In a dm multipath environment, providing end user with an option of
> > selecting preferred path for an I/O in the SAN based on path speed,
> > health status and user preference is found to be useful. This allows
> > a user to select a reliable path over flakey/bad paths thereby
> > achieving higher I/O success rate. The specific scenario in which
> > it is found to be useful is where a user has a need to eliminate
> > the paths experiencing frequent I/O errors due to SAN failures and
> > use the best performing path for I/O whenever it is available.
> > 
> > Another scenario where it is found to be useful is in providing
> > option for user to select a high speed path (say 16GB/8GB FC)
> > over alternative low speed paths (4GB/2GB FC).
> > 
> > A new dm path selector kernel loadable module named "dm_pref_path"
> > is introduced to handle preferred path load balance policy
> > (pref-path) operations. The key operations of this policy is to
> > select and return user specified path from the current discovered
> > online/ healthy paths. If the user specified path do not exist in
> > the online/ healthy paths list due to path being currently in
> > failed state or user has mentioned wrong device information, it
> > will fall back to round-robin policy, where all the online/ healthy
> > paths are given equal preference.
> > 
> > Functionality provided in this module is verified on wide variety
> > of servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets).
> > Additionally in some specific multipathing configurations involving
> > varied path speeds, proposed preferred path policy provided some
> > performance improvements over existing round-robin and service-time
> > load balance policies.
> > 
> Shouldn't service-time provide similar functionality?
> After all, for all scenarios described above the preferred path
> would have a lower service time, so they should be selected
> automatically, no?

Yes, I'm thinking the same thing.  In fact the service-time also has the
ability to specify 'relative_throughput'
(see: Documentation/device-mapper/dm-service-time.txt).

I'm also missing why different path groups couldn't be used for fast vs
slow paths...

Basically: Ravikanth we need _proof_ that you've exhausted the
capabilities of the existing path selector policies (in conjunction with
path groups)

Not understanding why you verified "some performance improvements" but
stopped short of showing them.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski Jan. 22, 2016, 5:06 p.m. UTC | #3
On Fri, Jan 22, 2016 at 06:31:42AM -0700, Ravikanth Nalla wrote:
> v2:
>   - changes merged with latest mainline and functionality re-verified.
>   - performed additional tests to illustrate performance benefits of
>     using this feature in certain configuration.
> 
> In a dm multipath environment, providing end user with an option of
> selecting preferred path for an I/O in the SAN based on path speed,
> health status and user preference is found to be useful. This allows
> a user to select a reliable path over flakey/bad paths thereby
> achieving higher I/O success rate. The specific scenario in which
> it is found to be useful is where a user has a need to eliminate
> the paths experiencing frequent I/O errors due to SAN failures and
> use the best performing path for I/O whenever it is available.
> 
> Another scenario where it is found to be useful is in providing
> option for user to select a high speed path (say 16GB/8GB FC)
> over alternative low speed paths (4GB/2GB FC).
> 
> A new dm path selector kernel loadable module named "dm_pref_path"
> is introduced to handle preferred path load balance policy
> (pref-path) operations. The key operations of this policy is to
> select and return user specified path from the current discovered
> online/ healthy paths. If the user specified path do not exist in
> the online/ healthy paths list due to path being currently in
> failed state or user has mentioned wrong device information, it
> will fall back to round-robin policy, where all the online/ healthy
> paths are given equal preference.

This seems like a problem that has already been solved with path groups.
If the path(s) in your preferred path group are there, multipath will
use them.  If not, then it will use your less preferred path(s), and
load balance across them how ever you choose with the path_selectors.

I admit that we don't have a path prioritizer that does a good job of
allowing users to manually pick a specific path to prefer.  But it seems
to me that there is where we should be solving the issue.

-Ben
 
> Functionality provided in this module is verified on wide variety
> of servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets).
> Additionally in some specific multipathing configurations involving
> varied path speeds, proposed preferred path policy provided some
> performance improvements over existing round-robin and service-time
> load balance policies.
> 
> Signed-off-by: Ravikanth Nalla <ravikanth.nalla@hpe.com>
> ---
>  Documentation/device-mapper/dm-pref-path.txt |  52 ++++++
>  drivers/md/Makefile                          |   6 +-
>  drivers/md/dm-pref-path.c                    | 249 +++++++++++++++++++++++++++
>  3 files changed, 304 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/device-mapper/dm-pref-path.txt
>  create mode 100644 drivers/md/dm-pref-path.c
> 
> diff --git a/Documentation/device-mapper/dm-pref-path.txt b/Documentation/device-mapper/dm-pref-path.txt
> new file mode 100644
> index 0000000..0efb156b
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-pref-path.txt
> @@ -0,0 +1,52 @@
> +dm-pref-path
> +============
> +
> +dm-pref-path is a path selector module for device-mapper targets, which
> +selects a user specified path for the incoming I/O.
> +
> +The key operations of this policy to select and return user specified
> +path from the current discovered online/ healthy paths. If the user
> +specified path do not exist in the online/ healthy path list due to
> +path being currently in failed state or user has mentioned wrong device
> +information, it will fall back to round-robin policy, where all the
> +online/ healthy paths are given equal preference.
> +
> +The path selector name is 'pref-path'.
> +
> +Table parameters for each path: [<repeat_count>]
> +
> +Status for each path: <status> <fail-count>
> +	<status>: 'A' if the path is active, 'F' if the path is failed.
> +	<fail-count>: The number of path failures.
> +
> +Algorithm
> +=========
> +User is provided with an option to specify preferred path in DM
> +Multipath configuration file (/etc/multipath.conf) under multipath{}
> +section with a syntax "path_selector "pref-path 1 <device major>:<device minor>"".
> +
> +	1. The pref-path selector would search and return the matching user
> +        preferred path from the online/ healthy path list for incoming I/O.
> +
> +	2. If the user preferred path do not exist in the online/ healthy
> +        path list due to path being currently in failed state or user
> +        has mentioned wrong device information, it will fall back to
> +        round-robin policy, where all the online/ healthy paths are given
> +        equal preference.
> +
> +	3. If the user preferred path comes back online/ healthy, pref-path
> +        selector would find and return this path for incoming I/O.
> +
> +Examples
> +========
> +Consider 4 paths sdq, sdam, sdbh and sdcc, if user prefers path sdbh
> +with major:minor number 67:176 which has throughput of 8GB/s over other
> +paths of 4GB/s, pref-path policy will chose this sdbh path for all the
> +incoming I/O's.
> +
> +# dmsetup table Test_Lun_2
> +0 20971520 multipath 0 0 1 1 pref-path 0 4 1 66:80 10000 67:160 10000
> +68:240 10000 8:240 10000
> +
> +# dmsetup status Test_Lun_2
> +0 20971520 multipath 2 0 0 0 1 1 A 0 4 0 66:80 A 0 67:160 A 0 68:240 A
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index f34979c..5c9f4e9 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -20,8 +20,8 @@ md-mod-y	+= md.o bitmap.o
>  raid456-y	+= raid5.o raid5-cache.o
>  
>  # Note: link order is important.  All raid personalities
> -# and must come before md.o, as they each initialise 
> -# themselves, and md.o may use the personalities when it 
> +# and must come before md.o, as they each initialise
> +# themselves, and md.o may use the personalities when it
>  # auto-initialised.
>  
>  obj-$(CONFIG_MD_LINEAR)		+= linear.o
> @@ -41,7 +41,7 @@ obj-$(CONFIG_DM_BIO_PRISON)	+= dm-bio-prison.o
>  obj-$(CONFIG_DM_CRYPT)		+= dm-crypt.o
>  obj-$(CONFIG_DM_DELAY)		+= dm-delay.o
>  obj-$(CONFIG_DM_FLAKEY)		+= dm-flakey.o
> -obj-$(CONFIG_DM_MULTIPATH)	+= dm-multipath.o dm-round-robin.o
> +obj-$(CONFIG_DM_MULTIPATH)	+= dm-multipath.o dm-round-robin.o dm-pref-path.o
>  obj-$(CONFIG_DM_MULTIPATH_QL)	+= dm-queue-length.o
>  obj-$(CONFIG_DM_MULTIPATH_ST)	+= dm-service-time.o
>  obj-$(CONFIG_DM_SWITCH)		+= dm-switch.o
> diff --git a/drivers/md/dm-pref-path.c b/drivers/md/dm-pref-path.c
> new file mode 100644
> index 0000000..6bf1c76
> --- /dev/null
> +++ b/drivers/md/dm-pref-path.c
> @@ -0,0 +1,249 @@
> +/*
> + * (C) Copyright 2015 Hewlett Packard Enterprise Development LP.
> + *
> + * dm-pref-path.c
> + *
> + * Module Author: Ravikanth Nalla
> + *
> + * This program is free software; you can redistribute it
> + * and/or modify it under the terms of the GNU General Public
> + * License, version 2 as published by the Free Software Foundation;
> + * either version 2 of the License, or (at your option) any later
> + * version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * dm-pref-path path selector:
> + * Handles preferred path load balance policy operations. The key
> + * operations of this policy is to select and return user specified
> + * path from the current discovered online/ healthy paths(valid_paths).
> + * If the user specified path do not exist in the valid_paths list due
> + * to path being currently in failed state or user has mentioned wrong
> + * device information, it will fall back to round-robin policy, where
> + * all the valid-paths are given equal preference.
> + *
> + */
> +
> +#include "dm.h"
> +#include "dm-path-selector.h"
> +
> +#include <linux/slab.h>
> +#include <linux/ctype.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +#include <linux/atomic.h>
> +
> +#define DM_MSG_PREFIX	"multipath pref-path"
> +#define PP_MIN_IO       10000
> +#define PP_VERSION	"1.0.0"
> +#define BUFF_LEN         16
> +
> +/* Flag for pref_path enablement */
> +unsigned pref_path_enabled;
> +
> +/* pref_path major:minor number */
> +char pref_path[BUFF_LEN];
> +
> +struct selector {
> +	struct list_head	valid_paths;
> +	struct list_head	failed_paths;
> +};
> +
> +struct path_info {
> +	struct list_head	list;
> +	struct dm_path		*path;
> +	unsigned		repeat_count;
> +};
> +
> +static struct selector *alloc_selector(void)
> +{
> +	struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL);
> +
> +	if (s) {
> +		INIT_LIST_HEAD(&s->valid_paths);
> +		INIT_LIST_HEAD(&s->failed_paths);
> +	}
> +
> +	return s;
> +}
> +
> +static int pf_create(struct path_selector *ps, unsigned argc, char
> +**argv) {
> +	struct selector *s = alloc_selector();
> +
> +	if (!s)
> +		return -ENOMEM;
> +
> +	if ((argc == 1) && strlen(argv[0]) < BUFF_LEN) {
> +		pref_path_enabled = 1;
> +		snprintf(pref_path, (BUFF_LEN-1), "%s", argv[0]);
> +	}
> +
> +	ps->context = s;
> +	return 0;
> +}
> +
> +static void pf_free_paths(struct list_head *paths)
> +{
> +	struct path_info *pi, *next;
> +
> +	list_for_each_entry_safe(pi, next, paths, list) {
> +		list_del(&pi->list);
> +		kfree(pi);
> +	}
> +}
> +
> +static void pf_destroy(struct path_selector *ps)
> +{
> +	struct selector *s = ps->context;
> +
> +	pf_free_paths(&s->valid_paths);
> +	pf_free_paths(&s->failed_paths);
> +	kfree(s);
> +	ps->context = NULL;
> +}
> +
> +static int pf_status(struct path_selector *ps, struct dm_path *path,
> +		     status_type_t type, char *result, unsigned maxlen) {
> +	unsigned sz = 0;
> +	struct path_info *pi;
> +
> +	/* When called with NULL path, return selector status/args. */
> +	if (!path)
> +		DMEMIT("0 ");
> +	else {
> +		pi = path->pscontext;
> +
> +		if (type == STATUSTYPE_TABLE)
> +			DMEMIT("%u ", pi->repeat_count);
> +	}
> +
> +	return sz;
> +}
> +
> +static int pf_add_path(struct path_selector *ps, struct dm_path *path,
> +		       int argc, char **argv, char **error) {
> +	struct selector *s = ps->context;
> +	struct path_info *pi;
> +
> +	/*
> +	 * Arguments: [<pref-path>]
> +	 */
> +	if (argc > 1) {
> +		*error = "pref-path ps: incorrect number of arguments";
> +		return -EINVAL;
> +	}
> +
> +	/* Allocate the path information structure */
> +	pi = kmalloc(sizeof(*pi), GFP_KERNEL);
> +	if (!pi) {
> +		*error = "pref-path ps: Error allocating path information";
> +		return -ENOMEM;
> +	}
> +
> +	pi->path = path;
> +	pi->repeat_count = PP_MIN_IO;
> +
> +	path->pscontext = pi;
> +
> +	list_add_tail(&pi->list, &s->valid_paths);
> +
> +	return 0;
> +}
> +
> +static void pf_fail_path(struct path_selector *ps, struct dm_path
> +*path) {
> +	struct selector *s = ps->context;
> +	struct path_info *pi = path->pscontext;
> +
> +	list_move(&pi->list, &s->failed_paths); }
> +
> +static int pf_reinstate_path(struct path_selector *ps, struct dm_path
> +*path) {
> +	struct selector *s = ps->context;
> +	struct path_info *pi = path->pscontext;
> +
> +	list_move_tail(&pi->list, &s->valid_paths);
> +
> +	return 0;
> +}
> +
> +/*
> + * Return user preferred path for an I/O.
> + */
> +static struct dm_path *pf_select_path(struct path_selector *ps,
> +				      unsigned *repeat_count, size_t nr_bytes) {
> +	struct selector *s = ps->context;
> +	struct path_info *pi = NULL, *best = NULL;
> +
> +	if (list_empty(&s->valid_paths))
> +		return NULL;
> +
> +	if (pref_path_enabled) {
> +		/* search for preferred path in the
> +		*  valid list and then return.
> +		*/
> +		list_for_each_entry(pi, &s->valid_paths, list) {
> +			if (!strcmp(pi->path->dev->name, pref_path)) {
> +				best = pi;
> +				*repeat_count = best->repeat_count;
> +				break;
> +			}
> +		}
> +	}
> +
> +	/* If preferred path is not enabled/ not available/
> +	*  offline chose the next path in the list.
> +	*/
> +	if (best == NULL && !list_empty(&s->valid_paths)) {
> +		pi = list_entry(s->valid_paths.next,
> +			struct path_info, list);
> +		list_move_tail(&pi->list, &s->valid_paths);
> +		best = pi;
> +		*repeat_count = best->repeat_count;
> +	}
> +
> +	return best ? best->path : NULL;
> +}
> +
> +static struct path_selector_type pf_ps = {
> +	.name		= "pref-path",
> +	.module		= THIS_MODULE,
> +	.table_args	= 1,
> +	.info_args	= 0,
> +	.create		= pf_create,
> +	.destroy	= pf_destroy,
> +	.status		= pf_status,
> +	.add_path	= pf_add_path,
> +	.fail_path	= pf_fail_path,
> +	.reinstate_path	= pf_reinstate_path,
> +	.select_path	= pf_select_path,
> +};
> +
> +static int __init dm_pf_init(void)
> +{
> +	int r = dm_register_path_selector(&pf_ps);
> +
> +	if (r < 0) {
> +		DMERR("register failed %d", r);
> +		return r;
> +	}
> +
> +	DMINFO("version " PP_VERSION " loaded");
> +	return r;
> +}
> +
> +static void __exit dm_pf_exit(void)
> +{
> +	dm_unregister_path_selector(&pf_ps);
> +}
> +
> +module_init(dm_pf_init);
> +module_exit(dm_pf_exit);
> +
> +MODULE_DESCRIPTION(DM_NAME "pref-path multipath path selector");
> +MODULE_AUTHOR("ravikanth.nalla@hpe.com");
> +MODULE_LICENSE("GPL");
> -- 
> 1.8.3.1
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Ravikanth Nalla Jan. 29, 2016, 2:10 p.m. UTC | #4
Hi Mike, Hannes, Ben

On 1/22/2016 10:29 PM, Mike Snitzer wrote:

 [Hannes please fix your mail client, seems you dropped all the original CCs]

On Fri, Jan 22 2016 at  8:42am -0500,
Hannes Reinecke <hare@suse.de> wrote:

> On 01/22/2016 02:31 PM, Ravikanth Nalla wrote:
> > 
> > Functionality provided in this module is verified on wide variety of 
> > servers ( with 2 CPU sockets, 4 CPU sockets and 8 CPU sockets).
> > Additionally in some specific multipathing configurations involving 
> > varied path speeds, proposed preferred path policy provided some 
> > performance improvements over existing round-robin and service-time 
> > load balance policies.
> > 
> Shouldn't service-time provide similar functionality?
> After all, for all scenarios described above the preferred path would 
> have a lower service time, so they should be selected automatically, 
> no?

Yes you are right that in case if the user is preferring a path because of its path speed, it will have a lower service time and so behavior of service time policy will be similar to the preferred path policy that we proposed. However another reason for proposing this policy is a scenario where user want to totally eliminate a particular path which is flaky and behaving in a unpredictable manner. In this case, service time policy may still schedule I/O on this path as randomly it may demonstrate better service time but the overall I/O performance over a period of time could be affected and so in this scenario selecting a known good preferred path will be beneficial was our thinking. In fact when we did comparative testing our policy with service time in a setup with varying path speeds ( 8Gig FC and 4Gig FC) we saw the following results which showed that our policy fared marginally better than the service-time policy.

service-time:
    io/sec    MB/sec    (msec)    Max Response Time 
   =======    ======    ======   =============
   1383.2    1450.3    23.132      44.7 

pref-path:
    io/sec    MB/sec    (msec)    Max Response Time 
   =======    ======    ======   =============
   1444.3    1514.5    22.152      37.4

> Yes, I'm thinking the same thing.  In fact the service-time also has the ability to specify 'relative_throughput'
> (see: Documentation/device-mapper/dm-service-time.txt).

Thanks for suggestion to look at the 'relative_throughput' feature associated with service-time. After your comment, I could see from the code how this feature works but unfortunately we are not able to find documentation on how to specify this in the multipath.conf file. May be we are not looking at the right place and if you have further pointers on how to use this feature, that will be helpful.

> I'm also missing why different path groups couldn't be used for fast vs slow paths...
> Basically: Ravikanth we need _proof_ that you've exhausted the capabilities of the existing path selector policies (in conjunction with path groups)

I assume you are referring to specifying following in multipath.conf file ( prio_args having the path to be preferred )

multipath{
	prio "weightedpath"
	prio_args "devname sdp 1"
	path_grouping_policy group_by_prio
}

Yes with this way it  that it is possible to lock the path to a specific path similar to what we have proposed. However having used this feature, it appears that it is not that intuitive to use it and also need multiple configuration parameters to be specified in the conf file. Hence the main intention with which we  provided this feature was to  provide a more intuitive way to specify a user preferred path.

>Not understanding why you verified "some performance improvements" but stopped short of showing them.

The performance improvements that we observed are shown above. Actually we never intended this to be a performance feature as our intention was to provide users with an easier option to specify a preferred path in scenarios like flaky SAN environments and so that's why we did not mention it earlier.

>On 1/22/2016 10:36 PM, Benjamin Marzinski wrote:

> This seems like a problem that has already been solved with path groups.
> If the path(s) in your preferred path group are there, multipath will use them.  If not, then it will use your less preferred path(s), and load balance across them > how ever you choose with the path_selectors.

> I admit that we don't have a path prioritizer that does a good job of allowing users to manually pick a specific path to prefer.  But it seems to me that there is > >where we should be solving the issue.

Yes as  mentioned , it appears that we will be able to achieve the same result using the above multipath{...} configuration. However as you mentioned I felt that it is not that user friendly in specify the path to prefer. So when you mentioned about solving the problem there, could you please clarify on what you had in mind and is there anything specific from our implementation that can be used there ?



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski Jan. 29, 2016, 5:50 p.m. UTC | #5
On Fri, Jan 29, 2016 at 02:10:52PM +0000, Nalla, Ravikanth wrote:
> Hi Mike, Hannes, Ben
> > This seems like a problem that has already been solved with path groups.
> > If the path(s) in your preferred path group are there, multipath will use them.  If not, then it will use your less preferred path(s), and load balance across them > how ever you choose with the path_selectors.
> 
> > I admit that we don't have a path prioritizer that does a good job of allowing users to manually pick a specific path to prefer.  But it seems to me that there is > >where we should be solving the issue.
> 
> Yes as  mentioned , it appears that we will be able to achieve the same result using the above multipath{...} configuration. However as you mentioned I felt that it is not that user friendly in specify the path to prefer. So when you mentioned about solving the problem there, could you please clarify on what you had in mind and is there anything specific from our implementation that can be used there ?
> 

There are two changes that I'm working on.

1. I'm adding an option for the alua prioritizer so that setting the
ALUA TPG Preferred Bit will cause the alau prioritizer to put that path
in a group by itself (with the highest priority). Currently if the
preferred bit is set for an active/optimized path, and there are other
active/optimized paths, they are all grouped together, and there is no
way to change that. So, for people with ALUA enabled hardware, they can
just enable the option, and set the Preferred Bit.

2. For people that need to be able to control the exact priority, I'm
redoing the weighted handler to allow better ways to specify the paths
in a presistent manner.  It won't be as simple as the alua method, but
it will be actually usable, unlike it's current state.

-Ben

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Hannes Reinecke Jan. 30, 2016, 8:32 a.m. UTC | #6
On 01/29/2016 06:50 PM, Benjamin Marzinski wrote:
> On Fri, Jan 29, 2016 at 02:10:52PM +0000, Nalla, Ravikanth wrote:
>> Hi Mike, Hannes, Ben
>>> This seems like a problem that has already been solved with path groups.
>>> If the path(s) in your preferred path group are there, multipath will
 >>> use them.  If not, then it will use your less preferred path(s), and
 >>> load balance across them > how ever you choose with the path_selectors.
>>>
>>> I admit that we don't have a path prioritizer that does a good job of
 >>> allowing users to manually pick a specific path to prefer.  But it 
seems
 >>> to me that there is > >where we should be solving the issue.
>>>
>> Yes as  mentioned , it appears that we will be able to achieve the same
 >> result using the above multipath{...} configuration. However as you
 >> mentioned I felt that it is not that user friendly in specify the path
 >> to prefer. So when you mentioned about solving the problem there, could
 >> you please clarify on what you had in mind and is there anything 
specific
 >> from our implementation that can be used there ?
>>
>
> There are two changes that I'm working on.
>
> 1. I'm adding an option for the alua prioritizer so that setting the
> ALUA TPG Preferred Bit will cause the alau prioritizer to put that path
> in a group by itself (with the highest priority). Currently if the
> preferred bit is set for an active/optimized path, and there are other
> active/optimized paths, they are all grouped together, and there is no
> way to change that. So, for people with ALUA enabled hardware, they can
> just enable the option, and set the Preferred Bit.
>
Hmm? I was under the distinct impression that it's exactly the other way 
round; at least in my code I have this:

                 switch(aas) {
                         case AAS_OPTIMIZED:
                                 rc = 50;
                                 break;
                         case AAS_NON_OPTIMIZED:
                                 rc = 10;
                                 break;
                         case AAS_LBA_DEPENDENT:
                                 rc = 5;
                                 break;
                         case AAS_STANDBY:
                                 rc = 1;
                                 break;
                         default:
                                 rc = 0;
                 }
                 if (priopath && aas != AAS_OPTIMIZED)
                         rc += 80;

ie any path with the 'prio' bit set will be getting a differen priority 
than those without. Consequently they'll be grouped into different 
priority groups.
I'd be surprised if your code is different, but what do I know ...

> 2. For people that need to be able to control the exact priority, I'm
> redoing the weighted handler to allow better ways to specify the paths
> in a presistent manner.  It won't be as simple as the alua method, but
> it will be actually usable, unlike it's current state.
>
That, however, is greatly appreciated :-)

Cheers,

Hannes
Benjamin Marzinski Jan. 30, 2016, 8:25 p.m. UTC | #7
On Sat, Jan 30, 2016 at 09:32:53AM +0100, Hannes Reinecke wrote:
> On 01/29/2016 06:50 PM, Benjamin Marzinski wrote:
> >On Fri, Jan 29, 2016 at 02:10:52PM +0000, Nalla, Ravikanth wrote:
> >>Hi Mike, Hannes, Ben
> >>>This seems like a problem that has already been solved with path groups.
> >>>If the path(s) in your preferred path group are there, multipath will
> >>> use them.  If not, then it will use your less preferred path(s), and
> >>> load balance across them > how ever you choose with the path_selectors.
> >>>
> >>>I admit that we don't have a path prioritizer that does a good job of
> >>> allowing users to manually pick a specific path to prefer.  But it seems
> >>> to me that there is > >where we should be solving the issue.
> >>>
> >>Yes as  mentioned , it appears that we will be able to achieve the same
> >> result using the above multipath{...} configuration. However as you
> >> mentioned I felt that it is not that user friendly in specify the path
> >> to prefer. So when you mentioned about solving the problem there, could
> >> you please clarify on what you had in mind and is there anything specific
> >> from our implementation that can be used there ?
> >>
> >
> >There are two changes that I'm working on.
> >
> >1. I'm adding an option for the alua prioritizer so that setting the
> >ALUA TPG Preferred Bit will cause the alau prioritizer to put that path
> >in a group by itself (with the highest priority). Currently if the
> >preferred bit is set for an active/optimized path, and there are other
> >active/optimized paths, they are all grouped together, and there is no
> >way to change that. So, for people with ALUA enabled hardware, they can
> >just enable the option, and set the Preferred Bit.
> >
> Hmm? I was under the distinct impression that it's exactly the other way
> round; at least in my code I have this:
> 
>                 switch(aas) {
>                         case AAS_OPTIMIZED:
>                                 rc = 50;
>                                 break;
>                         case AAS_NON_OPTIMIZED:
>                                 rc = 10;
>                                 break;
>                         case AAS_LBA_DEPENDENT:
>                                 rc = 5;
>                                 break;
>                         case AAS_STANDBY:
>                                 rc = 1;
>                                 break;
>                         default:
>                                 rc = 0;
>                 }
>                 if (priopath && aas != AAS_OPTIMIZED)
>                         rc += 80;
> 
> ie any path with the 'prio' bit set will be getting a differen priority than
> those without. Consequently they'll be grouped into different priority
> groups.
> I'd be surprised if your code is different, but what do I know ...

No. That's only true if the path doesn't have AAS_OPTIMIZED set.  So if
you have a non-optimized path with the pref bit set, it will be in a
group by itself. If the path is AAS_OPTIMIZED, the pref bit is ignored.

Like I mentioned before, you are the one who changed this

commit b330bf8a5e6a29b51af0d8b4088e0d8554e5cfb4
Author: Hannes Reinecke <hare@suse.de>
Date:   Tue Jul 16 09:12:54 2013 +0200

    alua: Do not add preferred path priority for active/optimized
    
    When a path is in active/optimized we should disregard the
    'preferred path' bit when calculating the priority.
    Otherwise we'll end up with having two different priorities
    (one for 'active/optimized (preferred)' and one for
    'active/optimized (non-preferred)').
    Which will result in two different path groups and a
    sub-optimal path usage.
    
    Signed-off-by: Hannes Reinecke <hare@suse.de>


Before this commit, it always used the pref bit. Again, like I said
before, I'm saying that this was the wrong thing to do.  The Spec is
pretty vague on what you should expect to happen when you set to pref
bit.  When the path was in a group by itself, I got complaints. Now that
the path is is a group with other active/optimized paths, I get
complaints.  The only answer is to allow the user to say what they want
the pref bit to mean.

-Ben 

 
> >2. For people that need to be able to control the exact priority, I'm
> >redoing the weighted handler to allow better ways to specify the paths
> >in a presistent manner.  It won't be as simple as the alua method, but
> >it will be actually usable, unlike it's current state.
> >
> That, however, is greatly appreciated :-)
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Benjamin Marzinski Jan. 30, 2016, 11:32 p.m. UTC | #8
On Sat, Jan 30, 2016 at 02:25:25PM -0600, Benjamin Marzinski wrote:
> Before this commit, it always used the pref bit. Again, like I said
> before, I'm saying that this was the wrong thing to do.  The Spec is

Oops. I meant: "I'm NOT saying that this was the wrong thing to do".

I am also fine with changing the default back to making the pref bit
always create it's own path group. As long there is a way for users to
get either behavior, I'm happy.

-Ben

> pretty vague on what you should expect to happen when you set to pref
> bit.  When the path was in a group by itself, I got complaints. Now that
> the path is is a group with other active/optimized paths, I get
> complaints.  The only answer is to allow the user to say what they want
> the pref bit to mean.
> 
> -Ben 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/Documentation/device-mapper/dm-pref-path.txt b/Documentation/device-mapper/dm-pref-path.txt
new file mode 100644
index 0000000..0efb156b
--- /dev/null
+++ b/Documentation/device-mapper/dm-pref-path.txt
@@ -0,0 +1,52 @@ 
+dm-pref-path
+============
+
+dm-pref-path is a path selector module for device-mapper targets, which
+selects a user specified path for the incoming I/O.
+
+The key operations of this policy to select and return user specified
+path from the current discovered online/ healthy paths. If the user
+specified path do not exist in the online/ healthy path list due to
+path being currently in failed state or user has mentioned wrong device
+information, it will fall back to round-robin policy, where all the
+online/ healthy paths are given equal preference.
+
+The path selector name is 'pref-path'.
+
+Table parameters for each path: [<repeat_count>]
+
+Status for each path: <status> <fail-count>
+	<status>: 'A' if the path is active, 'F' if the path is failed.
+	<fail-count>: The number of path failures.
+
+Algorithm
+=========
+User is provided with an option to specify preferred path in DM
+Multipath configuration file (/etc/multipath.conf) under multipath{}
+section with a syntax "path_selector "pref-path 1 <device major>:<device minor>"".
+
+	1. The pref-path selector would search and return the matching user
+        preferred path from the online/ healthy path list for incoming I/O.
+
+	2. If the user preferred path do not exist in the online/ healthy
+        path list due to path being currently in failed state or user
+        has mentioned wrong device information, it will fall back to
+        round-robin policy, where all the online/ healthy paths are given
+        equal preference.
+
+	3. If the user preferred path comes back online/ healthy, pref-path
+        selector would find and return this path for incoming I/O.
+
+Examples
+========
+Consider 4 paths sdq, sdam, sdbh and sdcc, if user prefers path sdbh
+with major:minor number 67:176 which has throughput of 8GB/s over other
+paths of 4GB/s, pref-path policy will chose this sdbh path for all the
+incoming I/O's.
+
+# dmsetup table Test_Lun_2
+0 20971520 multipath 0 0 1 1 pref-path 0 4 1 66:80 10000 67:160 10000
+68:240 10000 8:240 10000
+
+# dmsetup status Test_Lun_2
+0 20971520 multipath 2 0 0 0 1 1 A 0 4 0 66:80 A 0 67:160 A 0 68:240 A
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f34979c..5c9f4e9 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -20,8 +20,8 @@  md-mod-y	+= md.o bitmap.o
 raid456-y	+= raid5.o raid5-cache.o
 
 # Note: link order is important.  All raid personalities
-# and must come before md.o, as they each initialise 
-# themselves, and md.o may use the personalities when it 
+# and must come before md.o, as they each initialise
+# themselves, and md.o may use the personalities when it
 # auto-initialised.
 
 obj-$(CONFIG_MD_LINEAR)		+= linear.o
@@ -41,7 +41,7 @@  obj-$(CONFIG_DM_BIO_PRISON)	+= dm-bio-prison.o
 obj-$(CONFIG_DM_CRYPT)		+= dm-crypt.o
 obj-$(CONFIG_DM_DELAY)		+= dm-delay.o
 obj-$(CONFIG_DM_FLAKEY)		+= dm-flakey.o
-obj-$(CONFIG_DM_MULTIPATH)	+= dm-multipath.o dm-round-robin.o
+obj-$(CONFIG_DM_MULTIPATH)	+= dm-multipath.o dm-round-robin.o dm-pref-path.o
 obj-$(CONFIG_DM_MULTIPATH_QL)	+= dm-queue-length.o
 obj-$(CONFIG_DM_MULTIPATH_ST)	+= dm-service-time.o
 obj-$(CONFIG_DM_SWITCH)		+= dm-switch.o
diff --git a/drivers/md/dm-pref-path.c b/drivers/md/dm-pref-path.c
new file mode 100644
index 0000000..6bf1c76
--- /dev/null
+++ b/drivers/md/dm-pref-path.c
@@ -0,0 +1,249 @@ 
+/*
+ * (C) Copyright 2015 Hewlett Packard Enterprise Development LP.
+ *
+ * dm-pref-path.c
+ *
+ * Module Author: Ravikanth Nalla
+ *
+ * This program is free software; you can redistribute it
+ * and/or modify it under the terms of the GNU General Public
+ * License, version 2 as published by the Free Software Foundation;
+ * either version 2 of the License, or (at your option) any later
+ * version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * dm-pref-path path selector:
+ * Handles preferred path load balance policy operations. The key
+ * operations of this policy is to select and return user specified
+ * path from the current discovered online/ healthy paths(valid_paths).
+ * If the user specified path do not exist in the valid_paths list due
+ * to path being currently in failed state or user has mentioned wrong
+ * device information, it will fall back to round-robin policy, where
+ * all the valid-paths are given equal preference.
+ *
+ */
+
+#include "dm.h"
+#include "dm-path-selector.h"
+
+#include <linux/slab.h>
+#include <linux/ctype.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/atomic.h>
+
+#define DM_MSG_PREFIX	"multipath pref-path"
+#define PP_MIN_IO       10000
+#define PP_VERSION	"1.0.0"
+#define BUFF_LEN         16
+
+/* Flag for pref_path enablement */
+unsigned pref_path_enabled;
+
+/* pref_path major:minor number */
+char pref_path[BUFF_LEN];
+
+struct selector {
+	struct list_head	valid_paths;
+	struct list_head	failed_paths;
+};
+
+struct path_info {
+	struct list_head	list;
+	struct dm_path		*path;
+	unsigned		repeat_count;
+};
+
+static struct selector *alloc_selector(void)
+{
+	struct selector *s = kmalloc(sizeof(*s), GFP_KERNEL);
+
+	if (s) {
+		INIT_LIST_HEAD(&s->valid_paths);
+		INIT_LIST_HEAD(&s->failed_paths);
+	}
+
+	return s;
+}
+
+static int pf_create(struct path_selector *ps, unsigned argc, char
+**argv) {
+	struct selector *s = alloc_selector();
+
+	if (!s)
+		return -ENOMEM;
+
+	if ((argc == 1) && strlen(argv[0]) < BUFF_LEN) {
+		pref_path_enabled = 1;
+		snprintf(pref_path, (BUFF_LEN-1), "%s", argv[0]);
+	}
+
+	ps->context = s;
+	return 0;
+}
+
+static void pf_free_paths(struct list_head *paths)
+{
+	struct path_info *pi, *next;
+
+	list_for_each_entry_safe(pi, next, paths, list) {
+		list_del(&pi->list);
+		kfree(pi);
+	}
+}
+
+static void pf_destroy(struct path_selector *ps)
+{
+	struct selector *s = ps->context;
+
+	pf_free_paths(&s->valid_paths);
+	pf_free_paths(&s->failed_paths);
+	kfree(s);
+	ps->context = NULL;
+}
+
+static int pf_status(struct path_selector *ps, struct dm_path *path,
+		     status_type_t type, char *result, unsigned maxlen) {
+	unsigned sz = 0;
+	struct path_info *pi;
+
+	/* When called with NULL path, return selector status/args. */
+	if (!path)
+		DMEMIT("0 ");
+	else {
+		pi = path->pscontext;
+
+		if (type == STATUSTYPE_TABLE)
+			DMEMIT("%u ", pi->repeat_count);
+	}
+
+	return sz;
+}
+
+static int pf_add_path(struct path_selector *ps, struct dm_path *path,
+		       int argc, char **argv, char **error) {
+	struct selector *s = ps->context;
+	struct path_info *pi;
+
+	/*
+	 * Arguments: [<pref-path>]
+	 */
+	if (argc > 1) {
+		*error = "pref-path ps: incorrect number of arguments";
+		return -EINVAL;
+	}
+
+	/* Allocate the path information structure */
+	pi = kmalloc(sizeof(*pi), GFP_KERNEL);
+	if (!pi) {
+		*error = "pref-path ps: Error allocating path information";
+		return -ENOMEM;
+	}
+
+	pi->path = path;
+	pi->repeat_count = PP_MIN_IO;
+
+	path->pscontext = pi;
+
+	list_add_tail(&pi->list, &s->valid_paths);
+
+	return 0;
+}
+
+static void pf_fail_path(struct path_selector *ps, struct dm_path
+*path) {
+	struct selector *s = ps->context;
+	struct path_info *pi = path->pscontext;
+
+	list_move(&pi->list, &s->failed_paths); }
+
+static int pf_reinstate_path(struct path_selector *ps, struct dm_path
+*path) {
+	struct selector *s = ps->context;
+	struct path_info *pi = path->pscontext;
+
+	list_move_tail(&pi->list, &s->valid_paths);
+
+	return 0;
+}
+
+/*
+ * Return user preferred path for an I/O.
+ */
+static struct dm_path *pf_select_path(struct path_selector *ps,
+				      unsigned *repeat_count, size_t nr_bytes) {
+	struct selector *s = ps->context;
+	struct path_info *pi = NULL, *best = NULL;
+
+	if (list_empty(&s->valid_paths))
+		return NULL;
+
+	if (pref_path_enabled) {
+		/* search for preferred path in the
+		*  valid list and then return.
+		*/
+		list_for_each_entry(pi, &s->valid_paths, list) {
+			if (!strcmp(pi->path->dev->name, pref_path)) {
+				best = pi;
+				*repeat_count = best->repeat_count;
+				break;
+			}
+		}
+	}
+
+	/* If preferred path is not enabled/ not available/
+	*  offline chose the next path in the list.
+	*/
+	if (best == NULL && !list_empty(&s->valid_paths)) {
+		pi = list_entry(s->valid_paths.next,
+			struct path_info, list);
+		list_move_tail(&pi->list, &s->valid_paths);
+		best = pi;
+		*repeat_count = best->repeat_count;
+	}
+
+	return best ? best->path : NULL;
+}
+
+static struct path_selector_type pf_ps = {
+	.name		= "pref-path",
+	.module		= THIS_MODULE,
+	.table_args	= 1,
+	.info_args	= 0,
+	.create		= pf_create,
+	.destroy	= pf_destroy,
+	.status		= pf_status,
+	.add_path	= pf_add_path,
+	.fail_path	= pf_fail_path,
+	.reinstate_path	= pf_reinstate_path,
+	.select_path	= pf_select_path,
+};
+
+static int __init dm_pf_init(void)
+{
+	int r = dm_register_path_selector(&pf_ps);
+
+	if (r < 0) {
+		DMERR("register failed %d", r);
+		return r;
+	}
+
+	DMINFO("version " PP_VERSION " loaded");
+	return r;
+}
+
+static void __exit dm_pf_exit(void)
+{
+	dm_unregister_path_selector(&pf_ps);
+}
+
+module_init(dm_pf_init);
+module_exit(dm_pf_exit);
+
+MODULE_DESCRIPTION(DM_NAME "pref-path multipath path selector");
+MODULE_AUTHOR("ravikanth.nalla@hpe.com");
+MODULE_LICENSE("GPL");