Message ID | 20180713080602.31602-4-ming.lei@redhat.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: > Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq > core for enabling block runtime PM. I still think enabling this unconditionally for any SCSI device was a mistake, and it is even more so for blk-mq. Please only opt in for ufs, ATA first, adding others if wanted by maintainers.
On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: > On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: > > Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq > > core for enabling block runtime PM. > > I still think enabling this unconditionally for any SCSI device was > a mistake, and it is even more so for blk-mq. > > Please only opt in for ufs, ATA first, adding others if wanted by > maintainers. No, this way will cause regression because runtime PM works for all sd/sr device actually, and it isn't related with scsi host. Thanks, Ming
On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote: > On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: > > On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: > > > Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq > > > core for enabling block runtime PM. > > > > I still think enabling this unconditionally for any SCSI device was > > a mistake, and it is even more so for blk-mq. > > > > Please only opt in for ufs, ATA first, adding others if wanted by > > maintainers. > > No, this way will cause regression because runtime PM works for > all sd/sr device actually, and it isn't related with scsi host. For which SCSI devices other than ufs and ATA do we need PM support? Bart.
On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote: > On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote: > > On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: > > > On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: > > > > Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq > > > > core for enabling block runtime PM. > > > > > > I still think enabling this unconditionally for any SCSI device was > > > a mistake, and it is even more so for blk-mq. > > > > > > Please only opt in for ufs, ATA first, adding others if wanted by > > > maintainers. > > > > No, this way will cause regression because runtime PM works for > > all sd/sr device actually, and it isn't related with scsi host. > > For which SCSI devices other than ufs and ATA do we need PM support? As I said, it is any sd/sr device, which can be put down by runtime PM via START_STOP command if it isn't used for a while. Thanks, Ming
On Tue, Jul 17, 2018 at 11:38:53PM +0800, Ming Lei wrote: > As I said, it is any sd/sr device, which can be put down by runtime PM > via START_STOP command if it isn't used for a while. And in which case do we care that this happens?
On Tue, 17 Jul 2018, hch@lst.de wrote: > On Tue, Jul 17, 2018 at 11:38:53PM +0800, Ming Lei wrote: > > As I said, it is any sd/sr device, which can be put down by runtime PM > > via START_STOP command if it isn't used for a while. > > And in which case do we care that this happens? Consider a low-power system (e.g., embedded applications, tablet computer, etc.) using a USB flash drive. Consider a high-power system with lots of disk drives that consume a lot of power while they are running but not while they are stopped. It's safe to assume that if the kernel can provide a capability like this for users, some people will take advantage of it. Alan Stern
On 7/17/18 9:38 AM, Ming Lei wrote: > On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote: >> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote: >>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: >>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: >>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq >>>>> core for enabling block runtime PM. >>>> >>>> I still think enabling this unconditionally for any SCSI device was >>>> a mistake, and it is even more so for blk-mq. >>>> >>>> Please only opt in for ufs, ATA first, adding others if wanted by >>>> maintainers. >>> >>> No, this way will cause regression because runtime PM works for >>> all sd/sr device actually, and it isn't related with scsi host. >> >> For which SCSI devices other than ufs and ATA do we need PM support? > > As I said, it is any sd/sr device, which can be put down by runtime PM > via START_STOP command if it isn't used for a while. Christoph is basically echoing my concerns. Why don't we just enable it on slower devices, similarly to what we do for adding randomness? Nobody wants to pay this overhead for faster devices, since most people won't care. That said, we should still make it as efficient as we can, as per previous discussions.
On Wed, Jul 18, 2018 at 5:49 AM, Jens Axboe <axboe@kernel.dk> wrote: > On 7/17/18 9:38 AM, Ming Lei wrote: >> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote: >>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote: >>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: >>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: >>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq >>>>>> core for enabling block runtime PM. >>>>> >>>>> I still think enabling this unconditionally for any SCSI device was >>>>> a mistake, and it is even more so for blk-mq. >>>>> >>>>> Please only opt in for ufs, ATA first, adding others if wanted by >>>>> maintainers. >>>> >>>> No, this way will cause regression because runtime PM works for >>>> all sd/sr device actually, and it isn't related with scsi host. >>> >>> For which SCSI devices other than ufs and ATA do we need PM support? >> >> As I said, it is any sd/sr device, which can be put down by runtime PM >> via START_STOP command if it isn't used for a while. > > Christoph is basically echoing my concerns. Why don't we just enable > it on slower devices, similarly to what we do for adding > randomness? Nobody wants to pay this overhead for faster devices, > since most people won't care. IMO the problem isn't related with slow or quick device, it is related with the system, especially when it cares about power consumption, such as mobile phone, or laptop or servers with lots of disks attached. And we know it is often to see some fast disks shipped in laptop, such as NVMe, or other SSD. Thanks, Ming Lei
On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote: > IMO the problem isn't related with slow or quick device, it is related with > the system, especially when it cares about power consumption, such as > mobile phone, or laptop or servers with lots of disks attached. And we know > it is often to see some fast disks shipped in laptop, such as NVMe, or other > SSD. Yes but you're only taking locally attached devices into account. This is very likely harmful on sd devices attached to a SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast and people investing this money are likely to not like additional performance penalties. Furthermore I'd don't know if any array vendors actually implement START/STOP DISK, etc..
On Wed, Jul 18, 2018 at 8:28 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote: > On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote: >> IMO the problem isn't related with slow or quick device, it is related with >> the system, especially when it cares about power consumption, such as >> mobile phone, or laptop or servers with lots of disks attached. And we know >> it is often to see some fast disks shipped in laptop, such as NVMe, or other >> SSD. > > Yes but you're only taking locally attached devices into account. > This is very likely harmful on sd devices attached to a > SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast > and people investing this money are likely to not like additional > performance penalties. Runtime PM provides options for all these kind of cases. If you care performance, just keep the default setting(device is always in active). If you care power consumption, you may run PM utility to switch runtime PM on for the interested devices. Definitely we are working on solutions in which runtime PM code shouldn't introduce extra obvious cost in fast path, so no need to worry about its effect if you never want to use that. > > Furthermore I'd don't know if any array vendors actually implement > START/STOP DISK, etc.. That won't be one issue as I mentioned above. Thanks, Ming Lei
On 07/18/2018 02:06 PM, Ming Lei wrote: > On Wed, Jul 18, 2018 at 5:49 AM, Jens Axboe <axboe@kernel.dk> wrote: >> On 7/17/18 9:38 AM, Ming Lei wrote: >>> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote: >>>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote: >>>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: >>>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: >>>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq >>>>>>> core for enabling block runtime PM. >>>>>> >>>>>> I still think enabling this unconditionally for any SCSI device was >>>>>> a mistake, and it is even more so for blk-mq. >>>>>> >>>>>> Please only opt in for ufs, ATA first, adding others if wanted by >>>>>> maintainers. >>>>> >>>>> No, this way will cause regression because runtime PM works for >>>>> all sd/sr device actually, and it isn't related with scsi host. >>>> >>>> For which SCSI devices other than ufs and ATA do we need PM support? >>> >>> As I said, it is any sd/sr device, which can be put down by runtime PM >>> via START_STOP command if it isn't used for a while. >> >> Christoph is basically echoing my concerns. Why don't we just enable >> it on slower devices, similarly to what we do for adding >> randomness? Nobody wants to pay this overhead for faster devices, >> since most people won't care. > > IMO the problem isn't related with slow or quick device, it is related with > the system, especially when it cares about power consumption, such as > mobile phone, or laptop or servers with lots of disks attached. And we know > it is often to see some fast disks shipped in laptop, such as NVMe, or other > SSD. > But those typically have dedicated (transport/driver) mechanism to put the device to sleep, _and_ START STOP UNIT will be emulated for those devices anyway. I'd rather advocate to move runtime PM into the individual subsystems/drivers, and _not_ abuse START STOP UNIT here. Cheers, Hannes
On Wed, Jul 18, 2018 at 8:43 PM, Hannes Reinecke <hare@suse.de> wrote: > On 07/18/2018 02:06 PM, Ming Lei wrote: >> On Wed, Jul 18, 2018 at 5:49 AM, Jens Axboe <axboe@kernel.dk> wrote: >>> On 7/17/18 9:38 AM, Ming Lei wrote: >>>> On Tue, Jul 17, 2018 at 03:34:35PM +0000, Bart Van Assche wrote: >>>>> On Tue, 2018-07-17 at 23:30 +0800, Ming Lei wrote: >>>>>> On Tue, Jul 17, 2018 at 03:24:11PM +0200, Christoph Hellwig wrote: >>>>>>> On Fri, Jul 13, 2018 at 04:06:02PM +0800, Ming Lei wrote: >>>>>>>> Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq >>>>>>>> core for enabling block runtime PM. >>>>>>> >>>>>>> I still think enabling this unconditionally for any SCSI device was >>>>>>> a mistake, and it is even more so for blk-mq. >>>>>>> >>>>>>> Please only opt in for ufs, ATA first, adding others if wanted by >>>>>>> maintainers. >>>>>> >>>>>> No, this way will cause regression because runtime PM works for >>>>>> all sd/sr device actually, and it isn't related with scsi host. >>>>> >>>>> For which SCSI devices other than ufs and ATA do we need PM support? >>>> >>>> As I said, it is any sd/sr device, which can be put down by runtime PM >>>> via START_STOP command if it isn't used for a while. >>> >>> Christoph is basically echoing my concerns. Why don't we just enable >>> it on slower devices, similarly to what we do for adding >>> randomness? Nobody wants to pay this overhead for faster devices, >>> since most people won't care. >> >> IMO the problem isn't related with slow or quick device, it is related with >> the system, especially when it cares about power consumption, such as >> mobile phone, or laptop or servers with lots of disks attached. And we know >> it is often to see some fast disks shipped in laptop, such as NVMe, or other >> SSD. >> > But those typically have dedicated (transport/driver) mechanism to put > the device to sleep, _and_ START STOP UNIT will be emulated for those > devices anyway. > I'd rather advocate to move runtime PM into the individual > subsystems/drivers, and _not_ abuse START STOP UNIT here. Firstly START_STOP isn't a abuse for PM purpose, please see the description in t10.org: "The START STOP UNIT command (see 5.17) allows an application client to control the power condition of a logical unit. This method includes specifying that the logical unit transition to a power condition." Secondly the runtime PM on LUN level isn't contradictory with subsystem/driver's runtime PM. The two can co-exits, for examples, USB storage host, ufshcd, ... Thanks, Ming Lei
On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote: > > IMO the problem isn't related with slow or quick device, it is related with > > the system, especially when it cares about power consumption, such as > > mobile phone, or laptop or servers with lots of disks attached. And we know > > it is often to see some fast disks shipped in laptop, such as NVMe, or other > > SSD. > > Yes but you're only taking locally attached devices into account. > This is very likely harmful on sd devices attached to a > SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast > and people investing this money are likely to not like additional > performance penalties. As one extra reminder... People who care about extreme performance can perfectly well disable CONFIG_PM in their kernels. That should remove any overhead due to runtime PM. Alan Stern
On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote: > As one extra reminder... People who care about extreme performance can > perfectly well disable CONFIG_PM in their kernels. That should remove > any overhead due to runtime PM. Yes but how likely is this for people who are running their data center with a distribution kernel.
On 7/18/18 8:12 AM, Alan Stern wrote: > On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > >> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote: >>> IMO the problem isn't related with slow or quick device, it is related with >>> the system, especially when it cares about power consumption, such as >>> mobile phone, or laptop or servers with lots of disks attached. And we know >>> it is often to see some fast disks shipped in laptop, such as NVMe, or other >>> SSD. >> >> Yes but you're only taking locally attached devices into account. >> This is very likely harmful on sd devices attached to a >> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast >> and people investing this money are likely to not like additional >> performance penalties. > > As one extra reminder... People who care about extreme performance can > perfectly well disable CONFIG_PM in their kernels. That should remove > any overhead due to runtime PM. This is a kernel fallacy that needs to be shot down. Most folks just run what the distro provides, so no, they can't just turn off various kernel options. This is NEVER an argument that carries any weight at all for me. If we have the implementation, it needs to be fast enough to be on by default. No "well just turn off the option if you think the overhead is too high". Never. That's exactly why I didn't like the first implementation that Ming did.
On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote: > > As one extra reminder... People who care about extreme performance can > > perfectly well disable CONFIG_PM in their kernels. That should remove > > any overhead due to runtime PM. > > Yes but how likely is this for people who are running their data > center with a distribution kernel. Strictly speaking, that's a nonsense question. People who run distribution kernels are 100% unlikely to change the kernel configuration, by definition. That's not the point. If people really care about getting the utmost performance, to the extent that they resent the minimal overhead required for runtime PM, then they should not be running distribution kernels in the first place. Alan Stern
On Wed, 18 Jul 2018, Jens Axboe wrote: > On 7/18/18 8:12 AM, Alan Stern wrote: > > On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > > > >> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote: > >>> IMO the problem isn't related with slow or quick device, it is related with > >>> the system, especially when it cares about power consumption, such as > >>> mobile phone, or laptop or servers with lots of disks attached. And we know > >>> it is often to see some fast disks shipped in laptop, such as NVMe, or other > >>> SSD. > >> > >> Yes but you're only taking locally attached devices into account. > >> This is very likely harmful on sd devices attached to a > >> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast > >> and people investing this money are likely to not like additional > >> performance penalties. > > > > As one extra reminder... People who care about extreme performance can > > perfectly well disable CONFIG_PM in their kernels. That should remove > > any overhead due to runtime PM. > > This is a kernel fallacy that needs to be shot down. Most folks just run > what the distro provides, so no, they can't just turn off various kernel > options. This is NEVER an argument that carries any weight at all for > me. If we have the implementation, it needs to be fast enough to be on > by default. No "well just turn off the option if you think the overhead > is too high". Never. I certainly agree that it needs to be fast enough to be on by default. That was never an issue, and it is the goal we are working toward. But that isn't what Johannes wrote. To rule out all improvements that carry even the smallest runtime overhead is an extreme attitude. It does have its place, though -- and people who are in that place can afford to build their own kernels with their own configurations. > That's exactly why I didn't like the first implementation that Ming did. Fine. I'm happy to leave the technical details up to the two of you, since I'm not really up to speed on this part of the kernel and Ming does have a keen understanding of the runtime PM implementation. And hopefully -- perhaps with a little outside brainstorming inspiration -- you'll be able to come up with something that is acceptable to everyone. Alan Stern
On Wed, Jul 18, 2018 at 08:50:39AM -0600, Jens Axboe wrote: > On 7/18/18 8:12 AM, Alan Stern wrote: > > On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > > > >> On Wed, Jul 18, 2018 at 08:06:10PM +0800, Ming Lei wrote: > >>> IMO the problem isn't related with slow or quick device, it is related with > >>> the system, especially when it cares about power consumption, such as > >>> mobile phone, or laptop or servers with lots of disks attached. And we know > >>> it is often to see some fast disks shipped in laptop, such as NVMe, or other > >>> SSD. > >> > >> Yes but you're only taking locally attached devices into account. > >> This is very likely harmful on sd devices attached to a > >> SRP/FC/iSCSI/SAS Target in a SAN environment. These can be really fast > >> and people investing this money are likely to not like additional > >> performance penalties. > > > > As one extra reminder... People who care about extreme performance can > > perfectly well disable CONFIG_PM in their kernels. That should remove > > any overhead due to runtime PM. > > This is a kernel fallacy that needs to be shot down. Most folks just run > what the distro provides, so no, they can't just turn off various kernel > options. This is NEVER an argument that carries any weight at all for > me. If we have the implementation, it needs to be fast enough to be on > by default. No "well just turn off the option if you think the overhead > is too high". Never. > > That's exactly why I didn't like the first implementation that Ming did. Hi Jens, As far as I can think of, pm_runtime_mark_last_busy() may has to do when completing request for helping to figure out if queue is idle. Is this kind of cost you can accept? static inline void pm_runtime_mark_last_busy(struct device *dev) { WRITE_ONCE(dev->power.last_busy, jiffies); } BTW, last time, you mentioned that we may reuse the way used in timeout for deciding idle, but that period is too big. Thanks, Ming
On Wed, Jul 18, 2018 at 11:01:52AM -0400, Alan Stern wrote: > On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > > > On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote: > > > As one extra reminder... People who care about extreme performance can > > > perfectly well disable CONFIG_PM in their kernels. That should remove > > > any overhead due to runtime PM. > > > > Yes but how likely is this for people who are running their data > > center with a distribution kernel. > > Strictly speaking, that's a nonsense question. People who run > distribution kernels are 100% unlikely to change the kernel > configuration, by definition. > > That's not the point. If people really care about getting the utmost > performance, to the extent that they resent the minimal overhead > required for runtime PM, then they should not be running distribution > kernels in the first place. Can we agree to disagree here? Some of the people I know in HPC, HFT, etc... run stock SLE/RHEL and they do file bugs for every single 3-5% performance regression. This is not something I made up, this is the kind of things I have to deal with on my day job. So long, Johannes
On Thu, 19 Jul 2018, Johannes Thumshirn wrote: > On Wed, Jul 18, 2018 at 11:01:52AM -0400, Alan Stern wrote: > > On Wed, 18 Jul 2018, Johannes Thumshirn wrote: > > > > > On Wed, Jul 18, 2018 at 10:12:37AM -0400, Alan Stern wrote: > > > > As one extra reminder... People who care about extreme performance can > > > > perfectly well disable CONFIG_PM in their kernels. That should remove > > > > any overhead due to runtime PM. > > > > > > Yes but how likely is this for people who are running their data > > > center with a distribution kernel. > > > > Strictly speaking, that's a nonsense question. People who run > > distribution kernels are 100% unlikely to change the kernel > > configuration, by definition. > > > > That's not the point. If people really care about getting the utmost > > performance, to the extent that they resent the minimal overhead > > required for runtime PM, then they should not be running distribution > > kernels in the first place. > > Can we agree to disagree here? > > Some of the people I know in HPC, HFT, etc... run stock SLE/RHEL and > they do file bugs for every single 3-5% performance regression. This > is not something I made up, this is the kind of things I have to deal > with on my day job. The performance affect of runtime PM ought to be less than 1%. If it is larger, we'll figure out a different approach. Alan Stern
On Thu, Jul 19, 2018 at 10:35:39AM -0400, Alan Stern wrote: > On Thu, 19 Jul 2018, Johannes Thumshirn wrote: > > The performance affect of runtime PM ought to be less than 1%. If it > is larger, we'll figure out a different approach. OK, let's see then.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 41e9ac9fc138..fa4667aa4732 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2306,7 +2306,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) shost->tag_set.queue_depth = shost->can_queue; shost->tag_set.cmd_size = cmd_size; shost->tag_set.numa_node = NUMA_NO_NODE; - shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | + BLK_MQ_F_SG_MERGE | BLK_MQ_F_SUPPORT_RPM; shost->tag_set.flags |= BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy); shost->tag_set.driver_data = shost;
Usually SCSI supports runtime PM, so pass BLK_MQ_F_SUPPORT_RPM to blk-mq core for enabling block runtime PM. Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: linux-pm@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/scsi/scsi_lib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)