Message ID | 871staffus.fsf@notabene.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
But if you actually care about performance in any way I'd suggest
to use the loop device in direct I/O mode..
On Mon 03-04-17 11:18:51, NeilBrown wrote: > > When a filesystem is mounted from a loop device, writes are > throttled by balance_dirty_pages() twice: once when writing > to the filesystem and once when the loop_handle_cmd() writes > to the backing file. This double-throttling can trigger > positive feedback loops that create significant delays. The > throttling at the lower level is seen by the upper level as > a slow device, so it throttles extra hard. > > The PF_LESS_THROTTLE flag was created to handle exactly this > circumstance, though with an NFS filesystem mounted from a > local NFS server. It reduces the throttling on the lower > layer so that it can proceed largely unthrottled. > > To demonstrate this, create a filesystem on a loop device > and write (e.g. with dd) several large files which combine > to consume significantly more than the limit set by > /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total > time taken. > > When I do this directly on a device (no loop device) the > total time for several runs (mkfs, mount, write 200 files, > umount) is fairly stable: 28-35 seconds. > When I do this over a loop device the times are much worse > and less stable. 52-460 seconds. Half below 100seconds, > half above. > When I apply this patch, the times become stable again, > though not as fast as the no-loop-back case: 53-72 seconds. > > There may be room for further improvement as the total overhead still > seems too high, but this is a big improvement. Yes this makes sense to me > Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> one nit below > --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..a7e1dd215fc2 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) > { > struct loop_cmd *cmd = > container_of(work, struct loop_cmd, work); > + int oldflags = current->flags & PF_LESS_THROTTLE; > > + current->flags |= PF_LESS_THROTTLE; > loop_handle_cmd(cmd); > + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; we have a helper for this tsk_restore_flags(). It is not used consistently and maybe we want a dedicated api like we have for the scope NOIO/NOFS but that is a separate thing. I would find tsk_restore_flags easier to read.
On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <neilb@suse.com> wrote: > > When a filesystem is mounted from a loop device, writes are > throttled by balance_dirty_pages() twice: once when writing > to the filesystem and once when the loop_handle_cmd() writes > to the backing file. This double-throttling can trigger > positive feedback loops that create significant delays. The > throttling at the lower level is seen by the upper level as > a slow device, so it throttles extra hard. > > The PF_LESS_THROTTLE flag was created to handle exactly this > circumstance, though with an NFS filesystem mounted from a > local NFS server. It reduces the throttling on the lower > layer so that it can proceed largely unthrottled. > > To demonstrate this, create a filesystem on a loop device > and write (e.g. with dd) several large files which combine > to consume significantly more than the limit set by > /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total > time taken. > > When I do this directly on a device (no loop device) the > total time for several runs (mkfs, mount, write 200 files, > umount) is fairly stable: 28-35 seconds. > When I do this over a loop device the times are much worse > and less stable. 52-460 seconds. Half below 100seconds, > half above. > When I apply this patch, the times become stable again, > though not as fast as the no-loop-back case: 53-72 seconds. > > There may be room for further improvement as the total overhead still > seems too high, but this is a big improvement. > > Signed-off-by: NeilBrown <neilb@suse.com> > --- > drivers/block/loop.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 0ecb6461ed81..a7e1dd215fc2 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) > { > struct loop_cmd *cmd = > container_of(work, struct loop_cmd, work); > + int oldflags = current->flags & PF_LESS_THROTTLE; > > + current->flags |= PF_LESS_THROTTLE; > loop_handle_cmd(cmd); > + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; > } You can do it against 'lo->worker_task' instead of doing it in each loop_queue_work(), and this flag needn't to be restored because the kernel thread is loop specialized. thanks, Ming Lei
On Tue, Apr 04 2017, Christoph Hellwig wrote: > Looks fine, > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > But if you actually care about performance in any way I'd suggest > to use the loop device in direct I/O mode.. The losetup on my test VM is too old to support that :-( I guess it might be time to upgraded. It seems that there is not "mount -o direct_loop" or similar, so you have to do the losetup and the mount separately. Any thoughts on whether that should be changed ? Thanks, NeilBrown
On Tue, Apr 04 2017, Ming Lei wrote: > On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <neilb@suse.com> wrote: >> >> When a filesystem is mounted from a loop device, writes are >> throttled by balance_dirty_pages() twice: once when writing >> to the filesystem and once when the loop_handle_cmd() writes >> to the backing file. This double-throttling can trigger >> positive feedback loops that create significant delays. The >> throttling at the lower level is seen by the upper level as >> a slow device, so it throttles extra hard. >> >> The PF_LESS_THROTTLE flag was created to handle exactly this >> circumstance, though with an NFS filesystem mounted from a >> local NFS server. It reduces the throttling on the lower >> layer so that it can proceed largely unthrottled. >> >> To demonstrate this, create a filesystem on a loop device >> and write (e.g. with dd) several large files which combine >> to consume significantly more than the limit set by >> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total >> time taken. >> >> When I do this directly on a device (no loop device) the >> total time for several runs (mkfs, mount, write 200 files, >> umount) is fairly stable: 28-35 seconds. >> When I do this over a loop device the times are much worse >> and less stable. 52-460 seconds. Half below 100seconds, >> half above. >> When I apply this patch, the times become stable again, >> though not as fast as the no-loop-back case: 53-72 seconds. >> >> There may be room for further improvement as the total overhead still >> seems too high, but this is a big improvement. >> >> Signed-off-by: NeilBrown <neilb@suse.com> >> --- >> drivers/block/loop.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/block/loop.c b/drivers/block/loop.c >> index 0ecb6461ed81..a7e1dd215fc2 100644 >> --- a/drivers/block/loop.c >> +++ b/drivers/block/loop.c >> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) >> { >> struct loop_cmd *cmd = >> container_of(work, struct loop_cmd, work); >> + int oldflags = current->flags & PF_LESS_THROTTLE; >> >> + current->flags |= PF_LESS_THROTTLE; >> loop_handle_cmd(cmd); >> + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; >> } > > You can do it against 'lo->worker_task' instead of doing it in each > loop_queue_work(), > and this flag needn't to be restored because the kernel thread is loop > specialized. > good point. I'll do that. Thanks, NeilBrown
On Wed, Apr 5, 2017 at 12:27 PM, NeilBrown <neilb@suse.com> wrote: > On Tue, Apr 04 2017, Christoph Hellwig wrote: > >> Looks fine, >> >> Reviewed-by: Christoph Hellwig <hch@lst.de> >> >> But if you actually care about performance in any way I'd suggest >> to use the loop device in direct I/O mode.. > > The losetup on my test VM is too old to support that :-( > I guess it might be time to upgraded. > > It seems that there is not "mount -o direct_loop" or similar, so you > have to do the losetup and the mount separately. Any thoughts on I guess the 'direct_loop' can be added into 'mount' directly? but not familiar with mount utility. > whether that should be changed ? There was sysfs interface for controling direct IO in the initial submission, but looks it was reviewed out, :-) Thanks, Ming Lei
diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 0ecb6461ed81..a7e1dd215fc2 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work) { struct loop_cmd *cmd = container_of(work, struct loop_cmd, work); + int oldflags = current->flags & PF_LESS_THROTTLE; + current->flags |= PF_LESS_THROTTLE; loop_handle_cmd(cmd); + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags; } static int loop_init_request(void *data, struct request *rq,
When a filesystem is mounted from a loop device, writes are throttled by balance_dirty_pages() twice: once when writing to the filesystem and once when the loop_handle_cmd() writes to the backing file. This double-throttling can trigger positive feedback loops that create significant delays. The throttling at the lower level is seen by the upper level as a slow device, so it throttles extra hard. The PF_LESS_THROTTLE flag was created to handle exactly this circumstance, though with an NFS filesystem mounted from a local NFS server. It reduces the throttling on the lower layer so that it can proceed largely unthrottled. To demonstrate this, create a filesystem on a loop device and write (e.g. with dd) several large files which combine to consume significantly more than the limit set by /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total time taken. When I do this directly on a device (no loop device) the total time for several runs (mkfs, mount, write 200 files, umount) is fairly stable: 28-35 seconds. When I do this over a loop device the times are much worse and less stable. 52-460 seconds. Half below 100seconds, half above. When I apply this patch, the times become stable again, though not as fast as the no-loop-back case: 53-72 seconds. There may be room for further improvement as the total overhead still seems too high, but this is a big improvement. Signed-off-by: NeilBrown <neilb@suse.com> --- drivers/block/loop.c | 3 +++ 1 file changed, 3 insertions(+)