Message ID | 20230309213742.572091-1-nhuck@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fsverity: Remove WQ_UNBOUND from fsverity read workqueue | expand |
On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote: > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This > is problematic for latency sensitive workloads like I/O post-processing. > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related > scheduler latency and improves app cold startup times by ~30ms. Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue too, for the same reason? I'm still amazed that it's such a big improvement! I don't really need it to apply this patch, but it would be very interesting to know exactly why the latency is so bad with WQ_UNBOUND. > > This code was tested by running Android app startup benchmarks and > measuring how long the fsverity workqueue spent in the ready queue. > > Before > Total workqueue scheduler latency: 553800us > After > Total workqueue scheduler latency: 18962us > > Change-Id: I693efee541757851ed6d229430111cd763d39067 No Change-Id in upstream patches, please. > diff --git a/fs/verity/verify.c b/fs/verity/verify.c > index f50e3b5b52c9..e8ec37774d63 100644 > --- a/fs/verity/verify.c > +++ b/fs/verity/verify.c > @@ -395,7 +395,7 @@ int __init fsverity_init_workqueue(void) > * which blocks reads from completing, over regular application tasks. > */ > fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", > - WQ_UNBOUND | WQ_HIGHPRI, > + WQ_HIGHPRI, > num_online_cpus()); There's a comment just above here that explains why WQ_UNBOUND is being used. It needs to be updated to explain why WQ_UNBOUND is *not* being used. - Eric
On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@kernel.org> > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote: > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This > > is problematic for latency sensitive workloads like I/O post-processing. > > > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related > > scheduler latency and improves app cold startup times by ~30ms. > > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue > too, for the same reason? > > I'm still amazed that it's such a big improvement! I don't really need it to > apply this patch, but it would be very interesting to know exactly why the > latency is so bad with WQ_UNBOUND. > > > This code was tested by running Android app startup benchmarks and > > measuring how long the fsverity workqueue spent in the ready queue. > > > > Before > > Total workqueue scheduler latency: 553800us > > After > > Total workqueue scheduler latency: 18962us Given the gap between data above and the 15253 us in diagram[1], and the SHA instructions[2], could you specify a bit on your test? [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google.com/ [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/
On Fri, Mar 10, 2023 at 12:01 AM Hillf Danton <hdanton@sina.com> wrote: > > On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@kernel.org> > > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote: > > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This > > > is problematic for latency sensitive workloads like I/O post-processing. > > > > > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related > > > scheduler latency and improves app cold startup times by ~30ms. > > > > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue > > too, for the same reason? > > > > I'm still amazed that it's such a big improvement! I don't really need it to > > apply this patch, but it would be very interesting to know exactly why the > > latency is so bad with WQ_UNBOUND. My current guess for the root cause is excessing saving/restoring of the FPSIMD state. > > > > > This code was tested by running Android app startup benchmarks and > > > measuring how long the fsverity workqueue spent in the ready queue. > > > > > > Before > > > Total workqueue scheduler latency: 553800us > > > After > > > Total workqueue scheduler latency: 18962us > > Given the gap between data above and the 15253 us in diagram[1], and > the SHA instructions[2], could you specify a bit on your test? The test I'm running opens the Android messaging APK which is validated with fsverity. It opens the messaging app 25 times, dropping caches each time. The benchmark produces a Perfetto trace which we use to compute the scheduler latency. We sum up the amount of time that each fsverity worker spent in the ready state. The test in [1] is similar, but may be using a different APK. These tests are not in AOSP, so I can't share a link to them, but I would expect that fio on a ramdisk would produce similarly good results. > > [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google.com/ > [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/
On 10 Mar 2023 11:09:55 -0800 Nathan Huckleberry <nhuck@google.com> > On Fri, Mar 10, 2023 at 12:01=E2=80=AFAM Hillf Danton <hdanton@sina.com> wr= > ote: > > > > On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@kernel.org> > > > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote: > > > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. Th= > is > > > > is problematic for latency sensitive workloads like I/O post-processi= > ng. > > > > > > > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue relat= > ed > > > > scheduler latency and improves app cold startup times by ~30ms. > > > > > > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity w= > orkqueue > > > too, for the same reason? > > > > > > I'm still amazed that it's such a big improvement! I don't really need= > it to > > > apply this patch, but it would be very interesting to know exactly why = > the > > > latency is so bad with WQ_UNBOUND. > > My current guess for the root cause is excessing saving/restoring of > the FPSIMD state. > > > > > > > > This code was tested by running Android app startup benchmarks and > > > > measuring how long the fsverity workqueue spent in the ready queue. > > > > > > > > Before > > > > Total workqueue scheduler latency: 553800us > > > > After > > > > Total workqueue scheduler latency: 18962us > > > > Given the gap between data above and the 15253 us in diagram[1], and > > the SHA instructions[2], could you specify a bit on your test? > > The test I'm running opens the Android messaging APK which is > validated with fsverity. It opens the messaging app 25 times, dropping > caches each time. The benchmark produces a Perfetto trace which we use > to compute the scheduler latency. We sum up the amount of time that > each fsverity worker spent in the ready state. The test in [1] is > similar, but may be using a different APK. These tests are not in > AOSP, so I can't share a link to them, but I would expect that fio on > a ramdisk would produce similarly good results. Thanks for your introduction to the test. Are the similar results ARM64/Android specific? Could this patch help X86? > > > > > [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@= > google.com/ > > [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279L= > miRxKt7aQYg@mail.gmail.com/
diff --git a/fs/verity/verify.c b/fs/verity/verify.c index f50e3b5b52c9..e8ec37774d63 100644 --- a/fs/verity/verify.c +++ b/fs/verity/verify.c @@ -395,7 +395,7 @@ int __init fsverity_init_workqueue(void) * which blocks reads from completing, over regular application tasks. */ fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", - WQ_UNBOUND | WQ_HIGHPRI, + WQ_HIGHPRI, num_online_cpus()); if (!fsverity_read_workqueue) return -ENOMEM;
WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This is problematic for latency sensitive workloads like I/O post-processing. Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related scheduler latency and improves app cold startup times by ~30ms. This code was tested by running Android app startup benchmarks and measuring how long the fsverity workqueue spent in the ready queue. Before Total workqueue scheduler latency: 553800us After Total workqueue scheduler latency: 18962us Change-Id: I693efee541757851ed6d229430111cd763d39067 Signed-off-by: Nathan Huckleberry <nhuck@google.com> --- fs/verity/verify.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)