Message ID | 20230310193325.620493-1-nhuck@google.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v2] fsverity: Remove WQ_UNBOUND from fsverity read workqueue | expand |
On Fri, Mar 10, 2023 at 11:33:25AM -0800, Nathan Huckleberry wrote: > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This > is problematic for latency sensitive workloads, like I/O > post-processing. > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related > scheduler latency and improves app cold startup times by ~30ms. > WQ_UNBOUND was also removed from the dm-verity workqueue for the same > reason [1]. > > This code was tested by running Android app startup benchmarks and > measuring how long the fsverity workqueue spent in the runnable state. > > Before > Total workqueue scheduler latency: 553800us > After > Total workqueue scheduler latency: 18962us > > [1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/ > > Signed-off-by: Nathan Huckleberry <nhuck@google.com> > --- > Changelog: > v1 -> v2: > - Added comment about WQ_UNBOUND > - Added info about related dm-verity patches in commit message > > fs/verity/verify.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/fs/verity/verify.c b/fs/verity/verify.c > index f50e3b5b52c9..782b8b4a24c1 100644 > --- a/fs/verity/verify.c > +++ b/fs/verity/verify.c > @@ -387,15 +387,15 @@ EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work); > int __init fsverity_init_workqueue(void) > { > /* > - * Use an unbound workqueue to allow bios to be verified in parallel > - * even when they happen to complete on the same CPU. This sacrifices > - * locality, but it's worthwhile since hashing is CPU-intensive. > - * > * Also use a high-priority workqueue to prioritize verification work, > * which blocks reads from completing, over regular application tasks. > + * > + * This workqueue is not marked as unbound for performance reasons. > + * Using an unbound workqueue for crypto operations causes excessive > + * scheduler latency on ARM64. > */ > fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", > - WQ_UNBOUND | WQ_HIGHPRI, > + WQ_HIGHPRI, > num_online_cpus()); Applied to https://git.kernel.org/pub/scm/fs/fsverity/linux.git/log/?h=for-next I adjusted the comment slightly so that the first paragraph doesn't start with "Also". - Eric
Thanks Eric. On Mon, Mar 13, 2023 at 3:59 PM Eric Biggers <ebiggers@kernel.org> wrote: > > On Fri, Mar 10, 2023 at 11:33:25AM -0800, Nathan Huckleberry wrote: > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This > > is problematic for latency sensitive workloads, like I/O > > post-processing. > > > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related > > scheduler latency and improves app cold startup times by ~30ms. > > WQ_UNBOUND was also removed from the dm-verity workqueue for the same > > reason [1]. > > > > This code was tested by running Android app startup benchmarks and > > measuring how long the fsverity workqueue spent in the runnable state. > > > > Before > > Total workqueue scheduler latency: 553800us > > After > > Total workqueue scheduler latency: 18962us > > > > [1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/ > > > > Signed-off-by: Nathan Huckleberry <nhuck@google.com> > > --- > > Changelog: > > v1 -> v2: > > - Added comment about WQ_UNBOUND > > - Added info about related dm-verity patches in commit message > > > > fs/verity/verify.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/fs/verity/verify.c b/fs/verity/verify.c > > index f50e3b5b52c9..782b8b4a24c1 100644 > > --- a/fs/verity/verify.c > > +++ b/fs/verity/verify.c > > @@ -387,15 +387,15 @@ EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work); > > int __init fsverity_init_workqueue(void) > > { > > /* > > - * Use an unbound workqueue to allow bios to be verified in parallel > > - * even when they happen to complete on the same CPU. This sacrifices > > - * locality, but it's worthwhile since hashing is CPU-intensive. > > - * > > * Also use a high-priority workqueue to prioritize verification work, > > * which blocks reads from completing, over regular application tasks. > > + * > > + * This workqueue is not marked as unbound for performance reasons. > > + * Using an unbound workqueue for crypto operations causes excessive > > + * scheduler latency on ARM64. > > */ > > fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", > > - WQ_UNBOUND | WQ_HIGHPRI, > > + WQ_HIGHPRI, > > num_online_cpus()); > > Applied to https://git.kernel.org/pub/scm/fs/fsverity/linux.git/log/?h=for-next > > I adjusted the comment slightly so that the first paragraph doesn't start with > "Also". > > - Eric
On Mon, Mar 13, 2023 at 04:07:59PM -0700, Nathan Huckleberry wrote: > > > > Applied to https://git.kernel.org/pub/scm/fs/fsverity/linux.git/log/?h=for-next > > > > I adjusted the comment slightly so that the first paragraph doesn't start with > > "Also". > > As discussed, this seems to be a significant enough improvement to call this a "fix". So I've also added: Fixes: 8a1d0f9cacc9 ("fs-verity: add data verification hooks for ->readpages()") Cc: stable@vger.kernel.org and moved the commit to: https://git.kernel.org/pub/scm/fs/fsverity/linux.git/log/?h=for-current - Eric
diff --git a/fs/verity/verify.c b/fs/verity/verify.c index f50e3b5b52c9..782b8b4a24c1 100644 --- a/fs/verity/verify.c +++ b/fs/verity/verify.c @@ -387,15 +387,15 @@ EXPORT_SYMBOL_GPL(fsverity_enqueue_verify_work); int __init fsverity_init_workqueue(void) { /* - * Use an unbound workqueue to allow bios to be verified in parallel - * even when they happen to complete on the same CPU. This sacrifices - * locality, but it's worthwhile since hashing is CPU-intensive. - * * Also use a high-priority workqueue to prioritize verification work, * which blocks reads from completing, over regular application tasks. + * + * This workqueue is not marked as unbound for performance reasons. + * Using an unbound workqueue for crypto operations causes excessive + * scheduler latency on ARM64. */ fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", - WQ_UNBOUND | WQ_HIGHPRI, + WQ_HIGHPRI, num_online_cpus()); if (!fsverity_read_workqueue) return -ENOMEM;
WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This is problematic for latency sensitive workloads, like I/O post-processing. Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related scheduler latency and improves app cold startup times by ~30ms. WQ_UNBOUND was also removed from the dm-verity workqueue for the same reason [1]. This code was tested by running Android app startup benchmarks and measuring how long the fsverity workqueue spent in the runnable state. Before Total workqueue scheduler latency: 553800us After Total workqueue scheduler latency: 18962us [1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/ Signed-off-by: Nathan Huckleberry <nhuck@google.com> --- Changelog: v1 -> v2: - Added comment about WQ_UNBOUND - Added info about related dm-verity patches in commit message fs/verity/verify.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)