diff mbox series

[3/3] proc: use idr tgid tag hint to iterate pids in readdir

Message ID 20220614180949.102914-4-bfoster@redhat.com (mailing list archive)
State New, archived
Headers show
Series proc: improve root readdir latency with many threads | expand

Commit Message

Brian Foster June 14, 2022, 6:09 p.m. UTC
The tgid pid/task scan in proc_pid_readdir() is rather inefficient.
It linearly walks the pid_namespace and checks each allocated pid
for an associated PIDTYPE_TGID task. This has shown to impact
getdents() latency in environments that might have processes with
very large thread counts.

For example, on a mostly idle 2.4GHz Intel Xeon running Fedora on
5.19.0-rc2, 'strace -T xfs_io -c readdir /proc' shows the following:

  getdents64(... /* 814 entries */, 32768) = 20624 <0.000568>

With the addition of a dummy (i.e. idle) process running that
creates an additional 100k threads, that latency increases to:

  getdents64(... /* 815 entries */, 32768) = 20656 <0.011315>

While this may not be noticeable in one off /proc scans or simple
usage of ps or top, we have users that report problems caused by
this latency increase in these sort of scaled environments with
custom tooling that makes heavier use of task monitoring.

Optimize the tgid task scanning in proc_pid_readdir() by using
IDR_TGID tag lookups in the pid namespace tree. Tagged pids are not
guaranteed to have an associated PIDTYPE_TGID task, but pids that do
are always tagged. This significantly improves readdir() latency
when the pid namespace is populated with group leader tasks with
unusually large thread counts. For example, the above 100k idle task
test against a patched kernel now results in the following:

Idle:
  getdents64(... /* 861 entries */, 32768) = 21048 <0.000670>

"" + 100k threads:
  getdents64(... /* 862 entries */, 32768) = 21096 <0.000959>

... which is a much smaller latency hit after the high thread count
task is started.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/proc/base.c      |  2 +-
 include/linux/idr.h | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox June 15, 2022, 1:44 p.m. UTC | #1
On Tue, Jun 14, 2022 at 02:09:49PM -0400, Brian Foster wrote:
> +++ b/include/linux/idr.h
> @@ -185,6 +185,20 @@ static inline bool idr_is_group_lead(struct idr *idr, unsigned long id)
>  	return radix_tree_tag_get(&idr->idr_rt, id, IDR_TGID);
>  }
>  
> +/*
> + * Find the next id with a potentially associated TGID task using the internal
> + * tag. Task association is not guaranteed and must be checked explicitly.
> + */
> +static inline struct pid *find_tgid_pid(struct idr *idr, unsigned long id)
> +{
> +	struct pid *pid;
> +
> +	if (radix_tree_gang_lookup_tag(&idr->idr_rt, (void **) &pid, id, 1,
> +				       IDR_TGID) != 1)
> +		return NULL;
> +	return pid;
> +}

The IDR is a generic data structure, and shouldn't know anything about
PIDs, TGIDs or tasks.
Brian Foster June 15, 2022, 2:44 p.m. UTC | #2
On Wed, Jun 15, 2022 at 02:44:27PM +0100, Matthew Wilcox wrote:
> On Tue, Jun 14, 2022 at 02:09:49PM -0400, Brian Foster wrote:
> > +++ b/include/linux/idr.h
> > @@ -185,6 +185,20 @@ static inline bool idr_is_group_lead(struct idr *idr, unsigned long id)
> >  	return radix_tree_tag_get(&idr->idr_rt, id, IDR_TGID);
> >  }
> >  
> > +/*
> > + * Find the next id with a potentially associated TGID task using the internal
> > + * tag. Task association is not guaranteed and must be checked explicitly.
> > + */
> > +static inline struct pid *find_tgid_pid(struct idr *idr, unsigned long id)
> > +{
> > +	struct pid *pid;
> > +
> > +	if (radix_tree_gang_lookup_tag(&idr->idr_rt, (void **) &pid, id, 1,
> > +				       IDR_TGID) != 1)
> > +		return NULL;
> > +	return pid;
> > +}
> 
> The IDR is a generic data structure, and shouldn't know anything about
> PIDs, TGIDs or tasks.
> 

Hm, Ok. So I suppose this interface should probably be split up a bit
more. The idr wrappers can be more generic along the lines of
idr_set_tag(), idr_is_tagged(), idr_get_next_tagged(), etc. This would
use something like an IDR_TAG1 internally (instead of IDR_TGID) to
basically expose support for a single, generic, idr-iternal (but
external radix-tree) tag for idr consumers. (Presumably this would map
directly over to xarray marks per your reply to patch 1). Then, the
find_tgid_pid() bits can be lifted into the pid code along the lines of
find_ge_pid() (along with perhaps some other simple "tgid" wrappers over
the idr tag mechanism).

If that sounds generally reasonable, I'll wait for any feedback on the
functional approach and follow up with something like that in v2..

Brian
diff mbox series

Patch

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 8dfa36a99c74..fd3c8a5f8c2d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3436,7 +3436,7 @@  static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
 	rcu_read_lock();
 retry:
 	iter.task = NULL;
-	pid = find_ge_pid(iter.tgid, ns);
+	pid = find_tgid_pid(&ns->idr, iter.tgid);
 	if (pid) {
 		iter.tgid = pid_nr_ns(pid, ns);
 		iter.task = pid_task(pid, PIDTYPE_TGID);
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 11e0ccedfc92..5ef32311b232 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -185,6 +185,20 @@  static inline bool idr_is_group_lead(struct idr *idr, unsigned long id)
 	return radix_tree_tag_get(&idr->idr_rt, id, IDR_TGID);
 }
 
+/*
+ * Find the next id with a potentially associated TGID task using the internal
+ * tag. Task association is not guaranteed and must be checked explicitly.
+ */
+static inline struct pid *find_tgid_pid(struct idr *idr, unsigned long id)
+{
+	struct pid *pid;
+
+	if (radix_tree_gang_lookup_tag(&idr->idr_rt, (void **) &pid, id, 1,
+				       IDR_TGID) != 1)
+		return NULL;
+	return pid;
+}
+
 /**
  * idr_for_each_entry() - Iterate over an IDR's elements of a given type.
  * @idr: IDR handle.