Message ID | 20180508140640.0e312dba025df75cbf205cdb@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Kim Phillips <kim.phillips@arm.com> writes: > This patch is in the context of allowing the Coresight h/w > trace driver suite to be loaded as modules. Coresight uses > find_task_by_vpid when running in direct capture mode (via sysfs) > when getting/setting the context ID comparator to trigger on > (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> There is no way to implement a sysfs file that takes a pid correctly. Don't do it. Pids are tied to pid namespaces and sysfs deliberately does not do anything remotely resembly a pid namespace. Eric > Cc: Mathieu Poirier <mathieu.poirier@linaro.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Oleg Nesterov <oleg@redhat.com> > Cc: Gargi Sharma <gs051095@gmail.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Pavel Tatashin <pasha.tatashin@oracle.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: David Howells <dhowells@redhat.com> > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > Signed-off-by: Kim Phillips <kim.phillips@arm.com> > --- > Current CoreSight callsite: > > https://lxr.missinglinkelectronics.com/linux/include/linux/coresight.h#L285 > > A quick look didn't find anything, but if Coresight needs to do > something differently, please comment. > > kernel/pid.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/pid.c b/kernel/pid.c > index 157fe4b19971..92b1b623f3e0 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -342,6 +342,7 @@ struct task_struct *find_task_by_vpid(pid_t vnr) > { > return find_task_by_pid_ns(vnr, task_active_pid_ns(current)); > } > +EXPORT_SYMBOL_GPL(find_task_by_vpid); > > struct task_struct *find_get_task_by_vpid(pid_t nr) > {
Kim Phillips <kim.phillips@arm.com> writes: > This patch is in the context of allowing the Coresight h/w > trace driver suite to be loaded as modules. Coresight uses > find_task_by_vpid when running in direct capture mode (via sysfs) > when getting/setting the context ID comparator to trigger on > (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). Aside from my objection about how bad an interface a pid in sysfs is. The implementation of coresight_vpid_to_pid is horrible. The code should be just: static inline pid_t coresight_vpid_to_pid(pid_t vpid) { rcu_read_lock(); pid = pid_nr(find_vpid(vpid)); rcu_read_unlock(); return pid; } Which takes find_task_by_vpid out of the picture. But reading further I am seeing code writing a pid to hardware. That is broken. That is a layering violation of the first order. Giving implementation details like that to hardware. Any chance while you are working on this you can modify this code so that it does something sensible and defensible instead of every line of code I read be wrong in at least one detail? Thank you, Eric
Hi Eric, On 09/05/18 05:59, Eric W. Biederman wrote: > Kim Phillips <kim.phillips@arm.com> writes: > >> This patch is in the context of allowing the Coresight h/w >> trace driver suite to be loaded as modules. Coresight uses >> find_task_by_vpid when running in direct capture mode (via sysfs) >> when getting/setting the context ID comparator to trigger on >> (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). > > Aside from my objection about how bad an interface a pid in sysfs is. > The implementation of coresight_vpid_to_pid is horrible. > > The code should be just: > > static inline pid_t coresight_vpid_to_pid(pid_t vpid) > { > rcu_read_lock(); > pid = pid_nr(find_vpid(vpid)); > rcu_read_unlock(); > > return pid; > } > Which takes find_task_by_vpid out of the picture. > > But reading further I am seeing code writing a pid to hardware. That is > broken. That is a layering violation of the first order. Giving > implementation details like that to hardware. Note that the value here is nothing more than a token - the CoreSight hardware doesn't actually *do* anything with it other than match it against the same value which we also stash in the CPU in much the same fashion - see CONFIG_PID_IN_CONTEXTIDR for, if you'll pardon the pun, context. TL;DR: the CPU has a special register whose only purpose is to allow the OS help external debug tools identify the currently executing process, by writing some arbitrary identifier in there. The trace hardware can spit that identifier out into the trace stream whenever it changes, such that the user can see context switches easily. Newer trace hardware can also use it to actively filter what the capture at source, such that only the portions of interest are traced at all. We could in theory make up any old value, but as I understand it the PID is/was the most user-friendly and easily correlatable thing to hand, and it's now probably too well-established to reasonably change. Robin. > Any chance while you are working on this you can modify this code so > that it does something sensible and defensible instead of every line of > code I read be wrong in at least one detail? > > Thank you, > Eric >
On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote: > Kim Phillips <kim.phillips@arm.com> writes: > > > This patch is in the context of allowing the Coresight h/w > > trace driver suite to be loaded as modules. Coresight uses > > find_task_by_vpid when running in direct capture mode (via sysfs) > > when getting/setting the context ID comparator to trigger on > > (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). > > Aside from my objection about how bad an interface a pid in sysfs is. > The implementation of coresight_vpid_to_pid is horrible. > > The code should be just: > > static inline pid_t coresight_vpid_to_pid(pid_t vpid) > { > rcu_read_lock(); > pid = pid_nr(find_vpid(vpid)); > rcu_read_unlock(); > > return pid; > } > Which takes find_task_by_vpid out of the picture. Many thanks for pointing out the right way to do this. When Chunyan added this feature she broadly published her work and find_task_by_vpid() is the function she was asked to used. > > But reading further I am seeing code writing a pid to hardware. That is > broken. That is a layering violation of the first order. Giving > implementation details like that to hardware. This is how the feature works - as Robin pointed out tracers are designed to match pid values with the CPU's contextID register. The input value has no other effect than triggering trace collection, which has absolutely no baring on the CPU. > > Any chance while you are working on this you can modify this code so > that it does something sensible and defensible instead of every line of > code I read be wrong in at least one detail? > > Thank you, > Eric >
Mathieu Poirier <mathieu.poirier@linaro.org> writes: > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote: >> Kim Phillips <kim.phillips@arm.com> writes: >> >> > This patch is in the context of allowing the Coresight h/w >> > trace driver suite to be loaded as modules. Coresight uses >> > find_task_by_vpid when running in direct capture mode (via sysfs) >> > when getting/setting the context ID comparator to trigger on >> > (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). >> >> Aside from my objection about how bad an interface a pid in sysfs is. >> The implementation of coresight_vpid_to_pid is horrible. >> >> The code should be just: >> >> static inline pid_t coresight_vpid_to_pid(pid_t vpid) >> { >> rcu_read_lock(); >> pid = pid_nr(find_vpid(vpid)); >> rcu_read_unlock(); >> >> return pid; >> } >> Which takes find_task_by_vpid out of the picture. > > Many thanks for pointing out the right way to do this. When Chunyan added > this feature she broadly published her work and find_task_by_vpid() is the > function she was asked to used. Clearly no one was thinking through the implications of a sysfs file which does not have pid namespace support on namespacing. I am quite upset at this mess of an API. It is not a maintainable way to do things. >> But reading further I am seeing code writing a pid to hardware. That is >> broken. That is a layering violation of the first order. Giving >> implementation details like that to hardware. > > This is how the feature works - as Robin pointed out tracers are designed to > match pid values with the CPU's contextID register. The input value has no > other effect than triggering trace collection, which has absolutely no baring on > the CPU. So please tell me how we make the tracer pid namespace aware. Or is it guaranteed that only the global root user will use this functionality? As you are taking a vpid it looks like users with lesser privileges are able to request this. From the other reply it appears this is the value the tracer returns to put in logs. Perhaps I missed it but I didn't see anything that translated from the global pid to something else. Which would make using this feature in a pid namespace confusing and a problematic information leak if I have understood what has been said so far. Eric
On Wed, May 09, 2018 at 09:35:07PM -0500, Eric W. Biederman wrote: > Mathieu Poirier <mathieu.poirier@linaro.org> writes: > > > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote: > >> Kim Phillips <kim.phillips@arm.com> writes: > >> > >> > This patch is in the context of allowing the Coresight h/w > >> > trace driver suite to be loaded as modules. Coresight uses > >> > find_task_by_vpid when running in direct capture mode (via sysfs) > >> > when getting/setting the context ID comparator to trigger on > >> > (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). > >> > >> Aside from my objection about how bad an interface a pid in sysfs is. > >> The implementation of coresight_vpid_to_pid is horrible. > >> > >> The code should be just: > >> > >> static inline pid_t coresight_vpid_to_pid(pid_t vpid) > >> { > >> rcu_read_lock(); > >> pid = pid_nr(find_vpid(vpid)); > >> rcu_read_unlock(); > >> > >> return pid; > >> } > >> Which takes find_task_by_vpid out of the picture. > > > > Many thanks for pointing out the right way to do this. When Chunyan added > > this feature she broadly published her work and find_task_by_vpid() is the > > function she was asked to used. > > Clearly no one was thinking through the implications of a sysfs file > which does not have pid namespace support on namespacing. I am quite > upset at this mess of an API. It is not a maintainable way to do things. > > >> But reading further I am seeing code writing a pid to hardware. That is > >> broken. That is a layering violation of the first order. Giving > >> implementation details like that to hardware. > > > > This is how the feature works - as Robin pointed out tracers are designed to > > match pid values with the CPU's contextID register. The input value has no > > other effect than triggering trace collection, which has absolutely no baring on > > the CPU. > > So please tell me how we make the tracer pid namespace aware. Or is it > guaranteed that only the global root user will use this functionality? > > As you are taking a vpid it looks like users with lesser privileges are > able to request this. From the other reply it appears this is the > value the tracer returns to put in logs. Perhaps I missed it but I > didn't see anything that translated from the global pid to something > else. Which would make using this feature in a pid namespace confusing > and a problematic information leak if I have understood what has been > said so far. Let's look to see what's placed into the context ID register - this is done by arch/arm/mm/context.c::contextidr_notifier(): pid = task_pid_nr(thread->task) << ASID_BITS; This is documented in linux/sched.h as: * task_xid_nr() : global id, i.e. the id seen from the init namespace; So, what ends up in the context ID register is the _global_ PID, not a namespace specific PID. This means the hardware deals with global PID values. It seems quite logical to use the global PID value for the hardware, because that is a globally unique value - especially as the hardware uses this for filtering events. So asking for a namespace's pid 1 gets mapped to the global pid value, which won't match some other namespace's pid 1. The problem comes _if_ the event stream delivered to userspace contains the global PID values and the event stream is being looked at from within a namespace. This does not leak information from other namespaces because of the uniqueness of the global PID. However, what it does leak is the value of the global PID which is meaningless in the namespace. So, before the event stream is delivered to userspace, this value needs to be re-written to the namespace's PID value. Things get more yucky with this when you look at the ctxid_masks stuff - which looks to me like it implements a mask on the PID value. Masks on the pid value are irrelevant from within a namespace, because the mask is applied to the global PID value, not the namespace's PID value. You can't really define how a set of namespace PIDs will map to global PIDs, so masking the context ID PID value in the presence of namespaces is pretty useless - and potentially ends up being an information leak. As for the sysfs file thing, I think the simple solution to that is the sysfs file should accept a PID value in the current namespace, and translate that to the global namespace - and the global PID value should be stored. When reading, the global PID value should be translated back to the current namespace, or an error/empty given if the PID doesn't exist in that namespace. The current solution to store the vpid and simply return it irrespective of the namespace is just nonsense.
On 10 May 2018 at 02:40, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Wed, May 09, 2018 at 09:35:07PM -0500, Eric W. Biederman wrote: >> Mathieu Poirier <mathieu.poirier@linaro.org> writes: >> >> > On Tue, May 08, 2018 at 11:59:38PM -0500, Eric W. Biederman wrote: >> >> Kim Phillips <kim.phillips@arm.com> writes: >> >> >> >> > This patch is in the context of allowing the Coresight h/w >> >> > trace driver suite to be loaded as modules. Coresight uses >> >> > find_task_by_vpid when running in direct capture mode (via sysfs) >> >> > when getting/setting the context ID comparator to trigger on >> >> > (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). >> >> >> >> Aside from my objection about how bad an interface a pid in sysfs is. >> >> The implementation of coresight_vpid_to_pid is horrible. >> >> >> >> The code should be just: >> >> >> >> static inline pid_t coresight_vpid_to_pid(pid_t vpid) >> >> { >> >> rcu_read_lock(); >> >> pid = pid_nr(find_vpid(vpid)); >> >> rcu_read_unlock(); >> >> >> >> return pid; >> >> } >> >> Which takes find_task_by_vpid out of the picture. >> > >> > Many thanks for pointing out the right way to do this. When Chunyan added >> > this feature she broadly published her work and find_task_by_vpid() is the >> > function she was asked to used. >> >> Clearly no one was thinking through the implications of a sysfs file >> which does not have pid namespace support on namespacing. I am quite >> upset at this mess of an API. It is not a maintainable way to do things. >> >> >> But reading further I am seeing code writing a pid to hardware. That is >> >> broken. That is a layering violation of the first order. Giving >> >> implementation details like that to hardware. >> > >> > This is how the feature works - as Robin pointed out tracers are designed to >> > match pid values with the CPU's contextID register. The input value has no >> > other effect than triggering trace collection, which has absolutely no baring on >> > the CPU. >> >> So please tell me how we make the tracer pid namespace aware. Or is it >> guaranteed that only the global root user will use this functionality? >> >> As you are taking a vpid it looks like users with lesser privileges are >> able to request this. From the other reply it appears this is the >> value the tracer returns to put in logs. Perhaps I missed it but I >> didn't see anything that translated from the global pid to something >> else. Which would make using this feature in a pid namespace confusing >> and a problematic information leak if I have understood what has been >> said so far. > Hi Russell, > Let's look to see what's placed into the context ID register - this is > done by arch/arm/mm/context.c::contextidr_notifier(): > > pid = task_pid_nr(thread->task) << ASID_BITS; > > This is documented in linux/sched.h as: > > * task_xid_nr() : global id, i.e. the id seen from the init namespace; > > So, what ends up in the context ID register is the _global_ PID, not a > namespace specific PID. This means the hardware deals with global PID > values. Correct. > > It seems quite logical to use the global PID value for the hardware, > because that is a globally unique value - especially as the hardware > uses this for filtering events. So asking for a namespace's pid 1 > gets mapped to the global pid value, which won't match some other > namespace's pid 1. > > The problem comes _if_ the event stream delivered to userspace contains > the global PID values and the event stream is being looked at from > within a namespace. Correct. > > This does not leak information from other namespaces because of the > uniqueness of the global PID. However, what it does leak is the value > of the global PID which is meaningless in the namespace. So, before > the event stream is delivered to userspace, this value needs to be > re-written to the namespace's PID value. Unfortunately that can't be done. The trace stream is compressed and needs to be decompressed using an external library. I think the only option is to return an error if a user is trying to use this feature from a namespace. > > Things get more yucky with this when you look at the ctxid_masks stuff > - which looks to me like it implements a mask on the PID value. Masks > on the pid value are irrelevant from within a namespace, because the > mask is applied to the global PID value, not the namespace's PID value. > You can't really define how a set of namespace PIDs will map to global > PIDs, so masking the context ID PID value in the presence of namespaces > is pretty useless - and potentially ends up being an information leak. Also correct. Since there is no point in trying to use contextID tracing from a namespace (see above) there is also no point in trying to apply a mask to the contextIDs. Once again I think it is best to return an error when using from a namespace. Thanks, Mathieu > > As for the sysfs file thing, I think the simple solution to that is > the sysfs file should accept a PID value in the current namespace, > and translate that to the global namespace - and the global PID value > should be stored. When reading, the global PID value should be > translated back to the current namespace, or an error/empty given if > the PID doesn't exist in that namespace. The current solution to > store the vpid and simply return it irrespective of the namespace is > just nonsense. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up > According to speedtest.net: 8.21Mbps down 510kbps up
On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote: > Hi Russell, > > On 10 May 2018 at 02:40, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > > This does not leak information from other namespaces because of the > > uniqueness of the global PID. However, what it does leak is the value > > of the global PID which is meaningless in the namespace. So, before > > the event stream is delivered to userspace, this value needs to be > > re-written to the namespace's PID value. > > Unfortunately that can't be done. The trace stream is compressed and > needs to be decompressed using an external library. I think the only > option is to return an error if a user is trying to use this feature > from a namespace. That sounds like a sensible approach, and that should get rid of the vpid stuff too. Eric, would this solve all your concerns?
Russell King - ARM Linux <linux@armlinux.org.uk> writes: > On Thu, May 10, 2018 at 01:39:18PM -0600, Mathieu Poirier wrote: >> Hi Russell, >> >> On 10 May 2018 at 02:40, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: >> > This does not leak information from other namespaces because of the >> > uniqueness of the global PID. However, what it does leak is the value >> > of the global PID which is meaningless in the namespace. So, before >> > the event stream is delivered to userspace, this value needs to be >> > re-written to the namespace's PID value. >> >> Unfortunately that can't be done. The trace stream is compressed and >> needs to be decompressed using an external library. I think the only >> option is to return an error if a user is trying to use this feature >> from a namespace. > > That sounds like a sensible approach, and that should get rid of the > vpid stuff too. > > Eric, would this solve all your concerns? It does, and I have given my ack to the respin. I am moderately concerned about using the global pid with hardware. But pids are a core abstraction that aren't going anywhere. As long as hardware does not impose constraints on pids that software already does not we should be fine. Eric
diff --git a/kernel/pid.c b/kernel/pid.c index 157fe4b19971..92b1b623f3e0 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -342,6 +342,7 @@ struct task_struct *find_task_by_vpid(pid_t vnr) { return find_task_by_pid_ns(vnr, task_active_pid_ns(current)); } +EXPORT_SYMBOL_GPL(find_task_by_vpid); struct task_struct *find_get_task_by_vpid(pid_t nr) {
This patch is in the context of allowing the Coresight h/w trace driver suite to be loaded as modules. Coresight uses find_task_by_vpid when running in direct capture mode (via sysfs) when getting/setting the context ID comparator to trigger on (/sys/bus/coresight/devices/<x>.etm/ctxid_pid). Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Kim Phillips <kim.phillips@arm.com> --- Current CoreSight callsite: https://lxr.missinglinkelectronics.com/linux/include/linux/coresight.h#L285 A quick look didn't find anything, but if Coresight needs to do something differently, please comment. kernel/pid.c | 1 + 1 file changed, 1 insertion(+)