Message ID | 1466694434-1420-1-git-send-email-toiwoton@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 23, 2016 at 8:07 AM, Topi Miettinen <toiwoton@gmail.com> wrote: > There are many basic ways to control processes, including capabilities, > cgroups and resource limits. However, there are far fewer ways to find > out useful values for the limits, except blind trial and error. > > Currently, there is no way to know which capabilities are actually used. > Even the source code is only implicit, in-depth knowledge of each > capability must be used when analyzing a program to judge which > capabilities the program will exercise. > > Add a new cgroup controller for monitoring of capabilities > in the cgroup. > > Test case demonstrating basic capability monitoring and how the > capabilities are combined at next level (boot to rdshell): > > (initramfs) cd /sys/fs > (initramfs) mount -t cgroup2 cgroup cgroup > (initramfs) cd cgroup > (initramfs) echo +capability > cgroup.subtree_control > (initramfs) mkdir test; cd test > (initramfs) echo +capability > cgroup.subtree_control > (initramfs) ls > capability.used cgroup.events cgroup.subtree_control > cgroup.controllers cgroup.procs > (initramfs) mkdir first second > (initramfs) sh > > BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash) > Enter 'help' for a list of built-in commands. > > (initramfs) cd first > (initramfs) echo $$ >cgroup.procs > (initramfs) cat capability.used > 0000000000000000 # nothing so far > (initramfs) mknod /dev/z_$$ c 1 2 > (initramfs) cat capability.used > 0000000008000000 # CAP_MKNOD > (initramfs) cat ../capability.used > 0000000008000000 # also seen at next higher level > (initramfs) exit > (initramfs) sh > > BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash) > Enter 'help' for a list of built-in commands. > > (initramfs) cd second > (initramfs) echo $$ >cgroup.procs > (initramfs) cat capability.used > 0000000000000000 # nothing so far > (initramfs) chown 1234 /dev/z_* > (initramfs) cat capability.used > 0000000000000001 # CAP_CHROOT nitpick: this is CAP_CHOWN, not CAP_CHROOT -Kees
Hello, On Thu, Jun 23, 2016 at 06:07:10PM +0300, Topi Miettinen wrote: > There are many basic ways to control processes, including capabilities, > cgroups and resource limits. However, there are far fewer ways to find > out useful values for the limits, except blind trial and error. > > Currently, there is no way to know which capabilities are actually used. > Even the source code is only implicit, in-depth knowledge of each > capability must be used when analyzing a program to judge which > capabilities the program will exercise. > > Add a new cgroup controller for monitoring of capabilities > in the cgroup. > > Test case demonstrating basic capability monitoring and how the > capabilities are combined at next level (boot to rdshell): This doesn't have anything to do with resource control and I don't think it's a good idea to add arbitrary monitoring mechanisms to cgroup just because it's easy to add interface there. Given that capabilities are inherited and modified through the process hierarchy, shouldn't this be part of that? Thanks.
On Thu, 23 Jun 2016 18:07:10 +0300 Topi Miettinen <toiwoton@gmail.com> wrote: > There are many basic ways to control processes, including capabilities, > cgroups and resource limits. However, there are far fewer ways to find > out useful values for the limits, except blind trial and error. > > Currently, there is no way to know which capabilities are actually used. > Even the source code is only implicit, in-depth knowledge of each > capability must be used when analyzing a program to judge which > capabilities the program will exercise. > > Add a new cgroup controller for monitoring of capabilities > in the cgroup. I'm having trouble understanding how valuable this feature is to our users, and that's a rather important thing! Perhaps it would help if you were to explain your motivation: particular use cases which benefited from this, for example. -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/23/16 21:38, Tejun Heo wrote: > Hello, > > On Thu, Jun 23, 2016 at 06:07:10PM +0300, Topi Miettinen wrote: >> There are many basic ways to control processes, including capabilities, >> cgroups and resource limits. However, there are far fewer ways to find >> out useful values for the limits, except blind trial and error. >> >> Currently, there is no way to know which capabilities are actually used. >> Even the source code is only implicit, in-depth knowledge of each >> capability must be used when analyzing a program to judge which >> capabilities the program will exercise. >> >> Add a new cgroup controller for monitoring of capabilities >> in the cgroup. >> >> Test case demonstrating basic capability monitoring and how the >> capabilities are combined at next level (boot to rdshell): > > This doesn't have anything to do with resource control and I don't > think it's a good idea to add arbitrary monitoring mechanisms to > cgroup just because it's easy to add interface there. Given that > capabilities are inherited and modified through the process hierarchy, > shouldn't this be part of that? With per process tracking, it's easy to miss if a short-lived process exercised capabilities. Especially with ambient capabilities, the parent process could be a shell script which might not use capabilities at all, but its children do the heavy lifting. Per process tracking (like in the version I sent earlier) could still be added on top of this to complement cgroup level tracking, but I think cgroup approach is more flexible as it can cover anything from a single task to a collection of processes. -Topi > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/23/16 23:46, Andrew Morton wrote: > On Thu, 23 Jun 2016 18:07:10 +0300 Topi Miettinen <toiwoton@gmail.com> wrote: > >> There are many basic ways to control processes, including capabilities, >> cgroups and resource limits. However, there are far fewer ways to find >> out useful values for the limits, except blind trial and error. >> >> Currently, there is no way to know which capabilities are actually used. >> Even the source code is only implicit, in-depth knowledge of each >> capability must be used when analyzing a program to judge which >> capabilities the program will exercise. >> >> Add a new cgroup controller for monitoring of capabilities >> in the cgroup. > > I'm having trouble understanding how valuable this feature is to our > users, and that's a rather important thing! > > Perhaps it would help if you were to explain your motivation: > particular use cases which benefited from this, for example. > It's easy to control with for example systemd or many other tools, which capabilities a service should have at the start. But how should a system administrator, application developer or distro maintaner ever determine a suitable value for this? Currently the only way seems to be to become an expert on capabilities, make an educated guess how the set of programs in question happen to work in this context and especially how they could exercise the capabilites in all possible use cases. Even then, the outcome is to just try something to see if that happens to work. Reading the source code (if available) does not help very much, because the use of capabilities is anything but explicit there. This is way too difficult, there must be some easier way. The information which capabilities actually were used in a trial run gives a much better starting point. The users can just use the list of used capabilities with configuring the service or when developing or maintaining the application. Of course, even that could still fail eventually, but then you simply copy the new value of used capabilities to the configuration, whereas currently you have to reconsider your understanding of the capabilities and the programs in light of the failure, which by itself might give no new useful information. One way to solve this for good would be to make the use of capabilities explicit in the ABI. For example, there could be a system call dac_override() which would be the only possible way ever to use the capability CAP_DAC_OVERRIDE and so forth. Then reading source code, tracing and many other approaches would be useful. But the OS with that kind of ABI (not Linux) would not be Unix-like at all for any (potentially) capability using programs, like find(1) or cat(1). -Topi -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jun 23, 2016 at 6:14 PM, Topi Miettinen <toiwoton@gmail.com> wrote: > On 06/23/16 23:46, Andrew Morton wrote: >> On Thu, 23 Jun 2016 18:07:10 +0300 Topi Miettinen <toiwoton@gmail.com> wrote: >> >>> There are many basic ways to control processes, including capabilities, >>> cgroups and resource limits. However, there are far fewer ways to find >>> out useful values for the limits, except blind trial and error. >>> >>> Currently, there is no way to know which capabilities are actually used. >>> Even the source code is only implicit, in-depth knowledge of each >>> capability must be used when analyzing a program to judge which >>> capabilities the program will exercise. >>> >>> Add a new cgroup controller for monitoring of capabilities >>> in the cgroup. >> >> I'm having trouble understanding how valuable this feature is to our >> users, and that's a rather important thing! >> >> Perhaps it would help if you were to explain your motivation: >> particular use cases which benefited from this, for example. >> > > It's easy to control with for example systemd or many other tools, which > capabilities a service should have at the start. But how should a system > administrator, application developer or distro maintaner ever determine > a suitable value for this? Currently the only way seems to be to become > an expert on capabilities, make an educated guess how the set of > programs in question happen to work in this context and especially how > they could exercise the capabilites in all possible use cases. Even > then, the outcome is to just try something to see if that happens to > work. Reading the source code (if available) does not help very much, > because the use of capabilities is anything but explicit there. > > This is way too difficult, there must be some easier way. The > information which capabilities actually were used in a trial run gives a > much better starting point. The users can just use the list of used > capabilities with configuring the service or when developing or > maintaining the application. Of course, even that could still fail > eventually, but then you simply copy the new value of used capabilities > to the configuration, whereas currently you have to reconsider your > understanding of the capabilities and the programs in light of the > failure, which by itself might give no new useful information. > > One way to solve this for good would be to make the use of capabilities > explicit in the ABI. For example, there could be a system call > dac_override() which would be the only possible way ever to use the > capability CAP_DAC_OVERRIDE and so forth. Then reading source code, > tracing and many other approaches would be useful. But the OS with that > kind of ABI (not Linux) would not be Unix-like at all for any > (potentially) capability using programs, like find(1) or cat(1). The problem is that most of the capabilities are so powerful on their own that limiting services to just a few may be all but useless. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, Jun 24, 2016 at 12:22:54AM +0000, Topi Miettinen wrote: > > This doesn't have anything to do with resource control and I don't > > think it's a good idea to add arbitrary monitoring mechanisms to > > cgroup just because it's easy to add interface there. Given that > > capabilities are inherited and modified through the process hierarchy, > > shouldn't this be part of that? > > With per process tracking, it's easy to miss if a short-lived process > exercised capabilities. Especially with ambient capabilities, the parent > process could be a shell script which might not use capabilities at all, > but its children do the heavy lifting. But isn't being recursive orthogonal to using cgroup? Why not account usages recursively along the process hierarchy? Capabilities don't have much to do with cgroup but everything with process hierarchy. That's how they're distributed and modified. If monitoring their usages is necessary, it makes sense to do it in the same structure. Thanks.
Quoting Tejun Heo (tj@kernel.org): > Hello, > > On Fri, Jun 24, 2016 at 12:22:54AM +0000, Topi Miettinen wrote: > > > This doesn't have anything to do with resource control and I don't > > > think it's a good idea to add arbitrary monitoring mechanisms to > > > cgroup just because it's easy to add interface there. Given that > > > capabilities are inherited and modified through the process hierarchy, > > > shouldn't this be part of that? > > > > With per process tracking, it's easy to miss if a short-lived process > > exercised capabilities. Especially with ambient capabilities, the parent > > process could be a shell script which might not use capabilities at all, > > but its children do the heavy lifting. > > But isn't being recursive orthogonal to using cgroup? Why not account > usages recursively along the process hierarchy? Capabilities don't > have much to do with cgroup but everything with process hierarchy. > That's how they're distributed and modified. If monitoring their > usages is necessary, it makes sense to do it in the same structure. That was my argument against using cgroups to enforce a new bounding set. For tracking though, the cgroup process tracking seems as applicable to this as it does to systemd tracking of services. It tracks a task and the children it forks. -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: > Quoting Tejun Heo (tj@kernel.org): > > But isn't being recursive orthogonal to using cgroup? Why not account > > usages recursively along the process hierarchy? Capabilities don't > > have much to do with cgroup but everything with process hierarchy. > > That's how they're distributed and modified. If monitoring their > > usages is necessary, it makes sense to do it in the same structure. > > That was my argument against using cgroups to enforce a new bounding > set. For tracking though, the cgroup process tracking seems as applicable > to this as it does to systemd tracking of services. It tracks a task and > the children it forks. Just monitoring is less jarring than implementing security enforcement via cgroup, but it is still jarring. What's wrong with recursive process hierarchy monitoring which is in line with the whole facility is implemented anyway? Thanks.
Quoting Tejun Heo (tj@kernel.org): > Hello, > > On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: > > Quoting Tejun Heo (tj@kernel.org): > > > But isn't being recursive orthogonal to using cgroup? Why not account > > > usages recursively along the process hierarchy? Capabilities don't > > > have much to do with cgroup but everything with process hierarchy. > > > That's how they're distributed and modified. If monitoring their > > > usages is necessary, it makes sense to do it in the same structure. > > > > That was my argument against using cgroups to enforce a new bounding > > set. For tracking though, the cgroup process tracking seems as applicable > > to this as it does to systemd tracking of services. It tracks a task and > > the children it forks. > > Just monitoring is less jarring than implementing security enforcement > via cgroup, but it is still jarring. What's wrong with recursive > process hierarchy monitoring which is in line with the whole facility > is implemented anyway? As I think Topi pointed out, one shortcoming is that if there is a short-lived child task, using its /proc/self/status is racy. You might just miss that it ever even existed, let alone that the "application" needed it. Another alternative we've both mentioned is to use systemtap. That's not as nice a solution as a cgroup, but then again this isn't really a common case, so maybe it is precisely what a tracing infrastructure is meant for. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Serge E. Hallyn" <serge@hallyn.com> writes: > Quoting Tejun Heo (tj@kernel.org): >> Hello, >> >> On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: >> > Quoting Tejun Heo (tj@kernel.org): >> > > But isn't being recursive orthogonal to using cgroup? Why not account >> > > usages recursively along the process hierarchy? Capabilities don't >> > > have much to do with cgroup but everything with process hierarchy. >> > > That's how they're distributed and modified. If monitoring their >> > > usages is necessary, it makes sense to do it in the same structure. >> > >> > That was my argument against using cgroups to enforce a new bounding >> > set. For tracking though, the cgroup process tracking seems as applicable >> > to this as it does to systemd tracking of services. It tracks a task and >> > the children it forks. >> >> Just monitoring is less jarring than implementing security enforcement >> via cgroup, but it is still jarring. What's wrong with recursive >> process hierarchy monitoring which is in line with the whole facility >> is implemented anyway? > > As I think Topi pointed out, one shortcoming is that if there is a short-lived > child task, using its /proc/self/status is racy. You might just miss that it > ever even existed, let alone that the "application" needed it. > > Another alternative we've both mentioned is to use systemtap. That's not > as nice a solution as a cgroup, but then again this isn't really a common > case, so maybe it is precisely what a tracing infrastructure is meant for. Hmm. We have capability use wired up into auditing. So we might be able to get away with just adding an appropriate audit message in commoncap.c:cap_capable that honors the audit flag and logs an audit message. The hook in selinux already appears to do that. Certainly audit sounds like the subsystem for this kind of work, as it's whole point in life is logging things, then something in userspace can just run over the audit longs and build a nice summary. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Serge. On Fri, Jun 24, 2016 at 11:59:10AM -0500, Serge E. Hallyn wrote: > > Just monitoring is less jarring than implementing security enforcement > > via cgroup, but it is still jarring. What's wrong with recursive > > process hierarchy monitoring which is in line with the whole facility > > is implemented anyway? > > As I think Topi pointed out, one shortcoming is that if there is a short-lived > child task, using its /proc/self/status is racy. You might just miss that it > ever even existed, let alone that the "application" needed it. But the parent can collect whatever its children used. We already do that with other stats. Thanks.
Quoting Eric W. Biederman (ebiederm@xmission.com): > "Serge E. Hallyn" <serge@hallyn.com> writes: > > > Quoting Tejun Heo (tj@kernel.org): > >> Hello, > >> > >> On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: > >> > Quoting Tejun Heo (tj@kernel.org): > >> > > But isn't being recursive orthogonal to using cgroup? Why not account > >> > > usages recursively along the process hierarchy? Capabilities don't > >> > > have much to do with cgroup but everything with process hierarchy. > >> > > That's how they're distributed and modified. If monitoring their > >> > > usages is necessary, it makes sense to do it in the same structure. > >> > > >> > That was my argument against using cgroups to enforce a new bounding > >> > set. For tracking though, the cgroup process tracking seems as applicable > >> > to this as it does to systemd tracking of services. It tracks a task and > >> > the children it forks. > >> > >> Just monitoring is less jarring than implementing security enforcement > >> via cgroup, but it is still jarring. What's wrong with recursive > >> process hierarchy monitoring which is in line with the whole facility > >> is implemented anyway? > > > > As I think Topi pointed out, one shortcoming is that if there is a short-lived > > child task, using its /proc/self/status is racy. You might just miss that it > > ever even existed, let alone that the "application" needed it. > > > > Another alternative we've both mentioned is to use systemtap. That's not > > as nice a solution as a cgroup, but then again this isn't really a common > > case, so maybe it is precisely what a tracing infrastructure is meant for. > > Hmm. > > We have capability use wired up into auditing. So we might be able to > get away with just adding an appropriate audit message in > commoncap.c:cap_capable that honors the audit flag and logs an audit > message. The hook in selinux already appears to do that. > > Certainly audit sounds like the subsystem for this kind of work, as it's > whole point in life is logging things, then something in userspace can > just run over the audit longs and build a nice summary. Good point, so long as we can also track ppid or fork info (using taskstats?) that would seem the best way. -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 24, 2016 at 6:15 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Thu, Jun 23, 2016 at 6:14 PM, Topi Miettinen <toiwoton@gmail.com> wrote: >> On 06/23/16 23:46, Andrew Morton wrote: >>> On Thu, 23 Jun 2016 18:07:10 +0300 Topi Miettinen <toiwoton@gmail.com> wrote: >>> >>>> There are many basic ways to control processes, including capabilities, >>>> cgroups and resource limits. However, there are far fewer ways to find >>>> out useful values for the limits, except blind trial and error. >>>> >>>> Currently, there is no way to know which capabilities are actually used. >>>> Even the source code is only implicit, in-depth knowledge of each >>>> capability must be used when analyzing a program to judge which >>>> capabilities the program will exercise. >>>> >>>> Add a new cgroup controller for monitoring of capabilities >>>> in the cgroup. >>> >>> I'm having trouble understanding how valuable this feature is to our >>> users, and that's a rather important thing! >>> >>> Perhaps it would help if you were to explain your motivation: >>> particular use cases which benefited from this, for example. >>> >> >> It's easy to control with for example systemd or many other tools, which >> capabilities a service should have at the start. But how should a system >> administrator, application developer or distro maintaner ever determine >> a suitable value for this? Currently the only way seems to be to become >> an expert on capabilities, make an educated guess how the set of >> programs in question happen to work in this context and especially how >> they could exercise the capabilites in all possible use cases. Even >> then, the outcome is to just try something to see if that happens to >> work. Reading the source code (if available) does not help very much, >> because the use of capabilities is anything but explicit there. >> >> This is way too difficult, there must be some easier way. The >> information which capabilities actually were used in a trial run gives a >> much better starting point. The users can just use the list of used >> capabilities with configuring the service or when developing or >> maintaining the application. Of course, even that could still fail >> eventually, but then you simply copy the new value of used capabilities >> to the configuration, whereas currently you have to reconsider your >> understanding of the capabilities and the programs in light of the >> failure, which by itself might give no new useful information. >> >> One way to solve this for good would be to make the use of capabilities >> explicit in the ABI. For example, there could be a system call >> dac_override() which would be the only possible way ever to use the >> capability CAP_DAC_OVERRIDE and so forth. Then reading source code, >> tracing and many other approaches would be useful. But the OS with that >> kind of ABI (not Linux) would not be Unix-like at all for any >> (potentially) capability using programs, like find(1) or cat(1). > > The problem is that most of the capabilities are so powerful on their > own that limiting services to just a few may be all but useless. May be there is some gain _if_ the resources that a process interact with _can_ also be made invisible with namespaces. > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/24/16 17:21, Eric W. Biederman wrote: > "Serge E. Hallyn" <serge@hallyn.com> writes: > >> Quoting Tejun Heo (tj@kernel.org): >>> Hello, >>> >>> On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: >>>> Quoting Tejun Heo (tj@kernel.org): >>>>> But isn't being recursive orthogonal to using cgroup? Why not account >>>>> usages recursively along the process hierarchy? Capabilities don't >>>>> have much to do with cgroup but everything with process hierarchy. >>>>> That's how they're distributed and modified. If monitoring their >>>>> usages is necessary, it makes sense to do it in the same structure. >>>> >>>> That was my argument against using cgroups to enforce a new bounding >>>> set. For tracking though, the cgroup process tracking seems as applicable >>>> to this as it does to systemd tracking of services. It tracks a task and >>>> the children it forks. >>> >>> Just monitoring is less jarring than implementing security enforcement >>> via cgroup, but it is still jarring. What's wrong with recursive >>> process hierarchy monitoring which is in line with the whole facility >>> is implemented anyway? >> >> As I think Topi pointed out, one shortcoming is that if there is a short-lived >> child task, using its /proc/self/status is racy. You might just miss that it >> ever even existed, let alone that the "application" needed it. >> >> Another alternative we've both mentioned is to use systemtap. That's not >> as nice a solution as a cgroup, but then again this isn't really a common >> case, so maybe it is precisely what a tracing infrastructure is meant for. > > Hmm. > > We have capability use wired up into auditing. So we might be able to > get away with just adding an appropriate audit message in > commoncap.c:cap_capable that honors the audit flag and logs an audit > message. The hook in selinux already appears to do that. > > Certainly audit sounds like the subsystem for this kind of work, as it's > whole point in life is logging things, then something in userspace can > just run over the audit longs and build a nice summary. Even simpler would be to avoid the complexity of audit subsystem and just printk() when a task starts using a capability first time (not on further uses by same task). There are not that many capability bits nor privileged processes, meaning not too many log entries. I know as this was actually my first approach. But it's also far less user friendly than just reading a summarized value which could be directly fed back to configuration. Logging/auditing approach also doesn't work well for other things I'd like to present meaningful values for the user. For example, consider RLIMIT_AS, where my goal is also to enable the users to be able to configure this limit for a service. Should there be an audit message whenever the address space limit grows (i.e. each mmap())? What about when it shrinks? For RLIMIT_NOFILE we'd have to report each open()/close()/dup()/socket()/etc. and track how many are opened at the same time. I think it's better to store the fully cooked (meaningful to user) value in kernel and present it only when asked. -Topi > > Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/24/16 17:24, Tejun Heo wrote: > Hello, Serge. > > On Fri, Jun 24, 2016 at 11:59:10AM -0500, Serge E. Hallyn wrote: >>> Just monitoring is less jarring than implementing security enforcement >>> via cgroup, but it is still jarring. What's wrong with recursive >>> process hierarchy monitoring which is in line with the whole facility >>> is implemented anyway? >> >> As I think Topi pointed out, one shortcoming is that if there is a short-lived >> child task, using its /proc/self/status is racy. You might just miss that it >> ever even existed, let alone that the "application" needed it. > > But the parent can collect whatever its children used. We already do > that with other stats. The parent might be able do it if proc/pid/xyz files are still accessible after child exit but before its exit status is collected. But if the parent doesn't do it (and you are not able to change it to do it) and it collects the exit status without collecting other info, can you suggest a different way how another process could collect it 100% reliably? -Topi > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, Topi. On Sun, Jun 26, 2016 at 3:14 PM, Topi Miettinen <toiwoton@gmail.com> wrote: > The parent might be able do it if proc/pid/xyz files are still > accessible after child exit but before its exit status is collected. But > if the parent doesn't do it (and you are not able to change it to do it) > and it collects the exit status without collecting other info, can you > suggest a different way how another process could collect it 100% reliably? I'm not saying that there's such mechanism now. I'm suggesting that that'd be a more fitting way of implementing a new mechanism to track capability usages. Thanks.
Quoting Tejun Heo (tj@kernel.org): > Hello, Topi. > > On Sun, Jun 26, 2016 at 3:14 PM, Topi Miettinen <toiwoton@gmail.com> wrote: > > The parent might be able do it if proc/pid/xyz files are still > > accessible after child exit but before its exit status is collected. But > > if the parent doesn't do it (and you are not able to change it to do it) > > and it collects the exit status without collecting other info, can you > > suggest a different way how another process could collect it 100% reliably? > > I'm not saying that there's such mechanism now. I'm suggesting that > that'd be a more fitting way of implementing a new mechanism to track > capability usages. Hi Topi, I think Eric was right a few emails earlier that the audit subsystem is really the most appropriate answer to this. (Perhaps sysctl-controllered?) Combined with taskstats it would give you what you need. Or you could even use an empty new named cgroup controller, say 'none,name=caps', and then look only at audit results for cgroup '/myapp' in the caps hierarchy. -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/27/16 14:54, Serge E. Hallyn wrote: > Quoting Tejun Heo (tj@kernel.org): >> Hello, Topi. >> >> On Sun, Jun 26, 2016 at 3:14 PM, Topi Miettinen <toiwoton@gmail.com> wrote: >>> The parent might be able do it if proc/pid/xyz files are still >>> accessible after child exit but before its exit status is collected. But >>> if the parent doesn't do it (and you are not able to change it to do it) >>> and it collects the exit status without collecting other info, can you >>> suggest a different way how another process could collect it 100% reliably? >> >> I'm not saying that there's such mechanism now. I'm suggesting that >> that'd be a more fitting way of implementing a new mechanism to track >> capability usages. > > Hi Topi, > > I think Eric was right a few emails earlier that the audit subsystem is > really the most appropriate answer to this. (Perhaps sysctl-controllered?) > Combined with taskstats it would give you what you need. Or you could even > use an empty new named cgroup controller, say 'none,name=caps', and then > look only at audit results for cgroup '/myapp' in the caps hierarchy. > I'll have to study these more. But from what I saw so far, it looks to me that a separate tool would be needed to read taskstats and if that tool is not taken by distros, the users would not be any wiser, right? With cgroup (or /proc), no new tools would be needed. -Topi -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon, Jun 27, 2016 at 3:10 PM, Topi Miettinen <toiwoton@gmail.com> wrote: > I'll have to study these more. But from what I saw so far, it looks to > me that a separate tool would be needed to read taskstats and if that > tool is not taken by distros, the users would not be any wiser, right? > With cgroup (or /proc), no new tools would be needed. That is a factor but shouldn't be a deciding factor in designing our user-facing interfaces. Please also note that kernel source tree already has tools/ subdirectory which contains userland tools which are distributed along with the kernel. Thanks.
Quoting Tejun Heo (tj@kernel.org): > Hello, > > On Mon, Jun 27, 2016 at 3:10 PM, Topi Miettinen <toiwoton@gmail.com> wrote: > > I'll have to study these more. But from what I saw so far, it looks to > > me that a separate tool would be needed to read taskstats and if that > > tool is not taken by distros, the users would not be any wiser, right? > > With cgroup (or /proc), no new tools would be needed. > > That is a factor but shouldn't be a deciding factor in designing our > user-facing interfaces. Please also note that kernel source tree > already has tools/ subdirectory which contains userland tools which > are distributed along with the kernel. And, if you take audit+cgroup approach then no tools are needed. So long as you can have audit print out the cgroups for a task as part of the capability audit record. -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Topi Miettinen <toiwoton@gmail.com> writes: > On 06/24/16 17:21, Eric W. Biederman wrote: >> "Serge E. Hallyn" <serge@hallyn.com> writes: >> >>> Quoting Tejun Heo (tj@kernel.org): >>>> Hello, >>>> >>>> On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: >>>>> Quoting Tejun Heo (tj@kernel.org): >>>>>> But isn't being recursive orthogonal to using cgroup? Why not account >>>>>> usages recursively along the process hierarchy? Capabilities don't >>>>>> have much to do with cgroup but everything with process hierarchy. >>>>>> That's how they're distributed and modified. If monitoring their >>>>>> usages is necessary, it makes sense to do it in the same structure. >>>>> >>>>> That was my argument against using cgroups to enforce a new bounding >>>>> set. For tracking though, the cgroup process tracking seems as applicable >>>>> to this as it does to systemd tracking of services. It tracks a task and >>>>> the children it forks. >>>> >>>> Just monitoring is less jarring than implementing security enforcement >>>> via cgroup, but it is still jarring. What's wrong with recursive >>>> process hierarchy monitoring which is in line with the whole facility >>>> is implemented anyway? >>> >>> As I think Topi pointed out, one shortcoming is that if there is a short-lived >>> child task, using its /proc/self/status is racy. You might just miss that it >>> ever even existed, let alone that the "application" needed it. >>> >>> Another alternative we've both mentioned is to use systemtap. That's not >>> as nice a solution as a cgroup, but then again this isn't really a common >>> case, so maybe it is precisely what a tracing infrastructure is meant for. >> >> Hmm. >> >> We have capability use wired up into auditing. So we might be able to >> get away with just adding an appropriate audit message in >> commoncap.c:cap_capable that honors the audit flag and logs an audit >> message. The hook in selinux already appears to do that. >> >> Certainly audit sounds like the subsystem for this kind of work, as it's >> whole point in life is logging things, then something in userspace can >> just run over the audit longs and build a nice summary. > > Even simpler would be to avoid the complexity of audit subsystem and > just printk() when a task starts using a capability first time (not on > further uses by same task). There are not that many capability bits nor > privileged processes, meaning not too many log entries. I know as this > was actually my first approach. But it's also far less user friendly > than just reading a summarized value which could be directly fed back to > configuration. Your loss. > Logging/auditing approach also doesn't work well for other things I'd > like to present meaningful values for the user. For example, consider > RLIMIT_AS, where my goal is also to enable the users to be able to > configure this limit for a service. Should there be an audit message > whenever the address space limit grows (i.e. each mmap())? What about > when it shrinks? For RLIMIT_NOFILE we'd have to report each > open()/close()/dup()/socket()/etc. and track how many are opened at the > same time. I think it's better to store the fully cooked (meaningful to > user) value in kernel and present it only when asked. That doesn't have anything to do with anything. My suggestion was very much to do with capabilities which are already logged with the audit subsystem with selinux. The idea was to move those audit calls into commoncap where they arguably belong allow anyone to use them for anything. That is a non-controversial code cleanup that happens to cover your special case. That is enough to build a tool in userspace that will tell you which capabilities you need without penalizing the kernel, or the vast majority of everyone who does not use your feature. From what I have seen of this conversation there is not and will not be one interface to rule them all. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/16 04:57, Eric W. Biederman wrote: > Topi Miettinen <toiwoton@gmail.com> writes: > >> On 06/24/16 17:21, Eric W. Biederman wrote: >>> "Serge E. Hallyn" <serge@hallyn.com> writes: >>> >>>> Quoting Tejun Heo (tj@kernel.org): >>>>> Hello, >>>>> >>>>> On Fri, Jun 24, 2016 at 10:59:16AM -0500, Serge E. Hallyn wrote: >>>>>> Quoting Tejun Heo (tj@kernel.org): >>>>>>> But isn't being recursive orthogonal to using cgroup? Why not account >>>>>>> usages recursively along the process hierarchy? Capabilities don't >>>>>>> have much to do with cgroup but everything with process hierarchy. >>>>>>> That's how they're distributed and modified. If monitoring their >>>>>>> usages is necessary, it makes sense to do it in the same structure. >>>>>> >>>>>> That was my argument against using cgroups to enforce a new bounding >>>>>> set. For tracking though, the cgroup process tracking seems as applicable >>>>>> to this as it does to systemd tracking of services. It tracks a task and >>>>>> the children it forks. >>>>> >>>>> Just monitoring is less jarring than implementing security enforcement >>>>> via cgroup, but it is still jarring. What's wrong with recursive >>>>> process hierarchy monitoring which is in line with the whole facility >>>>> is implemented anyway? >>>> >>>> As I think Topi pointed out, one shortcoming is that if there is a short-lived >>>> child task, using its /proc/self/status is racy. You might just miss that it >>>> ever even existed, let alone that the "application" needed it. >>>> >>>> Another alternative we've both mentioned is to use systemtap. That's not >>>> as nice a solution as a cgroup, but then again this isn't really a common >>>> case, so maybe it is precisely what a tracing infrastructure is meant for. >>> >>> Hmm. >>> >>> We have capability use wired up into auditing. So we might be able to >>> get away with just adding an appropriate audit message in >>> commoncap.c:cap_capable that honors the audit flag and logs an audit >>> message. The hook in selinux already appears to do that. >>> >>> Certainly audit sounds like the subsystem for this kind of work, as it's >>> whole point in life is logging things, then something in userspace can >>> just run over the audit longs and build a nice summary. >> >> Even simpler would be to avoid the complexity of audit subsystem and >> just printk() when a task starts using a capability first time (not on >> further uses by same task). There are not that many capability bits nor >> privileged processes, meaning not too many log entries. I know as this >> was actually my first approach. But it's also far less user friendly >> than just reading a summarized value which could be directly fed back to >> configuration. > > Your loss. > >> Logging/auditing approach also doesn't work well for other things I'd >> like to present meaningful values for the user. For example, consider >> RLIMIT_AS, where my goal is also to enable the users to be able to >> configure this limit for a service. Should there be an audit message >> whenever the address space limit grows (i.e. each mmap())? What about >> when it shrinks? For RLIMIT_NOFILE we'd have to report each >> open()/close()/dup()/socket()/etc. and track how many are opened at the >> same time. I think it's better to store the fully cooked (meaningful to >> user) value in kernel and present it only when asked. > > That doesn't have anything to do with anything. > > My suggestion was very much to do with capabilities which are already > logged with the audit subsystem with selinux. The idea was to move > those audit calls into commoncap where they arguably belong allow anyone > to use them for anything. > > That is a non-controversial code cleanup that happens to cover your > special case. That is enough to build a tool in userspace that will > tell you which capabilities you need without penalizing the kernel, or > the vast majority of everyone who does not use your feature. > > From what I have seen of this conversation there is not and will not be > one interface to rule them all. Now that I know taskstats better, it looks like a good choice for most of the highwater marks, complemented with audit logging. The taskstats interface is only available to privileged processes but that's OK. I'll make new patches based on this approach. -Topi > > Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 4cc07ce..2b3d277 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1118,6 +1118,23 @@ writeback as follows. total available memory and applied the same way as vm.dirty[_background]_ratio. +5-4. Capabilities + +The "capability" controller is used to monitor capability use in the +cgroup. This can be used to discover a starting point for capability +bounding sets, even when running a shell script under ambient +capabilities, with only short-lived helper processes exercising the +capabilities. + + +5-4-1. Capability Interface Files + + capability.used + + A read-only file which exists on all cgroups. + + This reports the combined value of capability use in the + current cgroup and all its children. 6. Namespace diff --git a/include/linux/capability_cgroup.h b/include/linux/capability_cgroup.h new file mode 100644 index 0000000..c03b58d --- /dev/null +++ b/include/linux/capability_cgroup.h @@ -0,0 +1,7 @@ +#ifdef CONFIG_CGROUP_CAPABILITY +void capability_cgroup_update_used(int cap); +#else +static inline void capability_cgroup_update_used(int cap) +{ +} +#endif diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index 0df0336a..a5161d0 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -56,6 +56,10 @@ SUBSYS(hugetlb) SUBSYS(pids) #endif +#if IS_ENABLED(CONFIG_CGROUP_CAPABILITY) +SUBSYS(capability) +#endif + /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/init/Kconfig b/init/Kconfig index f755a60..25d17ef 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1141,6 +1141,12 @@ config CGROUP_PERF Say N if unsure. +config CGROUP_CAPABILITY + bool "Capability controller" + help + Provides a simple controller for monitoring of capabilities in the + cgroup. + config CGROUP_DEBUG bool "Example controller" default n diff --git a/kernel/capability.c b/kernel/capability.c index 45432b5..b57d7f9 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -17,6 +17,7 @@ #include <linux/syscalls.h> #include <linux/pid_namespace.h> #include <linux/user_namespace.h> +#include <linux/capability_cgroup.h> #include <asm/uaccess.h> /* @@ -380,6 +381,7 @@ bool ns_capable(struct user_namespace *ns, int cap) } if (security_capable(current_cred(), ns, cap) == 0) { + capability_cgroup_update_used(cap); current->flags |= PF_SUPERPRIV; return true; } diff --git a/security/Makefile b/security/Makefile index f2d71cd..2bb04f1 100644 --- a/security/Makefile +++ b/security/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_SECURITY_APPARMOR) += apparmor/ obj-$(CONFIG_SECURITY_YAMA) += yama/ obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o +obj-$(CONFIG_CGROUP_CAPABILITY) += capability_cgroup.o # Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/capability_cgroup.c b/security/capability_cgroup.c new file mode 100644 index 0000000..f002477 --- /dev/null +++ b/security/capability_cgroup.c @@ -0,0 +1,99 @@ +/* + * Capability cgroup + * + * Copyright 2016 Topi Miettinen + * + * This file is subject to the terms and conditions of the GNU General + * Public License. See the file COPYING in the main directory of the + * Linux distribution for more details. + */ + +#include <linux/capability.h> +#include <linux/capability_cgroup.h> +#include <linux/cgroup.h> +#include <linux/seq_file.h> +#include <linux/slab.h> + +static DEFINE_MUTEX(capcg_mutex); + +struct capcg_cgroup { + struct cgroup_subsys_state css; + kernel_cap_t cap_used; /* Capabilities actually used */ +}; + +static inline struct capcg_cgroup *css_to_capcg(struct cgroup_subsys_state *s) +{ + return s ? container_of(s, struct capcg_cgroup, css) : NULL; +} + +static inline struct capcg_cgroup *task_to_capcg(struct task_struct *task) +{ + return css_to_capcg(task_css(task, capability_cgrp_id)); +} + +static struct cgroup_subsys_state *capcg_css_alloc(struct cgroup_subsys_state + *parent) +{ + struct capcg_cgroup *caps; + + caps = kzalloc(sizeof(*caps), GFP_KERNEL); + if (!caps) + return ERR_PTR(-ENOMEM); + + cap_clear(caps->cap_used); + return &caps->css; +} + +static void capcg_css_free(struct cgroup_subsys_state *css) +{ + kfree(css_to_capcg(css)); +} + +static int capcg_seq_show_used(struct seq_file *m, void *v) +{ + struct capcg_cgroup *capcg = css_to_capcg(seq_css(m)); + struct cgroup_subsys_state *pos; + u32 capi; + kernel_cap_t subsys_caps = capcg->cap_used; + + rcu_read_lock(); + + css_for_each_child(pos, &capcg->css) { + struct capcg_cgroup *pos_capcg = css_to_capcg(pos); + + subsys_caps = cap_combine(subsys_caps, pos_capcg->cap_used); + } + + rcu_read_unlock(); + + CAP_FOR_EACH_U32(capi) { + seq_printf(m, "%08x", + subsys_caps.cap[CAP_LAST_U32 - capi]); + } + seq_putc(m, '\n'); + + return 0; +} + +static struct cftype capcg_files[] = { + { + .name = "used", + .seq_show = capcg_seq_show_used, + }, + { } /* terminate */ +}; + +struct cgroup_subsys capability_cgrp_subsys = { + .css_alloc = capcg_css_alloc, + .css_free = capcg_css_free, + .dfl_cftypes = capcg_files, +}; + +void capability_cgroup_update_used(int cap) +{ + struct capcg_cgroup *caps = task_to_capcg(current); + + mutex_lock(&capcg_mutex); + cap_raise(caps->cap_used, cap); + mutex_unlock(&capcg_mutex); +}
There are many basic ways to control processes, including capabilities, cgroups and resource limits. However, there are far fewer ways to find out useful values for the limits, except blind trial and error. Currently, there is no way to know which capabilities are actually used. Even the source code is only implicit, in-depth knowledge of each capability must be used when analyzing a program to judge which capabilities the program will exercise. Add a new cgroup controller for monitoring of capabilities in the cgroup. Test case demonstrating basic capability monitoring and how the capabilities are combined at next level (boot to rdshell): (initramfs) cd /sys/fs (initramfs) mount -t cgroup2 cgroup cgroup (initramfs) cd cgroup (initramfs) echo +capability > cgroup.subtree_control (initramfs) mkdir test; cd test (initramfs) echo +capability > cgroup.subtree_control (initramfs) ls capability.used cgroup.events cgroup.subtree_control cgroup.controllers cgroup.procs (initramfs) mkdir first second (initramfs) sh BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) cd first (initramfs) echo $$ >cgroup.procs (initramfs) cat capability.used 0000000000000000 # nothing so far (initramfs) mknod /dev/z_$$ c 1 2 (initramfs) cat capability.used 0000000008000000 # CAP_MKNOD (initramfs) cat ../capability.used 0000000008000000 # also seen at next higher level (initramfs) exit (initramfs) sh BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) cd second (initramfs) echo $$ >cgroup.procs (initramfs) cat capability.used 0000000000000000 # nothing so far (initramfs) chown 1234 /dev/z_* (initramfs) cat capability.used 0000000000000001 # CAP_CHROOT (initramfs) cat ../capability.used 0000000008000001 # combined at next higher level (initramfs) exit Signed-off-by: Topi Miettinen <toiwoton@gmail.com> --- Documentation/cgroup-v2.txt | 17 +++++++ include/linux/capability_cgroup.h | 7 +++ include/linux/cgroup_subsys.h | 4 ++ init/Kconfig | 6 +++ kernel/capability.c | 2 + security/Makefile | 1 + security/capability_cgroup.c | 99 +++++++++++++++++++++++++++++++++++++++ 7 files changed, 136 insertions(+) create mode 100644 include/linux/capability_cgroup.h create mode 100644 security/capability_cgroup.c