Message ID | 20220801180146.1157914-1-fred@cloudflare.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce security_create_user_ns() | expand |
Frederick Lawler <fred@cloudflare.com> writes: > While creating a LSM BPF MAC policy to block user namespace creation, we > used the LSM cred_prepare hook because that is the closest hook to prevent > a call to create_user_ns(). Re-nack for all of the same reasons. AKA This can only break the users of the user namespace. Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> You aren't fixing what your problem you are papering over it by denying access to the user namespace. Nack Nack Nack. Stop. Go back to the drawing board. Do not pass go. Do not collect $200. > The calls look something like this: > > cred = prepare_creds() > security_prepare_creds() > call_int_hook(cred_prepare, ... > if (cred) > create_user_ns(cred) > > We noticed that error codes were not propagated from this hook and > introduced a patch [1] to propagate those errors. > > The discussion notes that security_prepare_creds() > is not appropriate for MAC policies, and instead the hook is > meant for LSM authors to prepare credentials for mutation. [2] > > Ultimately, we concluded that a better course of action is to introduce > a new security hook for LSM authors. [3] > > This patch set first introduces a new security_create_user_ns() function > and userns_create LSM hook, then marks the hook as sleepable in BPF. > > Links: > 1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/ > 2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/ > 3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare.com/ > > Past discussions: > V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/ > V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ > V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/ > > Changes since v3: > - Explicitly set CAP_SYS_ADMIN to test namespace is created given > permission > - Simplify BPF test to use sleepable hook only > - Prefer unshare() over clone() for tests > Changes since v2: > - Rename create_user_ns hook to userns_create > - Use user_namespace as an object opposed to a generic namespace object > - s/domB_t/domA_t in commit message > Changes since v1: > - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch > - Add selinux: Implement create_user_ns hook patch > - Change function signature of security_create_user_ns() to only take > struct cred > - Move security_create_user_ns() call after id mapping check in > create_user_ns() > - Update documentation to reflect changes > > Frederick Lawler (4): > security, lsm: Introduce security_create_user_ns() > bpf-lsm: Make bpf_lsm_userns_create() sleepable > selftests/bpf: Add tests verifying bpf lsm userns_create hook > selinux: Implement userns_create hook > > include/linux/lsm_hook_defs.h | 1 + > include/linux/lsm_hooks.h | 4 + > include/linux/security.h | 6 ++ > kernel/bpf/bpf_lsm.c | 1 + > kernel/user_namespace.c | 5 + > security/security.c | 5 + > security/selinux/hooks.c | 9 ++ > security/selinux/include/classmap.h | 2 + > .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++ > .../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++ > 10 files changed, 168 insertions(+) > create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c > create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c Eric
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > Frederick Lawler <fred@cloudflare.com> writes: > > > While creating a LSM BPF MAC policy to block user namespace creation, we > > used the LSM cred_prepare hook because that is the closest hook to prevent > > a call to create_user_ns(). > > Re-nack for all of the same reasons. > AKA This can only break the users of the user namespace. > > Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > You aren't fixing what your problem you are papering over it by denying > access to the user namespace. > > Nack Nack Nack. > > Stop. > > Go back to the drawing board. > > Do not pass go. > > Do not collect $200. If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that. We've heard from different users now that there are very real use cases for this LSM hook. I understand you are concerned about adding additional controls to user namespaces, but these are controls requested by real users, and the controls being requested (LSM hooks, with BPF and SELinux implementations) are configurable by the *users* at *runtime*. This patchset does not force additional restrictions on user namespaces, it provides a mechanism that *users* can leverage to add additional granularity to the access controls surrounding user namespaces. Eric, if you have a different approach in mind to adding a LSM hook to user namespace creation I think we would all very much like to hear about it. However, if you do not have any suggestions along those lines, and simply want to NACK any effort to add a LSM hook to user namespace creation, I think we all understand your point of view and respectfully disagree. Barring any new approaches or suggestions, I think Frederick's patches look reasonable and I still plan on merging them into the LSM next branch when the merge window closes.
Paul Moore <paul@paul-moore.com> writes: > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> Frederick Lawler <fred@cloudflare.com> writes: >> >> > While creating a LSM BPF MAC policy to block user namespace creation, we >> > used the LSM cred_prepare hook because that is the closest hook to prevent >> > a call to create_user_ns(). >> >> Re-nack for all of the same reasons. >> AKA This can only break the users of the user namespace. >> >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> >> >> You aren't fixing what your problem you are papering over it by denying >> access to the user namespace. >> >> Nack Nack Nack. >> >> Stop. >> >> Go back to the drawing board. >> >> Do not pass go. >> >> Do not collect $200. > > If you want us to take your comments seriously Eric, you need to > provide the list with some constructive feedback that would allow > Frederick to move forward with a solution to the use case that has > been proposed. You response above may be many things, but it is > certainly not that. I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel. It is not a constructive approach to shoot the messenger and is not a constructive approach to blow me off every time you reply. I have proposed that is there is a subsystem that is unduly buggy we stop it from being enabled with a user-namespaces. Further this is a hook really should have extra-ordinary requirements, as all it can do is add additional failure modes to something that does not really fail. AKA all it can do is break-userspace. As such I need to see a justification on why it makes sense to break-userspace. > We've heard from different users now that there are very real use > cases for this LSM hook. I understand you are concerned about adding > additional controls to user namespaces, but these are controls > requested by real users, and the controls being requested (LSM hooks, > with BPF and SELinux implementations) are configurable by the *users* > at *runtime*. This patchset does not force additional restrictions on > user namespaces, it provides a mechanism that *users* can leverage to > add additional granularity to the access controls surrounding user > namespaces. But that is not the problem that cloudfare encountered and are trying to solve. At least that is not what I was told when I asked early in the review cycle. All saying that is user-configurable does is shift the blame from the kernel maintainers to the users. Shift the responsibility from people who should have enough expertise to know what is going on to people who are by definition have other concerns, so are less likely to be as well informed, and less likely to come up with good solutions. > Eric, if you have a different approach in mind to adding a LSM hook to > user namespace creation I think we would all very much like to hear > about it. However, if you do not have any suggestions along those > lines, and simply want to NACK any effort to add a LSM hook to user > namespace creation, I think we all understand your point of view and > respectfully disagree. Barring any new approaches or suggestions, I > think Frederick's patches look reasonable and I still plan on merging > them into the LSM next branch when the merge window closes. But it is my code you are planning to merge this into, and your are asking me to support something. I admit I have not had time to read everything. I am sick and tired and quite frankly very tired that people are busy wanting to shoot the messenger to the fact that there are bugs in the kernel. I am speaking up and engaging as best as I can with objections that are not hot-air. You are very much proposing to merge code that can only cause regressions and cause me grief. At least that is all I see. I don't see anything in the change descriptions of the change that refutes that. I don't see any interaction in fact with my concerns. In fact your last reply was to completely blow off my request on how to address the concerns that inspired this patch and to say other people have a use too. At this point I am happy to turn your request around and ask that you address my concerns and not blow them off. As I have seen no constructive engagement with my concerns. I think that is reasonable as by definition I will get the support issues when some LSM has some ill-thought out idea of how things should work and I get the bug report. Eric
On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > Paul Moore <paul@paul-moore.com> writes: > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> Frederick Lawler <fred@cloudflare.com> writes: > >> > >> > While creating a LSM BPF MAC policy to block user namespace creation, we > >> > used the LSM cred_prepare hook because that is the closest hook to prevent > >> > a call to create_user_ns(). > >> > >> Re-nack for all of the same reasons. > >> AKA This can only break the users of the user namespace. > >> > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> > >> You aren't fixing what your problem you are papering over it by denying > >> access to the user namespace. > >> > >> Nack Nack Nack. > >> > >> Stop. > >> > >> Go back to the drawing board. > >> > >> Do not pass go. > >> > >> Do not collect $200. > > > > If you want us to take your comments seriously Eric, you need to > > provide the list with some constructive feedback that would allow > > Frederick to move forward with a solution to the use case that has > > been proposed. You response above may be many things, but it is > > certainly not that. > > I did provide constructive feedback. My feedback to his problem > was to address the real problem of bugs in the kernel. We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Paul Moore <paul@paul-moore.com> writes: >> I did provide constructive feedback. My feedback to his problem >> was to address the real problem of bugs in the kernel. > > We've heard from several people who have use cases which require > adding LSM-level access controls and observability to user namespace > creation. This is the problem we are trying to solve here; if you do > not like the approach proposed in this patchset please suggest another > implementation that allows LSMs visibility into user namespace > creation. Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users. You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code. Eric
"Eric W. Biederman" <ebiederm@xmission.com> writes: > Paul Moore <paul@paul-moore.com> writes: > >>> I did provide constructive feedback. My feedback to his problem >>> was to address the real problem of bugs in the kernel. >> >> We've heard from several people who have use cases which require >> adding LSM-level access controls and observability to user namespace >> creation. This is the problem we are trying to solve here; if you do >> not like the approach proposed in this patchset please suggest another >> implementation that allows LSMs visibility into user namespace >> creation. > > Please stop, ignoring my feedback, not detailing what problem or > problems you are actually trying to be solved, and threatening to merge > code into files that I maintain that has the express purpose of breaking > my users. > > You just artificially constrained the problems, so that no other > solution is acceptable. On that basis alone I am object to this whole > approach to steam roll over me and my code. If you want an example of what kind of harm it can cause to introduce a failure where no failure was before I invite you to look at what happened with sendmail when setuid was modified to fail, when changing the user of a process would cause RLIMIT_NPROC to be exceeded. I am not arguing that what you are proposing is that bad but unexpected failures cause real problems, and at a minimum that needs a better response than: "There is at least one user that wants a failure here". Frankly I would love to see an argument that semantically it ever makes sense for creating a user namespace to fail. If that argument has already been made, my apologies to the person who made as I missed it, in being sick and tired, and frustrated at being blown off, when I asked for a proper discuss of the problem at hand. Eric
On Mon, Aug 8, 2022 at 3:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > Paul Moore <paul@paul-moore.com> writes: > >> I did provide constructive feedback. My feedback to his problem > >> was to address the real problem of bugs in the kernel. > > > > We've heard from several people who have use cases which require > > adding LSM-level access controls and observability to user namespace > > creation. This is the problem we are trying to solve here; if you do > > not like the approach proposed in this patchset please suggest another > > implementation that allows LSMs visibility into user namespace > > creation. > > Please stop, ignoring my feedback, not detailing what problem or > problems you are actually trying to be solved, and threatening to merge > code into files that I maintain that has the express purpose of breaking > my users. I've heard you talk about bugs being the only reason why people would want to ever block user namespaces, but I think we've all seen use cases now where it goes beyond that. However, even if it didn't, the need to build high confidence/assurance systems where big chunks of functionality can be disabled based on a security policy is a very real use case, and this patchset would help enable that. I've noticed you like to talk about these hooks being a source of "regressions", but access controls are not regressions Eric, they are tools that system builders, administrators, and users use to secure their systems. From my perspective, I believe that addresses your feedback around "fix the bugs" and "this is a regression", which is the only thing I've noted from your responses in this thread and others, but if I'm missing something more technical please let me/us know. > You just artificially constrained the problems, so that no other > solution is acceptable. There is a real need to be able to gain both additional visibility and access control over user namespace creation, please suggest the approach(es) you would find acceptable. > On that basis alone I am object to this whole > approach to steam roll over me and my code. I saw that choice of wording in your last email and thought it a bit curious, so I did a quick git log dump on kernel/user_namespace.c and I see approximately 31 contributors to that one file. I've always thought of the open source maintainer role as more of a "steward" and less of an "owner", but that's just my opinion.
On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > "Eric W. Biederman" <ebiederm@xmission.com> writes: > > Paul Moore <paul@paul-moore.com> writes: > > > >>> I did provide constructive feedback. My feedback to his problem > >>> was to address the real problem of bugs in the kernel. > >> > >> We've heard from several people who have use cases which require > >> adding LSM-level access controls and observability to user namespace > >> creation. This is the problem we are trying to solve here; if you do > >> not like the approach proposed in this patchset please suggest another > >> implementation that allows LSMs visibility into user namespace > >> creation. > > > > Please stop, ignoring my feedback, not detailing what problem or > > problems you are actually trying to be solved, and threatening to merge > > code into files that I maintain that has the express purpose of breaking > > my users. > > > > You just artificially constrained the problems, so that no other > > solution is acceptable. On that basis alone I am object to this whole > > approach to steam roll over me and my code. > > If you want an example of what kind of harm it can cause to introduce a > failure where no failure was before I invite you to look at what > happened with sendmail when setuid was modified to fail, when changing > the user of a process would cause RLIMIT_NPROC to be exceeded. I think we are all familiar with the sendmail capabilities bug and the others like it, but using that as an excuse to block additional access controls seems very weak. The Linux Kernel is very different from when the sendmail bug hit (what was that, ~20 years ago?), with advancements in capabilities and other discretionary controls, as well as mandatory access controls which have enabled Linux to be certified through a number of third party security evaluations. > I am not arguing that what you are proposing is that bad but unexpected > failures cause real problems, and at a minimum that needs a better > response than: "There is at least one user that wants a failure here". Let me fix that for you: "There are multiple users who want to have better visibility and access control for user namespace creation." -- paul-moore.com
Paul Moore <paul@paul-moore.com> writes: > On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> "Eric W. Biederman" <ebiederm@xmission.com> writes: >> > Paul Moore <paul@paul-moore.com> writes: >> > >> >>> I did provide constructive feedback. My feedback to his problem >> >>> was to address the real problem of bugs in the kernel. >> >> >> >> We've heard from several people who have use cases which require >> >> adding LSM-level access controls and observability to user namespace >> >> creation. This is the problem we are trying to solve here; if you do >> >> not like the approach proposed in this patchset please suggest another >> >> implementation that allows LSMs visibility into user namespace >> >> creation. >> > >> > Please stop, ignoring my feedback, not detailing what problem or >> > problems you are actually trying to be solved, and threatening to merge >> > code into files that I maintain that has the express purpose of breaking >> > my users. >> > >> > You just artificially constrained the problems, so that no other >> > solution is acceptable. On that basis alone I am object to this whole >> > approach to steam roll over me and my code. >> >> If you want an example of what kind of harm it can cause to introduce a >> failure where no failure was before I invite you to look at what >> happened with sendmail when setuid was modified to fail, when changing >> the user of a process would cause RLIMIT_NPROC to be exceeded. > > I think we are all familiar with the sendmail capabilities bug and the > others like it, but using that as an excuse to block additional access > controls seems very weak. The Linux Kernel is very different from > when the sendmail bug hit (what was that, ~20 years ago?), with > advancements in capabilities and other discretionary controls, as well > as mandatory access controls which have enabled Linux to be certified > through a number of third party security evaluations. If you are familiar with scenarios like that then why is there not being due diligence performed to ensure that userspace won't break? Certainly none of the paperwork you are talking about does that kind of checking and it most definitely is not happening before the code gets merged. I am saying that performing that due diligence should be a requirement before anyone even thinks about merging a patch that adds permission checks where no existed before. Sometimes changes to fix security bugs can get away with adding new restrictions because we know with a very very high degree of probability that the only thing that will break will be exploit code. In the rare case when real world applications are broken such changes need to be reverted or adapted. No one has even made the argument that only exploit code will be affected. So I am sorry I am the one who has to be the one to get in the way of a broken process with semantic review, but due diligence has not been done. So I am say no way this code should be merged. In addition to actually breaking existing userspace, I think there is a very real danger of breaking userspace, I think there is a very real danger of breaking network effects by making such a large change to the design of user namespaces. >> I am not arguing that what you are proposing is that bad but unexpected >> failures cause real problems, and at a minimum that needs a better >> response than: "There is at least one user that wants a failure here". > > Let me fix that for you: "There are multiple users who want to have > better visibility and access control for user namespace creation." Visibility sure. Design a proper hook for that. All the proposed hook can do is print an audit message. It can't allocate or manage any state as there is not the corresponding hook when a user namespace is freed. So the proposed hook is not appropriate for increasing visibility. Access control. Not a chance unless it is carefully designed and reviewed. There is a very large cost to adding access control where it has not previously existed. I talk about that cost as people breaking my users as that is how I see it. I don't see any discussion on why I am wrong. If we are going to add an access controls I want to see someone point out something that is actually semantically a problem. What motivates an access control? So far the only answer I have received is people want to reduce the attack surface of the kernel. I don't possibly see how reducing the attack surface by removing user namespaces makes the probability of having an exploitable kernel bug, anything approaching zero. So I look at the calculus. Chance of actually breaking userspace, or preventing people with a legitimate use from using user namespaces > 0%. Chance of actually preventing a determined attacker from exploiting the kernel < 1%. Amount of work to maintain, non-zero, and I really don't like it. Lots of work to achieve nothing but breaking some of my users. So please stop trying to redesign my subsystem and cause me headaches, unless you are going to do the due diligence necessary to do so responsibly. Eric
Paul Moore <paul@paul-moore.com> writes: > On Mon, Aug 8, 2022 at 3:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> Paul Moore <paul@paul-moore.com> writes: >> >> I did provide constructive feedback. My feedback to his problem >> >> was to address the real problem of bugs in the kernel. >> > >> > We've heard from several people who have use cases which require >> > adding LSM-level access controls and observability to user namespace >> > creation. This is the problem we are trying to solve here; if you do >> > not like the approach proposed in this patchset please suggest another >> > implementation that allows LSMs visibility into user namespace >> > creation. >> >> Please stop, ignoring my feedback, not detailing what problem or >> problems you are actually trying to be solved, and threatening to merge >> code into files that I maintain that has the express purpose of breaking >> my users. > > I've heard you talk about bugs being the only reason why people would > want to ever block user namespaces, but I think we've all seen use > cases now where it goes beyond that. I really have not, and I don't appreciate being called a liar. > However, even if it didn't, the > need to build high confidence/assurance systems where big chunks of > functionality can be disabled based on a security policy is a very > real use case, and this patchset would help enable that. Details please. What does this look like. What is the overall plan for attack surface reduction in one of these systems. How does it differ from seccomp? How does this differ from setting /proc/sys/user/max_usernamespaces to 0? Why is it only the user namespace that needs to be modified to implement such a system? Why is there no discussion of that in the change description. > I've noticed > you like to talk about these hooks being a source of "regressions", > but access controls are not regressions Eric, they are tools that > system builders, administrators, and users use to secure their > systems. > > From my perspective, I believe that addresses your feedback around > "fix the bugs" and "this is a regression", which is the only thing > I've noted from your responses in this thread and others, but if I'm > missing something more technical please let me/us know. Which is a short way of saying that the using this hook for attack surface reduction without a larger plan will be ineffective. If the attack surface is not sufficiently reduced it will not achieve a prevention of exploits and the attacks will still happen and be successful. With a change that is designed to prevent exploits not actually doing so all that is left is breaking userspace and causing maintenance problems. Earlier I asked to confirm that was the only reason cloudfare was interested in this change. I have asked that we have an actual conversation about what is trying to be achieved. Instead the conversation has simply been about implementation issues and not about if the code will be worth having. So far in my book the code very much does not look worth having. That is my technical judgment and I don't see anyone taking about my arguments or even really engaging in them. Since I keep getting blown off, instead of having my concerns addressed I say this code should not go. >> You just artificially constrained the problems, so that no other >> solution is acceptable. > > There is a real need to be able to gain both additional visibility and > access control over user namespace creation, please suggest the > approach(es) you would find acceptable. The suggested hook is not at all appropriate for visibility. Either the user namespace needs to have some state that can be set, or there needs to be something that is notified when the user namespace goes away. At best the hook can print an audit message. So the proposed hook is really not appropriate to add visibility to the user namespace. For the record I don't object to adding visibility, I am just pointing out the proposed hook is not appropriate to that task. What is the need to have an access control? Why do you need to fundamentally change the design of user namespaces? Those are questions I have not seen any answers to. Without actual answers of what the actual problems are I can't have a reasonable technical conversation. >> On that basis alone I am object to this whole >> approach to steam roll over me and my code. > > I saw that choice of wording in your last email and thought it a bit > curious, so I did a quick git log dump on kernel/user_namespace.c and > I see approximately 31 contributors to that one file. I've always > thought of the open source maintainer role as more of a "steward" and > less of an "owner", but that's just my opinion. As such it is unfortunately my responsibility to say no to badly thought out proposals. Proposals that will negatively affect the people using the code I maintain. My apologies if I have not been more elegant right now when I have been constantly sick, and tired. People getting scared of user namespaces for no real reason has been an on-going trend for a decade or so. This isn't a new issue, and it irritates me that it is still going on. I have addressed real concerns and fixed code, for many many years. This round of the people being afraid of user namespaces, I have yet to find any real concerns. So when I express my concerns that this is a pointless exercise and people don't address my concern. I say no. Eric
On Tue, Aug 9, 2022 at 12:08 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > Paul Moore <paul@paul-moore.com> writes: > > On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> "Eric W. Biederman" <ebiederm@xmission.com> writes: > >> > Paul Moore <paul@paul-moore.com> writes: > >> > > >> >>> I did provide constructive feedback. My feedback to his problem > >> >>> was to address the real problem of bugs in the kernel. > >> >> > >> >> We've heard from several people who have use cases which require > >> >> adding LSM-level access controls and observability to user namespace > >> >> creation. This is the problem we are trying to solve here; if you do > >> >> not like the approach proposed in this patchset please suggest another > >> >> implementation that allows LSMs visibility into user namespace > >> >> creation. > >> > > >> > Please stop, ignoring my feedback, not detailing what problem or > >> > problems you are actually trying to be solved, and threatening to merge > >> > code into files that I maintain that has the express purpose of breaking > >> > my users. > >> > > >> > You just artificially constrained the problems, so that no other > >> > solution is acceptable. On that basis alone I am object to this whole > >> > approach to steam roll over me and my code. > >> > >> If you want an example of what kind of harm it can cause to introduce a > >> failure where no failure was before I invite you to look at what > >> happened with sendmail when setuid was modified to fail, when changing > >> the user of a process would cause RLIMIT_NPROC to be exceeded. > > > > I think we are all familiar with the sendmail capabilities bug and the > > others like it, but using that as an excuse to block additional access > > controls seems very weak. The Linux Kernel is very different from > > when the sendmail bug hit (what was that, ~20 years ago?), with > > advancements in capabilities and other discretionary controls, as well > > as mandatory access controls which have enabled Linux to be certified > > through a number of third party security evaluations. > > If you are familiar with scenarios like that then why is there not > being due diligence performed to ensure that userspace won't break? What level of due diligence would satisfy you Eric? This feels very much like an unbounded ask that can be used to permanently block any effort to add any form of additional access control around user namespace creation. If that isn't the case, and this request is being made in good faith, please elaborate on what you believe would be sufficient analysis of this patch? Personally, the fact that the fork()/clone() variants and the unshare() syscall all can fail and return error codes to userspace seems like a reasonable level of safety. If userspace is currently ignoring the return values of fork/clone/unshare that is a problem independent of this patchset. Even looking at the existing create_user_ns() function shows several cases where the user namespace code can forcibly error out due to access controls, memory pressure, etc. I also see that additional restrictions were put on user namespace creation after it was introduced, e.g. the chroot restriction; I'm assuming that didn't break "your" users? If you can describe the analysis you did on that change, perhaps we can do the same for this patchset and call it sufficient, yes? > >> I am not arguing that what you are proposing is that bad but unexpected > >> failures cause real problems, and at a minimum that needs a better > >> response than: "There is at least one user that wants a failure here". > > > > Let me fix that for you: "There are multiple users who want to have > > better visibility and access control for user namespace creation." > > Visibility sure. Design a proper hook for that. All the proposed hook > can do is print an audit message. It can't allocate or manage any state > as there is not the corresponding hook when a user namespace is freed. > So the proposed hook is not appropriate for increasing visibility. Auditing very much increases visibility. Look at what the BPF LSM can do, observability is one of its primary goals. > Access control. Not a chance unless it is carefully designed and > reviewed. See the above. The relevant syscalls already have the risk of failure, if userspace is assuming fork/clone/unshare/etc. will never fail then that application is broken independent of this discussion.
On 8/9/2022 9:07 AM, Eric W. Biederman wrote: > Paul Moore <paul@paul-moore.com> writes: > >> On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >>> "Eric W. Biederman" <ebiederm@xmission.com> writes: >>>> Paul Moore <paul@paul-moore.com> writes: >>>> >>>>>> I did provide constructive feedback. My feedback to his problem >>>>>> was to address the real problem of bugs in the kernel. >>>>> We've heard from several people who have use cases which require >>>>> adding LSM-level access controls and observability to user namespace >>>>> creation. This is the problem we are trying to solve here; if you do >>>>> not like the approach proposed in this patchset please suggest another >>>>> implementation that allows LSMs visibility into user namespace >>>>> creation. >>>> Please stop, ignoring my feedback, not detailing what problem or >>>> problems you are actually trying to be solved, and threatening to merge >>>> code into files that I maintain that has the express purpose of breaking >>>> my users. >>>> >>>> You just artificially constrained the problems, so that no other >>>> solution is acceptable. On that basis alone I am object to this whole >>>> approach to steam roll over me and my code. >>> If you want an example of what kind of harm it can cause to introduce a >>> failure where no failure was before I invite you to look at what >>> happened with sendmail when setuid was modified to fail, when changing >>> the user of a process would cause RLIMIT_NPROC to be exceeded. >> I think we are all familiar with the sendmail capabilities bug and the >> others like it, but using that as an excuse to block additional access >> controls seems very weak. The Linux Kernel is very different from >> when the sendmail bug hit (what was that, ~20 years ago?), with >> advancements in capabilities and other discretionary controls, as well >> as mandatory access controls which have enabled Linux to be certified >> through a number of third party security evaluations. > If you are familiar with scenarios like that then why is there not > being due diligence performed to ensure that userspace won't break? > > Certainly none of the paperwork you are talking about does that kind > of checking and it most definitely is not happening before the code > gets merged. > > I am saying that performing that due diligence should be a requirement > before anyone even thinks about merging a patch that adds permission > checks where no existed before. > > Sometimes changes to fix security bugs can get away with adding new > restrictions because we know with a very very high degree of probability > that the only thing that will break will be exploit code. In the rare > case when real world applications are broken such changes need to be > reverted or adapted. No one has even made the argument that only > exploit code will be affected. > > So I am sorry I am the one who has to be the one to get in the way of a > broken process with semantic review, but due diligence has not been > done. So I am say no way this code should be merged. > > > In addition to actually breaking existing userspace, I think there is a > very real danger of breaking userspace, I think there is a very real > danger of breaking network effects by making such a large change to the > design of user namespaces. > > >>> I am not arguing that what you are proposing is that bad but unexpected >>> failures cause real problems, and at a minimum that needs a better >>> response than: "There is at least one user that wants a failure here". >> Let me fix that for you: "There are multiple users who want to have >> better visibility and access control for user namespace creation." > Visibility sure. Design a proper hook for that. All the proposed hook > can do is print an audit message. It can't allocate or manage any state > as there is not the corresponding hook when a user namespace is freed. > So the proposed hook is not appropriate for increasing visibility. > > > Access control. Not a chance unless it is carefully designed and > reviewed. There is a very large cost to adding access control where > it has not previously existed. > > I talk about that cost as people breaking my users as that is how I see > it. I don't see any discussion on why I am wrong. > > If we are going to add an access controls I want to see someone point > out something that is actually semantically a problem. What motivates > an access control? Smack has no interest in using the proposed hook because user namespaces are neither subjects nor objects. They are collections of DAC and/or privilege configuration alternatives. Or something like that. From the viewpoint of a security module that only implements old fashioned MAC there is no value in constraining user namespaces. SELinux, on the other hand, seeks to be comprehensive well beyond controlling accesses between subjects and objects. Asking the question "should there be a control on this operation?" seems sufficient to justify adding the control to SELinux policy. This is characteristic of "Fine Grain" control. So I'm of two minds on this. I don't need the hook, but I also understand why SELinux and BPF want it. I don't necessarily agree with their logic, but it is consistent with existing behavior. Any system that uses either of those security modules needs to be ready for unexpected denials based on any potential security concern. Keeping namespaces completely orthogonal to LSM seems doomed to failure eventually. > > So far the only answer I have received is people want to reduce the > attack surface of the kernel. I don't possibly see how reducing the > attack surface by removing user namespaces makes the probability of > having an exploitable kernel bug, anything approaching zero. > > So I look at the calculus. Chance of actually breaking userspace, or > preventing people with a legitimate use from using user namespaces > 0%. > Chance of actually preventing a determined attacker from exploiting the > kernel < 1%. Amount of work to maintain, non-zero, and I really don't > like it. > > Lots of work to achieve nothing but breaking some of my users. > > So please stop trying to redesign my subsystem and cause me headaches, > unless you are going to do the due diligence necessary to do so > responsibly. > > Eric
Paul Moore <paul@paul-moore.com> writes: > > What level of due diligence would satisfy you Eric? Having a real conversation about what a change is doing and to talk about it's merits and it's pro's and cons. I can't promise I would be convinced but that is the kind of conversation it would take. I was not trying to place an insurmountable barrier I was simply looking to see if people had been being careful and doing what is generally accepted for submitting a kernel patch. From all I can see that has completely not happened here. > If that isn't the case, and this request is being made in good faith Again you are calling me a liar. I really don't appreciate that. As for something already returning an error. The setuid system call also has error returns, and enforcing RLIMIT_NPROC caused sendmail to misbehave. I bring up the past in this way only to illustrate that things can happen. That simply examining the kernel and not thinking about userspace really isn't enough. I am also concerned about the ecosystem effects of adding random access control checks to a system call that does not perform access control checks. As I said this patch is changing a rather fundamental design decision by adding an access control. A design decision that for the most part has worked out quite well, and has allowed applications to add sandboxing support to themselves without asking permission to anyone. Adding an access control all of a sudden means application developers are having to ask for permission to things that are perfectly safe, and it means many parts of the kernel gets less love both in use and in maintenance. It might be possible to convince me that design decision needs to change, or that what is being proposed is small enough it does not practically change that design decision. Calling me a liar is not the way to change my mind. Ignoring me and pushing this through without addressing my concerns is not the way to change my mind. I honestly I want what I asked for at the start. I want discussion of what problems are being solved so we can talk about the problem or problems and if this is even the appropriate solution to them. Eric
Casey Schaufler <casey@schaufler-ca.com> writes: > Smack has no interest in using the proposed hook because user namespaces > are neither subjects nor objects. They are collections of DAC and/or > privilege configuration alternatives. Or something like that. From the > viewpoint of a security module that only implements old fashioned MAC > there is no value in constraining user namespaces. The goal is to simply enable things that become safe when you don't have to worry about confusing setuid executables. > SELinux, on the other hand, seeks to be comprehensive well beyond > controlling accesses between subjects and objects. Asking the question > "should there be a control on this operation?" seems sufficient to justify > adding the control to SELinux policy. This is characteristic of > "Fine Grain" control. I see that from a theoretical standpoint. On the flip side I prefer arguments grounded in something more than what an abstract framework could appreciate. We are so far from any theoretical purity in the linux kernel that I can't see theoretical purity being enough to justify a decision like this. > So I'm of two minds on this. I don't need the hook, but I also understand > why SELinux and BPF want it. I don't necessarily agree with their logic, > but it is consistent with existing behavior. Any system that uses either > of those security modules needs to be ready for unexpected denials based > on any potential security concern. Keeping namespaces completely orthogonal > to LSM seems doomed to failure eventually. I can cross that bridge when there is a healthy conversation about such a change. Too often I get "ouch! Creating a user namespace was used as the easiest way to exploit a security bug. Let's solve the issue by denying user namespaces." Which leads to half thought out policies made out of fear. Which is where I think this conversation started. I haven't seen it make it's way to any healthy reasons to deny user namespaces yet. Eric
On Tue, Aug 9, 2022 at 5:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > Paul Moore <paul@paul-moore.com> writes: > > > > What level of due diligence would satisfy you Eric? > > Having a real conversation about what a change is doing and to talk > about it's merits and it's pro's and cons. I can't promise I would be > convinced but that is the kind of conversation it would take. Earlier today you talked about due diligence to ensure that userspace won't break and I provided my reasoning on why userspace would not break (at least not because of this change). Userspace might be blocked from creating a new user namespace due to a security policy, but that would be the expected and desired outcome, not breakage. As far as your most recent comment regarding merit and pros/cons, I believe we have had that discussion (quite a few times already); it just seems you are not satisfied with the majority's conclusion. Personally, I'm not sure there is anything more I can do to convince you that this patchset is reasonable; I'm going to leave it to others at this point, or we can all simply agree to disagree for the moment. Just as you haven't heard a compelling argument for this patchset, I haven't heard a compelling argument against it. Barring some significant new discussion point, or opinion, I still plan on merging this into the LSM next branch when the merge window closes next week so it has time to go through a full round of linux-next testing. Assuming no unresolvable problems are found during the additional testing I plan to send it to Linus during the v6.1 merge window and I'm guessing we will get to go through this all again. It's less than ideal, but I think this is where we are at right now.
On Tue, Aug 9, 2022 at 3:40 PM Paul Moore <paul@paul-moore.com> wrote: > > On Tue, Aug 9, 2022 at 5:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Paul Moore <paul@paul-moore.com> writes: > > > > > > What level of due diligence would satisfy you Eric? > > > > Having a real conversation about what a change is doing and to talk > > about it's merits and it's pro's and cons. I can't promise I would be > > convinced but that is the kind of conversation it would take. > > Earlier today you talked about due diligence to ensure that userspace > won't break and I provided my reasoning on why userspace would not > break (at least not because of this change). Userspace might be > blocked from creating a new user namespace due to a security policy, > but that would be the expected and desired outcome, not breakage. As > far as your most recent comment regarding merit and pros/cons, I > believe we have had that discussion (quite a few times already); it > just seems you are not satisfied with the majority's conclusion. > > Personally, I'm not sure there is anything more I can do to convince > you that this patchset is reasonable; I'm going to leave it to others > at this point, or we can all simply agree to disagree for the moment. > Just as you haven't heard a compelling argument for this patchset, I > haven't heard a compelling argument against it. Barring some > significant new discussion point, or opinion, I still plan on merging > this into the LSM next branch when the merge window closes next week > so it has time to go through a full round of linux-next testing. > Assuming no unresolvable problems are found during the additional > testing I plan to send it to Linus during the v6.1 merge window and > I'm guessing we will get to go through this all again. It's less than > ideal, but I think this is where we are at right now. +1
On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote: > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Paul Moore <paul@paul-moore.com> writes: > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > >> Frederick Lawler <fred@cloudflare.com> writes: > > >> > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent > > >> > a call to create_user_ns(). > > >> > > >> Re-nack for all of the same reasons. > > >> AKA This can only break the users of the user namespace. > > >> > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > >> > > >> You aren't fixing what your problem you are papering over it by denying > > >> access to the user namespace. > > >> > > >> Nack Nack Nack. > > >> > > >> Stop. > > >> > > >> Go back to the drawing board. > > >> > > >> Do not pass go. > > >> > > >> Do not collect $200. > > > > > > If you want us to take your comments seriously Eric, you need to > > > provide the list with some constructive feedback that would allow > > > Frederick to move forward with a solution to the use case that has > > > been proposed. You response above may be many things, but it is > > > certainly not that. > > > > I did provide constructive feedback. My feedback to his problem > > was to address the real problem of bugs in the kernel. > > We've heard from several people who have use cases which require > adding LSM-level access controls and observability to user namespace > creation. This is the problem we are trying to solve here; if you do > not like the approach proposed in this patchset please suggest another > implementation that allows LSMs visibility into user namespace > creation. Regarding the observability - can someone concisely lay out why just auditing userns creation would not suffice? Userspace could decide what to report based on whether the creating user_ns == /proc/1/ns/user... Regarding limiting the tweaking of otherwise-privileged code by unprivileged users, i wonder whether we could instead add smarts to ns_capable(). Point being, uid mapping would still work, but we'd break the "privileged against resources you own" part of user namespaces. I would want it to default to allow, but then when a 0-day is found which requires reaching ns_capable() code, admins could easily prevent exploitation until reboot from a fixed kernel.
On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn <serge@hallyn.com> wrote: > On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote: > > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > Paul Moore <paul@paul-moore.com> writes: > > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > >> Frederick Lawler <fred@cloudflare.com> writes: > > > >> > > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we > > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent > > > >> > a call to create_user_ns(). > > > >> > > > >> Re-nack for all of the same reasons. > > > >> AKA This can only break the users of the user namespace. > > > >> > > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > > >> > > > >> You aren't fixing what your problem you are papering over it by denying > > > >> access to the user namespace. > > > >> > > > >> Nack Nack Nack. > > > >> > > > >> Stop. > > > >> > > > >> Go back to the drawing board. > > > >> > > > >> Do not pass go. > > > >> > > > >> Do not collect $200. > > > > > > > > If you want us to take your comments seriously Eric, you need to > > > > provide the list with some constructive feedback that would allow > > > > Frederick to move forward with a solution to the use case that has > > > > been proposed. You response above may be many things, but it is > > > > certainly not that. > > > > > > I did provide constructive feedback. My feedback to his problem > > > was to address the real problem of bugs in the kernel. > > > > We've heard from several people who have use cases which require > > adding LSM-level access controls and observability to user namespace > > creation. This is the problem we are trying to solve here; if you do > > not like the approach proposed in this patchset please suggest another > > implementation that allows LSMs visibility into user namespace > > creation. > > Regarding the observability - can someone concisely lay out why just > auditing userns creation would not suffice? Userspace could decide > what to report based on whether the creating user_ns == /proc/1/ns/user... One of the selling points of the BPF LSM is that it allows for various different ways of reporting and logging beyond audit. However, even if it was limited to just audit I believe that provides some useful justification as auditing fork()/clone() isn't quite the same and could be difficult to do at scale in some configurations. I haven't personally added a BPF LSM program to the kernel so I can't speak to the details on what is possible, but I'm sure others on the To/CC line could help provide more information if that is important to you. > Regarding limiting the tweaking of otherwise-privileged code by > unprivileged users, i wonder whether we could instead add smarts to > ns_capable(). The existing security_capable() hook is eventually called by ns_capable(): ns_capable() ns_capable_common() security_capable(const struct cred *cred, struct user_namespace *ns, int cap, unsigned int opts); ... I'm not sure what additional smarts would be useful here? [side note: SELinux does actually distinguish between capability checks in the initial user namespace vs child namespaces] > Point being, uid mapping would still work, but we'd > break the "privileged against resources you own" part of user > namespaces. I would want it to default to allow, but then when a > 0-day is found which requires reaching ns_capable() code, admins > could easily prevent exploitation until reboot from a fixed kernel. That assumes that everything you care about is behind a capability check, which is probably going to be correct in a lot of the cases, but I think it would be a mistake to assume that is always going to be true. -- paul-moore.com
On Sun, Aug 14, 2022 at 10:32:51PM -0400, Paul Moore wrote: > On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn <serge@hallyn.com> wrote: > > On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote: > > > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > Paul Moore <paul@paul-moore.com> writes: > > > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > >> Frederick Lawler <fred@cloudflare.com> writes: > > > > >> > > > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we > > > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent > > > > >> > a call to create_user_ns(). > > > > >> > > > > >> Re-nack for all of the same reasons. > > > > >> AKA This can only break the users of the user namespace. > > > > >> > > > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > > > >> > > > > >> You aren't fixing what your problem you are papering over it by denying > > > > >> access to the user namespace. > > > > >> > > > > >> Nack Nack Nack. > > > > >> > > > > >> Stop. > > > > >> > > > > >> Go back to the drawing board. > > > > >> > > > > >> Do not pass go. > > > > >> > > > > >> Do not collect $200. > > > > > > > > > > If you want us to take your comments seriously Eric, you need to > > > > > provide the list with some constructive feedback that would allow > > > > > Frederick to move forward with a solution to the use case that has > > > > > been proposed. You response above may be many things, but it is > > > > > certainly not that. > > > > > > > > I did provide constructive feedback. My feedback to his problem > > > > was to address the real problem of bugs in the kernel. > > > > > > We've heard from several people who have use cases which require > > > adding LSM-level access controls and observability to user namespace > > > creation. This is the problem we are trying to solve here; if you do > > > not like the approach proposed in this patchset please suggest another > > > implementation that allows LSMs visibility into user namespace > > > creation. > > > > Regarding the observability - can someone concisely lay out why just > > auditing userns creation would not suffice? Userspace could decide > > what to report based on whether the creating user_ns == /proc/1/ns/user... > > One of the selling points of the BPF LSM is that it allows for various > different ways of reporting and logging beyond audit. However, even > if it was limited to just audit I believe that provides some useful > justification as auditing fork()/clone() isn't quite the same and > could be difficult to do at scale in some configurations. I haven't > personally added a BPF LSM program to the kernel so I can't speak to > the details on what is possible, but I'm sure others on the To/CC line > could help provide more information if that is important to you. > > > Regarding limiting the tweaking of otherwise-privileged code by > > unprivileged users, i wonder whether we could instead add smarts to > > ns_capable(). > > The existing security_capable() hook is eventually called by ns_capable(): > > ns_capable() > ns_capable_common() > security_capable(const struct cred *cred, > struct user_namespace *ns, > int cap, > unsigned int opts); > > ... I'm not sure what additional smarts would be useful here? Oh - i wasn't necessarily thinking of an LSM. I was picturing a sysctl next to unprivileged_userns_clone. But you're right, looks like an LSM could already do this. Of course, there's an issue early on in that the root user in the new namespace couldn't setuid, so the uid mapping is still limited. So this idea probably isn't worth the characters we've typed about it so far, sorry. > [side note: SELinux does actually distinguish between capability > checks in the initial user namespace vs child namespaces] > > > Point being, uid mapping would still work, but we'd > > break the "privileged against resources you own" part of user > > namespaces. I would want it to default to allow, but then when a > > 0-day is found which requires reaching ns_capable() code, admins > > could easily prevent exploitation until reboot from a fixed kernel. > > That assumes that everything you care about is behind a capability > check, which is probably going to be correct in a lot of the cases, > but I think it would be a mistake to assume that is always going to be > true. I might be thinking wrongly, but if it's not behind a capability check, then it seems to me it's not an exploit that can only be reached by becoming root in a user namespace, which means disabling user namespace creation by unprivileged users will not stop the attack.
On Mon, Aug 15, 2022 at 11:41 AM Serge E. Hallyn <serge@hallyn.com> wrote: > On Sun, Aug 14, 2022 at 10:32:51PM -0400, Paul Moore wrote: > > On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn <serge@hallyn.com> wrote: > > > On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote: > > > > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > > Paul Moore <paul@paul-moore.com> writes: > > > > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > > > >> Frederick Lawler <fred@cloudflare.com> writes: > > > > > >> > > > > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we > > > > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent > > > > > >> > a call to create_user_ns(). > > > > > >> > > > > > >> Re-nack for all of the same reasons. > > > > > >> AKA This can only break the users of the user namespace. > > > > > >> > > > > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > > > > >> > > > > > >> You aren't fixing what your problem you are papering over it by denying > > > > > >> access to the user namespace. > > > > > >> > > > > > >> Nack Nack Nack. > > > > > >> > > > > > >> Stop. > > > > > >> > > > > > >> Go back to the drawing board. > > > > > >> > > > > > >> Do not pass go. > > > > > >> > > > > > >> Do not collect $200. > > > > > > > > > > > > If you want us to take your comments seriously Eric, you need to > > > > > > provide the list with some constructive feedback that would allow > > > > > > Frederick to move forward with a solution to the use case that has > > > > > > been proposed. You response above may be many things, but it is > > > > > > certainly not that. > > > > > > > > > > I did provide constructive feedback. My feedback to his problem > > > > > was to address the real problem of bugs in the kernel. > > > > > > > > We've heard from several people who have use cases which require > > > > adding LSM-level access controls and observability to user namespace > > > > creation. This is the problem we are trying to solve here; if you do > > > > not like the approach proposed in this patchset please suggest another > > > > implementation that allows LSMs visibility into user namespace > > > > creation. > > > > > > Regarding the observability - can someone concisely lay out why just > > > auditing userns creation would not suffice? Userspace could decide > > > what to report based on whether the creating user_ns == /proc/1/ns/user... > > > > One of the selling points of the BPF LSM is that it allows for various > > different ways of reporting and logging beyond audit. However, even > > if it was limited to just audit I believe that provides some useful > > justification as auditing fork()/clone() isn't quite the same and > > could be difficult to do at scale in some configurations. I haven't > > personally added a BPF LSM program to the kernel so I can't speak to > > the details on what is possible, but I'm sure others on the To/CC line > > could help provide more information if that is important to you. > > > > > Regarding limiting the tweaking of otherwise-privileged code by > > > unprivileged users, i wonder whether we could instead add smarts to > > > ns_capable(). > > > > The existing security_capable() hook is eventually called by ns_capable(): > > > > ns_capable() > > ns_capable_common() > > security_capable(const struct cred *cred, > > struct user_namespace *ns, > > int cap, > > unsigned int opts); > > > > ... I'm not sure what additional smarts would be useful here? > > Oh - i wasn't necessarily thinking of an LSM. I was picturing a > sysctl next to unprivileged_userns_clone. But you're right, looks > like an LSM could already do this. Of course, there's an issue early > on in that the root user in the new namespace couldn't setuid, so > the uid mapping is still limited. So this idea probably isn't worth > the characters we've typed about it so far, sorry. No harm, no foul. This thread has already reached record lows with respect to usefulness-vs-characters ratio, a few more isn't going to hurt anything ;) > > [side note: SELinux does actually distinguish between capability > > checks in the initial user namespace vs child namespaces] > > > > > Point being, uid mapping would still work, but we'd > > > break the "privileged against resources you own" part of user > > > namespaces. I would want it to default to allow, but then when a > > > 0-day is found which requires reaching ns_capable() code, admins > > > could easily prevent exploitation until reboot from a fixed kernel. > > > > That assumes that everything you care about is behind a capability > > check, which is probably going to be correct in a lot of the cases, > > but I think it would be a mistake to assume that is always going to be > > true. > > I might be thinking wrongly, but if it's not behind a capability check, > then it seems to me it's not an exploit that can only be reached by > becoming root in a user namespace, which means disabling user namespace > creation by unprivileged users will not stop the attack. I was primarily thinking about two things: subj/obj relationships which are really not addressed with capability checks, and unrelated problems which aren't the fault of the user namespace but could be somehow made easier through some of the unique situations offered by user namespaces. There are exploits that often require chaining together multiple "things" to trigger the necessary flaw, and sometimes the most immediate way to stop such an attack is to apply additional controls to one of these intermediate steps. Frekerick's work puts the necessary infrastructure in place so we can do that with user namespaces.