Message ID | 20230301014651.1370939-1-surenb@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/1] cgroup: limit cgroup psi file writes to processes with CAP_SYS_RESOURCE | expand |
On Tue 28-02-23 17:46:51, Suren Baghdasaryan wrote: > Currently /proc/pressure/* files can be written only by processes with > CAP_SYS_RESOURCE capability to prevent any unauthorized user from > creating psi triggers. However no such limitation is required for > per-cgroup pressure files. Fix this inconsistency by requiring the same > capability for writing per-cgroup psi files. > > Fixes: 6db12ee0456d ("psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files") Is this really a regression from this commit? 6db12ee0456d is changing permissions of those files to be world writeable with the CAP_SYS_RESOURCE requirement. Permissions of cgroup files is not changed and the default mode is 644 (with root as an owner) so only privileged processes are allowed without any delegation. I think you should instead construct this slightly differently. The ultimate goal is to allow a reasonable delegation after all, no? So keep your current patch and extend it by removing the min timeout constrain and justify the change by the necessity of the granularity tuning as reported by Sudarshan Rajagopala. If this causes any regression then a revert would also return the min timeout constrain back and we will have to think about a different approach. The consistency with the global case is a valid point only partially because different cgroups might have different owners which is not usually the case for the global psi interface, right? Makes sense?
On Wed, Mar 1, 2023 at 1:47 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 28-02-23 17:46:51, Suren Baghdasaryan wrote: > > Currently /proc/pressure/* files can be written only by processes with > > CAP_SYS_RESOURCE capability to prevent any unauthorized user from > > creating psi triggers. However no such limitation is required for > > per-cgroup pressure files. Fix this inconsistency by requiring the same > > capability for writing per-cgroup psi files. > > > > Fixes: 6db12ee0456d ("psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files") > > Is this really a regression from this commit? 6db12ee0456d is changing > permissions of those files to be world writeable with the > CAP_SYS_RESOURCE requirement. Permissions of cgroup files is not changed > and the default mode is 644 (with root as an owner) so only privileged > processes are allowed without any delegation. Agree, the Fixes line here is not valid. I will remove it. > > I think you should instead construct this slightly differently. The > ultimate goal is to allow a reasonable delegation after all, no? Yes. > > So keep your current patch and extend it by removing the min timeout > constrain and justify the change by the necessity of the granularity > tuning as reported by Sudarshan Rajagopala. If this causes any > regression then a revert would also return the min timeout constrain > back and we will have to think about a different approach. I think adding CAP_SYS_RESOURCE check is needed even if we keep the min timeout capped like today. Without it one could create multiple cgroups and add a trigger into each one, therefore creating an unlimited number of "psimon" kernel threads. At some point I expect them to affect system performance because even at high polling intervals they still consume some cpu resources. So, this change I think is needed regardless of the change to min polling period and I would suggest keeping them separate. > > The consistency with the global case is a valid point only partially > because different cgroups might have different owners which is not > usually the case for the global psi interface, right? Correct. > > Makes sense? Yes but hopefully my argument about keeping this and min period patches separate is reasonable? Thanks, Suren. > -- > Michal Hocko > SUSE Labs
On Wed 01-03-23 10:05:36, Suren Baghdasaryan wrote: [...] > Yes but hopefully my argument about keeping this and min period > patches separate is reasonable? I am not questioning that. The practical advantage to squash the two changes is that in case of the CAP_SYS_RESOURCE you do not have to explicitly think about reverting the min constrain as well. I do not think reverting one without the other is good.
On Wed, Mar 1, 2023 at 10:35 AM Michal Hocko <mhocko@suse.com> wrote: > > On Wed 01-03-23 10:05:36, Suren Baghdasaryan wrote: > [...] > > Yes but hopefully my argument about keeping this and min period > > patches separate is reasonable? > > I am not questioning that. The practical advantage to squash the two > changes is that in case of the CAP_SYS_RESOURCE you do not have to > explicitly think about reverting the min constrain as well. I do not > think reverting one without the other is good. Ok, I'm fine with having both changes in the same patch. Will post v2 later today. Thanks! > > -- > Michal Hocko > SUSE Labs
On Wed, Mar 1, 2023 at 10:40 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Wed, Mar 1, 2023 at 10:35 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Wed 01-03-23 10:05:36, Suren Baghdasaryan wrote: > > [...] > > > Yes but hopefully my argument about keeping this and min period > > > patches separate is reasonable? > > > > I am not questioning that. The practical advantage to squash the two > > changes is that in case of the CAP_SYS_RESOURCE you do not have to > > explicitly think about reverting the min constrain as well. I do not > > think reverting one without the other is good. > > Ok, I'm fine with having both changes in the same patch. Will post v2 > later today. Thanks! Didn't call it v2 since the title had to be changed. The new patch is posted at: https://lore.kernel.org/all/20230301193403.1507484-1-surenb@google.com/ > > > > > -- > > Michal Hocko > > SUSE Labs
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..b600a6baaeca 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3867,6 +3867,12 @@ static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of, return psi_trigger_poll(&ctx->psi.trigger, of->file, pt); } +static int cgroup_pressure_open(struct kernfs_open_file *of) +{ + return (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) ? + -EPERM : 0; +} + static void cgroup_pressure_release(struct kernfs_open_file *of) { struct cgroup_file_ctx *ctx = of->priv; @@ -5266,6 +5272,7 @@ static struct cftype cgroup_psi_files[] = { { .name = "io.pressure", .file_offset = offsetof(struct cgroup, psi_files[PSI_IO]), + .open = cgroup_pressure_open, .seq_show = cgroup_io_pressure_show, .write = cgroup_io_pressure_write, .poll = cgroup_pressure_poll, @@ -5274,6 +5281,7 @@ static struct cftype cgroup_psi_files[] = { { .name = "memory.pressure", .file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]), + .open = cgroup_pressure_open, .seq_show = cgroup_memory_pressure_show, .write = cgroup_memory_pressure_write, .poll = cgroup_pressure_poll, @@ -5282,6 +5290,7 @@ static struct cftype cgroup_psi_files[] = { { .name = "cpu.pressure", .file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]), + .open = cgroup_pressure_open, .seq_show = cgroup_cpu_pressure_show, .write = cgroup_cpu_pressure_write, .poll = cgroup_pressure_poll, @@ -5291,6 +5300,7 @@ static struct cftype cgroup_psi_files[] = { { .name = "irq.pressure", .file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]), + .open = cgroup_pressure_open, .seq_show = cgroup_irq_pressure_show, .write = cgroup_irq_pressure_write, .poll = cgroup_pressure_poll,
Currently /proc/pressure/* files can be written only by processes with CAP_SYS_RESOURCE capability to prevent any unauthorized user from creating psi triggers. However no such limitation is required for per-cgroup pressure files. Fix this inconsistency by requiring the same capability for writing per-cgroup psi files. Fixes: 6db12ee0456d ("psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files") Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- kernel/cgroup/cgroup.c | 10 ++++++++++ 1 file changed, 10 insertions(+)