diff mbox series

[1/1] cgroup: limit cgroup psi file writes to processes with CAP_SYS_RESOURCE

Message ID 20230301014651.1370939-1-surenb@google.com (mailing list archive)
State New
Headers show
Series [1/1] cgroup: limit cgroup psi file writes to processes with CAP_SYS_RESOURCE | expand

Commit Message

Suren Baghdasaryan March 1, 2023, 1:46 a.m. UTC
Currently /proc/pressure/* files can be written only by processes with
CAP_SYS_RESOURCE capability to prevent any unauthorized user from
creating psi triggers. However no such limitation is required for
per-cgroup pressure files. Fix this inconsistency by requiring the same
capability for writing per-cgroup psi files.

Fixes: 6db12ee0456d ("psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 kernel/cgroup/cgroup.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Michal Hocko March 1, 2023, 9:47 a.m. UTC | #1
On Tue 28-02-23 17:46:51, Suren Baghdasaryan wrote:
> Currently /proc/pressure/* files can be written only by processes with
> CAP_SYS_RESOURCE capability to prevent any unauthorized user from
> creating psi triggers. However no such limitation is required for
> per-cgroup pressure files. Fix this inconsistency by requiring the same
> capability for writing per-cgroup psi files.
> 
> Fixes: 6db12ee0456d ("psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files")

Is this really a regression from this commit? 6db12ee0456d is changing
permissions of those files to be world writeable with the
CAP_SYS_RESOURCE requirement. Permissions of cgroup files is not changed
and the default mode is 644 (with root as an owner) so only privileged
processes are allowed without any delegation.

I think you should instead construct this slightly differently. The
ultimate goal is to allow a reasonable delegation after all, no?

So keep your current patch and extend it by removing the min timeout
constrain and justify the change by the necessity of the granularity
tuning as reported by Sudarshan Rajagopala. If this causes any
regression then a revert would also return the min timeout constrain
back and we will have to think about a different approach.

The consistency with the global case is a valid point only partially
because different cgroups might have different owners which is not
usually the case for the global psi interface, right?

Makes sense?
Suren Baghdasaryan March 1, 2023, 6:05 p.m. UTC | #2
On Wed, Mar 1, 2023 at 1:47 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 28-02-23 17:46:51, Suren Baghdasaryan wrote:
> > Currently /proc/pressure/* files can be written only by processes with
> > CAP_SYS_RESOURCE capability to prevent any unauthorized user from
> > creating psi triggers. However no such limitation is required for
> > per-cgroup pressure files. Fix this inconsistency by requiring the same
> > capability for writing per-cgroup psi files.
> >
> > Fixes: 6db12ee0456d ("psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files")
>
> Is this really a regression from this commit? 6db12ee0456d is changing
> permissions of those files to be world writeable with the
> CAP_SYS_RESOURCE requirement. Permissions of cgroup files is not changed
> and the default mode is 644 (with root as an owner) so only privileged
> processes are allowed without any delegation.

Agree, the Fixes line here is not valid. I will remove it.

>
> I think you should instead construct this slightly differently. The
> ultimate goal is to allow a reasonable delegation after all, no?

Yes.

>
> So keep your current patch and extend it by removing the min timeout
> constrain and justify the change by the necessity of the granularity
> tuning as reported by Sudarshan Rajagopala. If this causes any
> regression then a revert would also return the min timeout constrain
> back and we will have to think about a different approach.

I think adding CAP_SYS_RESOURCE check is needed even if we keep the
min timeout capped like today. Without it one could create multiple
cgroups and add a trigger into each one, therefore creating an
unlimited number of "psimon" kernel threads. At some point I expect
them to affect system performance because even at high polling
intervals they still consume some cpu resources. So, this change I
think is needed regardless of the change to min polling period and I
would suggest keeping them separate.

>
> The consistency with the global case is a valid point only partially
> because different cgroups might have different owners which is not
> usually the case for the global psi interface, right?

Correct.

>
> Makes sense?

Yes but hopefully my argument about keeping this and min period
patches separate is reasonable?
Thanks,
Suren.

> --
> Michal Hocko
> SUSE Labs
Michal Hocko March 1, 2023, 6:35 p.m. UTC | #3
On Wed 01-03-23 10:05:36, Suren Baghdasaryan wrote:
[...]
> Yes but hopefully my argument about keeping this and min period
> patches separate is reasonable?

I am not questioning that. The practical advantage to squash the two
changes is that in case of the CAP_SYS_RESOURCE you do not have to
explicitly think about reverting the min constrain as well. I do not
think reverting one without the other is good.
Suren Baghdasaryan March 1, 2023, 6:40 p.m. UTC | #4
On Wed, Mar 1, 2023 at 10:35 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 01-03-23 10:05:36, Suren Baghdasaryan wrote:
> [...]
> > Yes but hopefully my argument about keeping this and min period
> > patches separate is reasonable?
>
> I am not questioning that. The practical advantage to squash the two
> changes is that in case of the CAP_SYS_RESOURCE you do not have to
> explicitly think about reverting the min constrain as well. I do not
> think reverting one without the other is good.

Ok, I'm fine with having both changes in the same patch. Will post v2
later today. Thanks!

>
> --
> Michal Hocko
> SUSE Labs
Suren Baghdasaryan March 1, 2023, 7:36 p.m. UTC | #5
On Wed, Mar 1, 2023 at 10:40 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Wed, Mar 1, 2023 at 10:35 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 01-03-23 10:05:36, Suren Baghdasaryan wrote:
> > [...]
> > > Yes but hopefully my argument about keeping this and min period
> > > patches separate is reasonable?
> >
> > I am not questioning that. The practical advantage to squash the two
> > changes is that in case of the CAP_SYS_RESOURCE you do not have to
> > explicitly think about reverting the min constrain as well. I do not
> > think reverting one without the other is good.
>
> Ok, I'm fine with having both changes in the same patch. Will post v2
> later today. Thanks!

Didn't call it v2 since the title had to be changed. The new patch is
posted at: https://lore.kernel.org/all/20230301193403.1507484-1-surenb@google.com/

>
> >
> > --
> > Michal Hocko
> > SUSE Labs
diff mbox series

Patch

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 935e8121b21e..b600a6baaeca 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3867,6 +3867,12 @@  static __poll_t cgroup_pressure_poll(struct kernfs_open_file *of,
 	return psi_trigger_poll(&ctx->psi.trigger, of->file, pt);
 }
 
+static int cgroup_pressure_open(struct kernfs_open_file *of)
+{
+	return (of->file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) ?
+		-EPERM : 0;
+}
+
 static void cgroup_pressure_release(struct kernfs_open_file *of)
 {
 	struct cgroup_file_ctx *ctx = of->priv;
@@ -5266,6 +5272,7 @@  static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "io.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_IO]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_io_pressure_show,
 		.write = cgroup_io_pressure_write,
 		.poll = cgroup_pressure_poll,
@@ -5274,6 +5281,7 @@  static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "memory.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_MEM]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_memory_pressure_show,
 		.write = cgroup_memory_pressure_write,
 		.poll = cgroup_pressure_poll,
@@ -5282,6 +5290,7 @@  static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "cpu.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_CPU]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_cpu_pressure_show,
 		.write = cgroup_cpu_pressure_write,
 		.poll = cgroup_pressure_poll,
@@ -5291,6 +5300,7 @@  static struct cftype cgroup_psi_files[] = {
 	{
 		.name = "irq.pressure",
 		.file_offset = offsetof(struct cgroup, psi_files[PSI_IRQ]),
+		.open = cgroup_pressure_open,
 		.seq_show = cgroup_irq_pressure_show,
 		.write = cgroup_irq_pressure_write,
 		.poll = cgroup_pressure_poll,