Message ID | 20230922062558.1739642-1-max.kellermann@ionos.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] fs/ceph/debugfs: make all files world-readable | expand |
On 9/22/23 14:25, Max Kellermann wrote: > I'd like to be able to run metrics collector processes without special > privileges > > In the kernel, there is a mix of debugfs files being world-readable > and not world-readable is; with a naive "git grep", I found 723 > world-readable debugfs_create_file() calls and 582 calls which were > only accessible to privileged processe. > > From the code, I cannot derive a consistent policy for that, but the > ceph statistics seem harmless (and useful) enough. I am not sure whether will this make sense. Because the 'debug' under '/sys/kernel/' is also only accessible by privileged process. Ilya, Jeff Any idea ? Thanks - Xiubo > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> > --- > fs/ceph/debugfs.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c > index 3904333fa6c3..2abee7e18144 100644 > --- a/fs/ceph/debugfs.c > +++ b/fs/ceph/debugfs.c > @@ -429,31 +429,31 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) > name); > > fsc->debugfs_mdsmap = debugfs_create_file("mdsmap", > - 0400, > + 0444, > fsc->client->debugfs_dir, > fsc, > &mdsmap_fops); > > fsc->debugfs_mds_sessions = debugfs_create_file("mds_sessions", > - 0400, > + 0444, > fsc->client->debugfs_dir, > fsc, > &mds_sessions_fops); > > fsc->debugfs_mdsc = debugfs_create_file("mdsc", > - 0400, > + 0444, > fsc->client->debugfs_dir, > fsc, > &mdsc_fops); > > fsc->debugfs_caps = debugfs_create_file("caps", > - 0400, > + 0444, > fsc->client->debugfs_dir, > fsc, > &caps_fops); > > fsc->debugfs_status = debugfs_create_file("status", > - 0400, > + 0444, > fsc->client->debugfs_dir, > fsc, > &status_fops); > @@ -461,13 +461,13 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) > fsc->debugfs_metrics_dir = debugfs_create_dir("metrics", > fsc->client->debugfs_dir); > > - debugfs_create_file("file", 0400, fsc->debugfs_metrics_dir, fsc, > + debugfs_create_file("file", 0444, fsc->debugfs_metrics_dir, fsc, > &metrics_file_fops); > - debugfs_create_file("latency", 0400, fsc->debugfs_metrics_dir, fsc, > + debugfs_create_file("latency", 0444, fsc->debugfs_metrics_dir, fsc, > &metrics_latency_fops); > - debugfs_create_file("size", 0400, fsc->debugfs_metrics_dir, fsc, > + debugfs_create_file("size", 0444, fsc->debugfs_metrics_dir, fsc, > &metrics_size_fops); > - debugfs_create_file("caps", 0400, fsc->debugfs_metrics_dir, fsc, > + debugfs_create_file("caps", 0444, fsc->debugfs_metrics_dir, fsc, > &metrics_caps_fops); > } >
On Mon, 2023-09-25 at 13:18 +0800, Xiubo Li wrote: > On 9/22/23 14:25, Max Kellermann wrote: > > I'd like to be able to run metrics collector processes without special > > privileges > > > > In the kernel, there is a mix of debugfs files being world-readable > > and not world-readable is; with a naive "git grep", I found 723 > > world-readable debugfs_create_file() calls and 582 calls which were > > only accessible to privileged processe. > > > > From the code, I cannot derive a consistent policy for that, but the > > ceph statistics seem harmless (and useful) enough. > > I am not sure whether will this make sense. Because the 'debug' under > '/sys/kernel/' is also only accessible by privileged process. > > Ilya, Jeff > > Any idea ? > Yeah, I don't think this makes much sense. At least on my machine: # stat -c '%A' /sys/kernel/debug drwx------ Without at least x permissions, an unprivileged user can't pathwalk through there. Max, how are you testing this? > > > Signed-off-by: Max Kellermann <max.kellermann@ionos.com> > > --- > > fs/ceph/debugfs.c | 18 +++++++++--------- > > 1 file changed, 9 insertions(+), 9 deletions(-) > > > > diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c > > index 3904333fa6c3..2abee7e18144 100644 > > --- a/fs/ceph/debugfs.c > > +++ b/fs/ceph/debugfs.c > > @@ -429,31 +429,31 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) > > name); > > > > fsc->debugfs_mdsmap = debugfs_create_file("mdsmap", > > - 0400, > > + 0444, > > fsc->client->debugfs_dir, > > fsc, > > &mdsmap_fops); > > > > fsc->debugfs_mds_sessions = debugfs_create_file("mds_sessions", > > - 0400, > > + 0444, > > fsc->client->debugfs_dir, > > fsc, > > &mds_sessions_fops); > > > > fsc->debugfs_mdsc = debugfs_create_file("mdsc", > > - 0400, > > + 0444, > > fsc->client->debugfs_dir, > > fsc, > > &mdsc_fops); > > > > fsc->debugfs_caps = debugfs_create_file("caps", > > - 0400, > > + 0444, > > fsc->client->debugfs_dir, > > fsc, > > &caps_fops); > > > > fsc->debugfs_status = debugfs_create_file("status", > > - 0400, > > + 0444, > > fsc->client->debugfs_dir, > > fsc, > > &status_fops); > > @@ -461,13 +461,13 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) > > fsc->debugfs_metrics_dir = debugfs_create_dir("metrics", > > fsc->client->debugfs_dir); > > > > - debugfs_create_file("file", 0400, fsc->debugfs_metrics_dir, fsc, > > + debugfs_create_file("file", 0444, fsc->debugfs_metrics_dir, fsc, > > &metrics_file_fops); > > - debugfs_create_file("latency", 0400, fsc->debugfs_metrics_dir, fsc, > > + debugfs_create_file("latency", 0444, fsc->debugfs_metrics_dir, fsc, > > &metrics_latency_fops); > > - debugfs_create_file("size", 0400, fsc->debugfs_metrics_dir, fsc, > > + debugfs_create_file("size", 0444, fsc->debugfs_metrics_dir, fsc, > > &metrics_size_fops); > > - debugfs_create_file("caps", 0400, fsc->debugfs_metrics_dir, fsc, > > + debugfs_create_file("caps", 0444, fsc->debugfs_metrics_dir, fsc, > > &metrics_caps_fops); > > } > > >
On Fri, Sep 22, 2023 at 8:26 AM Max Kellermann <max.kellermann@ionos.com> wrote: > > I'd like to be able to run metrics collector processes without special > privileges Hi Max, A word of caution about building metrics collectors based on debugfs output: there are no stability guarantees. While the format won't be changed just for the sake of change of course, expect zero effort to preserve backwards compatibility. The latency metrics in particular are sent to the MDS in binary form and are intended to be consumed through commands like "ceph fs top". debugfs stuff is there just for an occasional sneak peek (apart from actual debugging). Thanks, Ilya
On Mon, 2023-09-25 at 12:41 +0200, Ilya Dryomov wrote: > On Fri, Sep 22, 2023 at 8:26 AM Max Kellermann <max.kellermann@ionos.com> wrote: > > > > I'd like to be able to run metrics collector processes without special > > privileges > > Hi Max, > > A word of caution about building metrics collectors based on debugfs > output: there are no stability guarantees. While the format won't be > changed just for the sake of change of course, expect zero effort to > preserve backwards compatibility. > > The latency metrics in particular are sent to the MDS in binary form > and are intended to be consumed through commands like "ceph fs top". > debugfs stuff is there just for an occasional sneak peek (apart from > actual debugging). > FWIW, I wish we had gone with netlink for this functionality instead of a seqfile. Lorenzo has been working with netlink for some similar functionality with nfsd[1], and it's much nicer for this sort of thing. [1]: https://lore.kernel.org/linux-nfs/ZQTM6l7NrsVHFoR5@lore-desk/T/#t
On Mon, Sep 25, 2023 at 7:18 AM Xiubo Li <xiubli@redhat.com> wrote: > I am not sure whether will this make sense. Because the 'debug' under > '/sys/kernel/' is also only accessible by privileged process. Not exactly correct. It is by default accessible to processes who have CAP_DAC_OVERRIDE and additionally it is accessible to (unprivileged) processes running as uid=0 (those two traits usually overlap). But we don't want to run kernel-exporter as uid=0 and neither do we want to give it CAP_DAC_OVERRIDE; both would be too much, it would affect much more than just (read) access to debugfs. Instead, we mount debugfs with "gid=X,mode=0710". That way, we can give (unprivileged) processes which are member of a certain group access to debugfs, and we put our kernel-exporter process in that group. We can use these mount options to change debugfs defaults, but if a debugfs implementor (such as cephfs) decides to override these global debugfs settings by passing stricter file permissions, we can't easily override that. And that is what my patch is about: restore the ability to override debugfs permissions with a mount option, as debugfs was designed. Max
On Mon, Sep 25, 2023 at 12:42 PM Ilya Dryomov <idryomov@gmail.com> wrote: > A word of caution about building metrics collectors based on debugfs > output: there are no stability guarantees. While the format won't be > changed just for the sake of change of course, expect zero effort to > preserve backwards compatibility. Agree, but there's nothing else. We have been using my patch for quite some time, and it has been very useful. Maybe we can discuss promoting these statistics to sysfs/proc? (the raw numbers, not the existing aggregates which are useless for any practical purpose) > The latency metrics in particular are sent to the MDS in binary form > and are intended to be consumed through commands like "ceph fs top". > debugfs stuff is there just for an occasional sneak peek (apart from > actual debugging). I don't know the whole Ceph ecosystem so well, but "ceph" is a command that is supposed to run on a Ceph server, and not on a machine that mounts a cephfs, right? If that's right, then this command is useless for me. Max
On Tue, Sep 26, 2023 at 8:16 AM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Mon, Sep 25, 2023 at 12:42 PM Ilya Dryomov <idryomov@gmail.com> wrote: > > A word of caution about building metrics collectors based on debugfs > > output: there are no stability guarantees. While the format won't be > > changed just for the sake of change of course, expect zero effort to > > preserve backwards compatibility. > > Agree, but there's nothing else. We have been using my patch for quite > some time, and it has been very useful. > > Maybe we can discuss promoting these statistics to sysfs/proc? (the > raw numbers, not the existing aggregates which are useless for any > practical purpose) > > > The latency metrics in particular are sent to the MDS in binary form > > and are intended to be consumed through commands like "ceph fs top". > > debugfs stuff is there just for an occasional sneak peek (apart from > > actual debugging). > > I don't know the whole Ceph ecosystem so well, but "ceph" is a command > that is supposed to run on a Ceph server, and not on a machine that > mounts a cephfs, right? If that's right, then this command is useless > for me. No, "ceph" command (as well as "rbd", "rados", etc) can be run from anywhere -- it's just a matter of installing a package which is likely already installed unless you are mounting CephFS manually without using /sbin/mount.ceph mount helper. Thanks, Ilya
On Tue, Sep 26, 2023 at 10:46 AM Ilya Dryomov <idryomov@gmail.com> wrote: > No, "ceph" command (as well as "rbd", "rados", etc) can be run from > anywhere -- it's just a matter of installing a package which is likely > already installed unless you are mounting CephFS manually without using > /sbin/mount.ceph mount helper. I have never heard of that helper, so no, we're not using it - should we? This "ceph" tool requires installing 90 MB of additional Debian packages, which I just tried on a test cluster, and "ceph fs top" fails with "Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')". Okay, so I have to configure something.... but .... I don't get why I would want to do that, when I can get the same information from the kernel without installing or configuring anything. This sounds like overcomplexifying the thing for no reason. Max
On Tue, Sep 26, 2023 at 11:09 AM Max Kellermann <max.kellermann@ionos.com> wrote: > > On Tue, Sep 26, 2023 at 10:46 AM Ilya Dryomov <idryomov@gmail.com> wrote: > > No, "ceph" command (as well as "rbd", "rados", etc) can be run from > > anywhere -- it's just a matter of installing a package which is likely > > already installed unless you are mounting CephFS manually without using > > /sbin/mount.ceph mount helper. > > I have never heard of that helper, so no, we're not using it - should we? If you have figured out the right mount options, you might as well not. The helper does things like determine whether v1 or v2 addresses should be used, fetch the key and pass it via the kernel keyring (whereas you are probably passing it verbatim on the command line), etc. It's the same syscall in the end, so the helper is certainly not required. > > This "ceph" tool requires installing 90 MB of additional Debian > packages, which I just tried on a test cluster, and "ceph fs top" > fails with "Error initializing cluster client: ObjectNotFound('RADOS > object not found (error calling conf_read_file)')". Okay, so I have to > configure something.... but .... I don't get why I would want to do > that, when I can get the same information from the kernel without > installing or configuring anything. This sounds like overcomplexifying > the thing for no reason. I have relayed my understanding of this feature (or rather how it was presented to me). I see where you are coming from, so adding more CephFS folks to chime in. Thanks, Ilya
On Wed, Sep 27, 2023 at 12:53 PM Ilya Dryomov <idryomov@gmail.com> wrote: > > This "ceph" tool requires installing 90 MB of additional Debian > > packages, which I just tried on a test cluster, and "ceph fs top" > > fails with "Error initializing cluster client: ObjectNotFound('RADOS > > object not found (error calling conf_read_file)')". Okay, so I have to > > configure something.... but .... I don't get why I would want to do > > that, when I can get the same information from the kernel without > > installing or configuring anything. This sounds like overcomplexifying > > the thing for no reason. > > I have relayed my understanding of this feature (or rather how it was > presented to me). I see where you are coming from, so adding more > CephFS folks to chime in. Let me show these folks how badly "ceph fs stats" performs: # time ceph fs perf stats {"version": 2, "global_counters": ["cap_hit", "read_latency", "write_latency"[...] real 0m0.502s user 0m0.393s sys 0m0.053s Now my debugfs-based solution: # time cat /sys/kernel/debug/ceph/*/metrics/latency item total avg_lat(us) min_lat(us) max_lat(us) stdev(us) [...] real 0m0.002s user 0m0.002s sys 0m0.001s debugfs is more than 200 times faster. It is so fast, it can hardly be measured by "time" - and most of these 2ms is the overhead for executing /bin/cat, not for actually reading the debugfs file. Our kernel-exporter is a daemon process, it only needs a single pread() system call in each iteration, it has even less overhead. Integrating the "ceph" tool instead would require forking the process each time, starting a new Python VM, and so on... For obtaining real-time latency statistics, the "ceph" script is the wrong tool for the job. Max
diff --git a/fs/ceph/debugfs.c b/fs/ceph/debugfs.c index 3904333fa6c3..2abee7e18144 100644 --- a/fs/ceph/debugfs.c +++ b/fs/ceph/debugfs.c @@ -429,31 +429,31 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) name); fsc->debugfs_mdsmap = debugfs_create_file("mdsmap", - 0400, + 0444, fsc->client->debugfs_dir, fsc, &mdsmap_fops); fsc->debugfs_mds_sessions = debugfs_create_file("mds_sessions", - 0400, + 0444, fsc->client->debugfs_dir, fsc, &mds_sessions_fops); fsc->debugfs_mdsc = debugfs_create_file("mdsc", - 0400, + 0444, fsc->client->debugfs_dir, fsc, &mdsc_fops); fsc->debugfs_caps = debugfs_create_file("caps", - 0400, + 0444, fsc->client->debugfs_dir, fsc, &caps_fops); fsc->debugfs_status = debugfs_create_file("status", - 0400, + 0444, fsc->client->debugfs_dir, fsc, &status_fops); @@ -461,13 +461,13 @@ void ceph_fs_debugfs_init(struct ceph_fs_client *fsc) fsc->debugfs_metrics_dir = debugfs_create_dir("metrics", fsc->client->debugfs_dir); - debugfs_create_file("file", 0400, fsc->debugfs_metrics_dir, fsc, + debugfs_create_file("file", 0444, fsc->debugfs_metrics_dir, fsc, &metrics_file_fops); - debugfs_create_file("latency", 0400, fsc->debugfs_metrics_dir, fsc, + debugfs_create_file("latency", 0444, fsc->debugfs_metrics_dir, fsc, &metrics_latency_fops); - debugfs_create_file("size", 0400, fsc->debugfs_metrics_dir, fsc, + debugfs_create_file("size", 0444, fsc->debugfs_metrics_dir, fsc, &metrics_size_fops); - debugfs_create_file("caps", 0400, fsc->debugfs_metrics_dir, fsc, + debugfs_create_file("caps", 0444, fsc->debugfs_metrics_dir, fsc, &metrics_caps_fops); }
I'd like to be able to run metrics collector processes without special privileges In the kernel, there is a mix of debugfs files being world-readable and not world-readable is; with a naive "git grep", I found 723 world-readable debugfs_create_file() calls and 582 calls which were only accessible to privileged processe. From the code, I cannot derive a consistent policy for that, but the ceph statistics seem harmless (and useful) enough. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> --- fs/ceph/debugfs.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)