Message ID | 20210701015444.ZOZaFPX0b%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [001/192] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c | expand |
On Wed, Jun 30, 2021 at 06:54:44PM -0700, Andrew Morton wrote: > From: Kalesh Singh <kaleshsingh@google.com> > Subject: procfs: allow reading fdinfo with PTRACE_MODE_READ > > Android captures per-process system memory state when certain low memory > events (e.g a foreground app kill) occur, to identify potential memory > hoggers. In order to measure how much memory a process actually consumes, > it is necessary to include the DMA buffer sizes for that process in the > memory accounting. Since the handle to DMA buffers are raw FDs, it is > important to be able to identify which processes have FD references to a > DMA buffer. > > Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and > /proc/<pid>/fdinfo -- both are only readable by the process owner, as > follows: > > 1. Do a readlink on each FD. > 2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD. > 3. stat the file to get the dmabuf inode number. > 4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size. > > Accessing other processes' fdinfo requires root privileges. This limits > the use of the interface to debugging environments and is not suitable for > production builds. Granting root privileges even to a system process > increases the attack surface and is highly undesirable. > > Since fdinfo doesn't permit reading process memory and manipulating > process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED. > > Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com > Signed-off-by: Kalesh Singh <kaleshsingh@google.com> > Suggested-by: Jann Horn <jannh@google.com> > Acked-by: Christian König <christian.koenig@amd.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Alexey Dobriyan <adobriyan@gmail.com> > Cc: Alexey Gladkov <gladkov.alexey@gmail.com> > Cc: Andrei Vagin <avagin@gmail.com> > Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> > Cc: Christian Brauner <christian.brauner@ubuntu.com> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Hridya Valsaraju <hridya@google.com> > Cc: James Morris <jamorris@linux.microsoft.com> > Cc: Jeff Vander Stoep <jeffv@google.com> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: Kees Cook <keescook@chromium.org> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Michel Lespinasse <walken@google.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Randy Dunlap <rdunlap@infradead.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- Rather useful (also for CRIU and others). Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
On Wed, Jun 30, 2021 at 06:54:44PM -0700, Andrew Morton wrote: > From: Kalesh Singh <kaleshsingh@google.com> > Subject: procfs: allow reading fdinfo with PTRACE_MODE_READ > > Android captures per-process system memory state when certain low memory > events (e.g a foreground app kill) occur, to identify potential memory > hoggers. In order to measure how much memory a process actually consumes, > it is necessary to include the DMA buffer sizes for that process in the > memory accounting. Since the handle to DMA buffers are raw FDs, it is > important to be able to identify which processes have FD references to a > DMA buffer. > > Currently, DMA buffer FDs can be accounted using /proc/<pid>/fd/* and > /proc/<pid>/fdinfo -- both are only readable by the process owner, as > follows: > > 1. Do a readlink on each FD. > 2. If the target path begins with "/dmabuf", then the FD is a dmabuf FD. > 3. stat the file to get the dmabuf inode number. > 4. Read/ proc/<pid>/fdinfo/<fd>, to get the DMA buffer size. > > Accessing other processes' fdinfo requires root privileges. This limits > the use of the interface to debugging environments and is not suitable for > production builds. Granting root privileges even to a system process > increases the attack surface and is highly undesirable. > > Since fdinfo doesn't permit reading process memory and manipulating > process state, allow accessing fdinfo under PTRACE_MODE_READ_FSCRED. > > Link: https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com > Signed-off-by: Kalesh Singh <kaleshsingh@google.com> > Suggested-by: Jann Horn <jannh@google.com> > Acked-by: Christian König <christian.koenig@amd.com> > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Alexey Dobriyan <adobriyan@gmail.com> > Cc: Alexey Gladkov <gladkov.alexey@gmail.com> > Cc: Andrei Vagin <avagin@gmail.com> > Cc: Bernd Edlinger <bernd.edlinger@hotmail.de> > Cc: Christian Brauner <christian.brauner@ubuntu.com> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Helge Deller <deller@gmx.de> > Cc: Hridya Valsaraju <hridya@google.com> > Cc: James Morris <jamorris@linux.microsoft.com> > Cc: Jeff Vander Stoep <jeffv@google.com> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: Kees Cook <keescook@chromium.org> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Michel Lespinasse <walken@google.com> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Randy Dunlap <rdunlap@infradead.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > fs/proc/base.c | 4 ++-- > fs/proc/fd.c | 15 ++++++++++++++- > 2 files changed, 16 insertions(+), 3 deletions(-) > > --- a/fs/proc/base.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read > +++ a/fs/proc/base.c > @@ -3172,7 +3172,7 @@ static const struct pid_entry tgid_base_ > DIR("task", S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations), > DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), > DIR("map_files", S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations), > - DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), > + DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations), > DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), > #ifdef CONFIG_NET > DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), > @@ -3517,7 +3517,7 @@ static const struct inode_operations pro > */ > static const struct pid_entry tid_base_stuff[] = { > DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), > - DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), > + DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations), > DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), > #ifdef CONFIG_NET > DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), > --- a/fs/proc/fd.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read > +++ a/fs/proc/fd.c > @@ -6,6 +6,7 @@ > #include <linux/fdtable.h> > #include <linux/namei.h> > #include <linux/pid.h> > +#include <linux/ptrace.h> > #include <linux/security.h> > #include <linux/file.h> > #include <linux/seq_file.h> > @@ -72,6 +73,18 @@ out: > > static int seq_fdinfo_open(struct inode *inode, struct file *file) > { > + bool allowed = false; > + struct task_struct *task = get_proc_task(inode); > + > + if (!task) > + return -ESRCH; > + > + allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); > + put_task_struct(task); > + > + if (!allowed) > + return -EACCES; Uhm, this is only checked in open(), and never again? Is this safe in the face of exec or pid re-use? -Kees > + > return single_open(file, seq_show, inode); > } > > @@ -308,7 +321,7 @@ static struct dentry *proc_fdinfo_instan > struct proc_inode *ei; > struct inode *inode; > > - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR); > + inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO); > if (!inode) > return ERR_PTR(-ENOENT); > > _
On Fri, Jul 2, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote: > > Uhm, this is only checked in open(), and never again? Is this safe in > the face of exec or pid re-use? Interesting question, but not really all that valid for this particular patch. Why? Because we already only check for owner permissions on open, and never again. So if we have fdinfo issues across a suid exec or pid re-use, they are pre-existing.. But yes, it would probably be a good idea to think about readdir() on that directory. If somebody reminds me after the merge window is over, I'll come back to this, but if somebody else wants to think about it before then, that would be great. Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Jul 2, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote: >> >> Uhm, this is only checked in open(), and never again? Is this safe in >> the face of exec or pid re-use? Exec does not change the file descriptor table. The open holds a reference to the proc inode. The proc inode holds the struct pid of the task and the file descriptor number. References using struct pid do not suffer from userspace pid rollover issues. So the only issue I see is file descriptor reuse after an exec, that changes the processes struct cred. Assuming we care it would probably be worth a bug fix patch to check something. Eric
On Fri, Jul 02, 2021 at 03:40:49PM -0500, Eric W. Biederman wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > > > On Fri, Jul 2, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote: > >> > >> Uhm, this is only checked in open(), and never again? Is this safe in > >> the face of exec or pid re-use? > > Exec does not change the file descriptor table. Ah yeah, good point. I've been thinking too much about vmas. > The open holds a reference to the proc inode. The proc inode holds the > struct pid of the task and the file descriptor number. References using > struct pid do not suffer from userspace pid rollover issues. Okay, cool. > So the only issue I see is file descriptor reuse after an exec, > that changes the processes struct cred. Right -- the info leak would be snooping on what a privileged process was doing with a given fd? Similar stuff has been used to do typing pattern analysis with login passwords, but that's a stretch here, I think. Hmm. > Assuming we care it would probably be worth a bug fix patch to check > something. Sounds good.
On Fri, Jul 2, 2021 at 4:31 PM Kees Cook <keescook@chromium.org> wrote: > > Right -- the info leak would be snooping on what a privileged process > was doing with a given fd? Similar stuff has been used to do typing > pattern analysis with login passwords, but that's a stretch here, I > think. Hmm. So I think you'd see the directory list, but generally that's just the file descriptor numbers. Which is information you shouldn't have access to, but it's probably not very *interesting* information. I think it would be worth fixing but possibly not a very high priority. Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, Jul 2, 2021 at 4:31 PM Kees Cook <keescook@chromium.org> wrote: >> >> Right -- the info leak would be snooping on what a privileged process >> was doing with a given fd? Similar stuff has been used to do typing >> pattern analysis with login passwords, but that's a stretch here, I >> think. Hmm. > > So I think you'd see the directory list, but generally that's just the > file descriptor numbers. > > Which is information you shouldn't have access to, but it's probably > not very *interesting* information. > > I think it would be worth fixing but possibly not a very high > priority. It is not just the directory whose permission changed but the individual files in that directory. You can also see the position, flags, mnt_id, and soon inode number of fdinfo files you open before a suid exec. Knowing what file someone is reading on a particular file descriptor number and how far they are in reading that file sounds like a side channel someone can do something with. Eric
--- a/fs/proc/base.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read +++ a/fs/proc/base.c @@ -3172,7 +3172,7 @@ static const struct pid_entry tgid_base_ DIR("task", S_IRUGO|S_IXUGO, proc_task_inode_operations, proc_task_operations), DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), DIR("map_files", S_IRUSR|S_IXUSR, proc_map_files_inode_operations, proc_map_files_operations), - DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), + DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations), DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), #ifdef CONFIG_NET DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), @@ -3517,7 +3517,7 @@ static const struct inode_operations pro */ static const struct pid_entry tid_base_stuff[] = { DIR("fd", S_IRUSR|S_IXUSR, proc_fd_inode_operations, proc_fd_operations), - DIR("fdinfo", S_IRUSR|S_IXUSR, proc_fdinfo_inode_operations, proc_fdinfo_operations), + DIR("fdinfo", S_IRUGO|S_IXUGO, proc_fdinfo_inode_operations, proc_fdinfo_operations), DIR("ns", S_IRUSR|S_IXUGO, proc_ns_dir_inode_operations, proc_ns_dir_operations), #ifdef CONFIG_NET DIR("net", S_IRUGO|S_IXUGO, proc_net_inode_operations, proc_net_operations), --- a/fs/proc/fd.c~procfs-allow-reading-fdinfo-with-ptrace_mode_read +++ a/fs/proc/fd.c @@ -6,6 +6,7 @@ #include <linux/fdtable.h> #include <linux/namei.h> #include <linux/pid.h> +#include <linux/ptrace.h> #include <linux/security.h> #include <linux/file.h> #include <linux/seq_file.h> @@ -72,6 +73,18 @@ out: static int seq_fdinfo_open(struct inode *inode, struct file *file) { + bool allowed = false; + struct task_struct *task = get_proc_task(inode); + + if (!task) + return -ESRCH; + + allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); + put_task_struct(task); + + if (!allowed) + return -EACCES; + return single_open(file, seq_show, inode); } @@ -308,7 +321,7 @@ static struct dentry *proc_fdinfo_instan struct proc_inode *ei; struct inode *inode; - inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUSR); + inode = proc_pid_make_inode(dentry->d_sb, task, S_IFREG | S_IRUGO); if (!inode) return ERR_PTR(-ENOENT);