Message ID | 20210111170622.2613577-1-surenb@google.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Paul Moore |
Headers | show |
Series | [v2,1/1] mm/madvise: replace ptrace attach requirement for process_madvise | expand |
On Mon, Jan 11, 2021 at 09:06:22AM -0800, Suren Baghdasaryan wrote: > process_madvise currently requires ptrace attach capability. > PTRACE_MODE_ATTACH gives one process complete control over another > process. It effectively removes the security boundary between the > two processes (in one direction). Granting ptrace attach capability > even to a system process is considered dangerous since it creates an > attack surface. This severely limits the usage of this API. > The operations process_madvise can perform do not affect the correctness > of the operation of the target process; they only affect where the data > is physically located (and therefore, how fast it can be accessed). > What we want is the ability for one process to influence another process > in order to optimize performance across the entire system while leaving > the security boundary intact. > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > and CAP_SYS_NICE for influencing process performance. > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org>
On Mon, 11 Jan 2021 09:06:22 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > process_madvise currently requires ptrace attach capability. > PTRACE_MODE_ATTACH gives one process complete control over another > process. It effectively removes the security boundary between the > two processes (in one direction). Granting ptrace attach capability > even to a system process is considered dangerous since it creates an > attack surface. This severely limits the usage of this API. > The operations process_madvise can perform do not affect the correctness > of the operation of the target process; they only affect where the data > is physically located (and therefore, how fast it can be accessed). > What we want is the ability for one process to influence another process > in order to optimize performance across the entire system while leaving > the security boundary intact. > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > and CAP_SYS_NICE for influencing process performance. It would be useful to see the proposed manpage update. process_madvise() was released in 5.10, so this is a non-backward-compatible change to a released kernel. I think it would be OK at this stage to feed this into 5.10.x with a cc:stable and suitable words in the changelog explaining why we're doing this. Alternatively we could retain PTRACE_MODE_ATTACH's behaviour and add PTRACE_MODE_READ&CAP_SYS_NICE alongside that.
On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > process_madvise currently requires ptrace attach capability. > PTRACE_MODE_ATTACH gives one process complete control over another > process. It effectively removes the security boundary between the > two processes (in one direction). Granting ptrace attach capability > even to a system process is considered dangerous since it creates an > attack surface. This severely limits the usage of this API. > The operations process_madvise can perform do not affect the correctness > of the operation of the target process; they only affect where the data > is physically located (and therefore, how fast it can be accessed). Yes it doesn't influence the correctness but it is still a very sensitive operation because it can allow a targeted side channel timing attacks so we should be really careful. > What we want is the ability for one process to influence another process > in order to optimize performance across the entire system while leaving > the security boundary intact. > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > and CAP_SYS_NICE for influencing process performance. I have to say that ptrace modes are rather obscure to me. So I cannot really judge whether MODE_READ is sufficient. My understanding has always been that this is requred to RO access to the address space. But this operation clearly has a visible side effect. Do we have any actual documentation for the existing modes? I would be really curious to hear from Jann and Oleg (now Cced). Is CAP_SYS_NICE requirement really necessary? > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > Acked-by: Minchan Kim <minchan@kernel.org> > Acked-by: David Rientjes <rientjes@google.com> > --- > mm/madvise.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 6a660858784b..a9bcd16b5d95 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > goto release_task; > } > > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */ > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); > if (IS_ERR_OR_NULL(mm)) { > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; > goto release_task; > } > > + /* > + * Require CAP_SYS_NICE for influencing process performance. Note that > + * only non-destructive hints are currently supported. > + */ > + if (!capable(CAP_SYS_NICE)) { > + ret = -EPERM; > + goto release_mm; > + } > + > total_len = iov_iter_count(&iter); > > while (iov_iter_count(&iter)) { > @@ -1217,6 +1227,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > if (ret == 0) > ret = total_len - iov_iter_count(&iter); > > +release_mm: > mmput(mm); > release_task: > put_task_struct(task); > -- > 2.30.0.284.gd98b1dd5eaa7-goog >
On Mon, Jan 11, 2021 at 5:22 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 11 Jan 2021 09:06:22 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > > > process_madvise currently requires ptrace attach capability. > > PTRACE_MODE_ATTACH gives one process complete control over another > > process. It effectively removes the security boundary between the > > two processes (in one direction). Granting ptrace attach capability > > even to a system process is considered dangerous since it creates an > > attack surface. This severely limits the usage of this API. > > The operations process_madvise can perform do not affect the correctness > > of the operation of the target process; they only affect where the data > > is physically located (and therefore, how fast it can be accessed). > > What we want is the ability for one process to influence another process > > in order to optimize performance across the entire system while leaving > > the security boundary intact. > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > and CAP_SYS_NICE for influencing process performance. > > It would be useful to see the proposed manpage update. > > process_madvise() was released in 5.10, so this is a > non-backward-compatible change to a released kernel. > > I think it would be OK at this stage to feed this into 5.10.x with a > cc:stable and suitable words in the changelog explaining why we're > doing this. Sure, I will post another patchset that will include manpage update and will CC:stable. That's of course after Michal's concerns are addressed. Thanks! > > Alternatively we could retain PTRACE_MODE_ATTACH's behaviour and add > PTRACE_MODE_READ&CAP_SYS_NICE alongside that.
On 01/12, Michal Hocko wrote: > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > What we want is the ability for one process to influence another process > > in order to optimize performance across the entire system while leaving > > the security boundary intact. > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > and CAP_SYS_NICE for influencing process performance. > > I have to say that ptrace modes are rather obscure to me. So I cannot > really judge whether MODE_READ is sufficient. My understanding has > always been that this is requred to RO access to the address space. But > this operation clearly has a visible side effect. Do we have any actual > documentation for the existing modes? > > I would be really curious to hear from Jann and Oleg (now Cced). Can't comment, sorry. I never understood these security checks and never tried. IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what is the difference. Oleg.
On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > On 01/12, Michal Hocko wrote: > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > What we want is the ability for one process to influence another process > > > in order to optimize performance across the entire system while leaving > > > the security boundary intact. > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > and CAP_SYS_NICE for influencing process performance. > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > really judge whether MODE_READ is sufficient. My understanding has > > always been that this is requred to RO access to the address space. But > > this operation clearly has a visible side effect. Do we have any actual > > documentation for the existing modes? > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > Can't comment, sorry. I never understood these security checks and never tried. > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > is the difference. I haven't seen a written explanation on ptrace modes but when I consulted Jann his explanation was: PTRACE_MODE_READ means you can inspect metadata about processes with the specified domain, across UID boundaries. PTRACE_MODE_ATTACH means you can fully impersonate processes with the specified domain, across UID boundaries. He did agree that in this case PTRACE_MODE_ATTACH seems too restrictive (we do not try to gain full control or impersonate a process) and PTRACE_MODE_READ is a better choice. > > Oleg. >
On Mon, Jan 11, 2021 at 11:46 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > process_madvise currently requires ptrace attach capability. > > PTRACE_MODE_ATTACH gives one process complete control over another > > process. It effectively removes the security boundary between the > > two processes (in one direction). Granting ptrace attach capability > > even to a system process is considered dangerous since it creates an > > attack surface. This severely limits the usage of this API. > > The operations process_madvise can perform do not affect the correctness > > of the operation of the target process; they only affect where the data > > is physically located (and therefore, how fast it can be accessed). > > Yes it doesn't influence the correctness but it is still a very > sensitive operation because it can allow a targeted side channel timing > attacks so we should be really careful. Sorry, I missed this comment in my answer. Possibility of affecting the target's performance including side channel attack is why we require CAP_SYS_NICE. > > > What we want is the ability for one process to influence another process > > in order to optimize performance across the entire system while leaving > > the security boundary intact. > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > and CAP_SYS_NICE for influencing process performance. > > I have to say that ptrace modes are rather obscure to me. So I cannot > really judge whether MODE_READ is sufficient. My understanding has > always been that this is requred to RO access to the address space. But > this operation clearly has a visible side effect. Do we have any actual > documentation for the existing modes? > > I would be really curious to hear from Jann and Oleg (now Cced). > > Is CAP_SYS_NICE requirement really necessary? > > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > Acked-by: Minchan Kim <minchan@kernel.org> > > Acked-by: David Rientjes <rientjes@google.com> > > --- > > mm/madvise.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 6a660858784b..a9bcd16b5d95 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > > goto release_task; > > } > > > > - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); > > + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */ > > + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); > > if (IS_ERR_OR_NULL(mm)) { > > ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; > > goto release_task; > > } > > > > + /* > > + * Require CAP_SYS_NICE for influencing process performance. Note that > > + * only non-destructive hints are currently supported. > > + */ > > + if (!capable(CAP_SYS_NICE)) { > > + ret = -EPERM; > > + goto release_mm; > > + } > > + > > total_len = iov_iter_count(&iter); > > > > while (iov_iter_count(&iter)) { > > @@ -1217,6 +1227,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > > if (ret == 0) > > ret = total_len - iov_iter_count(&iter); > > > > +release_mm: > > mmput(mm); > > release_task: > > put_task_struct(task); > > -- > > 2.30.0.284.gd98b1dd5eaa7-goog > > > > -- > Michal Hocko > SUSE Labs
On Tue 12-01-21 10:12:03, Suren Baghdasaryan wrote: > On Mon, Jan 11, 2021 at 11:46 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > process_madvise currently requires ptrace attach capability. > > > PTRACE_MODE_ATTACH gives one process complete control over another > > > process. It effectively removes the security boundary between the > > > two processes (in one direction). Granting ptrace attach capability > > > even to a system process is considered dangerous since it creates an > > > attack surface. This severely limits the usage of this API. > > > The operations process_madvise can perform do not affect the correctness > > > of the operation of the target process; they only affect where the data > > > is physically located (and therefore, how fast it can be accessed). > > > > Yes it doesn't influence the correctness but it is still a very > > sensitive operation because it can allow a targeted side channel timing > > attacks so we should be really careful. > > Sorry, I missed this comment in my answer. Possibility of affecting > the target's performance including side channel attack is why we > require CAP_SYS_NICE. OK. It would be really good to document that in the man page. From the current wording it seems we already rely on this cap for migration on a remote process which is not the same thing but it roughly falls into the similar category.
On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > On 01/12, Michal Hocko wrote: > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > What we want is the ability for one process to influence another process > > > > in order to optimize performance across the entire system while leaving > > > > the security boundary intact. > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > really judge whether MODE_READ is sufficient. My understanding has > > > always been that this is requred to RO access to the address space. But > > > this operation clearly has a visible side effect. Do we have any actual > > > documentation for the existing modes? > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > Can't comment, sorry. I never understood these security checks and never tried. > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > is the difference. > > I haven't seen a written explanation on ptrace modes but when I > consulted Jann his explanation was: > > PTRACE_MODE_READ means you can inspect metadata about processes with > the specified domain, across UID boundaries. > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > specified domain, across UID boundaries. Maybe this would be a good start to document expectations. Some more practical examples where the difference is visible would be great as well. > He did agree that in this case PTRACE_MODE_ATTACH seems too > restrictive (we do not try to gain full control or impersonate a > process) and PTRACE_MODE_READ is a better choice. All that being said, I am not against the changed behavior but I do not feel competent to give an ack.
On Wed, Jan 13, 2021 at 6:22 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > What we want is the ability for one process to influence another process > > > > > in order to optimize performance across the entire system while leaving > > > > > the security boundary intact. > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > always been that this is requred to RO access to the address space. But > > > > this operation clearly has a visible side effect. Do we have any actual > > > > documentation for the existing modes? > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > is the difference. > > > > I haven't seen a written explanation on ptrace modes but when I > > consulted Jann his explanation was: > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > the specified domain, across UID boundaries. > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > specified domain, across UID boundaries. > > Maybe this would be a good start to document expectations. Some more > practical examples where the difference is visible would be great as > well. I'll do my best but I'm also not a security expert. Will post the next version with a draft for the man page (this syscall does not have a man page yet AFAIKT) and we can iterate on the wording there. > > He did agree that in this case PTRACE_MODE_ATTACH seems too > > restrictive (we do not try to gain full control or impersonate a > > process) and PTRACE_MODE_READ is a better choice. > > All that being said, I am not against the changed behavior but I do not > feel competent to give an ack. Great. SOunds like the only missing piece is the man page with more details. I'll work on it but since it's the first time I will be contributing to man pages it might take me a couple days. Thanks everyone for the reviews! > -- > Michal Hocko > SUSE Labs
On Mon, 11 Jan 2021, Suren Baghdasaryan wrote: > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > and CAP_SYS_NICE for influencing process performance. Almost missed these -- please cc the LSM mailing list when modifying capabilities or other LSM-related things.
On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > What we want is the ability for one process to influence another process > > > > > in order to optimize performance across the entire system while leaving > > > > > the security boundary intact. > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > always been that this is requred to RO access to the address space. But > > > > this operation clearly has a visible side effect. Do we have any actual > > > > documentation for the existing modes? > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > is the difference. Yama in particular only does its checks on ATTACH and ignores READ, that's the difference you're probably most likely to encounter on a normal desktop system, since some distros turn Yama on by default. Basically the idea there is that running "gdb -p $pid" or "strace -p $pid" as a normal user will usually fail, but reading /proc/$pid/maps still works; so you can see things like detailed memory usage information and such, but you're not supposed to be able to directly peek into a running SSH client and inject data into the existing SSH connection, or steal the cryptographic keys for the current connection, or something like that. > > I haven't seen a written explanation on ptrace modes but when I > > consulted Jann his explanation was: > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > the specified domain, across UID boundaries. > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > specified domain, across UID boundaries. > > Maybe this would be a good start to document expectations. Some more > practical examples where the difference is visible would be great as > well. Before documenting the behavior, it would be a good idea to figure out what to do with perf_event_open(). That one's weird in that it only requires PTRACE_MODE_READ, but actually allows you to sample stuff like userspace stack and register contents (if perf_event_paranoid is 1 or 2). Maybe for SELinux things (and maybe also for Yama), there should be a level in between that allows fully inspecting the process (for purposes like profiling) but without the ability to corrupt its memory or registers or things like that. Or maybe perf_event_open() should just use the ATTACH mode.
On Tue, Jan 19, 2021 at 9:02 PM James Morris <jmorris@namei.org> wrote: > > On Mon, 11 Jan 2021, Suren Baghdasaryan wrote: > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > and CAP_SYS_NICE for influencing process performance. > > > Almost missed these -- please cc the LSM mailing list when modifying > capabilities or other LSM-related things. Thanks for the note. Will definitely include it when sending the next version. > > -- > James Morris > <jmorris@namei.org> > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Wed, Jan 20, 2021 at 5:18 AM Jann Horn <jannh@google.com> wrote: > > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > in order to optimize performance across the entire system while leaving > > > > > > the security boundary intact. > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > always been that this is requred to RO access to the address space. But > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > documentation for the existing modes? > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > is the difference. > > Yama in particular only does its checks on ATTACH and ignores READ, > that's the difference you're probably most likely to encounter on a > normal desktop system, since some distros turn Yama on by default. > Basically the idea there is that running "gdb -p $pid" or "strace -p > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > still works; so you can see things like detailed memory usage > information and such, but you're not supposed to be able to directly > peek into a running SSH client and inject data into the existing SSH > connection, or steal the cryptographic keys for the current > connection, or something like that. > > > > I haven't seen a written explanation on ptrace modes but when I > > > consulted Jann his explanation was: > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > the specified domain, across UID boundaries. > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > specified domain, across UID boundaries. > > > > Maybe this would be a good start to document expectations. Some more > > practical examples where the difference is visible would be great as > > well. > > Before documenting the behavior, it would be a good idea to figure out > what to do with perf_event_open(). That one's weird in that it only > requires PTRACE_MODE_READ, but actually allows you to sample stuff > like userspace stack and register contents (if perf_event_paranoid is > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > should be a level in between that allows fully inspecting the process > (for purposes like profiling) but without the ability to corrupt its > memory or registers or things like that. Or maybe perf_event_open() > should just use the ATTACH mode. Thanks for additional clarifications, Jann! Just to clarify, the documentation I'm preparing is a man page for process_madvise(2) which will list the required capabilities but won't dive into all the security details. I believe the above suggestions are for documenting different PTRACE modes and will not be included in that man page. Maybe a separate document could do that but I'm definitely not qualified to write it.
On Wed, Jan 20, 2021 at 8:57 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Wed, Jan 20, 2021 at 5:18 AM Jann Horn <jannh@google.com> wrote: > > > > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > > in order to optimize performance across the entire system while leaving > > > > > > > the security boundary intact. > > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > > always been that this is requred to RO access to the address space. But > > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > > documentation for the existing modes? > > > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > > is the difference. > > > > Yama in particular only does its checks on ATTACH and ignores READ, > > that's the difference you're probably most likely to encounter on a > > normal desktop system, since some distros turn Yama on by default. > > Basically the idea there is that running "gdb -p $pid" or "strace -p > > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > > still works; so you can see things like detailed memory usage > > information and such, but you're not supposed to be able to directly > > peek into a running SSH client and inject data into the existing SSH > > connection, or steal the cryptographic keys for the current > > connection, or something like that. > > > > > > I haven't seen a written explanation on ptrace modes but when I > > > > consulted Jann his explanation was: > > > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > > the specified domain, across UID boundaries. > > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > > specified domain, across UID boundaries. > > > > > > Maybe this would be a good start to document expectations. Some more > > > practical examples where the difference is visible would be great as > > > well. > > > > Before documenting the behavior, it would be a good idea to figure out > > what to do with perf_event_open(). That one's weird in that it only > > requires PTRACE_MODE_READ, but actually allows you to sample stuff > > like userspace stack and register contents (if perf_event_paranoid is > > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > > should be a level in between that allows fully inspecting the process > > (for purposes like profiling) but without the ability to corrupt its > > memory or registers or things like that. Or maybe perf_event_open() > > should just use the ATTACH mode. > > Thanks for additional clarifications, Jann! > Just to clarify, the documentation I'm preparing is a man page for > process_madvise(2) which will list the required capabilities but won't > dive into all the security details. > I believe the above suggestions are for documenting different PTRACE > modes and will not be included in that man page. Maybe a separate > document could do that but I'm definitely not qualified to write it. Folks, I posted the man page here: https://lore.kernel.org/linux-mm/20210120202337.1481402-1-surenb@google.com/ Also I realized that this patch is not changing at all and if I send a new version, the only difference would be CC'ing it to stable and linux-security-module. I'm CC'ing stable (James already CC'ed LSM), but if I should re-post it please let me know. Cc: stable@vger.kernel.org # 5.10+
On Wed 20-01-21 14:17:39, Jann Horn wrote: > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > in order to optimize performance across the entire system while leaving > > > > > > the security boundary intact. > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > always been that this is requred to RO access to the address space. But > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > documentation for the existing modes? > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > is the difference. > > Yama in particular only does its checks on ATTACH and ignores READ, > that's the difference you're probably most likely to encounter on a > normal desktop system, since some distros turn Yama on by default. > Basically the idea there is that running "gdb -p $pid" or "strace -p > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > still works; so you can see things like detailed memory usage > information and such, but you're not supposed to be able to directly > peek into a running SSH client and inject data into the existing SSH > connection, or steal the cryptographic keys for the current > connection, or something like that. > > > > I haven't seen a written explanation on ptrace modes but when I > > > consulted Jann his explanation was: > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > the specified domain, across UID boundaries. > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > specified domain, across UID boundaries. > > > > Maybe this would be a good start to document expectations. Some more > > practical examples where the difference is visible would be great as > > well. > > Before documenting the behavior, it would be a good idea to figure out > what to do with perf_event_open(). That one's weird in that it only > requires PTRACE_MODE_READ, but actually allows you to sample stuff > like userspace stack and register contents (if perf_event_paranoid is > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > should be a level in between that allows fully inspecting the process > (for purposes like profiling) but without the ability to corrupt its > memory or registers or things like that. Or maybe perf_event_open() > should just use the ATTACH mode. Thanks for the clarification. I still cannot say I would have a good mental picture. Having something in Documentation/core-api/ sounds really needed. Wrt to perf_event_open it sounds really odd it can do more than other places restrict indeed. Something for the respective maintainer but I strongly suspect people simply copy the pattern from other places because the expected semantic is not really clear.
On Tue, Jan 26, 2021 at 5:52 AM 'Michal Hocko' via kernel-team <kernel-team@android.com> wrote: > > On Wed 20-01-21 14:17:39, Jann Horn wrote: > > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > > in order to optimize performance across the entire system while leaving > > > > > > > the security boundary intact. > > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > > always been that this is requred to RO access to the address space. But > > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > > documentation for the existing modes? > > > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > > is the difference. > > > > Yama in particular only does its checks on ATTACH and ignores READ, > > that's the difference you're probably most likely to encounter on a > > normal desktop system, since some distros turn Yama on by default. > > Basically the idea there is that running "gdb -p $pid" or "strace -p > > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > > still works; so you can see things like detailed memory usage > > information and such, but you're not supposed to be able to directly > > peek into a running SSH client and inject data into the existing SSH > > connection, or steal the cryptographic keys for the current > > connection, or something like that. > > > > > > I haven't seen a written explanation on ptrace modes but when I > > > > consulted Jann his explanation was: > > > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > > the specified domain, across UID boundaries. > > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > > specified domain, across UID boundaries. > > > > > > Maybe this would be a good start to document expectations. Some more > > > practical examples where the difference is visible would be great as > > > well. > > > > Before documenting the behavior, it would be a good idea to figure out > > what to do with perf_event_open(). That one's weird in that it only > > requires PTRACE_MODE_READ, but actually allows you to sample stuff > > like userspace stack and register contents (if perf_event_paranoid is > > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > > should be a level in between that allows fully inspecting the process > > (for purposes like profiling) but without the ability to corrupt its > > memory or registers or things like that. Or maybe perf_event_open() > > should just use the ATTACH mode. > > Thanks for the clarification. I still cannot say I would have a good > mental picture. Having something in Documentation/core-api/ sounds > really needed. Wrt to perf_event_open it sounds really odd it can do > more than other places restrict indeed. Something for the respective > maintainer but I strongly suspect people simply copy the pattern from > other places because the expected semantic is not really clear. > Sorry, back to the matters of this patch. Are there any actionable items for me to take care of before it can be accepted? The only request from Andrew to write a man page is being worked on at https://lore.kernel.org/linux-mm/20210120202337.1481402-1-surenb@google.com/ and I'll follow up with the next version. I also CC'ed stable@ for this to be included into 5.10 per Andrew's request. That CC was lost at some point, so CC'ing again. I do not see anything else on this patch to fix. Please chime in if there are any more concerns, otherwise I would ask Andrew to take it into mm-tree and stable@ to apply it to 5.10. Thanks! > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Thu, Jan 28, 2021 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Jan 26, 2021 at 5:52 AM 'Michal Hocko' via kernel-team > <kernel-team@android.com> wrote: > > > > On Wed 20-01-21 14:17:39, Jann Horn wrote: > > > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > > > in order to optimize performance across the entire system while leaving > > > > > > > > the security boundary intact. > > > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > > > always been that this is requred to RO access to the address space. But > > > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > > > documentation for the existing modes? > > > > > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > > > is the difference. > > > > > > Yama in particular only does its checks on ATTACH and ignores READ, > > > that's the difference you're probably most likely to encounter on a > > > normal desktop system, since some distros turn Yama on by default. > > > Basically the idea there is that running "gdb -p $pid" or "strace -p > > > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > > > still works; so you can see things like detailed memory usage > > > information and such, but you're not supposed to be able to directly > > > peek into a running SSH client and inject data into the existing SSH > > > connection, or steal the cryptographic keys for the current > > > connection, or something like that. > > > > > > > > I haven't seen a written explanation on ptrace modes but when I > > > > > consulted Jann his explanation was: > > > > > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > > > the specified domain, across UID boundaries. > > > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > > > specified domain, across UID boundaries. > > > > > > > > Maybe this would be a good start to document expectations. Some more > > > > practical examples where the difference is visible would be great as > > > > well. > > > > > > Before documenting the behavior, it would be a good idea to figure out > > > what to do with perf_event_open(). That one's weird in that it only > > > requires PTRACE_MODE_READ, but actually allows you to sample stuff > > > like userspace stack and register contents (if perf_event_paranoid is > > > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > > > should be a level in between that allows fully inspecting the process > > > (for purposes like profiling) but without the ability to corrupt its > > > memory or registers or things like that. Or maybe perf_event_open() > > > should just use the ATTACH mode. > > > > Thanks for the clarification. I still cannot say I would have a good > > mental picture. Having something in Documentation/core-api/ sounds > > really needed. Wrt to perf_event_open it sounds really odd it can do > > more than other places restrict indeed. Something for the respective > > maintainer but I strongly suspect people simply copy the pattern from > > other places because the expected semantic is not really clear. > > > > Sorry, back to the matters of this patch. Are there any actionable > items for me to take care of before it can be accepted? The only > request from Andrew to write a man page is being worked on at > https://lore.kernel.org/linux-mm/20210120202337.1481402-1-surenb@google.com/ > and I'll follow up with the next version. I also CC'ed stable@ for > this to be included into 5.10 per Andrew's request. That CC was lost > at some point, so CC'ing again. > > I do not see anything else on this patch to fix. Please chime in if > there are any more concerns, otherwise I would ask Andrew to take it > into mm-tree and stable@ to apply it to 5.10. > Thanks! process_madvise man page V2 is posted at: https://lore.kernel.org/linux-mm/20210129070340.566340-1-surenb@google.com/ > > > > -- > > Michal Hocko > > SUSE Labs > > > > -- > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > >
On Thu, Jan 28, 2021 at 11:08 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Thu, Jan 28, 2021 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Tue, Jan 26, 2021 at 5:52 AM 'Michal Hocko' via kernel-team > > <kernel-team@android.com> wrote: > > > > > > On Wed 20-01-21 14:17:39, Jann Horn wrote: > > > > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > > > > in order to optimize performance across the entire system while leaving > > > > > > > > > the security boundary intact. > > > > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > > > > always been that this is requred to RO access to the address space. But > > > > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > > > > documentation for the existing modes? > > > > > > > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > > > > is the difference. > > > > > > > > Yama in particular only does its checks on ATTACH and ignores READ, > > > > that's the difference you're probably most likely to encounter on a > > > > normal desktop system, since some distros turn Yama on by default. > > > > Basically the idea there is that running "gdb -p $pid" or "strace -p > > > > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > > > > still works; so you can see things like detailed memory usage > > > > information and such, but you're not supposed to be able to directly > > > > peek into a running SSH client and inject data into the existing SSH > > > > connection, or steal the cryptographic keys for the current > > > > connection, or something like that. > > > > > > > > > > I haven't seen a written explanation on ptrace modes but when I > > > > > > consulted Jann his explanation was: > > > > > > > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > > > > the specified domain, across UID boundaries. > > > > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > > > > specified domain, across UID boundaries. > > > > > > > > > > Maybe this would be a good start to document expectations. Some more > > > > > practical examples where the difference is visible would be great as > > > > > well. > > > > > > > > Before documenting the behavior, it would be a good idea to figure out > > > > what to do with perf_event_open(). That one's weird in that it only > > > > requires PTRACE_MODE_READ, but actually allows you to sample stuff > > > > like userspace stack and register contents (if perf_event_paranoid is > > > > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > > > > should be a level in between that allows fully inspecting the process > > > > (for purposes like profiling) but without the ability to corrupt its > > > > memory or registers or things like that. Or maybe perf_event_open() > > > > should just use the ATTACH mode. > > > > > > Thanks for the clarification. I still cannot say I would have a good > > > mental picture. Having something in Documentation/core-api/ sounds > > > really needed. Wrt to perf_event_open it sounds really odd it can do > > > more than other places restrict indeed. Something for the respective > > > maintainer but I strongly suspect people simply copy the pattern from > > > other places because the expected semantic is not really clear. > > > > > > > Sorry, back to the matters of this patch. Are there any actionable > > items for me to take care of before it can be accepted? The only > > request from Andrew to write a man page is being worked on at > > https://lore.kernel.org/linux-mm/20210120202337.1481402-1-surenb@google.com/ > > and I'll follow up with the next version. I also CC'ed stable@ for > > this to be included into 5.10 per Andrew's request. That CC was lost > > at some point, so CC'ing again. > > > > I do not see anything else on this patch to fix. Please chime in if > > there are any more concerns, otherwise I would ask Andrew to take it > > into mm-tree and stable@ to apply it to 5.10. > > Thanks! > > process_madvise man page V2 is posted at: > https://lore.kernel.org/linux-mm/20210129070340.566340-1-surenb@google.com/ process_madvise man page V3 is posted at: https://lore.kernel.org/linux-mm/20210202053046.1653012-1-surenb@google.com/ > > > > > > > > -- > > > Michal Hocko > > > SUSE Labs > > > > > > -- > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > > >
On Mon, Feb 1, 2021 at 9:34 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Thu, Jan 28, 2021 at 11:08 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Thu, Jan 28, 2021 at 11:51 AM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Tue, Jan 26, 2021 at 5:52 AM 'Michal Hocko' via kernel-team > > > <kernel-team@android.com> wrote: > > > > > > > > On Wed 20-01-21 14:17:39, Jann Horn wrote: > > > > > On Wed, Jan 13, 2021 at 3:22 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Tue 12-01-21 09:51:24, Suren Baghdasaryan wrote: > > > > > > > On Tue, Jan 12, 2021 at 9:45 AM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > > > > > > > > > On 01/12, Michal Hocko wrote: > > > > > > > > > > > > > > > > > > On Mon 11-01-21 09:06:22, Suren Baghdasaryan wrote: > > > > > > > > > > > > > > > > > > > What we want is the ability for one process to influence another process > > > > > > > > > > in order to optimize performance across the entire system while leaving > > > > > > > > > > the security boundary intact. > > > > > > > > > > Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ > > > > > > > > > > and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata > > > > > > > > > > and CAP_SYS_NICE for influencing process performance. > > > > > > > > > > > > > > > > > > I have to say that ptrace modes are rather obscure to me. So I cannot > > > > > > > > > really judge whether MODE_READ is sufficient. My understanding has > > > > > > > > > always been that this is requred to RO access to the address space. But > > > > > > > > > this operation clearly has a visible side effect. Do we have any actual > > > > > > > > > documentation for the existing modes? > > > > > > > > > > > > > > > > > > I would be really curious to hear from Jann and Oleg (now Cced). > > > > > > > > > > > > > > > > Can't comment, sorry. I never understood these security checks and never tried. > > > > > > > > IIUC only selinux/etc can treat ATTACH/READ differently and I have no idea what > > > > > > > > is the difference. > > > > > > > > > > Yama in particular only does its checks on ATTACH and ignores READ, > > > > > that's the difference you're probably most likely to encounter on a > > > > > normal desktop system, since some distros turn Yama on by default. > > > > > Basically the idea there is that running "gdb -p $pid" or "strace -p > > > > > $pid" as a normal user will usually fail, but reading /proc/$pid/maps > > > > > still works; so you can see things like detailed memory usage > > > > > information and such, but you're not supposed to be able to directly > > > > > peek into a running SSH client and inject data into the existing SSH > > > > > connection, or steal the cryptographic keys for the current > > > > > connection, or something like that. > > > > > > > > > > > > I haven't seen a written explanation on ptrace modes but when I > > > > > > > consulted Jann his explanation was: > > > > > > > > > > > > > > PTRACE_MODE_READ means you can inspect metadata about processes with > > > > > > > the specified domain, across UID boundaries. > > > > > > > PTRACE_MODE_ATTACH means you can fully impersonate processes with the > > > > > > > specified domain, across UID boundaries. > > > > > > > > > > > > Maybe this would be a good start to document expectations. Some more > > > > > > practical examples where the difference is visible would be great as > > > > > > well. > > > > > > > > > > Before documenting the behavior, it would be a good idea to figure out > > > > > what to do with perf_event_open(). That one's weird in that it only > > > > > requires PTRACE_MODE_READ, but actually allows you to sample stuff > > > > > like userspace stack and register contents (if perf_event_paranoid is > > > > > 1 or 2). Maybe for SELinux things (and maybe also for Yama), there > > > > > should be a level in between that allows fully inspecting the process > > > > > (for purposes like profiling) but without the ability to corrupt its > > > > > memory or registers or things like that. Or maybe perf_event_open() > > > > > should just use the ATTACH mode. > > > > > > > > Thanks for the clarification. I still cannot say I would have a good > > > > mental picture. Having something in Documentation/core-api/ sounds > > > > really needed. Wrt to perf_event_open it sounds really odd it can do > > > > more than other places restrict indeed. Something for the respective > > > > maintainer but I strongly suspect people simply copy the pattern from > > > > other places because the expected semantic is not really clear. > > > > > > > > > > Sorry, back to the matters of this patch. Are there any actionable > > > items for me to take care of before it can be accepted? The only > > > request from Andrew to write a man page is being worked on at > > > https://lore.kernel.org/linux-mm/20210120202337.1481402-1-surenb@google.com/ > > > and I'll follow up with the next version. I also CC'ed stable@ for > > > this to be included into 5.10 per Andrew's request. That CC was lost > > > at some point, so CC'ing again. > > > > > > I do not see anything else on this patch to fix. Please chime in if > > > there are any more concerns, otherwise I would ask Andrew to take it > > > into mm-tree and stable@ to apply it to 5.10. > > > Thanks! > > > > process_madvise man page V2 is posted at: > > https://lore.kernel.org/linux-mm/20210129070340.566340-1-surenb@google.com/ > > process_madvise man page V3 is posted at: > https://lore.kernel.org/linux-mm/20210202053046.1653012-1-surenb@google.com/ > Hi Andrew, A friendly reminder to please include this patch into mm tree. There seem to be no more questions or objections. The man page you requested is accepted here: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993 stable is CC'ed and this patch should go into 5.10 and later kernels The patch has been: Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> If you want me to resend it, please let me know. Thanks, Suren. > > > > > > > > > > > > -- > > > > Michal Hocko > > > > SUSE Labs > > > > > > > > -- > > > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. > > > >
On Tue, 2 Mar 2021 15:53:39 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > Hi Andrew, > A friendly reminder to please include this patch into mm tree. > There seem to be no more questions or objections. > The man page you requested is accepted here: > https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993 > stable is CC'ed and this patch should go into 5.10 and later kernels > The patch has been: > Acked-by: Minchan Kim <minchan@kernel.org> > Acked-by: David Rientjes <rientjes@google.com> > Reviewed-by: Kees Cook <keescook@chromium.org> > > If you want me to resend it, please let me know. This patch was tough. I think it would be best to resend please, being sure to cc everyone who commented. To give everyone another chance to get their heads around it. If necessary, please update the changelog to address any confusion/questions which have arisen thus far. Thanks.
On Tue, Mar 2, 2021 at 4:17 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 2 Mar 2021 15:53:39 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > > > Hi Andrew, > > A friendly reminder to please include this patch into mm tree. > > There seem to be no more questions or objections. > > The man page you requested is accepted here: > > https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993 > > stable is CC'ed and this patch should go into 5.10 and later kernels > > The patch has been: > > Acked-by: Minchan Kim <minchan@kernel.org> > > Acked-by: David Rientjes <rientjes@google.com> > > Reviewed-by: Kees Cook <keescook@chromium.org> > > > > If you want me to resend it, please let me know. > > This patch was tough. I think it would be best to resend please, being > sure to cc everyone who commented. To give everyone another chance to > get their heads around it. If necessary, please update the changelog > to address any confusion/questions which have arisen thus far. Sure, will do. Thanks! > > Thanks.
On Tue, Mar 2, 2021 at 4:19 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Mar 2, 2021 at 4:17 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 2 Mar 2021 15:53:39 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > > > > > Hi Andrew, > > > A friendly reminder to please include this patch into mm tree. > > > There seem to be no more questions or objections. > > > The man page you requested is accepted here: > > > https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=a144f458bad476a3358e3a45023789cb7bb9f993 > > > stable is CC'ed and this patch should go into 5.10 and later kernels > > > The patch has been: > > > Acked-by: Minchan Kim <minchan@kernel.org> > > > Acked-by: David Rientjes <rientjes@google.com> > > > Reviewed-by: Kees Cook <keescook@chromium.org> > > > > > > If you want me to resend it, please let me know. > > > > This patch was tough. I think it would be best to resend please, being > > sure to cc everyone who commented. To give everyone another chance to > > get their heads around it. If necessary, please update the changelog > > to address any confusion/questions which have arisen thus far. > > Sure, will do. Thanks! Posted v3 at https://lore.kernel.org/linux-mm/20210303185807.2160264-1-surenb@google.com/ > > > > > Thanks.
diff --git a/mm/madvise.c b/mm/madvise.c index 6a660858784b..a9bcd16b5d95 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1197,12 +1197,22 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto release_task; } - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */ + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); if (IS_ERR_OR_NULL(mm)) { ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; goto release_task; } + /* + * Require CAP_SYS_NICE for influencing process performance. Note that + * only non-destructive hints are currently supported. + */ + if (!capable(CAP_SYS_NICE)) { + ret = -EPERM; + goto release_mm; + } + total_len = iov_iter_count(&iter); while (iov_iter_count(&iter)) { @@ -1217,6 +1227,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, if (ret == 0) ret = total_len - iov_iter_count(&iter); +release_mm: mmput(mm); release_task: put_task_struct(task);