diff mbox series

[v14,7/8] signal: define the field siginfo.si_faultflags

Message ID 0eb601a5d1906fadd7099149eb605181911cfc04.1604523707.git.pcc@google.com (mailing list archive)
State New, archived
Headers show
Series arm64: expose FAR_EL1 tag bits in siginfo | expand

Commit Message

Peter Collingbourne Nov. 4, 2020, 9:18 p.m. UTC
This field will contain flags that may be used by signal handlers to
determine whether other fields in the _sigfault portion of siginfo are
valid. An example use case is the following patch, which introduces
the si_addr_tag_bits{,_mask} fields.

A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
a signal handler to require the kernel to set the field (but note
that the field will be set anyway if the kernel supports the flag,
regardless of its value). In combination with the previous patches,
this allows a userspace program to determine whether the kernel will
set the field.

It is possible for an si_faultflags-unaware program to cause a signal
handler in an si_faultflags-aware program to be called with a provided
siginfo data structure by using one of the following syscalls:

- ptrace(PTRACE_SETSIGINFO)
- pidfd_send_signal
- rt_sigqueueinfo
- rt_tgsigqueueinfo

So we need to prevent the si_faultflags-unaware program from causing an
uninitialized read of si_faultflags in the si_faultflags-aware program when
it uses one of these syscalls.

The last three cases can be handled by observing that each of these
syscalls fails if si_code >= 0. We also observe that kill(2) and
tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
so we define si_faultflags to only be valid if si_code > 0.

There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
detects that the signal would use the _sigfault layout, and introduce
a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
program may use to opt out of this behavior.

It is also possible for the kernel to inject a signal specified to
use _sigfault by calling force_sig (e.g. there are numerous calls to
force_sig(SIGSEGV)). In this case si_code is set to SI_KERNEL and the
_kill union member is used, so document that si_code must be < SI_KERNEL.

Ideally this field could have just been named si_flags, but that
name was already taken by ia64, so a different name was chosen.

I considered making ia64's si_flags a generic field and having it
appear at the end of _sigfault (in the same place as this patch has
si_faultflags) on non-ia64, keeping it in the same place on ia64. ia64's
si_flags is a 32-bit field with only one flag bit allocated, so we
would have 31 bits to use if we do this. However, it seems simplest
to avoid entangling these fields.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Ide155ce29366c3eab2a944ae4c51205982e5b8b2
---
v14:
- make the padding explicit so we can easily use it later

v13:
- renamed si_xflags to si_faultflags
- use fallthrough macros in kernel/ptrace.c
- fixed a style warning pointed out by checkpatch.pl

v12:
- Change type of si_xflags to u32 to avoid increasing alignment
- Add si_xflags to signal_compat.c test cases

v11:
- update comment to say that si_code must > 0
- change ptrace(PTRACE_SETSIGINFO2) to take a flags argument

v10:
- make the new field compatible with the various ways
  that a siginfo can be injected from another process
- eliminate some duplication by adding a refactoring patch
  before this one

 arch/powerpc/platforms/powernv/vas-fault.c |  1 +
 arch/x86/kernel/signal_compat.c            |  7 +++--
 include/linux/compat.h                     |  2 ++
 include/linux/signal_types.h               |  2 +-
 include/uapi/asm-generic/siginfo.h         |  7 +++++
 include/uapi/asm-generic/signal-defs.h     |  4 +++
 include/uapi/linux/ptrace.h                | 12 ++++++++
 kernel/ptrace.c                            | 33 ++++++++++++++++++----
 kernel/signal.c                            |  3 ++
 9 files changed, 62 insertions(+), 9 deletions(-)

Comments

Eric W. Biederman Nov. 10, 2020, 1:54 a.m. UTC | #1
Peter you are patching buggy code for your siginfo extension can
you please ignore vas-fault.c.  The code in vas-fault.c should
be fixed separately.  Futher it uses clear_siginfo so you should
get well defined behavior even if your new field is not initialized.

I have copied the powerpc folks so hopefully this buggy code
can be fixed.

> diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c
> index 3d21fce254b7..877e7d5fb4a2 100644
> --- a/arch/powerpc/platforms/powernv/vas-fault.c
> +++ b/arch/powerpc/platforms/powernv/vas-fault.c
> @@ -154,6 +154,7 @@ static void update_csb(struct vas_window *window,
>  	info.si_errno = EFAULT;
>  	info.si_code = SEGV_MAPERR;
>  	info.si_addr = csb_addr;
> +	info.si_faultflags = 0;


Powerpc folks.  This code was introduced in c96c4436aba4 ("powerpc/vas:
Update CSB and notify process for fault CRBs") and is badly buggy.

Let me count the bugs:

a) Using kill_pid_info.  That performs a permission check that
   does not make sense from a kernel thread.

b) Manually filling in struct siginfo.  Everyone gets it wrong
   and the powerpc code is no exception setting si_errno when
   that is something Linux as a rule does not do.

Technically we have send_sig_fault to handle sending
a fault from a non-sychrnous context but I am not convinced
it make sense in this case.

c) Sending an asynchronous SIGSEGV with the si_code set to SEGV_MAPERR.
   How can userspace detect it is an asynchronous signal?  What can
   userspace do if it detects an asynchronous signal?  If userspace is
   so buggered as to give your kernel thread a bogus address I suspect
   uncerimonious sending SIGKILL is probably the best you can do.

There are some additional questionable things in that code like taking a
task_struct reference simply to be able to test tsk->flags but no
locks are held to ensure that tsk->flags are meaningful.  Nor are
any tests performed to see if the task being tested still uses
the designated mm.  I suspect exec could have been called.

In which case the code needs to check the mm, or at least play with
exec_id to ensure you are not improperly signaling a process after exec.

None of this is to say that update_csb is fundmentally bad or hard to
correct just that it has some significant defects in it's implementation
right now that need to be corrected.  I am hoping a detailed accounting
and pointing out those defects will allow the bug to be fixed.

Thank you,
Eric
Eric W. Biederman Nov. 10, 2020, 1:57 a.m. UTC | #2
Peter Collingbourne <pcc@google.com> writes:

> This field will contain flags that may be used by signal handlers to
> determine whether other fields in the _sigfault portion of siginfo are
> valid. An example use case is the following patch, which introduces
> the si_addr_tag_bits{,_mask} fields.
>
> A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
> a signal handler to require the kernel to set the field (but note
> that the field will be set anyway if the kernel supports the flag,
> regardless of its value). In combination with the previous patches,
> this allows a userspace program to determine whether the kernel will
> set the field.
>
> It is possible for an si_faultflags-unaware program to cause a signal
> handler in an si_faultflags-aware program to be called with a provided
> siginfo data structure by using one of the following syscalls:
>
> - ptrace(PTRACE_SETSIGINFO)
> - pidfd_send_signal
> - rt_sigqueueinfo
> - rt_tgsigqueueinfo
>
> So we need to prevent the si_faultflags-unaware program from causing an
> uninitialized read of si_faultflags in the si_faultflags-aware program when
> it uses one of these syscalls.
>
> The last three cases can be handled by observing that each of these
> syscalls fails if si_code >= 0. We also observe that kill(2) and
> tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
> so we define si_faultflags to only be valid if si_code > 0.
>
> There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
> we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
> detects that the signal would use the _sigfault layout, and introduce
> a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
> program may use to opt out of this behavior.

So I think while well intentioned this is misguided.

gdb and the like may use this but I expect the primary user is CRIU
which simply reads the signal out of one process saves it on disk
and then restores the signal as read into the new process (possibly
on a different machine).

At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
pass through kind of operation.

Eric
Haren Myneni Nov. 11, 2020, 11:10 a.m. UTC | #3
On Mon, 2020-11-09 at 19:54 -0600, Eric W. Biederman wrote:
> Peter you are patching buggy code for your siginfo extension can
> you please ignore vas-fault.c.  The code in vas-fault.c should
> be fixed separately.  Futher it uses clear_siginfo so you should
> get well defined behavior even if your new field is not initialized.
> 
> I have copied the powerpc folks so hopefully this buggy code
> can be fixed.
> 
> > diff --git a/arch/powerpc/platforms/powernv/vas-fault.c
> > b/arch/powerpc/platforms/powernv/vas-fault.c
> > index 3d21fce254b7..877e7d5fb4a2 100644
> > --- a/arch/powerpc/platforms/powernv/vas-fault.c
> > +++ b/arch/powerpc/platforms/powernv/vas-fault.c
> > @@ -154,6 +154,7 @@ static void update_csb(struct vas_window
> > *window,
> >  	info.si_errno = EFAULT;
> >  	info.si_code = SEGV_MAPERR;
> >  	info.si_addr = csb_addr;
> > +	info.si_faultflags = 0;
> 

Thanks Eric for your comments and pointing possible issues.

Here is the NX coprocessor interaction with user space and kernel:

- Process opens NX window / channel. The user space sends requests to
NX without kernel involvement. This request contains data buffer and
status block called coprocessor status block (CSB). 
- if NX sees fault on the request buffer or on CSB address, issue an
interrupt to kernel
- kernel updates the CSB with the fault information and then the
process can reissue request.
- If the fault is on CSB address and is not a valid address, sending
SEGV signal so that the process assign proper CSB and reissue new
request
- We are not seeing the invalid CSB address in the process context, but
during handling the fault later. So thought about sending SEGV signal
instead of killing the process since it is not a standard segfault. 
- All these windows will be closed upon process exit, but waits
until all pending requests are completed. So process will not exit with
pending requests, means after all faults handled if any. 
- In the case of multithread applications, NX windows will be closed
with the last thread. Means other threads can still issue requests with
these windows. So to support in these applications, take PID and MM
references during window open and release them later in close. 

> Powerpc folks.  This code was introduced in c96c4436aba4
> ("powerpc/vas:
> Update CSB and notify process for fault CRBs") and is badly buggy.
> 
> Let me count the bugs:
> 
> a) Using kill_pid_info.  That performs a permission check that
>    does not make sense from a kernel thread.
> 
> b) Manually filling in struct siginfo.  Everyone gets it wrong
>    and the powerpc code is no exception setting si_errno when
>    that is something Linux as a rule does not do.
> 
> Technically we have send_sig_fault to handle sending
> a fault from a non-sychrnous context but I am not convinced
> it make sense in this case.

Yes, kill_pid_info() -> group_send_sig_info() checks permissions which
is an extra step.  I think send_sig_fault may be used to replace the
above steps.

> 
> c) Sending an asynchronous SIGSEGV with the si_code set to
> SEGV_MAPERR.
>    How can userspace detect it is an asynchronous signal?  What can
>    userspace do if it detects an asynchronous signal?  If userspace
> is
>    so buggered as to give your kernel thread a bogus address I
> suspect
>    uncerimonious sending SIGKILL is probably the best you can do.

Application can assign new CSB and send new request when it catches the
signal. For example it can use csb_addr passed in si_addr amd decide
whether this SEGV is due to to CSB fault. Since it is an async signal,
was thinking for the application to recover instead of killing theprocess.  

> 
> There are some additional questionable things in that code like
> taking a
> task_struct reference simply to be able to test tsk->flags but no
> locks are held to ensure that tsk->flags are meaningful.  Nor are
> any tests performed to see if the task being tested still uses
> the designated mm.  I suspect exec could have been called.

tsk->flags is used to make sure not to send a signal if the task is in
exiting. We access this task under get/put_task_struct(). Also
kill_pid_info() sends signal if pid_task() is available. Since we are
taken mm reference, it can not be freed. 

So the task has to be present until all NX windows are closed.

> 
> In which case the code needs to check the mm, or at least play with
> exec_id to ensure you are not improperly signaling a process after
> exec.
> 
> None of this is to say that update_csb is fundmentally bad or hard to
> correct just that it has some significant defects in it's
> implementation
> right now that need to be corrected.  I am hoping a detailed
> accounting
> and pointing out those defects will allow the bug to be fixed.
> 
> Thank you,
> Eric
Dave Martin Nov. 11, 2020, 5:27 p.m. UTC | #4
On Mon, Nov 09, 2020 at 07:57:33PM -0600, Eric W. Biederman wrote:
> Peter Collingbourne <pcc@google.com> writes:
> 
> > This field will contain flags that may be used by signal handlers to
> > determine whether other fields in the _sigfault portion of siginfo are
> > valid. An example use case is the following patch, which introduces
> > the si_addr_tag_bits{,_mask} fields.
> >
> > A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
> > a signal handler to require the kernel to set the field (but note
> > that the field will be set anyway if the kernel supports the flag,
> > regardless of its value). In combination with the previous patches,
> > this allows a userspace program to determine whether the kernel will
> > set the field.
> >
> > It is possible for an si_faultflags-unaware program to cause a signal
> > handler in an si_faultflags-aware program to be called with a provided
> > siginfo data structure by using one of the following syscalls:
> >
> > - ptrace(PTRACE_SETSIGINFO)
> > - pidfd_send_signal
> > - rt_sigqueueinfo
> > - rt_tgsigqueueinfo
> >
> > So we need to prevent the si_faultflags-unaware program from causing an
> > uninitialized read of si_faultflags in the si_faultflags-aware program when
> > it uses one of these syscalls.
> >
> > The last three cases can be handled by observing that each of these
> > syscalls fails if si_code >= 0. We also observe that kill(2) and
> > tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
> > so we define si_faultflags to only be valid if si_code > 0.
> >
> > There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
> > we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
> > detects that the signal would use the _sigfault layout, and introduce
> > a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
> > program may use to opt out of this behavior.
> 
> So I think while well intentioned this is misguided.
> 
> gdb and the like may use this but I expect the primary user is CRIU
> which simply reads the signal out of one process saves it on disk
> and then restores the signal as read into the new process (possibly
> on a different machine).
> 
> At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
> pass through kind of operation.

This is a problem, though.

How can we tell the difference between a siginfo that was generated by
the kernel and a siginfo that was generated (or altered) by a non-xflags
aware userspace?

Short of revving the whole API, I don't see a simple solution to this.

Although a bit of a hack, could we include some kind of checksum in the
siginfo?  If the checksum matches during PTRACE_SETSIGINFO, we could
accept the whole thing; xflags included.  Otherwise, we could silently
drop non-self-describing extensions.

If we only need to generate the checksum when PTRACE_GETSIGINFO is
called then it might be feasible to use a strong hash; otherwise, this
mechanism will be far from bulletproof.

A hash has the advantage that we don't need any other information
to validate it beyond a salt: if the hash matches, it's self-
validating.  We could also package other data with it to describe the
presence of extensions, but relying on this for regular sigaction()/
signal delivery use feels too high-overhead.

For debuggers, I suspect that PTRACE_SETSIGINFO2 is still useful:
userspace callers that want to write an extension field that they
knowingly generated themselves should have a way to express that.

Thoughts?

Cheers
---Dave
Eric W. Biederman Nov. 11, 2020, 8:15 p.m. UTC | #5
Dave Martin <Dave.Martin@arm.com> writes:

> On Mon, Nov 09, 2020 at 07:57:33PM -0600, Eric W. Biederman wrote:
>> Peter Collingbourne <pcc@google.com> writes:
>> 
>> > This field will contain flags that may be used by signal handlers to
>> > determine whether other fields in the _sigfault portion of siginfo are
>> > valid. An example use case is the following patch, which introduces
>> > the si_addr_tag_bits{,_mask} fields.
>> >
>> > A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
>> > a signal handler to require the kernel to set the field (but note
>> > that the field will be set anyway if the kernel supports the flag,
>> > regardless of its value). In combination with the previous patches,
>> > this allows a userspace program to determine whether the kernel will
>> > set the field.
>> >
>> > It is possible for an si_faultflags-unaware program to cause a signal
>> > handler in an si_faultflags-aware program to be called with a provided
>> > siginfo data structure by using one of the following syscalls:
>> >
>> > - ptrace(PTRACE_SETSIGINFO)
>> > - pidfd_send_signal
>> > - rt_sigqueueinfo
>> > - rt_tgsigqueueinfo
>> >
>> > So we need to prevent the si_faultflags-unaware program from causing an
>> > uninitialized read of si_faultflags in the si_faultflags-aware program when
>> > it uses one of these syscalls.
>> >
>> > The last three cases can be handled by observing that each of these
>> > syscalls fails if si_code >= 0. We also observe that kill(2) and
>> > tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
>> > so we define si_faultflags to only be valid if si_code > 0.
>> >
>> > There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
>> > we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
>> > detects that the signal would use the _sigfault layout, and introduce
>> > a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
>> > program may use to opt out of this behavior.
>> 
>> So I think while well intentioned this is misguided.
>> 
>> gdb and the like may use this but I expect the primary user is CRIU
>> which simply reads the signal out of one process saves it on disk
>> and then restores the signal as read into the new process (possibly
>> on a different machine).
>> 
>> At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
>> pass through kind of operation.
>
> This is a problem, though.
>
> How can we tell the difference between a siginfo that was generated by
> the kernel and a siginfo that was generated (or altered) by a non-xflags
> aware userspace?
>
> Short of revving the whole API, I don't see a simple solution to this.

Unlike receiving a signal.  We do know that userspace old and new
always sends unused fields as zero into PTRACE_SETSIGINFO.

The split into kernel_siginfo verifies this and fails userspace if it
does something different.  No problems have been reported.

So in the case of xflags a non-xflags aware userspace would either pass
the siginfo from through from somewhere else (such as
PTRACE_GETSIGINFO), or it would simply generate a signal with all of
the xflags bits clear.  So everything should work regardless.

> Although a bit of a hack, could we include some kind of checksum in the
> siginfo?  If the checksum matches during PTRACE_SETSIGINFO, we could
> accept the whole thing; xflags included.  Otherwise, we could silently
> drop non-self-describing extensions.
>
> If we only need to generate the checksum when PTRACE_GETSIGINFO is
> called then it might be feasible to use a strong hash; otherwise, this
> mechanism will be far from bulletproof.
>
> A hash has the advantage that we don't need any other information
> to validate it beyond a salt: if the hash matches, it's self-
> validating.  We could also package other data with it to describe the
> presence of extensions, but relying on this for regular sigaction()/
> signal delivery use feels too high-overhead.
>
> For debuggers, I suspect that PTRACE_SETSIGINFO2 is still useful:
> userspace callers that want to write an extension field that they
> knowingly generated themselves should have a way to express that.
>
> Thoughts?

I think there are two cases:
1) CRIU  -- It is just a passthrough of PTRACE_GETSIGINFO
2) Creating a signal from nowhere -- Code that does not know about
   xflags would leave xflags at 0 so no problem.

Does anyone see any other cases I am missing?

Eric
Eric W. Biederman Nov. 11, 2020, 8:28 p.m. UTC | #6
ebiederm@xmission.com (Eric W. Biederman) writes:

> Dave Martin <Dave.Martin@arm.com> writes:
>
>> On Mon, Nov 09, 2020 at 07:57:33PM -0600, Eric W. Biederman wrote:
>>> Peter Collingbourne <pcc@google.com> writes:
>>> 
>>> > This field will contain flags that may be used by signal handlers to
>>> > determine whether other fields in the _sigfault portion of siginfo are
>>> > valid. An example use case is the following patch, which introduces
>>> > the si_addr_tag_bits{,_mask} fields.
>>> >
>>> > A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
>>> > a signal handler to require the kernel to set the field (but note
>>> > that the field will be set anyway if the kernel supports the flag,
>>> > regardless of its value). In combination with the previous patches,
>>> > this allows a userspace program to determine whether the kernel will
>>> > set the field.
>>> >
>>> > It is possible for an si_faultflags-unaware program to cause a signal
>>> > handler in an si_faultflags-aware program to be called with a provided
>>> > siginfo data structure by using one of the following syscalls:
>>> >
>>> > - ptrace(PTRACE_SETSIGINFO)
>>> > - pidfd_send_signal
>>> > - rt_sigqueueinfo
>>> > - rt_tgsigqueueinfo
>>> >
>>> > So we need to prevent the si_faultflags-unaware program from causing an
>>> > uninitialized read of si_faultflags in the si_faultflags-aware program when
>>> > it uses one of these syscalls.
>>> >
>>> > The last three cases can be handled by observing that each of these
>>> > syscalls fails if si_code >= 0. We also observe that kill(2) and
>>> > tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
>>> > so we define si_faultflags to only be valid if si_code > 0.
>>> >
>>> > There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
>>> > we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
>>> > detects that the signal would use the _sigfault layout, and introduce
>>> > a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
>>> > program may use to opt out of this behavior.
>>> 
>>> So I think while well intentioned this is misguided.
>>> 
>>> gdb and the like may use this but I expect the primary user is CRIU
>>> which simply reads the signal out of one process saves it on disk
>>> and then restores the signal as read into the new process (possibly
>>> on a different machine).
>>> 
>>> At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
>>> pass through kind of operation.
>>
>> This is a problem, though.
>>
>> How can we tell the difference between a siginfo that was generated by
>> the kernel and a siginfo that was generated (or altered) by a non-xflags
>> aware userspace?
>>
>> Short of revving the whole API, I don't see a simple solution to this.
>
> Unlike receiving a signal.  We do know that userspace old and new
> always sends unused fields as zero into PTRACE_SETSIGINFO.
>
> The split into kernel_siginfo verifies this and fails userspace if it
> does something different.  No problems have been reported.
>
> So in the case of xflags a non-xflags aware userspace would either pass
> the siginfo from through from somewhere else (such as
> PTRACE_GETSIGINFO), or it would simply generate a signal with all of
> the xflags bits clear.  So everything should work regardless.
>
>> Although a bit of a hack, could we include some kind of checksum in the
>> siginfo?  If the checksum matches during PTRACE_SETSIGINFO, we could
>> accept the whole thing; xflags included.  Otherwise, we could silently
>> drop non-self-describing extensions.
>>
>> If we only need to generate the checksum when PTRACE_GETSIGINFO is
>> called then it might be feasible to use a strong hash; otherwise, this
>> mechanism will be far from bulletproof.
>>
>> A hash has the advantage that we don't need any other information
>> to validate it beyond a salt: if the hash matches, it's self-
>> validating.  We could also package other data with it to describe the
>> presence of extensions, but relying on this for regular sigaction()/
>> signal delivery use feels too high-overhead.
>>
>> For debuggers, I suspect that PTRACE_SETSIGINFO2 is still useful:
>> userspace callers that want to write an extension field that they
>> knowingly generated themselves should have a way to express that.
>>
>> Thoughts?
>
> I think there are two cases:
> 1) CRIU  -- It is just a passthrough of PTRACE_GETSIGINFO
> 2) Creating a signal from nowhere -- Code that does not know about
>    xflags would leave xflags at 0 so no problem.
>
> Does anyone see any other cases I am missing?
>

Zoinks.  I forgot to read and double check the code I wrote.

copy_siginfo_from_user only verifies against 0 when we don't know the
layout.  So I don't know if we can count on userspace providing the
extra data as 0 or not.

So if we do indeed continue to need xflags we might care.

It is currently an undefined non-sense case to provide non-zero fields
there.  So I think it is reasonable to expect even debuggers generating
signals to set those fields to know values such as 0.

Further I expect it is rare for debuggers to generate pretend faults.

So I would say perform whatever testing we can, so that there are no
obvious problem users of PTRACE_SETSIGINFO and then to simply not worry
about the PTRACE_SETSIGINFO unless someone reports a regression.

But fist let's see if we really need xflags at all.

Eric
Eric W. Biederman Nov. 11, 2020, 8:46 p.m. UTC | #7
Haren Myneni <haren@linux.ibm.com> writes:

> On Mon, 2020-11-09 at 19:54 -0600, Eric W. Biederman wrote:
>> Peter you are patching buggy code for your siginfo extension can
>> you please ignore vas-fault.c.  The code in vas-fault.c should
>> be fixed separately.  Futher it uses clear_siginfo so you should
>> get well defined behavior even if your new field is not initialized.
>> 
>> I have copied the powerpc folks so hopefully this buggy code
>> can be fixed.
>> 
>> > diff --git a/arch/powerpc/platforms/powernv/vas-fault.c
>> > b/arch/powerpc/platforms/powernv/vas-fault.c
>> > index 3d21fce254b7..877e7d5fb4a2 100644
>> > --- a/arch/powerpc/platforms/powernv/vas-fault.c
>> > +++ b/arch/powerpc/platforms/powernv/vas-fault.c
>> > @@ -154,6 +154,7 @@ static void update_csb(struct vas_window
>> > *window,
>> >  	info.si_errno = EFAULT;
>> >  	info.si_code = SEGV_MAPERR;
>> >  	info.si_addr = csb_addr;
>> > +	info.si_faultflags = 0;
>> 
>
> Thanks Eric for your comments and pointing possible issues.
>
> Here is the NX coprocessor interaction with user space and kernel:
>
> - Process opens NX window / channel. The user space sends requests to
> NX without kernel involvement. This request contains data buffer and
> status block called coprocessor status block (CSB). 
> - if NX sees fault on the request buffer or on CSB address, issue an
> interrupt to kernel
> - kernel updates the CSB with the fault information and then the
> process can reissue request.
> - If the fault is on CSB address and is not a valid address, sending
> SEGV signal so that the process assign proper CSB and reissue new
> request
> - We are not seeing the invalid CSB address in the process context, but
> during handling the fault later. So thought about sending SEGV signal
> instead of killing the process since it is not a standard segfault. 
> - All these windows will be closed upon process exit, but waits
> until all pending requests are completed. So process will not exit with
> pending requests, means after all faults handled if any.

I see a lot that doesn't seem to match that description.

Consider forking a child and having that child exit, but still have
the file descriptor open.  The release method will be called.

If you consider file descriptor passing there are much more interesting
cases involved.

> - In the case of multithread applications, NX windows will be closed
> with the last thread. Means other threads can still issue requests with
> these windows. So to support in these applications, take PID and MM
> references during window open and release them later in close. 
>
>> Powerpc folks.  This code was introduced in c96c4436aba4
>> ("powerpc/vas:
>> Update CSB and notify process for fault CRBs") and is badly buggy.
>> 
>> Let me count the bugs:
>> 
>> a) Using kill_pid_info.  That performs a permission check that
>>    does not make sense from a kernel thread.
>> 
>> b) Manually filling in struct siginfo.  Everyone gets it wrong
>>    and the powerpc code is no exception setting si_errno when
>>    that is something Linux as a rule does not do.
>> 
>> Technically we have send_sig_fault to handle sending
>> a fault from a non-sychrnous context but I am not convinced
>> it make sense in this case.
>
> Yes, kill_pid_info() -> group_send_sig_info() checks permissions which
> is an extra step.  I think send_sig_fault may be used to replace the
> above steps.

Make that change at a minimum, please.

>> c) Sending an asynchronous SIGSEGV with the si_code set to
>> SEGV_MAPERR.
>>    How can userspace detect it is an asynchronous signal?  What can
>>    userspace do if it detects an asynchronous signal?  If userspace
>> is
>>    so buggered as to give your kernel thread a bogus address I
>> suspect
>>    uncerimonious sending SIGKILL is probably the best you can do.
>
> Application can assign new CSB and send new request when it catches the
> signal. For example it can use csb_addr passed in si_addr amd decide
> whether this SEGV is due to to CSB fault. Since it is an async signal,
> was thinking for the application to recover instead of killing theprocess.  

An async signal the process can recover from is fine (caveats about
exec, exit, and file descripotr passing aside).

You want something to mark this as an asynchronous event so that
userspace can tell that signal from other signals.

If only a single coprocessor is possible it could be as simple
as using a special si_code for this case.

>> There are some additional questionable things in that code like
>> taking a
>> task_struct reference simply to be able to test tsk->flags but no
>> locks are held to ensure that tsk->flags are meaningful.  Nor are
>> any tests performed to see if the task being tested still uses
>> the designated mm.  I suspect exec could have been called.
>
> tsk->flags is used to make sure not to send a signal if the task is in
> exiting. We access this task under get/put_task_struct().

The thing ins get/put_task_struct is about a reference count.  It is not
a lock.  So the task can still set PF_EXITING after you have tested it
and before you call copy_to_user.   So testing PF_EXITING gets you
practically nothing.

Further the signal delivery code can already cop with attempting to send
a signal to a task that is in PF_EXITING so I don't see the point of
trying to avoid that situation.


> Also 
> kill_pid_info() sends signal if pid_task() is available. Since we are
> taken mm reference, it can not be freed.

But task->mm can be different if the task has called exec.

> So the task has to be present until all NX windows are closed.

I don't see how that follows from anything the code does or you have
said.  The lifetime of an mm and the lifetime of a pid are independent
of the lifetime of a task.  

>> In which case the code needs to check the mm, or at least play with
>> exec_id to ensure you are not improperly signaling a process after
>> exec.
>> 
>> None of this is to say that update_csb is fundmentally bad or hard to
>> correct just that it has some significant defects in it's
>> implementation
>> right now that need to be corrected.  I am hoping a detailed
>> accounting
>> and pointing out those defects will allow the bug to be fixed.
>> 
>> Thank you,
>> Eric

Eric
Dave Martin Nov. 12, 2020, 5:21 p.m. UTC | #8
On Wed, Nov 11, 2020 at 02:28:38PM -0600, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
> > Dave Martin <Dave.Martin@arm.com> writes:
> >
> >> On Mon, Nov 09, 2020 at 07:57:33PM -0600, Eric W. Biederman wrote:
> >>> Peter Collingbourne <pcc@google.com> writes:
> >>> 
> >>> > This field will contain flags that may be used by signal handlers to
> >>> > determine whether other fields in the _sigfault portion of siginfo are
> >>> > valid. An example use case is the following patch, which introduces
> >>> > the si_addr_tag_bits{,_mask} fields.
> >>> >
> >>> > A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
> >>> > a signal handler to require the kernel to set the field (but note
> >>> > that the field will be set anyway if the kernel supports the flag,
> >>> > regardless of its value). In combination with the previous patches,
> >>> > this allows a userspace program to determine whether the kernel will
> >>> > set the field.
> >>> >
> >>> > It is possible for an si_faultflags-unaware program to cause a signal
> >>> > handler in an si_faultflags-aware program to be called with a provided
> >>> > siginfo data structure by using one of the following syscalls:
> >>> >
> >>> > - ptrace(PTRACE_SETSIGINFO)
> >>> > - pidfd_send_signal
> >>> > - rt_sigqueueinfo
> >>> > - rt_tgsigqueueinfo
> >>> >
> >>> > So we need to prevent the si_faultflags-unaware program from causing an
> >>> > uninitialized read of si_faultflags in the si_faultflags-aware program when
> >>> > it uses one of these syscalls.
> >>> >
> >>> > The last three cases can be handled by observing that each of these
> >>> > syscalls fails if si_code >= 0. We also observe that kill(2) and
> >>> > tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
> >>> > so we define si_faultflags to only be valid if si_code > 0.
> >>> >
> >>> > There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
> >>> > we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
> >>> > detects that the signal would use the _sigfault layout, and introduce
> >>> > a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
> >>> > program may use to opt out of this behavior.
> >>> 
> >>> So I think while well intentioned this is misguided.
> >>> 
> >>> gdb and the like may use this but I expect the primary user is CRIU
> >>> which simply reads the signal out of one process saves it on disk
> >>> and then restores the signal as read into the new process (possibly
> >>> on a different machine).
> >>> 
> >>> At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
> >>> pass through kind of operation.
> >>
> >> This is a problem, though.
> >>
> >> How can we tell the difference between a siginfo that was generated by
> >> the kernel and a siginfo that was generated (or altered) by a non-xflags
> >> aware userspace?
> >>
> >> Short of revving the whole API, I don't see a simple solution to this.
> >
> > Unlike receiving a signal.  We do know that userspace old and new
> > always sends unused fields as zero into PTRACE_SETSIGINFO.
> >
> > The split into kernel_siginfo verifies this and fails userspace if it
> > does something different.  No problems have been reported.
> >
> > So in the case of xflags a non-xflags aware userspace would either pass
> > the siginfo from through from somewhere else (such as
> > PTRACE_GETSIGINFO), or it would simply generate a signal with all of
> > the xflags bits clear.  So everything should work regardless.
> >
> >> Although a bit of a hack, could we include some kind of checksum in the
> >> siginfo?  If the checksum matches during PTRACE_SETSIGINFO, we could
> >> accept the whole thing; xflags included.  Otherwise, we could silently
> >> drop non-self-describing extensions.
> >>
> >> If we only need to generate the checksum when PTRACE_GETSIGINFO is
> >> called then it might be feasible to use a strong hash; otherwise, this
> >> mechanism will be far from bulletproof.
> >>
> >> A hash has the advantage that we don't need any other information
> >> to validate it beyond a salt: if the hash matches, it's self-
> >> validating.  We could also package other data with it to describe the
> >> presence of extensions, but relying on this for regular sigaction()/
> >> signal delivery use feels too high-overhead.
> >>
> >> For debuggers, I suspect that PTRACE_SETSIGINFO2 is still useful:
> >> userspace callers that want to write an extension field that they
> >> knowingly generated themselves should have a way to express that.
> >>
> >> Thoughts?
> >
> > I think there are two cases:
> > 1) CRIU  -- It is just a passthrough of PTRACE_GETSIGINFO
> > 2) Creating a signal from nowhere -- Code that does not know about
> >    xflags would leave xflags at 0 so no problem.
> >
> > Does anyone see any other cases I am missing?
> >
> 
> Zoinks.  I forgot to read and double check the code I wrote.
> 
> copy_siginfo_from_user only verifies against 0 when we don't know the
> layout.  So I don't know if we can count on userspace providing the
> extra data as 0 or not.
> 
> So if we do indeed continue to need xflags we might care.
>
> It is currently an undefined non-sense case to provide non-zero fields
> there.  So I think it is reasonable to expect even debuggers generating
> signals to set those fields to know values such as 0.

You may well be right that the only time extra fields coming from
userspace should be deliberately nonzero is in the passthrough case.

I'm still concerned about padding or type-punning issues causing
unallocated fields to be initialised with junk, even in the presence of
a memset(0).  I haven't actually seen this happen, but the language
standards seem to allow it and compilers may get more aggressive in the
future.

This is why I think we still want an explicit way for userspace to
indicate what extension fields initialised.

Am I missing something that eliminates the danger today?

> Further I expect it is rare for debuggers to generate pretend faults.

That seems a reasonable assumption.  Hand-rolling a fault siginfo seems
a pretty fragile thing to do, if it works reliably at all.

> So I would say perform whatever testing we can, so that there are no
> obvious problem users of PTRACE_SETSIGINFO and then to simply not worry
> about the PTRACE_SETSIGINFO unless someone reports a regression.
> 
> But fist let's see if we really need xflags at all.
> 
> Eric

Ack

Cheers
---Dave
Dave Martin Nov. 12, 2020, 5:23 p.m. UTC | #9
On Wed, Nov 11, 2020 at 02:15:15PM -0600, Eric W. Biederman wrote:
> Dave Martin <Dave.Martin@arm.com> writes:
> 
> > On Mon, Nov 09, 2020 at 07:57:33PM -0600, Eric W. Biederman wrote:
> >> Peter Collingbourne <pcc@google.com> writes:
> >> 
> >> > This field will contain flags that may be used by signal handlers to
> >> > determine whether other fields in the _sigfault portion of siginfo are
> >> > valid. An example use case is the following patch, which introduces
> >> > the si_addr_tag_bits{,_mask} fields.
> >> >
> >> > A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
> >> > a signal handler to require the kernel to set the field (but note
> >> > that the field will be set anyway if the kernel supports the flag,
> >> > regardless of its value). In combination with the previous patches,
> >> > this allows a userspace program to determine whether the kernel will
> >> > set the field.
> >> >
> >> > It is possible for an si_faultflags-unaware program to cause a signal
> >> > handler in an si_faultflags-aware program to be called with a provided
> >> > siginfo data structure by using one of the following syscalls:
> >> >
> >> > - ptrace(PTRACE_SETSIGINFO)
> >> > - pidfd_send_signal
> >> > - rt_sigqueueinfo
> >> > - rt_tgsigqueueinfo
> >> >
> >> > So we need to prevent the si_faultflags-unaware program from causing an
> >> > uninitialized read of si_faultflags in the si_faultflags-aware program when
> >> > it uses one of these syscalls.
> >> >
> >> > The last three cases can be handled by observing that each of these
> >> > syscalls fails if si_code >= 0. We also observe that kill(2) and
> >> > tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
> >> > so we define si_faultflags to only be valid if si_code > 0.
> >> >
> >> > There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
> >> > we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
> >> > detects that the signal would use the _sigfault layout, and introduce
> >> > a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
> >> > program may use to opt out of this behavior.
> >> 
> >> So I think while well intentioned this is misguided.
> >> 
> >> gdb and the like may use this but I expect the primary user is CRIU
> >> which simply reads the signal out of one process saves it on disk
> >> and then restores the signal as read into the new process (possibly
> >> on a different machine).
> >> 
> >> At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
> >> pass through kind of operation.
> >
> > This is a problem, though.
> >
> > How can we tell the difference between a siginfo that was generated by
> > the kernel and a siginfo that was generated (or altered) by a non-xflags
> > aware userspace?
> >
> > Short of revving the whole API, I don't see a simple solution to this.
> 
> Unlike receiving a signal.  We do know that userspace old and new
> always sends unused fields as zero into PTRACE_SETSIGINFO.
> 
> The split into kernel_siginfo verifies this and fails userspace if it
> does something different.  No problems have been reported.
> 
> So in the case of xflags a non-xflags aware userspace would either pass
> the siginfo from through from somewhere else (such as
> PTRACE_GETSIGINFO), or it would simply generate a signal with all of
> the xflags bits clear.  So everything should work regardless.
> 
> > Although a bit of a hack, could we include some kind of checksum in the
> > siginfo?  If the checksum matches during PTRACE_SETSIGINFO, we could
> > accept the whole thing; xflags included.  Otherwise, we could silently
> > drop non-self-describing extensions.
> >
> > If we only need to generate the checksum when PTRACE_GETSIGINFO is
> > called then it might be feasible to use a strong hash; otherwise, this
> > mechanism will be far from bulletproof.
> >
> > A hash has the advantage that we don't need any other information
> > to validate it beyond a salt: if the hash matches, it's self-
> > validating.  We could also package other data with it to describe the
> > presence of extensions, but relying on this for regular sigaction()/
> > signal delivery use feels too high-overhead.
> >
> > For debuggers, I suspect that PTRACE_SETSIGINFO2 is still useful:
> > userspace callers that want to write an extension field that they
> > knowingly generated themselves should have a way to express that.
> >
> > Thoughts?

Eric, did you have any view on the hash idea here?

Cheers
---Dave
Eric W. Biederman Nov. 12, 2020, 8:01 p.m. UTC | #10
Dave Martin <Dave.Martin@arm.com> writes:

> On Wed, Nov 11, 2020 at 02:15:15PM -0600, Eric W. Biederman wrote:
>> Dave Martin <Dave.Martin@arm.com> writes:
>> 
>> > On Mon, Nov 09, 2020 at 07:57:33PM -0600, Eric W. Biederman wrote:
>> >> Peter Collingbourne <pcc@google.com> writes:
>> >> 
>> >> > This field will contain flags that may be used by signal handlers to
>> >> > determine whether other fields in the _sigfault portion of siginfo are
>> >> > valid. An example use case is the following patch, which introduces
>> >> > the si_addr_tag_bits{,_mask} fields.
>> >> >
>> >> > A new sigcontext flag, SA_FAULTFLAGS, is introduced in order to allow
>> >> > a signal handler to require the kernel to set the field (but note
>> >> > that the field will be set anyway if the kernel supports the flag,
>> >> > regardless of its value). In combination with the previous patches,
>> >> > this allows a userspace program to determine whether the kernel will
>> >> > set the field.
>> >> >
>> >> > It is possible for an si_faultflags-unaware program to cause a signal
>> >> > handler in an si_faultflags-aware program to be called with a provided
>> >> > siginfo data structure by using one of the following syscalls:
>> >> >
>> >> > - ptrace(PTRACE_SETSIGINFO)
>> >> > - pidfd_send_signal
>> >> > - rt_sigqueueinfo
>> >> > - rt_tgsigqueueinfo
>> >> >
>> >> > So we need to prevent the si_faultflags-unaware program from causing an
>> >> > uninitialized read of si_faultflags in the si_faultflags-aware program when
>> >> > it uses one of these syscalls.
>> >> >
>> >> > The last three cases can be handled by observing that each of these
>> >> > syscalls fails if si_code >= 0. We also observe that kill(2) and
>> >> > tgkill(2) may be used to send a signal where si_code == 0 (SI_USER),
>> >> > so we define si_faultflags to only be valid if si_code > 0.
>> >> >
>> >> > There is no such check on si_code in ptrace(PTRACE_SETSIGINFO), so
>> >> > we make ptrace(PTRACE_SETSIGINFO) clear the si_faultflags field if it
>> >> > detects that the signal would use the _sigfault layout, and introduce
>> >> > a new ptrace request type, PTRACE_SETSIGINFO2, that a si_faultflags-aware
>> >> > program may use to opt out of this behavior.
>> >> 
>> >> So I think while well intentioned this is misguided.
>> >> 
>> >> gdb and the like may use this but I expect the primary user is CRIU
>> >> which simply reads the signal out of one process saves it on disk
>> >> and then restores the signal as read into the new process (possibly
>> >> on a different machine).
>> >> 
>> >> At least for the CRIU usage PTRACE_SETSIGINFO need to remain a raw
>> >> pass through kind of operation.
>> >
>> > This is a problem, though.
>> >
>> > How can we tell the difference between a siginfo that was generated by
>> > the kernel and a siginfo that was generated (or altered) by a non-xflags
>> > aware userspace?
>> >
>> > Short of revving the whole API, I don't see a simple solution to this.
>> 
>> Unlike receiving a signal.  We do know that userspace old and new
>> always sends unused fields as zero into PTRACE_SETSIGINFO.
>> 
>> The split into kernel_siginfo verifies this and fails userspace if it
>> does something different.  No problems have been reported.
>> 
>> So in the case of xflags a non-xflags aware userspace would either pass
>> the siginfo from through from somewhere else (such as
>> PTRACE_GETSIGINFO), or it would simply generate a signal with all of
>> the xflags bits clear.  So everything should work regardless.
>> 
>> > Although a bit of a hack, could we include some kind of checksum in the
>> > siginfo?  If the checksum matches during PTRACE_SETSIGINFO, we could
>> > accept the whole thing; xflags included.  Otherwise, we could silently
>> > drop non-self-describing extensions.
>> >
>> > If we only need to generate the checksum when PTRACE_GETSIGINFO is
>> > called then it might be feasible to use a strong hash; otherwise, this
>> > mechanism will be far from bulletproof.
>> >
>> > A hash has the advantage that we don't need any other information
>> > to validate it beyond a salt: if the hash matches, it's self-
>> > validating.  We could also package other data with it to describe the
>> > presence of extensions, but relying on this for regular sigaction()/
>> > signal delivery use feels too high-overhead.
>> >
>> > For debuggers, I suspect that PTRACE_SETSIGINFO2 is still useful:
>> > userspace callers that want to write an extension field that they
>> > knowingly generated themselves should have a way to express that.
>> >
>> > Thoughts?
>
> Eric, did you have any view on the hash idea here?

I am not quite certain what you meant by salt.  A per kernel instance
secret I suspect.

Such a secret would break creating siginfo by hand and checkpointing
and restoring on a different machine.

If you don't go with full crypto security it sounds like it would work.

If we really need to deploy xflags I think it bears looking at, but
right now it feels like one thing too many.



Eric
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/powernv/vas-fault.c b/arch/powerpc/platforms/powernv/vas-fault.c
index 3d21fce254b7..877e7d5fb4a2 100644
--- a/arch/powerpc/platforms/powernv/vas-fault.c
+++ b/arch/powerpc/platforms/powernv/vas-fault.c
@@ -154,6 +154,7 @@  static void update_csb(struct vas_window *window,
 	info.si_errno = EFAULT;
 	info.si_code = SEGV_MAPERR;
 	info.si_addr = csb_addr;
+	info.si_faultflags = 0;
 
 	/*
 	 * process will be polling on csb.flags after request is sent to
diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index ddfd919be46c..222ff6178571 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -121,8 +121,8 @@  static inline void signal_compat_build_tests(void)
 #endif
 
 	CHECK_CSI_OFFSET(_sigfault);
-	CHECK_CSI_SIZE  (_sigfault, 4*sizeof(int));
-	CHECK_SI_SIZE   (_sigfault, 8*sizeof(int));
+	CHECK_CSI_SIZE  (_sigfault, 8*sizeof(int));
+	CHECK_SI_SIZE   (_sigfault, 16*sizeof(int));
 
 	BUILD_BUG_ON(offsetof(siginfo_t, si_addr) != 0x10);
 	BUILD_BUG_ON(offsetof(compat_siginfo_t, si_addr) != 0x0C);
@@ -138,6 +138,9 @@  static inline void signal_compat_build_tests(void)
 	BUILD_BUG_ON(offsetof(siginfo_t, si_pkey) != 0x20);
 	BUILD_BUG_ON(offsetof(compat_siginfo_t, si_pkey) != 0x14);
 
+	BUILD_BUG_ON(offsetof(siginfo_t, si_faultflags) != 0x48);
+	BUILD_BUG_ON(offsetof(compat_siginfo_t, si_faultflags) != 0x28);
+
 	CHECK_CSI_OFFSET(_sigpoll);
 	CHECK_CSI_SIZE  (_sigpoll, 2*sizeof(int));
 	CHECK_SI_SIZE   (_sigpoll, 4*sizeof(int));
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 14d514233e1d..84d3b72be701 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -236,7 +236,9 @@  typedef struct compat_siginfo {
 					char _dummy_pkey[__COMPAT_ADDR_BND_PKEY_PAD];
 					u32 _pkey;
 				} _addr_pkey;
+				compat_uptr_t _pad[6];
 			};
+			u32 _faultflags;
 		} _sigfault;
 
 		/* SIGPOLL */
diff --git a/include/linux/signal_types.h b/include/linux/signal_types.h
index a7887ad84d36..7501209eae33 100644
--- a/include/linux/signal_types.h
+++ b/include/linux/signal_types.h
@@ -78,6 +78,6 @@  struct ksignal {
 
 #define UAPI_SA_FLAGS                                                          \
 	(SA_NOCLDSTOP | SA_NOCLDWAIT | SA_SIGINFO | SA_ONSTACK | SA_RESTART |  \
-	 SA_NODEFER | SA_RESETHAND | __ARCH_UAPI_SA_FLAGS)
+	 SA_NODEFER | SA_RESETHAND | SA_FAULTFLAGS | __ARCH_UAPI_SA_FLAGS)
 
 #endif /* _LINUX_SIGNAL_TYPES_H */
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index 7aacf9389010..10a55aed9ede 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,7 +91,12 @@  union __sifields {
 				char _dummy_pkey[__ADDR_BND_PKEY_PAD];
 				__u32 _pkey;
 			} _addr_pkey;
+			void *_pad[6];
 		};
+		__u32 _faultflags;
+#ifdef __LP64__
+		__u32 _pad2; /* to be used if we add another 32-bit field */
+#endif
 	} _sigfault;
 
 	/* SIGPOLL */
@@ -152,6 +157,8 @@  typedef struct siginfo {
 #define si_trapno	_sifields._sigfault._trapno
 #endif
 #define si_addr_lsb	_sifields._sigfault._addr_lsb
+/* si_faultflags is only valid if 0 < si_code < SI_KERNEL */
+#define si_faultflags	_sifields._sigfault._faultflags
 #define si_lower	_sifields._sigfault._addr_bnd._lower
 #define si_upper	_sifields._sigfault._addr_bnd._upper
 #define si_pkey		_sifields._sigfault._addr_pkey._pkey
diff --git a/include/uapi/asm-generic/signal-defs.h b/include/uapi/asm-generic/signal-defs.h
index 0126ebda4d31..e27bf959d4c4 100644
--- a/include/uapi/asm-generic/signal-defs.h
+++ b/include/uapi/asm-generic/signal-defs.h
@@ -20,6 +20,9 @@ 
  * so this bit allows flag bit support to be detected from userspace while
  * allowing an old kernel to be distinguished from a kernel that supports every
  * flag bit.
+ * SA_FAULTFLAGS indicates that the signal handler requires the siginfo.si_faultflags
+ * field to be valid. Note that if the kernel supports SA_FAULTFLAGS, the field will
+ * be valid regardless of the value of this flag.
  *
  * SA_ONESHOT and SA_NOMASK are the historical Linux names for the Single
  * Unix names RESETHAND and NODEFER respectively.
@@ -49,6 +52,7 @@ 
 #define SA_RESETHAND	0x80000000
 #endif
 #define SA_UNSUPPORTED	0x00000400
+#define SA_FAULTFLAGS	0x00000800
 
 #define SA_NOMASK	SA_NODEFER
 #define SA_ONESHOT	SA_RESETHAND
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index a71b6e3b03eb..722a2c8a4d3d 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -101,6 +101,18 @@  struct ptrace_syscall_info {
 	};
 };
 
+#define PTRACE_SETSIGINFO2	0x420f
+/*
+ * These flags are passed as the addr argument to ptrace.
+ */
+
+/*
+ * Asserts that the caller is aware of the field siginfo.si_faultflags. Prevents
+ * the kernel from automatically setting the field to 0 when the signal uses
+ * a sigfault layout.
+ */
+#define PTRACE_SIGINFO_FAULTFLAGS	0x1
+
 /*
  * These values are stored in task->ptrace_message
  * by tracehook_report_syscall_* to describe the current syscall-stop.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..ab0618b4602c 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -687,18 +687,31 @@  static int ptrace_getsiginfo(struct task_struct *child, kernel_siginfo_t *info)
 	return error;
 }
 
-static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *info)
+static int ptrace_setsiginfo(struct task_struct *child, unsigned long flags,
+			     kernel_siginfo_t *info)
 {
-	unsigned long flags;
+	unsigned long lock_flags;
 	int error = -ESRCH;
 
-	if (lock_task_sighand(child, &flags)) {
+	if (flags & ~PTRACE_SIGINFO_FAULTFLAGS)
+		return -EINVAL;
+
+	/*
+	 * If the caller is unaware of si_faultflags and we're using a layout that
+	 * requires it, set it to 0 which means "no fields are available".
+	 */
+	if (!(flags & PTRACE_SIGINFO_FAULTFLAGS) &&
+	    siginfo_layout_is_fault(
+		    siginfo_layout(info->si_signo, info->si_code)))
+		info->si_faultflags = 0;
+
+	if (lock_task_sighand(child, &lock_flags)) {
 		error = -EINVAL;
 		if (likely(child->last_siginfo != NULL)) {
 			copy_siginfo(child->last_siginfo, info);
 			error = 0;
 		}
-		unlock_task_sighand(child, &flags);
+		unlock_task_sighand(child, &lock_flags);
 	}
 	return error;
 }
@@ -1038,9 +1051,13 @@  int ptrace_request(struct task_struct *child, long request,
 		break;
 
 	case PTRACE_SETSIGINFO:
+		addr = 0;
+		fallthrough;
+
+	case PTRACE_SETSIGINFO2:
 		ret = copy_siginfo_from_user(&siginfo, datavp);
 		if (!ret)
-			ret = ptrace_setsiginfo(child, &siginfo);
+			ret = ptrace_setsiginfo(child, addr, &siginfo);
 		break;
 
 	case PTRACE_GETSIGMASK: {
@@ -1347,10 +1364,14 @@  int compat_ptrace_request(struct task_struct *child, compat_long_t request,
 		break;
 
 	case PTRACE_SETSIGINFO:
+		addr = 0;
+		fallthrough;
+
+	case PTRACE_SETSIGINFO2:
 		ret = copy_siginfo_from_user32(
 			&siginfo, (struct compat_siginfo __user *) datap);
 		if (!ret)
-			ret = ptrace_setsiginfo(child, &siginfo);
+			ret = ptrace_setsiginfo(child, addr, &siginfo);
 		break;
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
 	case PTRACE_GETREGSET:
diff --git a/kernel/signal.c b/kernel/signal.c
index d18930aafbf4..1fd1f0d12174 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1657,6 +1657,7 @@  static void set_sigfault_common_fields(struct kernel_siginfo *info, int sig,
 	info->si_errno = 0;
 	info->si_code = code;
 	info->si_addr = addr;
+	info->si_faultflags = 0;
 }
 
 int force_sig_fault_to_task(int sig, int code, void __user *addr
@@ -3270,6 +3271,7 @@  void copy_siginfo_to_external32(struct compat_siginfo *to,
 #ifdef __ARCH_SI_TRAPNO
 		to->si_trapno = from->si_trapno;
 #endif
+		to->si_faultflags = from->si_faultflags;
 	}
 
 	switch (layout) {
@@ -3345,6 +3347,7 @@  static int post_copy_siginfo_from_user32(kernel_siginfo_t *to,
 #ifdef __ARCH_SI_TRAPNO
 		to->si_trapno = from->si_trapno;
 #endif
+		to->si_faultflags = from->si_faultflags;
 	}
 
 	switch (layout) {