diff mbox series

[v5,19/30] arm64: add POE signal support

Message ID 20240822151113.1479789-20-joey.gouly@arm.com (mailing list archive)
State New, archived
Headers show
Series Permission Overlay Extension | expand

Commit Message

Joey Gouly Aug. 22, 2024, 3:11 p.m. UTC
Add PKEY support to signals, by saving and restoring POR_EL0 from the stackframe.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/include/uapi/asm/sigcontext.h |  7 +++
 arch/arm64/kernel/signal.c               | 62 ++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

Comments

Kevin Brodsky Sept. 24, 2024, 11:27 a.m. UTC | #1
On 22/08/2024 17:11, Joey Gouly wrote:
> @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
>  		sme_smstop();
>  	}
>  
> +	if (system_supports_poe())
> +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);

At the point where setup_return() is called, the signal frame has
already been written to the user stack. In other words, we write to the
user stack first, and then reset POR_EL0. This may be problematic,
especially if we are using the alternate signal stack, which the
interrupted POR_EL0 may not grant access to. In that situation uaccess
will fail and we'll end up with a SIGSEGV.

This issue has already been discussed on the x86 side, and as it happens
patches to reset PKRU early [1] have just landed. I don't think this is
a blocker for getting this series landed, but we should try and align
with x86. If there's no objection, I'm planning to work on a counterpart
to the x86 series (resetting POR_EL0 early during signal delivery).

Kevin

[1]
https://lore.kernel.org/lkml/20240802061318.2140081-2-aruna.ramakrishna@oracle.com/

> +
>  	if (ka->sa.sa_flags & SA_RESTORER)
>  		sigtramp = ka->sa.sa_restorer;
>  	else
Dave Martin Sept. 24, 2024, 3:04 p.m. UTC | #2
On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> On 22/08/2024 17:11, Joey Gouly wrote:
> > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> >  		sme_smstop();
> >  	}
> >  
> > +	if (system_supports_poe())
> > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> 
> At the point where setup_return() is called, the signal frame has
> already been written to the user stack. In other words, we write to the
> user stack first, and then reset POR_EL0. This may be problematic,
> especially if we are using the alternate signal stack, which the
> interrupted POR_EL0 may not grant access to. In that situation uaccess
> will fail and we'll end up with a SIGSEGV.
> 
> This issue has already been discussed on the x86 side, and as it happens
> patches to reset PKRU early [1] have just landed. I don't think this is
> a blocker for getting this series landed, but we should try and align
> with x86. If there's no objection, I'm planning to work on a counterpart
> to the x86 series (resetting POR_EL0 early during signal delivery).
> 
> Kevin
> 
> [1]
> https://lore.kernel.org/lkml/20240802061318.2140081-2-aruna.ramakrishna@oracle.com/

+1, all the uaccess in signal delivery is done by the kernel on behalf
of the signal handler context, so we should do it with (at least) the
same memory permissions that the signal handler is going to be entered
with.

(In an ideal world, userspace would save this information itself, using
its own handler permissions -- well, no, in an ideal world we wouldn't
have the signal delivery mechanism at all, but hopefully you get the
idea.)

Cheers
---Dave
>
Will Deacon Oct. 9, 2024, 2:43 p.m. UTC | #3
Hi Kevin,

On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> On 22/08/2024 17:11, Joey Gouly wrote:
> > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> >  		sme_smstop();
> >  	}
> >  
> > +	if (system_supports_poe())
> > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> 
> At the point where setup_return() is called, the signal frame has
> already been written to the user stack. In other words, we write to the
> user stack first, and then reset POR_EL0. This may be problematic,
> especially if we are using the alternate signal stack, which the
> interrupted POR_EL0 may not grant access to. In that situation uaccess
> will fail and we'll end up with a SIGSEGV.
> 
> This issue has already been discussed on the x86 side, and as it happens
> patches to reset PKRU early [1] have just landed. I don't think this is
> a blocker for getting this series landed, but we should try and align
> with x86. If there's no objection, I'm planning to work on a counterpart
> to the x86 series (resetting POR_EL0 early during signal delivery).

Did you get a chance to work on that? It would be great to land the
fixes for 6.12, if possible, so that the first kernel release with POE
support doesn't land with known issues.

Cheers,

Will
Will Deacon Oct. 14, 2024, 5:10 p.m. UTC | #4
Kevin, Joey,

On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote:
> On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> > On 22/08/2024 17:11, Joey Gouly wrote:
> > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> > >  		sme_smstop();
> > >  	}
> > >  
> > > +	if (system_supports_poe())
> > > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> > 
> > At the point where setup_return() is called, the signal frame has
> > already been written to the user stack. In other words, we write to the
> > user stack first, and then reset POR_EL0. This may be problematic,
> > especially if we are using the alternate signal stack, which the
> > interrupted POR_EL0 may not grant access to. In that situation uaccess
> > will fail and we'll end up with a SIGSEGV.
> > 
> > This issue has already been discussed on the x86 side, and as it happens
> > patches to reset PKRU early [1] have just landed. I don't think this is
> > a blocker for getting this series landed, but we should try and align
> > with x86. If there's no objection, I'm planning to work on a counterpart
> > to the x86 series (resetting POR_EL0 early during signal delivery).
> 
> Did you get a chance to work on that? It would be great to land the
> fixes for 6.12, if possible, so that the first kernel release with POE
> support doesn't land with known issues.

Looking a little more at this, I think we have quite a weird behaviour
on arm64 as it stands. It looks like we rely on the signal frame to hold
the original POR_EL0 so, if for some reason we fail to allocate space
for the POR context, I think we'll return back from the signal with
POR_EL0_INIT. That seems bad?

Will
Joey Gouly Oct. 15, 2024, 9:59 a.m. UTC | #5
On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:
> Kevin, Joey,
> 
> On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote:
> > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> > > On 22/08/2024 17:11, Joey Gouly wrote:
> > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> > > >  		sme_smstop();
> > > >  	}
> > > >  
> > > > +	if (system_supports_poe())
> > > > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> > > 
> > > At the point where setup_return() is called, the signal frame has
> > > already been written to the user stack. In other words, we write to the
> > > user stack first, and then reset POR_EL0. This may be problematic,
> > > especially if we are using the alternate signal stack, which the
> > > interrupted POR_EL0 may not grant access to. In that situation uaccess
> > > will fail and we'll end up with a SIGSEGV.
> > > 
> > > This issue has already been discussed on the x86 side, and as it happens
> > > patches to reset PKRU early [1] have just landed. I don't think this is
> > > a blocker for getting this series landed, but we should try and align
> > > with x86. If there's no objection, I'm planning to work on a counterpart
> > > to the x86 series (resetting POR_EL0 early during signal delivery).
> > 
> > Did you get a chance to work on that? It would be great to land the
> > fixes for 6.12, if possible, so that the first kernel release with POE
> > support doesn't land with known issues.
> 
> Looking a little more at this, I think we have quite a weird behaviour
> on arm64 as it stands. It looks like we rely on the signal frame to hold
> the original POR_EL0 so, if for some reason we fail to allocate space
> for the POR context, I think we'll return back from the signal with
> POR_EL0_INIT. That seems bad?

If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?

setup_sigframe_layout()
        if (system_supports_poe()) {
                err = sigframe_alloc(user, &user->poe_offset,
                                     sizeof(struct poe_context));
                if (err)
                        return err;
        }

Through get_sigframe() and setup_rt_frame(), that eventually hets here:

handle_signal()
	ret = setup_rt_frame(usig, ksig, oldset, regs);

	[..]

        signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP));

void signal_setup_done(int failed, struct ksignal *ksig, int stepping)                                                                                                                         
{                                                                                                                                                                                              
        if (failed)                                                                                                                                                                            
                force_sigsegv(ksig->sig);                                                                                                                                                      
        else                                                                                                                                                                                   
                signal_delivered(ksig, stepping);                                                                                                                                              
}  

So I think it's "fine"?

Thanks,
Joey
Mark Brown Oct. 15, 2024, 11:37 a.m. UTC | #6
On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote:
> On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:

> > Looking a little more at this, I think we have quite a weird behaviour
> > on arm64 as it stands. It looks like we rely on the signal frame to hold
> > the original POR_EL0 so, if for some reason we fail to allocate space
> > for the POR context, I think we'll return back from the signal with
> > POR_EL0_INIT. That seems bad?

> If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?

...

> So I think it's "fine"?

Yeah, there's a bunch of other stuff would go badly if we tried to carry
on after failing to allocate a signal frame.
Will Deacon Oct. 15, 2024, 11:41 a.m. UTC | #7
On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote:
> On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:
> > Kevin, Joey,
> > 
> > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote:
> > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> > > > On 22/08/2024 17:11, Joey Gouly wrote:
> > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> > > > >  		sme_smstop();
> > > > >  	}
> > > > >  
> > > > > +	if (system_supports_poe())
> > > > > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> > > > 
> > > > At the point where setup_return() is called, the signal frame has
> > > > already been written to the user stack. In other words, we write to the
> > > > user stack first, and then reset POR_EL0. This may be problematic,
> > > > especially if we are using the alternate signal stack, which the
> > > > interrupted POR_EL0 may not grant access to. In that situation uaccess
> > > > will fail and we'll end up with a SIGSEGV.
> > > > 
> > > > This issue has already been discussed on the x86 side, and as it happens
> > > > patches to reset PKRU early [1] have just landed. I don't think this is
> > > > a blocker for getting this series landed, but we should try and align
> > > > with x86. If there's no objection, I'm planning to work on a counterpart
> > > > to the x86 series (resetting POR_EL0 early during signal delivery).
> > > 
> > > Did you get a chance to work on that? It would be great to land the
> > > fixes for 6.12, if possible, so that the first kernel release with POE
> > > support doesn't land with known issues.
> > 
> > Looking a little more at this, I think we have quite a weird behaviour
> > on arm64 as it stands. It looks like we rely on the signal frame to hold
> > the original POR_EL0 so, if for some reason we fail to allocate space
> > for the POR context, I think we'll return back from the signal with
> > POR_EL0_INIT. That seems bad?
> 
> If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?
> 
> setup_sigframe_layout()
>         if (system_supports_poe()) {
>                 err = sigframe_alloc(user, &user->poe_offset,
>                                      sizeof(struct poe_context));
>                 if (err)
>                         return err;
>         }
> 
> Through get_sigframe() and setup_rt_frame(), that eventually hets here:
> 
> handle_signal()
> 	ret = setup_rt_frame(usig, ksig, oldset, regs);
> 
> 	[..]
> 
>         signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP));
> 
> void signal_setup_done(int failed, struct ksignal *ksig, int stepping)                                                                                                                         
> {                                                                                                                                                                                              
>         if (failed)                                                                                                                                                                            
>                 force_sigsegv(ksig->sig);                                                                                                                                                      
>         else                                                                                                                                                                                   
>                 signal_delivered(ksig, stepping);                                                                                                                                              
> }  
> 
> So I think it's "fine"?

Ah, yes, sorry about that. I got confused by the conditional push in
setup_sigframe():

	if (system_supports_poe() && err == 0 && user->poe_offset) {
		...

which gives the wrong impression that the POR is somehow optional, even
if the CPU supports POE. So we should drop that check of
'user->poe_offset' as it cannot be NULL here.

We also still need to resolve Kevin's concern, which probably means
keeping the thread's original POR around someplace.

Will
Joey Gouly Oct. 15, 2024, 12:25 p.m. UTC | #8
On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote:
> On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote:
> > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:
> > > Kevin, Joey,
> > > 
> > > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote:
> > > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> > > > > On 22/08/2024 17:11, Joey Gouly wrote:
> > > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> > > > > >  		sme_smstop();
> > > > > >  	}
> > > > > >  
> > > > > > +	if (system_supports_poe())
> > > > > > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> > > > > 
> > > > > At the point where setup_return() is called, the signal frame has
> > > > > already been written to the user stack. In other words, we write to the
> > > > > user stack first, and then reset POR_EL0. This may be problematic,
> > > > > especially if we are using the alternate signal stack, which the
> > > > > interrupted POR_EL0 may not grant access to. In that situation uaccess
> > > > > will fail and we'll end up with a SIGSEGV.
> > > > > 
> > > > > This issue has already been discussed on the x86 side, and as it happens
> > > > > patches to reset PKRU early [1] have just landed. I don't think this is
> > > > > a blocker for getting this series landed, but we should try and align
> > > > > with x86. If there's no objection, I'm planning to work on a counterpart
> > > > > to the x86 series (resetting POR_EL0 early during signal delivery).
> > > > 
> > > > Did you get a chance to work on that? It would be great to land the
> > > > fixes for 6.12, if possible, so that the first kernel release with POE
> > > > support doesn't land with known issues.
> > > 
> > > Looking a little more at this, I think we have quite a weird behaviour
> > > on arm64 as it stands. It looks like we rely on the signal frame to hold
> > > the original POR_EL0 so, if for some reason we fail to allocate space
> > > for the POR context, I think we'll return back from the signal with
> > > POR_EL0_INIT. That seems bad?
> > 
> > If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?
> > 
> > setup_sigframe_layout()
> >         if (system_supports_poe()) {
> >                 err = sigframe_alloc(user, &user->poe_offset,
> >                                      sizeof(struct poe_context));
> >                 if (err)
> >                         return err;
> >         }
> > 
> > Through get_sigframe() and setup_rt_frame(), that eventually hets here:
> > 
> > handle_signal()
> > 	ret = setup_rt_frame(usig, ksig, oldset, regs);
> > 
> > 	[..]
> > 
> >         signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP));
> > 
> > void signal_setup_done(int failed, struct ksignal *ksig, int stepping)                                                                                                                         
> > {                                                                                                                                                                                              
> >         if (failed)                                                                                                                                                                            
> >                 force_sigsegv(ksig->sig);                                                                                                                                                      
> >         else                                                                                                                                                                                   
> >                 signal_delivered(ksig, stepping);                                                                                                                                              
> > }  
> > 
> > So I think it's "fine"?
> 
> Ah, yes, sorry about that. I got confused by the conditional push in
> setup_sigframe():
> 
> 	if (system_supports_poe() && err == 0 && user->poe_offset) {
> 		...
> 
> which gives the wrong impression that the POR is somehow optional, even
> if the CPU supports POE. So we should drop that check of
> 'user->poe_offset' as it cannot be NULL here.
> 
> We also still need to resolve Kevin's concern, which probably means
> keeping the thread's original POR around someplace.

That was cargo culted (by me) from the rest of the function (apart from TPIDR2
and FPMR). I think Kevin is planning on sending his signal changes still, but
is on holiday, maybe he can remove the last part of the condition as part of
his series.

Thanks,
Joey
Mark Brown Oct. 15, 2024, 1:26 p.m. UTC | #9
On Tue, Oct 15, 2024 at 01:25:29PM +0100, Joey Gouly wrote:
> On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote:

> > 	if (system_supports_poe() && err == 0 && user->poe_offset) {
> > 		...

> > which gives the wrong impression that the POR is somehow optional, even
> > if the CPU supports POE. So we should drop that check of
> > 'user->poe_offset' as it cannot be NULL here.

> That was cargo culted (by me) from the rest of the function (apart from TPIDR2
> and FPMR). I think Kevin is planning on sending his signal changes still, but
> is on holiday, maybe he can remove the last part of the condition as part of
> his series.

That's there because the decisions about "should we save this thing" are
taken in setup_sigframe_layout() and for a bunch of the extensions we
suppress the saving if they're in some sort of default state (eg, when
we don't have TIF_SVE set we don't output the SVE sigframe).
Dave Martin Oct. 15, 2024, 1:39 p.m. UTC | #10
On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote:
> On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote:
> > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:
> > > Kevin, Joey,
> > > 
> > > On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote:
> > > > On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
> > > > > On 22/08/2024 17:11, Joey Gouly wrote:
> > > > > > @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
> > > > > >  		sme_smstop();
> > > > > >  	}
> > > > > >  
> > > > > > +	if (system_supports_poe())
> > > > > > +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
> > > > > 
> > > > > At the point where setup_return() is called, the signal frame has
> > > > > already been written to the user stack. In other words, we write to the
> > > > > user stack first, and then reset POR_EL0. This may be problematic,
> > > > > especially if we are using the alternate signal stack, which the
> > > > > interrupted POR_EL0 may not grant access to. In that situation uaccess
> > > > > will fail and we'll end up with a SIGSEGV.
> > > > > 
> > > > > This issue has already been discussed on the x86 side, and as it happens
> > > > > patches to reset PKRU early [1] have just landed. I don't think this is
> > > > > a blocker for getting this series landed, but we should try and align
> > > > > with x86. If there's no objection, I'm planning to work on a counterpart
> > > > > to the x86 series (resetting POR_EL0 early during signal delivery).
> > > > 
> > > > Did you get a chance to work on that? It would be great to land the
> > > > fixes for 6.12, if possible, so that the first kernel release with POE
> > > > support doesn't land with known issues.
> > > 
> > > Looking a little more at this, I think we have quite a weird behaviour
> > > on arm64 as it stands. It looks like we rely on the signal frame to hold
> > > the original POR_EL0 so, if for some reason we fail to allocate space
> > > for the POR context, I think we'll return back from the signal with
> > > POR_EL0_INIT. That seems bad?
> > 
> > If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?
> > 
> > setup_sigframe_layout()
> >         if (system_supports_poe()) {
> >                 err = sigframe_alloc(user, &user->poe_offset,
> >                                      sizeof(struct poe_context));
> >                 if (err)
> >                         return err;
> >         }
> > 
> > Through get_sigframe() and setup_rt_frame(), that eventually hets here:
> > 
> > handle_signal()
> > 	ret = setup_rt_frame(usig, ksig, oldset, regs);
> > 
> > 	[..]
> > 
> >         signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP));
> > 
> > void signal_setup_done(int failed, struct ksignal *ksig, int stepping)                                                                                                                         
> > {                                                                                                                                                                                              
> >         if (failed)                                                                                                                                                                            
> >                 force_sigsegv(ksig->sig);                                                                                                                                                      
> >         else                                                                                                                                                                                   
> >                 signal_delivered(ksig, stepping);                                                                                                                                              
> > }  
> > 
> > So I think it's "fine"?
> 
> Ah, yes, sorry about that. I got confused by the conditional push in
> setup_sigframe():
> 
> 	if (system_supports_poe() && err == 0 && user->poe_offset) {
> 		...
> 
> which gives the wrong impression that the POR is somehow optional, even
> if the CPU supports POE. So we should drop that check of
> 'user->poe_offset' as it cannot be NULL here.

From memory and a quick glance at the code:

For other "conditionally unconditional" things, we don't have a
corresponding check on user->foo.

For conditional stuff, non-NULLness of user->foo is used to track
whether we decided to dump the corresponding record; for consistency
here, if we have system_supports_poe() && err == 0, then that's
sufficient (though in prior versions of this code, POR_EL0 dumping was
conditional and so the extra check did do something...)


In any case, if some allocation fails then we splat out with a SIGSEGV
before modifying the user task state to deliver the signal (in
setup_return() etc.)

If The user's POR_EL0 value is being clobbered before we get here, we
would save the wrong value -- so the code would be broken anyway.


So, as Joey says, this is probably fine, but the user->poe_offset check
looks superfluous.  The kernel will splat on us here and kill the thread
if it's NULL anyway.

[...]

Cheers
---Dave
Catalin Marinas Oct. 15, 2024, 3:01 p.m. UTC | #11
On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote:
> On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote:
> > On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:
> > > Looking a little more at this, I think we have quite a weird behaviour
> > > on arm64 as it stands. It looks like we rely on the signal frame to hold
> > > the original POR_EL0 so, if for some reason we fail to allocate space
> > > for the POR context, I think we'll return back from the signal with
> > > POR_EL0_INIT. That seems bad?
> > 
> > If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?
> > 
> > setup_sigframe_layout()
> >         if (system_supports_poe()) {
> >                 err = sigframe_alloc(user, &user->poe_offset,
> >                                      sizeof(struct poe_context));
> >                 if (err)
> >                         return err;
> >         }
> > 
> > Through get_sigframe() and setup_rt_frame(), that eventually hets here:
> > 
> > handle_signal()
> > 	ret = setup_rt_frame(usig, ksig, oldset, regs);
> > 
> > 	[..]
> > 
> >         signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP));
> > 
> > void signal_setup_done(int failed, struct ksignal *ksig, int stepping)
> > {
> >         if (failed)
> >                 force_sigsegv(ksig->sig);
> >         else
> >                 signal_delivered(ksig, stepping);
> > }  
> > 
> > So I think it's "fine"?
> 
> Ah, yes, sorry about that. I got confused by the conditional push in
> setup_sigframe():
> 
> 	if (system_supports_poe() && err == 0 && user->poe_offset) {
> 		...
> 
> which gives the wrong impression that the POR is somehow optional, even
> if the CPU supports POE. So we should drop that check of
> 'user->poe_offset' as it cannot be NULL here.

I agree, we should remove this check as it's confusing.

> We also still need to resolve Kevin's concern, which probably means
> keeping the thread's original POR around someplace.

If we fail to allocate context for POR_EL0 (or anything else), we'll
deliver a SIGSEGV. I think it's quite likely that the SIGSEGV will also
fail to allocate context we end up with a fatal SIGSEGV. Not sure the
user can affect the allocation/layout, though it can change stack
attributes where the frame is written.

Assuming that the user tricks the kernel into failing to write the
context but allows it to succeed on the resulting SIGSEGV, POR_EL0
wouldn't have been reset and the SIGSEGV context will still have the
original value. I don't think we need to do anything here for 6.12.

However, in for-next/core, we have gcs_signal_entry() called after
resetting POR_EL0. If this fails, we can end up with a new POR_EL0 on
sigreturn (subject to the above user toggling permissions). I think this
needs to be fixed, POR_EL0 only reset when we know we are going to
deliver the signal.
Kevin Brodsky Oct. 17, 2024, 7:44 a.m. UTC | #12
On 15/10/2024 14:25, Joey Gouly wrote:
> On Tue, Oct 15, 2024 at 12:41:16PM +0100, Will Deacon wrote:
>> On Tue, Oct 15, 2024 at 10:59:11AM +0100, Joey Gouly wrote:
>>> On Mon, Oct 14, 2024 at 06:10:23PM +0100, Will Deacon wrote:
>>>> Kevin, Joey,
>>>>
>>>> On Wed, Oct 09, 2024 at 03:43:01PM +0100, Will Deacon wrote:
>>>>> On Tue, Sep 24, 2024 at 01:27:58PM +0200, Kevin Brodsky wrote:
>>>>>> On 22/08/2024 17:11, Joey Gouly wrote:
>>>>>>> @@ -1178,6 +1237,9 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
>>>>>>>  		sme_smstop();
>>>>>>>  	}
>>>>>>>  
>>>>>>> +	if (system_supports_poe())
>>>>>>> +		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
>>>>>> At the point where setup_return() is called, the signal frame has
>>>>>> already been written to the user stack. In other words, we write to the
>>>>>> user stack first, and then reset POR_EL0. This may be problematic,
>>>>>> especially if we are using the alternate signal stack, which the
>>>>>> interrupted POR_EL0 may not grant access to. In that situation uaccess
>>>>>> will fail and we'll end up with a SIGSEGV.
>>>>>>
>>>>>> This issue has already been discussed on the x86 side, and as it happens
>>>>>> patches to reset PKRU early [1] have just landed. I don't think this is
>>>>>> a blocker for getting this series landed, but we should try and align
>>>>>> with x86. If there's no objection, I'm planning to work on a counterpart
>>>>>> to the x86 series (resetting POR_EL0 early during signal delivery).
>>>>> Did you get a chance to work on that? It would be great to land the
>>>>> fixes for 6.12, if possible, so that the first kernel release with POE
>>>>> support doesn't land with known issues.
>>>> Looking a little more at this, I think we have quite a weird behaviour
>>>> on arm64 as it stands. It looks like we rely on the signal frame to hold
>>>> the original POR_EL0 so, if for some reason we fail to allocate space
>>>> for the POR context, I think we'll return back from the signal with
>>>> POR_EL0_INIT. That seems bad?
>>> If we don't allocate space for POR_EL0, I think the program recieves SIGSGEV?
>>>
>>> setup_sigframe_layout()
>>>         if (system_supports_poe()) {
>>>                 err = sigframe_alloc(user, &user->poe_offset,
>>>                                      sizeof(struct poe_context));
>>>                 if (err)
>>>                         return err;
>>>         }
>>>
>>> Through get_sigframe() and setup_rt_frame(), that eventually hets here:
>>>
>>> handle_signal()
>>> 	ret = setup_rt_frame(usig, ksig, oldset, regs);
>>>
>>> 	[..]
>>>
>>>         signal_setup_done(ret, ksig, test_thread_flag(TIF_SINGLESTEP));
>>>
>>> void signal_setup_done(int failed, struct ksignal *ksig, int stepping)                                                                                                                         
>>> {                                                                                                                                                                                              
>>>         if (failed)                                                                                                                                                                            
>>>                 force_sigsegv(ksig->sig);                                                                                                                                                      
>>>         else                                                                                                                                                                                   
>>>                 signal_delivered(ksig, stepping);                                                                                                                                              
>>> }  
>>>
>>> So I think it's "fine"?
>> Ah, yes, sorry about that. I got confused by the conditional push in
>> setup_sigframe():
>>
>> 	if (system_supports_poe() && err == 0 && user->poe_offset) {
>> 		...
>>
>> which gives the wrong impression that the POR is somehow optional, even
>> if the CPU supports POE. So we should drop that check of
>> 'user->poe_offset' as it cannot be NULL here.
>>
>> We also still need to resolve Kevin's concern, which probably means
>> keeping the thread's original POR around someplace.
> That was cargo culted (by me) from the rest of the function (apart from TPIDR2
> and FPMR). I think Kevin is planning on sending his signal changes still, but
> is on holiday, maybe he can remove the last part of the condition as part of
> his series.

Indeed just got back from holiday. I've got the series ready, about to
send it. I will add a clean-up patch removing this check on poe_offset.

Kevin
Kevin Brodsky Oct. 17, 2024, 2 p.m. UTC | #13
On 15/10/2024 17:01, Catalin Marinas wrote:
>> We also still need to resolve Kevin's concern, which probably means
>> keeping the thread's original POR around someplace.
> If we fail to allocate context for POR_EL0 (or anything else), we'll
> deliver a SIGSEGV. I think it's quite likely that the SIGSEGV will also
> fail to allocate context we end up with a fatal SIGSEGV. Not sure the
> user can affect the allocation/layout, though it can change stack
> attributes where the frame is written.
>
> Assuming that the user tricks the kernel into failing to write the
> context but allows it to succeed on the resulting SIGSEGV, POR_EL0
> wouldn't have been reset and the SIGSEGV context will still have the
> original value. I don't think we need to do anything here for 6.12.
>
> However, in for-next/core, we have gcs_signal_entry() called after
> resetting POR_EL0. If this fails, we can end up with a new POR_EL0 on
> sigreturn (subject to the above user toggling permissions). I think this
> needs to be fixed, POR_EL0 only reset when we know we are going to
> deliver the signal.

In the series I've just posted [1], POR_EL0 is reset to "allow all"
before we do anything, so it sounds like we may have a problem there.
However, it does keep track of that state, so I think the fix may be
simple. If any error occurs in setup_rt_frame(), we could call
restore_unpriv_access_state() to restore the original value of POR_EL0,
like in sigreturn(). Otherwise we call set_handler_unpriv_access_state()
to set POR_EL0 to POR_EL0_INIT as we do today. I can make that change in
v2 if that sounds helpful.

Kevin

[1]
https://lore.kernel.org/linux-arm-kernel/20241017133909.3837547-4-kevin.brodsky@arm.com/T/#u
diff mbox series

Patch

diff --git arch/arm64/include/uapi/asm/sigcontext.h arch/arm64/include/uapi/asm/sigcontext.h
index 8a45b7a411e0..e4cba8a6c9a2 100644
--- arch/arm64/include/uapi/asm/sigcontext.h
+++ arch/arm64/include/uapi/asm/sigcontext.h
@@ -98,6 +98,13 @@  struct esr_context {
 	__u64 esr;
 };
 
+#define POE_MAGIC	0x504f4530
+
+struct poe_context {
+	struct _aarch64_ctx head;
+	__u64 por_el0;
+};
+
 /*
  * extra_context: describes extra space in the signal frame for
  * additional structures that don't fit in sigcontext.__reserved[].
diff --git arch/arm64/kernel/signal.c arch/arm64/kernel/signal.c
index 4a77f4976e11..561986947530 100644
--- arch/arm64/kernel/signal.c
+++ arch/arm64/kernel/signal.c
@@ -61,6 +61,7 @@  struct rt_sigframe_user_layout {
 	unsigned long za_offset;
 	unsigned long zt_offset;
 	unsigned long fpmr_offset;
+	unsigned long poe_offset;
 	unsigned long extra_offset;
 	unsigned long end_offset;
 };
@@ -185,6 +186,8 @@  struct user_ctxs {
 	u32 zt_size;
 	struct fpmr_context __user *fpmr;
 	u32 fpmr_size;
+	struct poe_context __user *poe;
+	u32 poe_size;
 };
 
 static int preserve_fpsimd_context(struct fpsimd_context __user *ctx)
@@ -258,6 +261,32 @@  static int restore_fpmr_context(struct user_ctxs *user)
 	return err;
 }
 
+static int preserve_poe_context(struct poe_context __user *ctx)
+{
+	int err = 0;
+
+	__put_user_error(POE_MAGIC, &ctx->head.magic, err);
+	__put_user_error(sizeof(*ctx), &ctx->head.size, err);
+	__put_user_error(read_sysreg_s(SYS_POR_EL0), &ctx->por_el0, err);
+
+	return err;
+}
+
+static int restore_poe_context(struct user_ctxs *user)
+{
+	u64 por_el0;
+	int err = 0;
+
+	if (user->poe_size != sizeof(*user->poe))
+		return -EINVAL;
+
+	__get_user_error(por_el0, &(user->poe->por_el0), err);
+	if (!err)
+		write_sysreg_s(por_el0, SYS_POR_EL0);
+
+	return err;
+}
+
 #ifdef CONFIG_ARM64_SVE
 
 static int preserve_sve_context(struct sve_context __user *ctx)
@@ -621,6 +650,7 @@  static int parse_user_sigframe(struct user_ctxs *user,
 	user->za = NULL;
 	user->zt = NULL;
 	user->fpmr = NULL;
+	user->poe = NULL;
 
 	if (!IS_ALIGNED((unsigned long)base, 16))
 		goto invalid;
@@ -671,6 +701,17 @@  static int parse_user_sigframe(struct user_ctxs *user,
 			/* ignore */
 			break;
 
+		case POE_MAGIC:
+			if (!system_supports_poe())
+				goto invalid;
+
+			if (user->poe)
+				goto invalid;
+
+			user->poe = (struct poe_context __user *)head;
+			user->poe_size = size;
+			break;
+
 		case SVE_MAGIC:
 			if (!system_supports_sve() && !system_supports_sme())
 				goto invalid;
@@ -857,6 +898,9 @@  static int restore_sigframe(struct pt_regs *regs,
 	if (err == 0 && system_supports_sme2() && user.zt)
 		err = restore_zt_context(&user);
 
+	if (err == 0 && system_supports_poe() && user.poe)
+		err = restore_poe_context(&user);
+
 	return err;
 }
 
@@ -980,6 +1024,13 @@  static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
 			return err;
 	}
 
+	if (system_supports_poe()) {
+		err = sigframe_alloc(user, &user->poe_offset,
+				     sizeof(struct poe_context));
+		if (err)
+			return err;
+	}
+
 	return sigframe_alloc_end(user);
 }
 
@@ -1042,6 +1093,14 @@  static int setup_sigframe(struct rt_sigframe_user_layout *user,
 		err |= preserve_fpmr_context(fpmr_ctx);
 	}
 
+	if (system_supports_poe() && err == 0 && user->poe_offset) {
+		struct poe_context __user *poe_ctx =
+			apply_user_offset(user, user->poe_offset);
+
+		err |= preserve_poe_context(poe_ctx);
+	}
+
+
 	/* ZA state if present */
 	if (system_supports_sme() && err == 0 && user->za_offset) {
 		struct za_context __user *za_ctx =
@@ -1178,6 +1237,9 @@  static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
 		sme_smstop();
 	}
 
+	if (system_supports_poe())
+		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+
 	if (ka->sa.sa_flags & SA_RESTORER)
 		sigtramp = ka->sa.sa_restorer;
 	else