x86/vdso: Remove retpoline from SGX vDSO call

Message ID	20200930140108.48075-1-jarkko.sakkinen@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=ssgJ=DH=vger.kernel.org=linux-sgx-owner@kernel.org> From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: linux-sgx@vger.kernel.org Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, x86@kernel.org, Haitao Huang <haitao.huang@linux.intel.com>, Sean Christopherson <sean.j.christopherson@intel.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@amacapital.net>, Cedric Xing <cedric.xing@intel.com> Subject: [PATCH] x86/vdso: Remove retpoline from SGX vDSO call Date: Wed, 30 Sep 2020 17:01:08 +0300 Message-Id: <20200930140108.48075-1-jarkko.sakkinen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	x86/vdso: Remove retpoline from SGX vDSO call \| expand x86/vdso: Remove retpoline from SGX vDSO call

Jarkko Sakkinen Sept. 30, 2020, 2:01 p.m. UTC

The user handler, which can be optionally used to handle enclave
exceptions, is always the same global handler provided by the SGX
runtime, who wants to use such a handler instead returning on exception.

Thus, there is no any non-deterministic branch prediction happening.
The code path is always the same and never change. Obviously, you could
change it all the time purposely but for any sane real-world use that
would not make any sense.

Thus, remove retpoline wrapping.

Cc: x86@kernel.org
Cc: Haitao Huang <haitao.huang@linux.intel.com>
CC: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Cedric Xing <cedric.xing@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
 arch/x86/entry/vdso/vsgx.S | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Dave Hansen Sept. 30, 2020, 2:08 p.m. UTC | #1

On 9/30/20 7:01 AM, Jarkko Sakkinen wrote:
> The user handler, which can be optionally used to handle enclave
> exceptions, is always the same global handler provided by the SGX
> runtime, who wants to use such a handler instead returning on exception.
> 
> Thus, there is no any non-deterministic branch prediction happening.
> The code path is always the same and never change. Obviously, you could
> change it all the time purposely but for any sane real-world use that
> would not make any sense.

The fundamental problem mitigated by retpolines is that indirect branch
 instructions themselves are non-deterministic (speculatively).

This:

> +	call	*%rax

is an indirect branch instruction.  That leaves me a bit confused since
the changelog doesn't really match the code.

Do we care about mitigating Spectre-v2-style attacks for the VDSO's
indirect calls?

Jarkko Sakkinen Sept. 30, 2020, 2:20 p.m. UTC | #2

On Wed, Sep 30, 2020 at 07:08:58AM -0700, Dave Hansen wrote:
> On 9/30/20 7:01 AM, Jarkko Sakkinen wrote:
> > The user handler, which can be optionally used to handle enclave
> > exceptions, is always the same global handler provided by the SGX
> > runtime, who wants to use such a handler instead returning on exception.
> > 
> > Thus, there is no any non-deterministic branch prediction happening.
> > The code path is always the same and never change. Obviously, you could
> > change it all the time purposely but for any sane real-world use that
> > would not make any sense.
> 
> The fundamental problem mitigated by retpolines is that indirect branch
>  instructions themselves are non-deterministic (speculatively).
> 
> This:
> 
> > +	call	*%rax
> 
> is an indirect branch instruction.  That leaves me a bit confused since
> the changelog doesn't really match the code.
> 
> Do we care about mitigating Spectre-v2-style attacks for the VDSO's
> indirect calls?

It is yes, my wording was just extremely bad. What I meant to say is
that there is branch prediction happening but it is, given how runtime
will use the handler, leading always unconditionally to the same
destination.

I asked does this have any bad mitigations yesterday:

https://lkml.org/lkml/2020/9/29/2505

I'm not expert on Spectre, or any sort of security researcher, but I've
read a few papers about and understand the general concept. With the
constraints how the callback is used in practice, I'd *guess* it is
fine to drop retpoline but I really need some feedback on this from
people who understand these attacks better.

I'll submit a patch with boot time patching (aka using ALTERNATE) if
this does not get the buy-in. Just have to evaluate the both options
before making any decisions.

/Jarkko

Dave Hansen Sept. 30, 2020, 2:33 p.m. UTC | #3

On 9/30/20 7:20 AM, Jarkko Sakkinen wrote:
> I'm not expert on Spectre, or any sort of security researcher, but I've
> read a few papers about and understand the general concept. With the
> constraints how the callback is used in practice, I'd *guess* it is
> fine to drop retpoline but I really need some feedback on this from
> people who understand these attacks better.

Do you recall why you added it in the first place?  What was the
motivation for it?  Were you responding to a review comment?

Jarkko Sakkinen Sept. 30, 2020, 3:28 p.m. UTC | #4

On Wed, Sep 30, 2020 at 07:33:38AM -0700, Dave Hansen wrote:
> On 9/30/20 7:20 AM, Jarkko Sakkinen wrote:
> > I'm not expert on Spectre, or any sort of security researcher, but I've
> > read a few papers about and understand the general concept. With the
> > constraints how the callback is used in practice, I'd *guess* it is
> > fine to drop retpoline but I really need some feedback on this from
> > people who understand these attacks better.
> 
> Do you recall why you added it in the first place?  What was the
> motivation for it?  Were you responding to a review comment?

Absolutely cannot recall it :-) I even cannot recall the exact time when
we landed the vDSO in the first place. Too much stuff has happend during
the long three year upstreaming cycle. I will try to backtrack this
info.

/Jarkko

Sean Christopherson Sept. 30, 2020, 3:43 p.m. UTC | #5

On Wed, Sep 30, 2020 at 06:28:06PM +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 07:33:38AM -0700, Dave Hansen wrote:
> > On 9/30/20 7:20 AM, Jarkko Sakkinen wrote:
> > > I'm not expert on Spectre, or any sort of security researcher, but I've
> > > read a few papers about and understand the general concept. With the
> > > constraints how the callback is used in practice, I'd *guess* it is
> > > fine to drop retpoline but I really need some feedback on this from
> > > people who understand these attacks better.
> > 
> > Do you recall why you added it in the first place?  What was the
> > motivation for it?  Were you responding to a review comment?
> 
> Absolutely cannot recall it :-) I even cannot recall the exact time when
> we landed the vDSO in the first place. Too much stuff has happend during
> the long three year upstreaming cycle. I will try to backtrack this
> info.

It originated in a comment from Andy when we were discussing the legitimacy
of the callback.  From that point on it got taken as gospel that the indirect
call would be implemented as a retpoline.

https://lkml.kernel.org/r/CALCETrVBR+2HjTqX=W4r9GOq69Xg36v4gmCKqK0wUjzAqBJnrw@mail.gmail.com

Dave Hansen Sept. 30, 2020, 4:28 p.m. UTC | #6

On 9/30/20 8:43 AM, Sean Christopherson wrote:
>>> Do you recall why you added it in the first place?  What was the
>>> motivation for it?  Were you responding to a review comment?
>> Absolutely cannot recall it :-) I even cannot recall the exact time when
>> we landed the vDSO in the first place. Too much stuff has happend during
>> the long three year upstreaming cycle. I will try to backtrack this
>> info.
> It originated in a comment from Andy when we were discussing the legitimacy
> of the callback.  From that point on it got taken as gospel that the indirect
> call would be implemented as a retpoline.
> 
> https://lkml.kernel.org/r/CALCETrVBR+2HjTqX=W4r9GOq69Xg36v4gmCKqK0wUjzAqBJnrw@mail.gmail.com

OK, so that was Andy L. saying:

> But I have a real argument for dropping exit_handler: in this new age
> of Spectre, the indirect call is a retpoline, and it's therefore quite
> slow.  So I'm not saying NAK, but I do think it's unnecessary.

It sounds like we were never able to jettison the indirect call.  So,
we've got a kernel-provided indirect call that's only ever executed by
userspace.  A quick grep didn't show any other indirect branches in the
VDSO, so there might not be good precedent for what to do.

The problem with the VDSO is that even if userspace is compiled to the
gills with mitigations, this VDSO branch won't be mitigated since it
comes from the kernel.

So, here's the big question for me:  How does a security-sensitive
userspace *binary* make mitigated indirect calls these days?

Seems like the kind of thing for which Intel should have a good writeup.  :)

Jarkko Sakkinen Sept. 30, 2020, 4:38 p.m. UTC | #7

On Wed, Sep 30, 2020 at 08:43:49AM -0700, Sean Christopherson wrote:
> On Wed, Sep 30, 2020 at 06:28:06PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 07:33:38AM -0700, Dave Hansen wrote:
> > > On 9/30/20 7:20 AM, Jarkko Sakkinen wrote:
> > > > I'm not expert on Spectre, or any sort of security researcher, but I've
> > > > read a few papers about and understand the general concept. With the
> > > > constraints how the callback is used in practice, I'd *guess* it is
> > > > fine to drop retpoline but I really need some feedback on this from
> > > > people who understand these attacks better.
> > > 
> > > Do you recall why you added it in the first place?  What was the
> > > motivation for it?  Were you responding to a review comment?
> > 
> > Absolutely cannot recall it :-) I even cannot recall the exact time when
> > we landed the vDSO in the first place. Too much stuff has happend during
> > the long three year upstreaming cycle. I will try to backtrack this
> > info.
> 
> It originated in a comment from Andy when we were discussing the legitimacy
> of the callback.  From that point on it got taken as gospel that the indirect
> call would be implemented as a retpoline.
> 
> https://lkml.kernel.org/r/CALCETrVBR+2HjTqX=W4r9GOq69Xg36v4gmCKqK0wUjzAqBJnrw@mail.gmail.com

This patch from v20-v21 era is also relevant:

https://lore.kernel.org/linux-sgx/ba2a51568f3adaf74994d33ea3cbee570e20c6f6.1555965327.git.cedric.xing@intel.com/

It introduced the retpoline wrapping to the patch set but unfortunately
does not have any explanation for that particular detail.

Neither does Andy's comment except correctly stating that retpoline is
the modern standard for indirect calls but that does not get us too far.

My argument, or maybe just actually a question, is essentially that
given the usage pattern for this particular indirect call, do we need to
actually retpoline it?

Boot time patching turned out to be simple to implement but it is also
yet another whistle for the already complex vDSO.

/Jarkko

Jethro Beekman Sept. 30, 2020, 5:01 p.m. UTC | #8

On 2020-09-30 18:28, Dave Hansen wrote:
> On 9/30/20 8:43 AM, Sean Christopherson wrote:
>>>> Do you recall why you added it in the first place?  What was the
>>>> motivation for it?  Were you responding to a review comment?
>>> Absolutely cannot recall it :-) I even cannot recall the exact time when
>>> we landed the vDSO in the first place. Too much stuff has happend during
>>> the long three year upstreaming cycle. I will try to backtrack this
>>> info.
>> It originated in a comment from Andy when we were discussing the legitimacy
>> of the callback.  From that point on it got taken as gospel that the indirect
>> call would be implemented as a retpoline.
>>
>> https://lkml.kernel.org/r/CALCETrVBR+2HjTqX=W4r9GOq69Xg36v4gmCKqK0wUjzAqBJnrw@mail.gmail.com
> 
> OK, so that was Andy L. saying:
> 
>> But I have a real argument for dropping exit_handler: in this new age
>> of Spectre, the indirect call is a retpoline, and it's therefore quite
>> slow.  So I'm not saying NAK, but I do think it's unnecessary.
> 
> It sounds like we were never able to jettison the indirect call.  So,
> we've got a kernel-provided indirect call that's only ever executed by
> userspace.  A quick grep didn't show any other indirect branches in the
> VDSO, so there might not be good precedent for what to do.
> 
> The problem with the VDSO is that even if userspace is compiled to the
> gills with mitigations, this VDSO branch won't be mitigated since it
> comes from the kernel.
> 
> So, here's the big question for me:  How does a security-sensitive
> userspace *binary* make mitigated indirect calls these days?
> 
> Seems like the kind of thing for which Intel should have a good writeup.  :)
> 

If by "these days" you mean also protecting against LVI, then it'd be like so, I think:

lfence 
callq  *%rax

(at least this is what the LLVM LVI hardening pass does)

--
Jethro Beekman | Fortanix

Andrew Cooper Sept. 30, 2020, 6:09 p.m. UTC | #9

On 30/09/2020 18:01, Jethro Beekman wrote:
> On 2020-09-30 18:28, Dave Hansen wrote:
>> On 9/30/20 8:43 AM, Sean Christopherson wrote:
>>>>> Do you recall why you added it in the first place?  What was the
>>>>> motivation for it?  Were you responding to a review comment?
>>>> Absolutely cannot recall it :-) I even cannot recall the exact time when
>>>> we landed the vDSO in the first place. Too much stuff has happend during
>>>> the long three year upstreaming cycle. I will try to backtrack this
>>>> info.
>>> It originated in a comment from Andy when we were discussing the legitimacy
>>> of the callback.  From that point on it got taken as gospel that the indirect
>>> call would be implemented as a retpoline.
>>>
>>> https://lkml.kernel.org/r/CALCETrVBR+2HjTqX=W4r9GOq69Xg36v4gmCKqK0wUjzAqBJnrw@mail.gmail.com
>> OK, so that was Andy L. saying:
>>
>>> But I have a real argument for dropping exit_handler: in this new age
>>> of Spectre, the indirect call is a retpoline, and it's therefore quite
>>> slow.  So I'm not saying NAK, but I do think it's unnecessary.
>> It sounds like we were never able to jettison the indirect call.  So,
>> we've got a kernel-provided indirect call that's only ever executed by
>> userspace.  A quick grep didn't show any other indirect branches in the
>> VDSO, so there might not be good precedent for what to do.
>>
>> The problem with the VDSO is that even if userspace is compiled to the
>> gills with mitigations, this VDSO branch won't be mitigated since it
>> comes from the kernel.
>>
>> So, here's the big question for me:  How does a security-sensitive
>> userspace *binary* make mitigated indirect calls these days?
>>
>> Seems like the kind of thing for which Intel should have a good writeup.  :)
>>
> If by "these days" you mean also protecting against LVI, then it'd be like so, I think:
>
> lfence 
> callq  *%rax
>
> (at least this is what the LLVM LVI hardening pass does)

The problem is that "what to do about speculation" is exceedingly
complicated.

The 2017/8 code samples to beat Spectre v2 are one of:

1) Retpoline (not totally safe on Skylake uarch, but pretty good)
2) (legacy) IBRS and CALL *reg/mem (Intel)
3) LFENCE; CALL *reg/mem (AMD)

Later (i.e. this year), there is finally public documentation from all
vendors concerning straight-line speculation, so the code fragments now
need to be:

1) Retpoline (not totally safe on Skylake uarch, but pretty good)
2) (legacy) IBRS and CALL *reg/mem; LFENCE (Intel)
3) LFENCE; CALL *reg/mem; LFENCE (AMD)

to avoid potential speculative type confusion in the remainder of the
basic block.

Same goes for indirect JMP (Intel and AMD) and RET (AMD only), although
as there is no architectural execution, you can use INT3/INT1 to halt
speculation with lower code volume than LFENCE.

As noted, to avoid LVI poisoning of the branch target, a leading LFENCE
is needed (or one of the even more complicated variations), including
ahead of the RET of a retpoline.  However in practice, its only code
inside the enclave which is possibly worth protecting in this way.

Intel hardware mitigations, eIBRS, comes recommended with switching back
to a plain CALL *reg/mem, until a recent paper where it was demonstrated
that even with eIBRS active, you can still get speculative type
confusion due to same-mode training.  If same-mode training is a
problem, the recommendation is to use eIBRS and Retpoline.

AMD hardware mitigations, simply called IBRS, is programmed in the same
way as eIBRS (i.e. turn it on at the start of day and leave it on), does
guarantee to to protect against same-mode training.

As for the VDSO, it is userspace, and there's no way that it will have
WRSS generally enabled.  Therefore, it cannot use a retpoline gadget
even with the "fix up the shadow stack" variation when CET is in use. 
Any of the other variations ought to be CET-safe.

What, if anything, userspace does to protect against Spectre attacks is
a matter of every piece of compiled code in every library loaded, and
the risk profile of the application itself.  A browser with a javascript
sandbox is liable to try and take more care than something like cowsay.

Honestly, my advice would be to leave it unprotected for now.  Anyone
who managed to figure out the rest of the practical userspace issues
will probably have a much better idea of what can/should be done in this
case.

If that doesn't sit well with people, then the next best would probably
be LFENCE; CALL *reg/mem; LFENCE to cover as many of the corner cases as
possible without being incompatible with CET.  Its not as if this
callback is the slow aspect of entering/exiting SGX mode.

~Andrew

Jarkko Sakkinen Sept. 30, 2020, 7:25 p.m. UTC | #10

On Wed, Sep 30, 2020 at 07:09:33PM +0100, Andrew Cooper wrote:
> Honestly, my advice would be to leave it unprotected for now.  Anyone
> who managed to figure out the rest of the practical userspace issues
> will probably have a much better idea of what can/should be done in this
> case.
> 
> If that doesn't sit well with people, then the next best would probably
> be LFENCE; CALL *reg/mem; LFENCE to cover as many of the corner cases as
> possible without being incompatible with CET.  Its not as if this
> callback is the slow aspect of entering/exiting SGX mode.
> 
> ~Andrew

I tend to agree. We cannot drive changes based on unknown unknowns.

And I don't see why we could not add boot time patching of retpoline
even after the code is in the mainline kernel, if something ever
pushes to that direction.

/Jarkko

Xing, Cedric Sept. 30, 2020, 8:45 p.m. UTC | #11

On 9/30/2020 12:25 PM, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 07:09:33PM +0100, Andrew Cooper wrote:
>> Honestly, my advice would be to leave it unprotected for now.  Anyone
>> who managed to figure out the rest of the practical userspace issues
>> will probably have a much better idea of what can/should be done in this
>> case.
>>
>> If that doesn't sit well with people, then the next best would probably
>> be LFENCE; CALL *reg/mem; LFENCE to cover as many of the corner cases as
>> possible without being incompatible with CET.  Its not as if this
>> callback is the slow aspect of entering/exiting SGX mode.
>>
>> ~Andrew
> 
> I tend to agree. We cannot drive changes based on unknown unknowns.
> 
> And I don't see why we could not add boot time patching of retpoline
> even after the code is in the mainline kernel, if something ever
> pushes to that direction.
> 
> /Jarkko
> 
I agree. It'll be compatible with CET. The overhead of LFENCE is 
negligible comparing to entering/exiting SGX mode.

Jarkko Sakkinen Sept. 30, 2020, 9:22 p.m. UTC | #12

On Wed, Sep 30, 2020 at 01:45:52PM -0700, Xing, Cedric wrote:
> On 9/30/2020 12:25 PM, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 07:09:33PM +0100, Andrew Cooper wrote:
> > > Honestly, my advice would be to leave it unprotected for now.  Anyone
> > > who managed to figure out the rest of the practical userspace issues
> > > will probably have a much better idea of what can/should be done in this
> > > case.
> > > 
> > > If that doesn't sit well with people, then the next best would probably
> > > be LFENCE; CALL *reg/mem; LFENCE to cover as many of the corner cases as
> > > possible without being incompatible with CET.  Its not as if this
> > > callback is the slow aspect of entering/exiting SGX mode.
> > > 
> > > ~Andrew
> > 
> > I tend to agree. We cannot drive changes based on unknown unknowns.
> > 
> > And I don't see why we could not add boot time patching of retpoline
> > even after the code is in the mainline kernel, if something ever
> > pushes to that direction.
> > 
> > /Jarkko
> > 
> I agree. It'll be compatible with CET. The overhead of LFENCE is negligible
> comparing to entering/exiting SGX mode.

Andrew's advice was to do "just call" as for now.

If we add also lfence, what is the real-world threat scenario that we
are protecting against that exposes a real visible risk that could harm
the users of these patches?

Please remember that:

1. We can assume that the usage model and implementation is for the
   callback is sane. It is something that is contained to the run-time
   and there is just one instance of the callback.
2. We can always harden this more later on.

I do not want to add any extra bytes to the vDSO without any practical
purpose and I also need to document this choice.

/Jarkko

Jarkko Sakkinen Sept. 30, 2020, 9:36 p.m. UTC | #13

On Thu, Oct 01, 2020 at 12:22:20AM +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 01:45:52PM -0700, Xing, Cedric wrote:
> > On 9/30/2020 12:25 PM, Jarkko Sakkinen wrote:
> > > On Wed, Sep 30, 2020 at 07:09:33PM +0100, Andrew Cooper wrote:
> > > > Honestly, my advice would be to leave it unprotected for now.  Anyone
> > > > who managed to figure out the rest of the practical userspace issues
> > > > will probably have a much better idea of what can/should be done in this
> > > > case.
> > > > 
> > > > If that doesn't sit well with people, then the next best would probably
> > > > be LFENCE; CALL *reg/mem; LFENCE to cover as many of the corner cases as
> > > > possible without being incompatible with CET.  Its not as if this
> > > > callback is the slow aspect of entering/exiting SGX mode.
> > > > 
> > > > ~Andrew
> > > 
> > > I tend to agree. We cannot drive changes based on unknown unknowns.
> > > 
> > > And I don't see why we could not add boot time patching of retpoline
> > > even after the code is in the mainline kernel, if something ever
> > > pushes to that direction.
> > > 
> > > /Jarkko
> > > 
> > I agree. It'll be compatible with CET. The overhead of LFENCE is negligible
> > comparing to entering/exiting SGX mode.
> 
> Andrew's advice was to do "just call" as for now.
> 
> If we add also lfence, what is the real-world threat scenario that we
> are protecting against that exposes a real visible risk that could harm
> the users of these patches?
> 
> Please remember that:
> 
> 1. We can assume that the usage model and implementation is for the
>    callback is sane. It is something that is contained to the run-time
>    and there is just one instance of the callback.
> 2. We can always harden this more later on.
> 
> I do not want to add any extra bytes to the vDSO without any practical
> purpose and I also need to document this choice.

What if we  just keep everything as it is? Why boot time patching
cannot be part of CET-SS patch set?

Why?

1. Full reptoline is the safest alternative and we have it done already.
2. Before CET-SS there is no *functional* need to do boot time patching.
   The usual guideline is: do not add unused cruft to the kernel.

There is too much guesswork in other alternatives. If CET-SS actually
lands before SGX patches, then I'm happy to add in boot time patching.

AFAIK we actually could not add boot time patching without any
applications for it. That's incompatible with the common practices.

I'd leaving the code as it is and remark to the changelog that
CET-SS will require refining the reptoline to be optional.

/Jarkko

Dave Hansen Sept. 30, 2020, 9:46 p.m. UTC | #14

On 9/30/20 2:36 PM, Jarkko Sakkinen wrote:
> 1. Full reptoline is the safest alternative and we have it done already.

I wouldn't feel _quite_ comfortable saying this.

LFENCEs have architecturally defined behavior.  Retpolines have zero
future guarantees in the architecture.  I'll take an LFENCE that (versus
a retpoline) is:

1. Less code
2. Never has to be patched
3. Never causes functional problems (like with CET)
4. Has architectural semantics

The only advantage retpolines offer is that they have a well-defined
mitigations on existing microarchitectures.

To me, an LFENCE is waaaaaaay "safer".

Jarkko Sakkinen Sept. 30, 2020, 11:41 p.m. UTC | #15

On Wed, Sep 30, 2020 at 02:46:05PM -0700, Dave Hansen wrote:
> On 9/30/20 2:36 PM, Jarkko Sakkinen wrote:
> > 1. Full reptoline is the safest alternative and we have it done already.
> 
> I wouldn't feel _quite_ comfortable saying this.
> 
> LFENCEs have architecturally defined behavior.  Retpolines have zero
> future guarantees in the architecture.  I'll take an LFENCE that (versus
> a retpoline) is:
> 
> 1. Less code
> 2. Never has to be patched
> 3. Never causes functional problems (like with CET)
> 4. Has architectural semantics
> 
> The only advantage retpolines offer is that they have a well-defined
> mitigations on existing microarchitectures.
> 
> To me, an LFENCE is waaaaaaay "safer".

This is a buy-in argument for me.

We know that CET-SS breaks RETPOLINE, which can be solved by applying
boot time patching. However, there could be however many other features
that could conflict with it in yet non-existing microarchitectures,
which would make the patching process tedious to manage over time.

Essentially, we will end up maintaining the backward and forward
compatibility forever in software. That does not sound too motivating,
agreed.

"Plain" LFENCE does not possess this issue as it is fully contained to
the microarchitecture. Thus, software does not have to do anything to
maintain backward and forward compatibility, which means that the SGX
vDSO continues to functionally work even in the old kernel images to
forseeable future.

To summarize, we will use LFENCE as it has overally the best
characteristics for the vDSO when balancing between security and keeping
things functionally working.

Do I have the correct understanding of your argument? Just sanity
checking before I change any part of the code or documentation.

/Jarkko

x86/vdso: Remove retpoline from SGX vDSO call

Commit Message

Comments

Patch