mbox series

[RFC,00/11] Support microcode updates affecting SGX

Message ID 20220309104050.18207-1-cathy.zhang@intel.com (mailing list archive)
Headers show
Series Support microcode updates affecting SGX | expand

Message

Zhang, Cathy March 9, 2022, 10:40 a.m. UTC
Users hate reboots. This lets SGX enclaves attest to updated microcode
without a reboot.

== General Microcode Background ==

Historically, microcode updates are applied by the BIOS or early in
boot. In recent years, several trends have made these old approaches
less palatable.

First, the cadence of microcode updates has increased to deliver
security mitigations. Second, the value of those updates has increased,
meaning that any delay in applying them is unacceptable. Third, users
have become accustomed to approaches like hot patching their kernels
and have a growing aversion to reboots in general.

Users want microcode updates to behave more like a hot patching a
kernel and less like a BIOS update.

Today, many microcode updates _can_ be applied without a reboot.
But users have strongly and specifically expressed a desire to
perform *any* microcode update on a running system without a reboot.
This work is a direct result of those user requests and lets SGX
enclaves take full advantage of microcode updates without a reboot.

== SGX Attestation Background ==

SGX enclaves have an attestation mechanism. An enclave might, for
instance, need to attest to its state before it is given a special
decryption key. Since SGX must trust the CPU microcode, attestation
incorporates the microcode versions of all processors on the system
and is affected by microcode updates. This allows the entity to which
the enclave is attesting to make deployment decisions based on the
microcode version. For example, an enclave might be denied a decryption
key if it runs on a system that has old microcode without a specific
mitigation.

Unfortunately, this attestation metric (called CPUSVN) is only a
snapshot. When the kernel first uses SGX (successfully executes any
ENCLS instruction), SGX inspects all CPUs in the system and incorporates
a record of their microcode versions into CPUSVN. Today, that value is
locked and is not updated until a reboot.

== Problems ==

This means that, although the microcode may be update, enclaves can
never attest to this fact. Enclaves are stuck attesting to the old
version until a reboot.

Old enclaves created before the microcode update are presumed to be
compromised must not be allowed to attest with the new microcode
version.

== Solution ==

EUPDATESVN is a new SGX instruction which allows enclave attestation
to include information about updated microcode without a reboot.

Whenever a microcode update affects SGX, the SGX attestation
architecture assumes that all running enclaves and cryptographic
assets (like internal SGX encryption keys) have been compromised.
To mitigate the impact of this presumed compromise, EUPDATESVN success
requires that all SGX memory to be marked as "unused" and its contents
destroyed. This requirement ensures that no compromised enclave can
survive the EUPDATESVN procedure and provides an opportunity to
generate new cryptographic assets.

This series implements the infrastructure needed to track and tear
down bare-metal enclaves and then run EUPDATESVN. This is expected
to be triggered by administrators via sysfs at some convenient time
after a microcode update, probably by the microcode update tooling
itself.

This is a very slow operation. It is, of course, exceedingly disruptive
to enclaves but should be infrequent as microcode updates are released
on the order of every few months. Also, this is not the first piece of
the SGX architecture which will destroy all enclave contents. Enclaves
are expected to be designed to be volatile and survive termination at
any time gracefully.

A follow-on series will add Virtual EPC (KVM guest) support.

SGX Seamless should handle most SGX flows while doing SVN update, so, this
RFC series is based on SGX EDMM v2 which introduces SGX2 flows.
https://lore.kernel.org/lkml/cover.1644274683.git.reinette.chatre@intel.com/T/

Here is the spec for your reference:
https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true

Cathy Zhang (11):
  x86/sgx: Introduce mechanism to prevent new initializations of EPC
    pages
  x86/sgx: Provide VA page non-NULL owner
  x86/sgx: Save enclave pointer for VA page
  x86/sgx: Keep record for SGX VA and Guest page type
  x86/sgx: Save the size of each EPC section
  x86/sgx: Forced EPC page zapping for EUPDATESVN
  x86/sgx: Define error codes for ENCLS[EUPDATESVN]
  x86/sgx: Implement ENCLS[EUPDATESVN]
  x86/microcode: Expose EUPDATESVN procedure via sysfs
  x86/sgx: Call ENCLS[EUPDATESVN] during SGX initialization
  Documentation/x86/sgx: Document EUPDATESVN sysfs file

 arch/x86/include/asm/microcode.h              |   5 +
 arch/x86/include/asm/sgx.h                    |  46 +-
 arch/x86/kernel/cpu/sgx/encl.h                |   3 +-
 arch/x86/kernel/cpu/sgx/encls.h               |  16 +
 arch/x86/kernel/cpu/sgx/sgx.h                 |  23 +-
 arch/x86/kernel/cpu/microcode/core.c          |  44 ++
 arch/x86/kernel/cpu/sgx/encl.c                |  46 +-
 arch/x86/kernel/cpu/sgx/ioctl.c               |  53 +-
 arch/x86/kernel/cpu/sgx/main.c                | 469 +++++++++++++++++-
 arch/x86/kernel/cpu/sgx/virt.c                |  22 +
 .../ABI/testing/sysfs-devices-system-cpu      |  14 +
 Documentation/x86/sgx.rst                     |  43 ++
 12 files changed, 759 insertions(+), 25 deletions(-)

Comments

Thomas Gleixner March 9, 2022, 7:01 p.m. UTC | #1
Cathy,

On Wed, Mar 09 2022 at 18:40, Cathy Zhang wrote:
> Users hate reboots. This lets SGX enclaves attest to updated microcode
> without a reboot.

Users hate guesswork much more. And microcode updates without reboot are
guesswork because Intel fails to include information into the microcode
header which tells the kernel whether the update is safe to do on a
running system... Not your fault, but 

> Today, many microcode updates _can_ be applied without a reboot.
> But users have strongly and specifically expressed a desire to
> perform *any* microcode update on a running system without a reboot.

That's wishful thinking. Any microcode update which changes features or
behaviour can result in inconsistent state of the kernel/system. That's
a fact and proliferating the fairy tale that *any* microcode update can
be done late is just a marketing terminological inexactitude.

Can we please stick to facts?

> This series implements the infrastructure needed to track and tear
> down bare-metal enclaves and then run EUPDATESVN. This is expected
> to be triggered by administrators via sysfs at some convenient time
> after a microcode update, probably by the microcode update tooling
> itself.

Tear down after a microcode update? This does not make any sense at all,
really. If the enclaves become inconsistent due to the microcode update
then you want to tear them down _before_ the microcode update, then
update the microcode, run EUPDATESVN and then bring them up again.

Just because it somehow works does not mean it's correct.

Thanks,

        tglx
Dave Hansen March 9, 2022, 7:14 p.m. UTC | #2
On 3/9/22 11:01, Thomas Gleixner wrote:
>> This series implements the infrastructure needed to track and tear
>> down bare-metal enclaves and then run EUPDATESVN. This is expected
>> to be triggered by administrators via sysfs at some convenient time
>> after a microcode update, probably by the microcode update tooling
>> itself.
> Tear down after a microcode update? This does not make any sense at all,
> really. If the enclaves become inconsistent due to the microcode update

I don't think there's anything that makes the enclaves inconsistent from
the microcode update itself.

Let's imagine an extreme (thankfully imaginary) case: SGX has been
totally broken by some attack.  All running enclaves might have been
compromised.  A magical microcode update comes and saves the day and
mitigates the attack.

From the hardware perspective, at the time of the microcode update, the
(presumably compromised) enclaves *can* still run.  Nothing changes for
them.  The only thing they can't do is attest to the shiny new microcode.

Are you saying that the kernel should consider the enclaves inconsistent
at the time of the microcode update?  Or, were you thinking that the
microcode update process itself would make them inconsistent?
Borislav Petkov March 9, 2022, 7:36 p.m. UTC | #3
On Wed, Mar 09, 2022 at 11:14:22AM -0800, Dave Hansen wrote:
> Let's imagine an extreme (thankfully imaginary) case: SGX has been
> totally broken by some attack.  All running enclaves might have been
> compromised.  A magical microcode update comes and saves the day and
> mitigates the attack.
> 
> From the hardware perspective, at the time of the microcode update, the
> (presumably compromised) enclaves *can* still run.

Here's where you lost me: the enclaves are presumably compromised and
yet you wanna leave them running?! Isn't the strategy to kill them to
limit the spread of whatever has compromised them?
Dave Hansen March 9, 2022, 7:52 p.m. UTC | #4
On 3/9/22 11:36, Borislav Petkov wrote:
> On Wed, Mar 09, 2022 at 11:14:22AM -0800, Dave Hansen wrote:
>> Let's imagine an extreme (thankfully imaginary) case: SGX has been
>> totally broken by some attack.  All running enclaves might have been
>> compromised.  A magical microcode update comes and saves the day and
>> mitigates the attack.
>>
>> From the hardware perspective, at the time of the microcode update, the
>> (presumably compromised) enclaves *can* still run.
> Here's where you lost me: the enclaves are presumably compromised and
> yet you wanna leave them running?! Isn't the strategy to kill them to
> limit the spread of whatever has compromised them?

Killing them immediately is a totally valid policy.  But, I think it's
also a valid policy to continue to let them run.  Maybe you know they
were not vulnerable to whatever got mitigated.  Or, maybe they're
sufficiently sandboxed that they are not of any concern.  You want new
enclaves to be able to attest to the new microcode, but you're just not
that worried about the old ones.

This mechanism allows userspace to separate the "update the microcode"
and "destroy the enclaves" and implement a policy which separates them
(or doesn't).

In either case, the specific demand from end users for this flexibility
is clearly lacking.  I'm sure Cathy and Ashok will get working to flesh
that out.
Thomas Gleixner March 9, 2022, 8:15 p.m. UTC | #5
Dave,

On Wed, Mar 09 2022 at 11:14, Dave Hansen wrote:
> On 3/9/22 11:01, Thomas Gleixner wrote:
>>> This series implements the infrastructure needed to track and tear
>>> down bare-metal enclaves and then run EUPDATESVN. This is expected
>>> to be triggered by administrators via sysfs at some convenient time
>>> after a microcode update, probably by the microcode update tooling
>>> itself.
>> Tear down after a microcode update? This does not make any sense at all,
>> really. If the enclaves become inconsistent due to the microcode update
>
> I don't think there's anything that makes the enclaves inconsistent from
> the microcode update itself.
>
> Let's imagine an extreme (thankfully imaginary) case: SGX has been
> totally broken by some attack.  All running enclaves might have been
> compromised.  A magical microcode update comes and saves the day and
> mitigates the attack.
>
> From the hardware perspective, at the time of the microcode update, the
> (presumably compromised) enclaves *can* still run.  Nothing changes for
> them.  The only thing they can't do is attest to the shiny new
> microcode.

So lets spin that further in a timeline of events:

  6AM Info about CPU erratum which makes SGX vulnerable arrives with
      a fix in form of a microcode update

 12AM Microcode is updated

  6PM Enclaves are torn down

It technically works, but it does not make any sense at all.

I fundamentaly detest procedures which are violating common sense
especially when those violations are not backed up by any technical
arguments.

> Are you saying that the kernel should consider the enclaves inconsistent
> at the time of the microcode update?  Or, were you thinking that the
> microcode update process itself would make them inconsistent?

Inconsistent in the meaning that the attestation is moot at the point
of the microcode update because that attestation was done against the
previously loaded microcode.

That means if anything fundamentally changed by the microcode then the
enclave might become vulnerable by the microcode update itself because
the deployment decision based on the previous microcode is not longer
correct.

Unlikely, but not impossible and we had cases where certain mitigations
had to be redone in microcode which means that a code deployed based on
the previous microcode revision is not longer protected.

Again, this all might be a non issue, but with a cover letter based on
marketing ballyhoo, I'm not seeing how this can be correct under all
circumstances.

We write code and create proceedures based on the worst case assumptions
and by applying common sense and not based on what $customer has on his
wishlist. I'm all for serving customers, but ponies are not part of that.

Thanks,

        tglx
Dave Hansen March 9, 2022, 8:32 p.m. UTC | #6
On 3/9/22 02:40, Cathy Zhang wrote:
> This series implements the infrastructure needed to track and tear
> down bare-metal enclaves and then run EUPDATESVN. This is expected
> to be triggered by administrators via sysfs at some convenient time
> after a microcode update, probably by the microcode update tooling
> itself.

Cathy, if it isn't abundantly clear by now, everyone seems to hate this
part of the implementation.

Let's just make this just do EUPDATESVN as a part of the microcode
update process.  No new ABI.  No trying to preserve enclaves.  Kill them
early, kill them all, and be done with it.

If we merge that and we have end users chasing us with torches and
pitchforks because their precious enclaves were torn down, we'll think
about doing something different.
Ashok Raj March 9, 2022, 8:48 p.m. UTC | #7
On Wed, Mar 09, 2022 at 12:32:40PM -0800, Dave Hansen wrote:
> On 3/9/22 02:40, Cathy Zhang wrote:
> > This series implements the infrastructure needed to track and tear
> > down bare-metal enclaves and then run EUPDATESVN. This is expected
> > to be triggered by administrators via sysfs at some convenient time
> > after a microcode update, probably by the microcode update tooling
> > itself.
> 
> Cathy, if it isn't abundantly clear by now, everyone seems to hate this
> part of the implementation.

Certainly if there is good information that this ucode brings in SGX fixes
it absolutely makes sense to do that. Right now this information is only
communicated via release notes and some other agent like the orchestrator
decides the kill SGX part.

If we had a programmatic way to determine EUPDATESVN is required this
automatic kill is the way to go.
> 
> Let's just make this just do EUPDATESVN as a part of the microcode
> update process.  No new ABI.  No trying to preserve enclaves.  Kill them
> early, kill them all, and be done with it.

Maybe use some meta-data that can communicate this directly.

> 
> If we merge that and we have end users chasing us with torches and
> pitchforks because their precious enclaves were torn down, we'll think
> about doing something different.
Thomas Gleixner March 9, 2022, 11:09 p.m. UTC | #8
Ashok,

On Wed, Mar 09 2022 at 12:48, Ashok Raj wrote:
> On Wed, Mar 09, 2022 at 12:32:40PM -0800, Dave Hansen wrote:
>> On 3/9/22 02:40, Cathy Zhang wrote:
>> > This series implements the infrastructure needed to track and tear
>> > down bare-metal enclaves and then run EUPDATESVN. This is expected
>> > to be triggered by administrators via sysfs at some convenient time
>> > after a microcode update, probably by the microcode update tooling
>> > itself.
>> 
>> Cathy, if it isn't abundantly clear by now, everyone seems to hate this
>> part of the implementation.
>
> Certainly if there is good information that this ucode brings in SGX fixes
> it absolutely makes sense to do that. Right now this information is only
> communicated via release notes and some other agent like the orchestrator
> decides the kill SGX part.

the point is that the attestation is invalid when you load new
microcode. IOW, the microcode update creates inconsistent state.

Inconsistent state is not subject to discussion. It's wrong independent
of how much handwaving and wishful thinking you apply to it.

If $customer wants to have that then he has the freedom to do so, but we
are not merging any patches which are proliferating the "I want a pony"
mentality.

Thanks,

        tglx
Zhang, Cathy March 10, 2022, 5:24 a.m. UTC | #9
Hi Dave, Thomas,

Thanks for helping review!

I will remove the new ABI and let microcode update process call the interface directly to clean up EPC and do EUPDATESVN. Please let me know if you have other suggestion.

> -----Original Message-----
> From: Hansen, Dave <dave.hansen@intel.com>
> Sent: Thursday, March 10, 2022 4:33 AM
> To: Zhang, Cathy <cathy.zhang@intel.com>; linux-sgx@vger.kernel.org;
> x86@kernel.org; Raj, Ashok <ashok.raj@intel.com>
> Subject: Re: [RFC PATCH 00/11] Support microcode updates affecting SGX
> 
> On 3/9/22 02:40, Cathy Zhang wrote:
> > This series implements the infrastructure needed to track and tear
> > down bare-metal enclaves and then run EUPDATESVN. This is expected to
> > be triggered by administrators via sysfs at some convenient time after
> > a microcode update, probably by the microcode update tooling itself.
> 
> Cathy, if it isn't abundantly clear by now, everyone seems to hate this part of
> the implementation.
> 
> Let's just make this just do EUPDATESVN as a part of the microcode update
> process.  No new ABI.  No trying to preserve enclaves.  Kill them early, kill
> them all, and be done with it.
> 
> If we merge that and we have end users chasing us with torches and
> pitchforks because their precious enclaves were torn down, we'll think about
> doing something different.