Message ID | 20220309104050.18207-1-cathy.zhang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Support microcode updates affecting SGX | expand |
Cathy, On Wed, Mar 09 2022 at 18:40, Cathy Zhang wrote: > Users hate reboots. This lets SGX enclaves attest to updated microcode > without a reboot. Users hate guesswork much more. And microcode updates without reboot are guesswork because Intel fails to include information into the microcode header which tells the kernel whether the update is safe to do on a running system... Not your fault, but > Today, many microcode updates _can_ be applied without a reboot. > But users have strongly and specifically expressed a desire to > perform *any* microcode update on a running system without a reboot. That's wishful thinking. Any microcode update which changes features or behaviour can result in inconsistent state of the kernel/system. That's a fact and proliferating the fairy tale that *any* microcode update can be done late is just a marketing terminological inexactitude. Can we please stick to facts? > This series implements the infrastructure needed to track and tear > down bare-metal enclaves and then run EUPDATESVN. This is expected > to be triggered by administrators via sysfs at some convenient time > after a microcode update, probably by the microcode update tooling > itself. Tear down after a microcode update? This does not make any sense at all, really. If the enclaves become inconsistent due to the microcode update then you want to tear them down _before_ the microcode update, then update the microcode, run EUPDATESVN and then bring them up again. Just because it somehow works does not mean it's correct. Thanks, tglx
On 3/9/22 11:01, Thomas Gleixner wrote: >> This series implements the infrastructure needed to track and tear >> down bare-metal enclaves and then run EUPDATESVN. This is expected >> to be triggered by administrators via sysfs at some convenient time >> after a microcode update, probably by the microcode update tooling >> itself. > Tear down after a microcode update? This does not make any sense at all, > really. If the enclaves become inconsistent due to the microcode update I don't think there's anything that makes the enclaves inconsistent from the microcode update itself. Let's imagine an extreme (thankfully imaginary) case: SGX has been totally broken by some attack. All running enclaves might have been compromised. A magical microcode update comes and saves the day and mitigates the attack. From the hardware perspective, at the time of the microcode update, the (presumably compromised) enclaves *can* still run. Nothing changes for them. The only thing they can't do is attest to the shiny new microcode. Are you saying that the kernel should consider the enclaves inconsistent at the time of the microcode update? Or, were you thinking that the microcode update process itself would make them inconsistent?
On Wed, Mar 09, 2022 at 11:14:22AM -0800, Dave Hansen wrote: > Let's imagine an extreme (thankfully imaginary) case: SGX has been > totally broken by some attack. All running enclaves might have been > compromised. A magical microcode update comes and saves the day and > mitigates the attack. > > From the hardware perspective, at the time of the microcode update, the > (presumably compromised) enclaves *can* still run. Here's where you lost me: the enclaves are presumably compromised and yet you wanna leave them running?! Isn't the strategy to kill them to limit the spread of whatever has compromised them?
On 3/9/22 11:36, Borislav Petkov wrote: > On Wed, Mar 09, 2022 at 11:14:22AM -0800, Dave Hansen wrote: >> Let's imagine an extreme (thankfully imaginary) case: SGX has been >> totally broken by some attack. All running enclaves might have been >> compromised. A magical microcode update comes and saves the day and >> mitigates the attack. >> >> From the hardware perspective, at the time of the microcode update, the >> (presumably compromised) enclaves *can* still run. > Here's where you lost me: the enclaves are presumably compromised and > yet you wanna leave them running?! Isn't the strategy to kill them to > limit the spread of whatever has compromised them? Killing them immediately is a totally valid policy. But, I think it's also a valid policy to continue to let them run. Maybe you know they were not vulnerable to whatever got mitigated. Or, maybe they're sufficiently sandboxed that they are not of any concern. You want new enclaves to be able to attest to the new microcode, but you're just not that worried about the old ones. This mechanism allows userspace to separate the "update the microcode" and "destroy the enclaves" and implement a policy which separates them (or doesn't). In either case, the specific demand from end users for this flexibility is clearly lacking. I'm sure Cathy and Ashok will get working to flesh that out.
Dave, On Wed, Mar 09 2022 at 11:14, Dave Hansen wrote: > On 3/9/22 11:01, Thomas Gleixner wrote: >>> This series implements the infrastructure needed to track and tear >>> down bare-metal enclaves and then run EUPDATESVN. This is expected >>> to be triggered by administrators via sysfs at some convenient time >>> after a microcode update, probably by the microcode update tooling >>> itself. >> Tear down after a microcode update? This does not make any sense at all, >> really. If the enclaves become inconsistent due to the microcode update > > I don't think there's anything that makes the enclaves inconsistent from > the microcode update itself. > > Let's imagine an extreme (thankfully imaginary) case: SGX has been > totally broken by some attack. All running enclaves might have been > compromised. A magical microcode update comes and saves the day and > mitigates the attack. > > From the hardware perspective, at the time of the microcode update, the > (presumably compromised) enclaves *can* still run. Nothing changes for > them. The only thing they can't do is attest to the shiny new > microcode. So lets spin that further in a timeline of events: 6AM Info about CPU erratum which makes SGX vulnerable arrives with a fix in form of a microcode update 12AM Microcode is updated 6PM Enclaves are torn down It technically works, but it does not make any sense at all. I fundamentaly detest procedures which are violating common sense especially when those violations are not backed up by any technical arguments. > Are you saying that the kernel should consider the enclaves inconsistent > at the time of the microcode update? Or, were you thinking that the > microcode update process itself would make them inconsistent? Inconsistent in the meaning that the attestation is moot at the point of the microcode update because that attestation was done against the previously loaded microcode. That means if anything fundamentally changed by the microcode then the enclave might become vulnerable by the microcode update itself because the deployment decision based on the previous microcode is not longer correct. Unlikely, but not impossible and we had cases where certain mitigations had to be redone in microcode which means that a code deployed based on the previous microcode revision is not longer protected. Again, this all might be a non issue, but with a cover letter based on marketing ballyhoo, I'm not seeing how this can be correct under all circumstances. We write code and create proceedures based on the worst case assumptions and by applying common sense and not based on what $customer has on his wishlist. I'm all for serving customers, but ponies are not part of that. Thanks, tglx
On 3/9/22 02:40, Cathy Zhang wrote: > This series implements the infrastructure needed to track and tear > down bare-metal enclaves and then run EUPDATESVN. This is expected > to be triggered by administrators via sysfs at some convenient time > after a microcode update, probably by the microcode update tooling > itself. Cathy, if it isn't abundantly clear by now, everyone seems to hate this part of the implementation. Let's just make this just do EUPDATESVN as a part of the microcode update process. No new ABI. No trying to preserve enclaves. Kill them early, kill them all, and be done with it. If we merge that and we have end users chasing us with torches and pitchforks because their precious enclaves were torn down, we'll think about doing something different.
On Wed, Mar 09, 2022 at 12:32:40PM -0800, Dave Hansen wrote: > On 3/9/22 02:40, Cathy Zhang wrote: > > This series implements the infrastructure needed to track and tear > > down bare-metal enclaves and then run EUPDATESVN. This is expected > > to be triggered by administrators via sysfs at some convenient time > > after a microcode update, probably by the microcode update tooling > > itself. > > Cathy, if it isn't abundantly clear by now, everyone seems to hate this > part of the implementation. Certainly if there is good information that this ucode brings in SGX fixes it absolutely makes sense to do that. Right now this information is only communicated via release notes and some other agent like the orchestrator decides the kill SGX part. If we had a programmatic way to determine EUPDATESVN is required this automatic kill is the way to go. > > Let's just make this just do EUPDATESVN as a part of the microcode > update process. No new ABI. No trying to preserve enclaves. Kill them > early, kill them all, and be done with it. Maybe use some meta-data that can communicate this directly. > > If we merge that and we have end users chasing us with torches and > pitchforks because their precious enclaves were torn down, we'll think > about doing something different.
Ashok, On Wed, Mar 09 2022 at 12:48, Ashok Raj wrote: > On Wed, Mar 09, 2022 at 12:32:40PM -0800, Dave Hansen wrote: >> On 3/9/22 02:40, Cathy Zhang wrote: >> > This series implements the infrastructure needed to track and tear >> > down bare-metal enclaves and then run EUPDATESVN. This is expected >> > to be triggered by administrators via sysfs at some convenient time >> > after a microcode update, probably by the microcode update tooling >> > itself. >> >> Cathy, if it isn't abundantly clear by now, everyone seems to hate this >> part of the implementation. > > Certainly if there is good information that this ucode brings in SGX fixes > it absolutely makes sense to do that. Right now this information is only > communicated via release notes and some other agent like the orchestrator > decides the kill SGX part. the point is that the attestation is invalid when you load new microcode. IOW, the microcode update creates inconsistent state. Inconsistent state is not subject to discussion. It's wrong independent of how much handwaving and wishful thinking you apply to it. If $customer wants to have that then he has the freedom to do so, but we are not merging any patches which are proliferating the "I want a pony" mentality. Thanks, tglx
Hi Dave, Thomas, Thanks for helping review! I will remove the new ABI and let microcode update process call the interface directly to clean up EPC and do EUPDATESVN. Please let me know if you have other suggestion. > -----Original Message----- > From: Hansen, Dave <dave.hansen@intel.com> > Sent: Thursday, March 10, 2022 4:33 AM > To: Zhang, Cathy <cathy.zhang@intel.com>; linux-sgx@vger.kernel.org; > x86@kernel.org; Raj, Ashok <ashok.raj@intel.com> > Subject: Re: [RFC PATCH 00/11] Support microcode updates affecting SGX > > On 3/9/22 02:40, Cathy Zhang wrote: > > This series implements the infrastructure needed to track and tear > > down bare-metal enclaves and then run EUPDATESVN. This is expected to > > be triggered by administrators via sysfs at some convenient time after > > a microcode update, probably by the microcode update tooling itself. > > Cathy, if it isn't abundantly clear by now, everyone seems to hate this part of > the implementation. > > Let's just make this just do EUPDATESVN as a part of the microcode update > process. No new ABI. No trying to preserve enclaves. Kill them early, kill > them all, and be done with it. > > If we merge that and we have end users chasing us with torches and > pitchforks because their precious enclaves were torn down, we'll think about > doing something different.