diff mbox series

[v1] x86/kvm: Fix the decrypted pages free in kvmclock

Message ID 20240612111201.18012-1-lirongqing@baidu.com (mailing list archive)
State New, archived
Headers show
Series [v1] x86/kvm: Fix the decrypted pages free in kvmclock | expand

Commit Message

Li RongQing June 12, 2024, 11:12 a.m. UTC
When set_memory_decrypted() fails, pages may be left fully or partially
decrypted. before free the pages to return pool, it should be encrypted
via set_memory_encrypted(), or else this could lead to functional or
security issues, if encrypting the pages fails, leak the pages

Fixes: 6a1cac56f41f ("x86/kvm: Use __bss_decrypted attribute in shared variables")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 arch/x86/kernel/kvmclock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Sean Christopherson Aug. 13, 2024, 5 a.m. UTC | #1
+more x86 folks

On Wed, Jun 12, 2024, Li RongQing wrote:
> When set_memory_decrypted() fails, pages may be left fully or partially
> decrypted. before free the pages to return pool, it should be encrypted
> via set_memory_encrypted(), or else this could lead to functional or
> security issues, if encrypting the pages fails, leak the pages

That seems like a major flaw in the API, i.e. not something that should be "fixed"
in kvmclock, especially since the vmm_fail paths only WARN.

Commit 82ace185017f ("x86/mm/cpa: Warn for set_memory_XXcrypted() VMM fails")
says the reason for only warning is to be able to play nice with both security
and uptime:

    Such conversion errors may herald future system instability, but are
    temporarily survivable with proper handling in the caller. The kernel
    traditionally makes every effort to keep running, but it is expected that
    some coco guests may prefer to play it safe security-wise, and panic in
    this case.

But punting the issue to the caller doesn't help with that, it just makes it all
too easy to introduce security bugs.  Wouldn't it be better to do something along
the lines of CONFIG_BUG_ON_DATA_CORRUPTION (though maybe runtime configurable?)
and let the end user explicitly decide what to do?

> Fixes: 6a1cac56f41f ("x86/kvm: Use __bss_decrypted attribute in shared variables")
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  arch/x86/kernel/kvmclock.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index 5b2c152..5e9f9d2 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -228,7 +228,8 @@ static void __init kvmclock_init_mem(void)
>  		r = set_memory_decrypted((unsigned long) hvclock_mem,
>  					 1UL << order);
>  		if (r) {
> -			__free_pages(p, order);
> +			if (!set_memory_encrypted((unsigned long)hvclock_mem, 1UL << order))
> +				__free_pages(p, order);
>  			hvclock_mem = NULL;
>  			pr_warn("kvmclock: set_memory_decrypted() failed. Disabling\n");
>  			return;
> -- 
> 2.9.4
>
Rick Edgecombe Aug. 13, 2024, 2:58 p.m. UTC | #2
+Dan, Elena

On Mon, 2024-08-12 at 22:00 -0700, Sean Christopherson wrote:
> +more x86 folks
> 
> On Wed, Jun 12, 2024, Li RongQing wrote:
> > When set_memory_decrypted() fails, pages may be left fully or partially
> > decrypted. before free the pages to return pool, it should be encrypted
> > via set_memory_encrypted(), or else this could lead to functional or
> > security issues, if encrypting the pages fails, leak the pages
> 
> That seems like a major flaw in the API, i.e. not something that should be
> "fixed"
> in kvmclock, especially since the vmm_fail paths only WARN.
> 
> Commit 82ace185017f ("x86/mm/cpa: Warn for set_memory_XXcrypted() VMM fails")
> says the reason for only warning is to be able to play nice with both security
> and uptime:
> 
>     Such conversion errors may herald future system instability, but are
>     temporarily survivable with proper handling in the caller. The kernel
>     traditionally makes every effort to keep running, but it is expected that
>     some coco guests may prefer to play it safe security-wise, and panic in
>     this case.
> 
> But punting the issue to the caller doesn't help with that, it just makes it
> all
> too easy to introduce security bugs.  Wouldn't it be better to do something
> along
> the lines of CONFIG_BUG_ON_DATA_CORRUPTION (though maybe runtime
> configurable?)
> and let the end user explicitly decide what to do?

Yes, this is like what I tried to do first, but there was concern over crashing
over laziness when the kernel *could* handle the error. I agree that it is error
prone to rely on the callers to do it. But there is a strong resistance to
crashing the kernel among some folks, even for hardening reasons. panic_on_warn
seems to be the only generally agreed way. There is also the problem that
set_memory() errors are hypothetically possible for entirely guest reasons. The
function returns an error code and set_memory() calls don't roll back partially
completed operations. So on balance I think fixing the callers to handle errors
is a good option.


BTW, when I was originally fixing this pattern in all the callsites, I started
trying to fix this one too. But it was pointed out that kvm clock should not be
used with confidential guests for other reasons already:
https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#kvm-cpuid-features-and-hypercalls

So I didn't continue for this callsite. There has been length, heated, mostly
internal discussion on how much the kernel should help the user set up their
confidential guest (i.e. whether to put some guard rails for cases like kvm
clock).

One of the points of support for guard rails had been, regardless of what we
expect from users, we are going to create work for maintainers if we don't help
*developers* understand which areas of the kernel are exposed to real CC guests.
I think that is happening here, so maybe we should re-open that discussion?

> 
> > Fixes: 6a1cac56f41f ("x86/kvm: Use __bss_decrypted attribute in shared
> > variables")
> > Signed-off-by: Li RongQing <lirongqing@baidu.com>
> > ---
> >   arch/x86/kernel/kvmclock.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> > index 5b2c152..5e9f9d2 100644
> > --- a/arch/x86/kernel/kvmclock.c
> > +++ b/arch/x86/kernel/kvmclock.c
> > @@ -228,7 +228,8 @@ static void __init kvmclock_init_mem(void)
> >                 r = set_memory_decrypted((unsigned long) hvclock_mem,
> >                                          1UL << order);
> >                 if (r) {
> > -                       __free_pages(p, order);
> > +                       if (!set_memory_encrypted((unsigned
> > long)hvclock_mem, 1UL << order))
> > +                               __free_pages(p, order);

This is a rare error that can only happen once per boot. I'm not sure we should
have extra code to try to re-encrypt the pages. We could just leak them.

> >                         hvclock_mem = NULL;
> >                         pr_warn("kvmclock: set_memory_decrypted() failed.
> > Disabling\n");
> >                         return;
> > -- 
> > 2.9.4
diff mbox series

Patch

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 5b2c152..5e9f9d2 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -228,7 +228,8 @@  static void __init kvmclock_init_mem(void)
 		r = set_memory_decrypted((unsigned long) hvclock_mem,
 					 1UL << order);
 		if (r) {
-			__free_pages(p, order);
+			if (!set_memory_encrypted((unsigned long)hvclock_mem, 1UL << order))
+				__free_pages(p, order);
 			hvclock_mem = NULL;
 			pr_warn("kvmclock: set_memory_decrypted() failed. Disabling\n");
 			return;