diff mbox series

[v3,09/11] KVM: arm64: Introduce KVM_VM_TYPE_ARM_SW_PROTECTED machine type

Message ID 20250211121128.703390-10-tabba@google.com (mailing list archive)
State New
Headers show
Series KVM: Mapping guest_memfd backed memory at the host for software protected VMs | expand

Commit Message

Fuad Tabba Feb. 11, 2025, 12:11 p.m. UTC
Introduce a new virtual machine type,
KVM_VM_TYPE_ARM_SW_PROTECTED, to serve as a development and
testing vehicle for Confidential (CoCo) VMs, similar to the x86
KVM_X86_SW_PROTECTED_VM type.

Initially, this is used to test guest_memfd without needing any
underlying protection.

Similar to the x86 type, this is currently only for development
and testing.  Do not use KVM_VM_TYPE_ARM_SW_PROTECTED for "real"
VMs, and especially not in production. The behavior and effective
ABI for software-protected VMs is unstable.

Signed-off-by: Fuad Tabba <tabba@google.com>
---
 Documentation/virt/kvm/api.rst    |  5 +++++
 arch/arm64/include/asm/kvm_host.h | 10 ++++++++++
 arch/arm64/kvm/arm.c              |  5 +++++
 arch/arm64/kvm/mmu.c              |  3 ---
 include/uapi/linux/kvm.h          |  6 ++++++
 5 files changed, 26 insertions(+), 3 deletions(-)

Comments

Quentin Perret Feb. 11, 2025, 4:12 p.m. UTC | #1
Hi Fuad,

On Tuesday 11 Feb 2025 at 12:11:25 (+0000), Fuad Tabba wrote:
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 117937a895da..f155d3781e08 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -652,6 +652,12 @@ struct kvm_enable_cap {
>  #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
>  #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
>  	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> +
> +#define KVM_VM_TYPE_ARM_SW_PROTECTED	(1UL << 9)

FWIW, the downstream Android code has used bit 31 since forever
for that.

Although I very much believe that upstream should not care about the
downstream mess in general, in this particular instance bit 9 really
isn't superior in any way, and there's a bunch of existing userspace
code that uses bit 31 today as we speak. It is very much Android's
problem to update these userspace programs if we do go with bit 9
upstream, but I don't really see how that would benefit upstream
either.

So, given that there is no maintenance cost for upstream to use bit 31
instead of 9, I'd vote for using bit 31 and ease the landing with
existing userspace code, unless folks are really opinionated with this
stuff :)

Thanks,
Quentin

> +#define KVM_VM_TYPE_MASK	(KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
> +				 KVM_VM_TYPE_ARM_SW_PROTECTED)
> +
>  /*
>   * ioctls for /dev/kvm fds:
>   */
> -- 
> 2.48.1.502.g6dc24dfdaf-goog
>
Fuad Tabba Feb. 11, 2025, 4:17 p.m. UTC | #2
Hi Quentin,

On Tue, 11 Feb 2025 at 16:12, Quentin Perret <qperret@google.com> wrote:
>
> Hi Fuad,
>
> On Tuesday 11 Feb 2025 at 12:11:25 (+0000), Fuad Tabba wrote:
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 117937a895da..f155d3781e08 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -652,6 +652,12 @@ struct kvm_enable_cap {
> >  #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK        0xffULL
> >  #define KVM_VM_TYPE_ARM_IPA_SIZE(x)          \
> >       ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> > +
> > +#define KVM_VM_TYPE_ARM_SW_PROTECTED (1UL << 9)
>
> FWIW, the downstream Android code has used bit 31 since forever
> for that.
>
> Although I very much believe that upstream should not care about the
> downstream mess in general, in this particular instance bit 9 really
> isn't superior in any way, and there's a bunch of existing userspace
> code that uses bit 31 today as we speak. It is very much Android's
> problem to update these userspace programs if we do go with bit 9
> upstream, but I don't really see how that would benefit upstream
> either.
>
> So, given that there is no maintenance cost for upstream to use bit 31
> instead of 9, I'd vote for using bit 31 and ease the landing with
> existing userspace code, unless folks are really opinionated with this
> stuff :)

My thinking is that this bit does _not_ mean pKVM. It means an
experimental software VM that is similar to the x86
KVM_X86_SW_PROTECTED_VM. Hence why I didn't choose bit 31.

From Documentation/virt/kvm/api.rst (for x86):

'''
Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
production.  The behavior and effective ABI for software-protected VMs is
unstable.
'''

which is similar to the documentation I added here.

Cheers,
/fuad



> Thanks,
> Quentin
>
> > +#define KVM_VM_TYPE_MASK     (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
> > +                              KVM_VM_TYPE_ARM_SW_PROTECTED)
> > +
> >  /*
> >   * ioctls for /dev/kvm fds:
> >   */
> > --
> > 2.48.1.502.g6dc24dfdaf-goog
> >
Quentin Perret Feb. 11, 2025, 4:29 p.m. UTC | #3
On Tuesday 11 Feb 2025 at 16:17:25 (+0000), Fuad Tabba wrote:
> Hi Quentin,
> 
> On Tue, 11 Feb 2025 at 16:12, Quentin Perret <qperret@google.com> wrote:
> >
> > Hi Fuad,
> >
> > On Tuesday 11 Feb 2025 at 12:11:25 (+0000), Fuad Tabba wrote:
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 117937a895da..f155d3781e08 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -652,6 +652,12 @@ struct kvm_enable_cap {
> > >  #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK        0xffULL
> > >  #define KVM_VM_TYPE_ARM_IPA_SIZE(x)          \
> > >       ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
> > > +
> > > +#define KVM_VM_TYPE_ARM_SW_PROTECTED (1UL << 9)
> >
> > FWIW, the downstream Android code has used bit 31 since forever
> > for that.
> >
> > Although I very much believe that upstream should not care about the
> > downstream mess in general, in this particular instance bit 9 really
> > isn't superior in any way, and there's a bunch of existing userspace
> > code that uses bit 31 today as we speak. It is very much Android's
> > problem to update these userspace programs if we do go with bit 9
> > upstream, but I don't really see how that would benefit upstream
> > either.
> >
> > So, given that there is no maintenance cost for upstream to use bit 31
> > instead of 9, I'd vote for using bit 31 and ease the landing with
> > existing userspace code, unless folks are really opinionated with this
> > stuff :)
> 
> My thinking is that this bit does _not_ mean pKVM. It means an
> experimental software VM that is similar to the x86
> KVM_X86_SW_PROTECTED_VM. Hence why I didn't choose bit 31.
> 
> From Documentation/virt/kvm/api.rst (for x86):
> 
> '''
> Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
> Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
> production.  The behavior and effective ABI for software-protected VMs is
> unstable.
> '''
> 
> which is similar to the documentation I added here.

Aha, I see, but are we going to allocate _another_ bit for protected VMs
proper once they're supported? Or just update the doc for the existing
bit? If the latter, then I guess this discussion can still happen :)

Thanks,
Quentin
Patrick Roy Feb. 11, 2025, 4:32 p.m. UTC | #4
Hi Quentin,

On Tue, 2025-02-11 at 16:29 +0000, Quentin Perret wrote:> On Tuesday 11 Feb 2025 at 16:17:25 (+0000), Fuad Tabba wrote:
>> Hi Quentin,
>>
>> On Tue, 11 Feb 2025 at 16:12, Quentin Perret <qperret@google.com> wrote:
>>>
>>> Hi Fuad,
>>>
>>> On Tuesday 11 Feb 2025 at 12:11:25 (+0000), Fuad Tabba wrote:
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index 117937a895da..f155d3781e08 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -652,6 +652,12 @@ struct kvm_enable_cap {
>>>>  #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK        0xffULL
>>>>  #define KVM_VM_TYPE_ARM_IPA_SIZE(x)          \
>>>>       ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
>>>> +
>>>> +#define KVM_VM_TYPE_ARM_SW_PROTECTED (1UL << 9)
>>>
>>> FWIW, the downstream Android code has used bit 31 since forever
>>> for that.
>>>
>>> Although I very much believe that upstream should not care about the
>>> downstream mess in general, in this particular instance bit 9 really
>>> isn't superior in any way, and there's a bunch of existing userspace
>>> code that uses bit 31 today as we speak. It is very much Android's
>>> problem to update these userspace programs if we do go with bit 9
>>> upstream, but I don't really see how that would benefit upstream
>>> either.
>>>
>>> So, given that there is no maintenance cost for upstream to use bit 31
>>> instead of 9, I'd vote for using bit 31 and ease the landing with
>>> existing userspace code, unless folks are really opinionated with this
>>> stuff :)
>>
>> My thinking is that this bit does _not_ mean pKVM. It means an
>> experimental software VM that is similar to the x86
>> KVM_X86_SW_PROTECTED_VM. Hence why I didn't choose bit 31.
>>
>> From Documentation/virt/kvm/api.rst (for x86):
>>
>> '''
>> Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
>> Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
>> production.  The behavior and effective ABI for software-protected VMs is
>> unstable.
>> '''
>>
>> which is similar to the documentation I added here.
> 
> Aha, I see, but are we going to allocate _another_ bit for protected VMs
> proper once they're supported? Or just update the doc for the existing
> bit? If the latter, then I guess this discussion can still happen :)

I was hoping that SW_PROTECTED_VM will be the VM type that something
like Firecracker could use, e.g. an interface to guest_memfd specifically
_without_ pKVM, as Fuad was saying.

> Thanks,
> Quentin

Best, 
Patrick
Quentin Perret Feb. 11, 2025, 5:09 p.m. UTC | #5
Hi Patrick,

On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> I was hoping that SW_PROTECTED_VM will be the VM type that something
> like Firecracker could use, e.g. an interface to guest_memfd specifically
> _without_ pKVM, as Fuad was saying.

I had, probably incorrectly, assumed that we'd eventually want to allow
gmem for all VMs, including traditional KVM VMs that don't have anything
special. Perhaps the gmem support could be exposed via a KVM_CAP in this
case?

Anyway, no objection to the proposed approach in this patch assuming we
will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
bit 31 :).

Thanks,
Quentin
Quentin Perret Feb. 14, 2025, 11:13 a.m. UTC | #6
On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> Hi Patrick,
> 
> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> > I was hoping that SW_PROTECTED_VM will be the VM type that something
> > like Firecracker could use, e.g. an interface to guest_memfd specifically
> > _without_ pKVM, as Fuad was saying.
> 
> I had, probably incorrectly, assumed that we'd eventually want to allow
> gmem for all VMs, including traditional KVM VMs that don't have anything
> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> case?
> 
> Anyway, no objection to the proposed approach in this patch assuming we
> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> bit 31 :).

Thinking about this a bit deeper, I am still wondering what this new
SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
both guest-memfd backed memslots and traditional HVA-backed memslots, we
could just make normal KVM guests accept guest-memfd memslots and get
the same thing? Is there any reason not to do that instead? Even though
SW_PROTECTED VMs are documented as 'unstable', the reality is this is
UAPI and you can bet it will end up being relied upon, so I would prefer
to have a solid reason for introducing this new VM type.

Cheers,
Quentin
Fuad Tabba Feb. 14, 2025, 11:33 a.m. UTC | #7
Hi Quentin,

On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
>
> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> > Hi Patrick,
> >
> > On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> > > I was hoping that SW_PROTECTED_VM will be the VM type that something
> > > like Firecracker could use, e.g. an interface to guest_memfd specifically
> > > _without_ pKVM, as Fuad was saying.
> >
> > I had, probably incorrectly, assumed that we'd eventually want to allow
> > gmem for all VMs, including traditional KVM VMs that don't have anything
> > special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> > case?
> >
> > Anyway, no objection to the proposed approach in this patch assuming we
> > will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> > bit 31 :).
>
> Thinking about this a bit deeper, I am still wondering what this new
> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
> both guest-memfd backed memslots and traditional HVA-backed memslots, we
> could just make normal KVM guests accept guest-memfd memslots and get
> the same thing? Is there any reason not to do that instead? Even though
> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
> UAPI and you can bet it will end up being relied upon, so I would prefer
> to have a solid reason for introducing this new VM type.

The more I think about it, I agree with you. I think that reasonable
behavior (for kvm/arm64) would be to allow using guest_memfd with all
VM types. If the VM type is a non-protected type, then its memory is
considered shared by default and is mappable --- as long as the
kconfig option is enabled. If VM is protected then the memory is not
shared by default.

What do you think Patrick? Do you need an explicit VM type?

Cheers,
/fuad

> Cheers,
> Quentin
Patrick Roy Feb. 14, 2025, 12:37 p.m. UTC | #8
On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
> Hi Quentin,
> 
> On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
>>
>> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
>>> Hi Patrick,
>>>
>>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
>>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
>>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
>>>> _without_ pKVM, as Fuad was saying.
>>>
>>> I had, probably incorrectly, assumed that we'd eventually want to allow
>>> gmem for all VMs, including traditional KVM VMs that don't have anything
>>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
>>> case?
>>>
>>> Anyway, no objection to the proposed approach in this patch assuming we
>>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
>>> bit 31 :).
>>
>> Thinking about this a bit deeper, I am still wondering what this new
>> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
>> both guest-memfd backed memslots and traditional HVA-backed memslots, we
>> could just make normal KVM guests accept guest-memfd memslots and get
>> the same thing? Is there any reason not to do that instead? Even though
>> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
>> UAPI and you can bet it will end up being relied upon, so I would prefer
>> to have a solid reason for introducing this new VM type.
> 
> The more I think about it, I agree with you. I think that reasonable
> behavior (for kvm/arm64) would be to allow using guest_memfd with all
> VM types. If the VM type is a non-protected type, then its memory is
> considered shared by default and is mappable --- as long as the
> kconfig option is enabled. If VM is protected then the memory is not
> shared by default.
> 
> What do you think Patrick? Do you need an explicit VM type?

Mhh, no, if "normal" VMs support guest_memfd, then that works too. I
suggested the VM type because that's how x86 works
(KVM_X86_SW_PROTECTED_VM), but never actually stopped to think about
whether it makes sense for ARM. Maybe Sean knows something we're missing?

I wonder whether having the "default sharedness" depend on the vm type
works out though - whether a range of gmem is shared or private is a
property of the guest_memfd instance, not the VM it's attached to, so I
guess the default behavior needs to be based solely on the guest_memfd
as well (and then if someone tries to attach a gmem to a VM whose desire
of protection doesnt match the guest_memfd's configuration, that
operation would fail)?

Tangentially related, does KVM_GMEM_SHARED to you mean "guest_memfd also
supports shared sections", or "guest_memfd does not support private
memory anymore"? (the difference being that in the former, then
KVM_GMEM_SHARED would later get the ability to convert ranges private,
and the EOPNOSUPP is just a transient state until conversion support is
merged) - doesnt matter for my usecase, but I got curious as some other
threads implied the second option to me and I ended up wondering why.

Best,
Patrick

> Cheers,
> /fuad
> 
>> Cheers,
>> Quentin
Fuad Tabba Feb. 14, 2025, 1:11 p.m. UTC | #9
Hi Patrick,

On Fri, 14 Feb 2025 at 12:37, Patrick Roy <roypat@amazon.co.uk> wrote:
>
>
>
> On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
> > Hi Quentin,
> >
> > On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
> >>
> >> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> >>> Hi Patrick,
> >>>
> >>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> >>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
> >>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
> >>>> _without_ pKVM, as Fuad was saying.
> >>>
> >>> I had, probably incorrectly, assumed that we'd eventually want to allow
> >>> gmem for all VMs, including traditional KVM VMs that don't have anything
> >>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> >>> case?
> >>>
> >>> Anyway, no objection to the proposed approach in this patch assuming we
> >>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> >>> bit 31 :).
> >>
> >> Thinking about this a bit deeper, I am still wondering what this new
> >> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
> >> both guest-memfd backed memslots and traditional HVA-backed memslots, we
> >> could just make normal KVM guests accept guest-memfd memslots and get
> >> the same thing? Is there any reason not to do that instead? Even though
> >> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
> >> UAPI and you can bet it will end up being relied upon, so I would prefer
> >> to have a solid reason for introducing this new VM type.
> >
> > The more I think about it, I agree with you. I think that reasonable
> > behavior (for kvm/arm64) would be to allow using guest_memfd with all
> > VM types. If the VM type is a non-protected type, then its memory is
> > considered shared by default and is mappable --- as long as the
> > kconfig option is enabled. If VM is protected then the memory is not
> > shared by default.
> >
> > What do you think Patrick? Do you need an explicit VM type?
>
> Mhh, no, if "normal" VMs support guest_memfd, then that works too. I
> suggested the VM type because that's how x86 works
> (KVM_X86_SW_PROTECTED_VM), but never actually stopped to think about
> whether it makes sense for ARM. Maybe Sean knows something we're missing?
>
> I wonder whether having the "default sharedness" depend on the vm type
> works out though - whether a range of gmem is shared or private is a
> property of the guest_memfd instance, not the VM it's attached to, so I
> guess the default behavior needs to be based solely on the guest_memfd
> as well (and then if someone tries to attach a gmem to a VM whose desire
> of protection doesnt match the guest_memfd's configuration, that
> operation would fail)?

Each guest_memfd is associated with a KVM instance. Although it could
migrate, it would be weird for a guest_memfd instance to migrate
between different types of VM, or at least, migrate between VMs that
have different confidentiality requirements.


> Tangentially related, does KVM_GMEM_SHARED to you mean "guest_memfd also
> supports shared sections", or "guest_memfd does not support private
> memory anymore"? (the difference being that in the former, then
> KVM_GMEM_SHARED would later get the ability to convert ranges private,
> and the EOPNOSUPP is just a transient state until conversion support is
> merged) - doesnt matter for my usecase, but I got curious as some other
> threads implied the second option to me and I ended up wondering why.

My thinking (and implementation in the other patch series) is that
KVM_GMEM_SHARED (back then called KVM_GMEM_MAPPABLE) allows sharing in
place/mapping, without adding restrictions.

Cheers,
/fuad

> Best,
> Patrick
>
> > Cheers,
> > /fuad
> >
> >> Cheers,
> >> Quentin
Patrick Roy Feb. 14, 2025, 1:18 p.m. UTC | #10
On Fri, 2025-02-14 at 13:11 +0000, Fuad Tabba wrote:
> Hi Patrick,
> 
> On Fri, 14 Feb 2025 at 12:37, Patrick Roy <roypat@amazon.co.uk> wrote:
>>
>>
>>
>> On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
>>> Hi Quentin,
>>>
>>> On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
>>>>
>>>> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
>>>>> Hi Patrick,
>>>>>
>>>>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
>>>>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
>>>>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
>>>>>> _without_ pKVM, as Fuad was saying.
>>>>>
>>>>> I had, probably incorrectly, assumed that we'd eventually want to allow
>>>>> gmem for all VMs, including traditional KVM VMs that don't have anything
>>>>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
>>>>> case?
>>>>>
>>>>> Anyway, no objection to the proposed approach in this patch assuming we
>>>>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
>>>>> bit 31 :).
>>>>
>>>> Thinking about this a bit deeper, I am still wondering what this new
>>>> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
>>>> both guest-memfd backed memslots and traditional HVA-backed memslots, we
>>>> could just make normal KVM guests accept guest-memfd memslots and get
>>>> the same thing? Is there any reason not to do that instead? Even though
>>>> SW_PROTECTED VMs are documented as 'unstable', the reality is this is
>>>> UAPI and you can bet it will end up being relied upon, so I would prefer
>>>> to have a solid reason for introducing this new VM type.
>>>
>>> The more I think about it, I agree with you. I think that reasonable
>>> behavior (for kvm/arm64) would be to allow using guest_memfd with all
>>> VM types. If the VM type is a non-protected type, then its memory is
>>> considered shared by default and is mappable --- as long as the
>>> kconfig option is enabled. If VM is protected then the memory is not
>>> shared by default.
>>>
>>> What do you think Patrick? Do you need an explicit VM type?
>>
>> Mhh, no, if "normal" VMs support guest_memfd, then that works too. I
>> suggested the VM type because that's how x86 works
>> (KVM_X86_SW_PROTECTED_VM), but never actually stopped to think about
>> whether it makes sense for ARM. Maybe Sean knows something we're missing?
>>
>> I wonder whether having the "default sharedness" depend on the vm type
>> works out though - whether a range of gmem is shared or private is a
>> property of the guest_memfd instance, not the VM it's attached to, so I
>> guess the default behavior needs to be based solely on the guest_memfd
>> as well (and then if someone tries to attach a gmem to a VM whose desire
>> of protection doesnt match the guest_memfd's configuration, that
>> operation would fail)?
> 
> Each guest_memfd is associated with a KVM instance. Although it could
> migrate, it would be weird for a guest_memfd instance to migrate
> between different types of VM, or at least, migrate between VMs that
> have different confidentiality requirements.

Ahh, right, I keep forgetting that CREATE_GUEST_MEMFD() is a vm ioctl. My
bad, sorry!

>> Tangentially related, does KVM_GMEM_SHARED to you mean "guest_memfd also
>> supports shared sections", or "guest_memfd does not support private
>> memory anymore"? (the difference being that in the former, then
>> KVM_GMEM_SHARED would later get the ability to convert ranges private,
>> and the EOPNOSUPP is just a transient state until conversion support is
>> merged) - doesnt matter for my usecase, but I got curious as some other
>> threads implied the second option to me and I ended up wondering why.
> 
> My thinking (and implementation in the other patch series) is that
> KVM_GMEM_SHARED (back then called KVM_GMEM_MAPPABLE) allows sharing in
> place/mapping, without adding restrictions.

That makes sense to me, thanks for the explanation!

> Cheers,
> /fuad
> 
>> Best,
>> Patrick
>>
>>> Cheers,
>>> /fuad
>>>
>>>> Cheers,
>>>> Quentin
Sean Christopherson Feb. 14, 2025, 3:12 p.m. UTC | #11
On Fri, Feb 14, 2025, Patrick Roy wrote:
> On Fri, 2025-02-14 at 13:11 +0000, Fuad Tabba wrote:
> > On Fri, 14 Feb 2025 at 12:37, Patrick Roy <roypat@amazon.co.uk> wrote:
> >> On Fri, 2025-02-14 at 11:33 +0000, Fuad Tabba wrote:
> >>> Hi Quentin,
> >>>
> >>> On Fri, 14 Feb 2025 at 11:13, Quentin Perret <qperret@google.com> wrote:
> >>>>
> >>>> On Tuesday 11 Feb 2025 at 17:09:20 (+0000), Quentin Perret wrote:
> >>>>> Hi Patrick,
> >>>>>
> >>>>> On Tuesday 11 Feb 2025 at 16:32:31 (+0000), Patrick Roy wrote:
> >>>>>> I was hoping that SW_PROTECTED_VM will be the VM type that something
> >>>>>> like Firecracker could use, e.g. an interface to guest_memfd specifically
> >>>>>> _without_ pKVM, as Fuad was saying.
> >>>>>
> >>>>> I had, probably incorrectly, assumed that we'd eventually want to allow
> >>>>> gmem for all VMs, including traditional KVM VMs that don't have anything
> >>>>> special. Perhaps the gmem support could be exposed via a KVM_CAP in this
> >>>>> case?
> >>>>>
> >>>>> Anyway, no objection to the proposed approach in this patch assuming we
> >>>>> will eventually have HW_PROTECTED_VM for pKVM VMs, and that _that_ can be
> >>>>> bit 31 :).
> >>>>
> >>>> Thinking about this a bit deeper, I am still wondering what this new
> >>>> SW_PROTECTED VM type is buying us? Given that SW_PROTECTED VMs accept
> >>>> both guest-memfd backed memslots and traditional HVA-backed memslots, we
> >>>> could just make normal KVM guests accept guest-memfd memslots and get
> >>>> the same thing? Is there any reason not to do that instead?

Once guest_memfd can be mmap()'d, no.  KVM_X86_SW_PROTECTED_VM was added for
testing and development of guest_memfd largely because KVM can't support a "real"
VM if KVM can't read/write guest memory through its normal mechanisms.  The gap
is most apparent on x86, but it holds true for arm64 as well.

> >>>> Even though SW_PROTECTED VMs are documented as 'unstable', the reality
> >>>> is this is UAPI and you can bet it will end up being relied upon, so I
> >>>> would prefer to have a solid reason for introducing this new VM type.
> >>>
> >>> The more I think about it, I agree with you. I think that reasonable
> >>> behavior (for kvm/arm64) would be to allow using guest_memfd with all
> >>> VM types. If the VM type is a non-protected type, then its memory is
> >>> considered shared by default and is mappable --- as long as the
> >>> kconfig option is enabled. If VM is protected then the memory is not
> >>> shared by default.

This aligns with what I see happening for x86, except that for non-protected VMs
there will be no shared vs. private, because such VMs won't have a concept of
private memory.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2b52eb77e29c..0fccee4feee7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -214,6 +214,11 @@  exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects
 size of the address translated by the stage2 level (guest physical to
 host physical address translations).
 
+KVM_VM_TYPE_ARM_SW_PROTECTED is currently only for development and testing of
+confidential VMs without having underlying support. Do not use
+KVM_VM_TYPE_ARM_SW_PROTECTED for "real" VMs, and especially not in production.
+The behavior and effective ABI for software-protected VMs is unstable.
+
 
 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST
 ----------------------------------------------------------
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cfa024de4e3..a4276d56f54d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -383,6 +383,8 @@  struct kvm_arch {
 	 * the associated pKVM instance in the hypervisor.
 	 */
 	struct kvm_protected_vm pkvm;
+
+	unsigned long vm_type;
 };
 
 struct kvm_vcpu_fault_info {
@@ -1555,4 +1557,12 @@  void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val);
 #define kvm_has_s1poe(k)				\
 	(kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP))
 
+#define kvm_arch_has_private_mem(kvm)			\
+	(IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&		\
+	 ((kvm)->arch.vm_type & KVM_VM_TYPE_ARM_SW_PROTECTED))
+
+#define kvm_arch_gmem_supports_shared_mem(kvm)		\
+	(IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) &&	\
+	 ((kvm)->arch.vm_type & KVM_VM_TYPE_ARM_SW_PROTECTED))
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 071a7d75be68..a2066db52ada 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -146,6 +146,9 @@  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
 
+	if (type & ~KVM_VM_TYPE_MASK)
+		return -EINVAL;
+
 	mutex_init(&kvm->arch.config_lock);
 
 #ifdef CONFIG_LOCKDEP
@@ -187,6 +190,8 @@  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	bitmap_zero(kvm->arch.vcpu_features, KVM_VCPU_MAX_FEATURES);
 
+	kvm->arch.vm_type = type;
+
 	return 0;
 
 err_free_cpumask:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 305060518766..b89649d31127 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -882,9 +882,6 @@  static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
 	u64 mmfr0, mmfr1;
 	u32 phys_shift;
 
-	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
-		return -EINVAL;
-
 	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
 	if (is_protected_kvm_enabled()) {
 		phys_shift = kvm_ipa_limit;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 117937a895da..f155d3781e08 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -652,6 +652,12 @@  struct kvm_enable_cap {
 #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
 #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
 	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_SW_PROTECTED	(1UL << 9)
+
+#define KVM_VM_TYPE_MASK	(KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+				 KVM_VM_TYPE_ARM_SW_PROTECTED)
+
 /*
  * ioctls for /dev/kvm fds:
  */