Message ID | 20200922004024.3699923-2-vipinsh@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: SVM: Cgroup support for SVM SEV ASIDs | expand |
Hi, On 9/21/20 5:40 PM, Vipin Sharma wrote: > diff --git a/init/Kconfig b/init/Kconfig > index d6a0b31b13dc..1a57c362b803 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1101,6 +1101,20 @@ config CGROUP_BPF > BPF_CGROUP_INET_INGRESS will be executed on the ingress path of > inet sockets. > > +config CGROUP_SEV > + bool "SEV ASID controller" > + depends on KVM_AMD_SEV > + default n > + help > + Provides a controller for AMD SEV ASIDs. This controller limits and > + shows the total usage of SEV ASIDs used in encrypted VMs on AMD > + processors. Whenever a new encrypted VM is created using SEV on an > + AMD processor, this controller will check the current limit in the > + cgroup to which the task belongs and will deny the SEV ASID if the > + cgroup has already reached its limit. > + > + Say N if unsure. Something here (either in the bool prompt string or the help text) should let a reader know w.t.h. SEV means. Without having to look in other places... thanks.
On Mon, Sep 21, 2020 at 06:04:04PM -0700, Randy Dunlap wrote: > Hi, > > On 9/21/20 5:40 PM, Vipin Sharma wrote: > > diff --git a/init/Kconfig b/init/Kconfig > > index d6a0b31b13dc..1a57c362b803 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -1101,6 +1101,20 @@ config CGROUP_BPF > > BPF_CGROUP_INET_INGRESS will be executed on the ingress path of > > inet sockets. > > > > +config CGROUP_SEV > > + bool "SEV ASID controller" > > + depends on KVM_AMD_SEV > > + default n > > + help > > + Provides a controller for AMD SEV ASIDs. This controller limits and > > + shows the total usage of SEV ASIDs used in encrypted VMs on AMD > > + processors. Whenever a new encrypted VM is created using SEV on an > > + AMD processor, this controller will check the current limit in the > > + cgroup to which the task belongs and will deny the SEV ASID if the > > + cgroup has already reached its limit. > > + > > + Say N if unsure. > > Something here (either in the bool prompt string or the help text) should > let a reader know w.t.h. SEV means. > > Without having to look in other places... ASIDs too. I'd also love to see more info in the docs and/or cover letter to explain why ASID management on SEV requires a cgroup. I know what an ASID is, and have a decent idea of how KVM manages ASIDs for legacy VMs, but I know nothing about why ASIDs are limited for SEV and not legacy VMs.
On Mon, Sep 21, 2020 at 06:22:28PM -0700, Sean Christopherson wrote: > On Mon, Sep 21, 2020 at 06:04:04PM -0700, Randy Dunlap wrote: > > Hi, > > > > On 9/21/20 5:40 PM, Vipin Sharma wrote: > > > diff --git a/init/Kconfig b/init/Kconfig > > > index d6a0b31b13dc..1a57c362b803 100644 > > > --- a/init/Kconfig > > > +++ b/init/Kconfig > > > @@ -1101,6 +1101,20 @@ config CGROUP_BPF > > > BPF_CGROUP_INET_INGRESS will be executed on the ingress path of > > > inet sockets. > > > > > > +config CGROUP_SEV > > > + bool "SEV ASID controller" > > > + depends on KVM_AMD_SEV > > > + default n > > > + help > > > + Provides a controller for AMD SEV ASIDs. This controller limits and > > > + shows the total usage of SEV ASIDs used in encrypted VMs on AMD > > > + processors. Whenever a new encrypted VM is created using SEV on an > > > + AMD processor, this controller will check the current limit in the > > > + cgroup to which the task belongs and will deny the SEV ASID if the > > > + cgroup has already reached its limit. > > > + > > > + Say N if unsure. > > > > Something here (either in the bool prompt string or the help text) should > > let a reader know w.t.h. SEV means. > > > > Without having to look in other places... > > ASIDs too. I'd also love to see more info in the docs and/or cover letter > to explain why ASID management on SEV requires a cgroup. I know what an > ASID is, and have a decent idea of how KVM manages ASIDs for legacy VMs, but > I know nothing about why ASIDs are limited for SEV and not legacy VMs. Thanks for the feedback, I will add more details in the Kconfig and the documentation about SEV and ASID.
On Mon, 2020-09-21 at 18:22 -0700, Sean Christopherson wrote: > On Mon, Sep 21, 2020 at 06:04:04PM -0700, Randy Dunlap wrote: > > Hi, > > > > On 9/21/20 5:40 PM, Vipin Sharma wrote: > > > diff --git a/init/Kconfig b/init/Kconfig > > > index d6a0b31b13dc..1a57c362b803 100644 > > > --- a/init/Kconfig > > > +++ b/init/Kconfig > > > @@ -1101,6 +1101,20 @@ config CGROUP_BPF > > > BPF_CGROUP_INET_INGRESS will be executed on the ingress path > > > of > > > inet sockets. > > > > > > +config CGROUP_SEV > > > + bool "SEV ASID controller" > > > + depends on KVM_AMD_SEV > > > + default n > > > + help > > > + Provides a controller for AMD SEV ASIDs. This controller > > > limits and > > > + shows the total usage of SEV ASIDs used in encrypted VMs on > > > AMD > > > + processors. Whenever a new encrypted VM is created using SEV > > > on an > > > + AMD processor, this controller will check the current limit > > > in the > > > + cgroup to which the task belongs and will deny the SEV ASID > > > if the > > > + cgroup has already reached its limit. > > > + > > > + Say N if unsure. > > > > Something here (either in the bool prompt string or the help text) > > should let a reader know w.t.h. SEV means. > > > > Without having to look in other places... > > ASIDs too. I'd also love to see more info in the docs and/or cover > letter to explain why ASID management on SEV requires a cgroup. I > know what an ASID is, and have a decent idea of how KVM manages ASIDs > for legacy VMs, but I know nothing about why ASIDs are limited for > SEV and not legacy VMs. Well, also, why would we only have a cgroup for ASIDs but not MSIDs? For the reader at home a Space ID (SID) is simply a tag that can be placed on a cache line to control things like flushing. Intel and AMD use MSIDs which are allocated per process to allow fast context switching by flushing all the process pages using a flush by SID. ASIDs are also used by both Intel and AMD to control nested/extended paging of virtual machines, so ASIDs are allocated per VM. So far it's universal. AMD invented a mechanism for tying their memory encryption technology to the ASID asserted on the memory bus, so now they can do encrypted virtual machines since each VM is tagged by ASID which the memory encryptor sees. It is suspected that the forthcoming intel TDX technology to encrypt VMs will operate in the same way as well. This isn't everything you have to do to get an encrypted VM, but it's a core part of it. The problem with SIDs (both A and M) is that they get crammed into spare bits in the CPU (like the upper bits of %CR3 for MSID) so we don't have enough of them to do a 1:1 mapping of MSID to process or ASID to VM. Thus we have to ration them somewhat, which is what I assume this patch is about? James
On Tue, Nov 03, 2020 at 08:39:12AM -0800, James Bottomley wrote: > On Mon, 2020-09-21 at 18:22 -0700, Sean Christopherson wrote: > > ASIDs too. I'd also love to see more info in the docs and/or cover > > letter to explain why ASID management on SEV requires a cgroup. I > > know what an ASID is, and have a decent idea of how KVM manages ASIDs > > for legacy VMs, but I know nothing about why ASIDs are limited for > > SEV and not legacy VMs. > > Well, also, why would we only have a cgroup for ASIDs but not MSIDs? Assuming MSID==PCID in Intel terminology, which may be a bad assumption, the answer is that rationing PCIDs is a fools errand, at least on Intel CPUs. > For the reader at home a Space ID (SID) is simply a tag that can be > placed on a cache line to control things like flushing. Intel and AMD > use MSIDs which are allocated per process to allow fast context > switching by flushing all the process pages using a flush by SID. > ASIDs are also used by both Intel and AMD to control nested/extended > paging of virtual machines, so ASIDs are allocated per VM. So far it's > universal. On Intel CPUs, multiple things factor into the actual ASID that is used to tag TLB entries. And underneath the hood, there are a _very_ limited number of ASIDs that are globally shared, i.e. a process in the host has an ASID, same as a process in the guest, and the CPU only supports tagging translations for N ASIDs at any given time. E.g. with TDX, the hardware/real ASID is derived from: VPID + PCID + SEAM + EPTP where VPID=0 for host, PCID=0 if PCID is disabled, SEAM=1 for the TDX-Module and TDX VMs, and obviously EPTP is invalid/ignored when EPT is disabled. > AMD invented a mechanism for tying their memory encryption technology > to the ASID asserted on the memory bus, so now they can do encrypted > virtual machines since each VM is tagged by ASID which the memory > encryptor sees. It is suspected that the forthcoming intel TDX > technology to encrypt VMs will operate in the same way as well. This TDX uses MKTME keys, which are not tied to the ASID. The KeyID is part of the physical address, at least in the initial hardware implementations, which means that from a memory perspective, each KeyID is a unique physical address. This is completely orthogonal to ASIDs, e.g. a given KeyID+PA combo can have mutliple TLB entries if it's accessed by multiple ASIDs. > isn't everything you have to do to get an encrypted VM, but it's a core > part of it. > > The problem with SIDs (both A and M) is that they get crammed into > spare bits in the CPU (like the upper bits of %CR3 for MSID) so we This CR3 reference is why I assume MSID==PCID, but the PCID is carved out of the lower bits (11:0) of CR3, which is why I'm unsure I interpreted this correctly. > don't have enough of them to do a 1:1 mapping of MSID to process or > ASID to VM. Thus we have to ration them somewhat, which is what I > assume this patch is about? This cgroup is more about a hard limitation than about performance. With PCIDs, VPIDs, and AMD's ASIDs, there is always the option of recycling an existing ID (used for PCIDs and ASIDs), or simply disabling the feature (used for VPIDs). In both cases, exhausting the resource affects performance due to incurring TLB flushes at transition points, but doesn't prevent creating new processes/VMs. And due to the way PCID=>ASID derivation works on Intel CPUs, the kernel doesn't even bother trying to use a large number of PCIDs. IIRC, the current number of PCIDs used by the kernel is 5, i.e. the kernel intentionally recycles PCIDs long before it's forced to do so by the architectural limitation of 4k PCIDs, because using more than 5 PCIDs actually hurts performance (forced PCID recycling allows the kernel to keep *its* ASID live by flushing userspace PCIDs, whereas CPU recycling of ASIDs is indiscriminate). MKTME KeyIDs and SEV ASIDs are different. There is a hard, relatively low limit on the number of IDs that are available, and exhausting that pool effectively prevents creating a new encrypted VM[*]. E.g. with TDX, on first gen hardware there is a hard limit of 127 KeyIDs that can be used to create TDX VMs. IIRC, SEV-ES is capped 512 or so ASIDs. Hitting that cap means no more protected VMs can be created. [*] KeyID exhaustion for TDX is a hard restriction, the old VM _must_ be torn down to reuse the KeyID. ASID exhaustion for SEV is not technically a hard limit, e.g. KVM could theoretically park a VM to reuse its ASID, but for all intents and purposes that VM is no longer live.
On Tue, 2020-11-03 at 10:10 -0800, Sean Christopherson wrote: > On Tue, Nov 03, 2020 at 08:39:12AM -0800, James Bottomley wrote: > > On Mon, 2020-09-21 at 18:22 -0700, Sean Christopherson wrote: > > > ASIDs too. I'd also love to see more info in the docs and/or > > > cover letter to explain why ASID management on SEV requires a > > > cgroup. I know what an ASID is, and have a decent idea of how > > > KVM manages ASIDs for legacy VMs, but I know nothing about why > > > ASIDs are limited for SEV and not legacy VMs. > > > > Well, also, why would we only have a cgroup for ASIDs but not > > MSIDs? > > Assuming MSID==PCID in Intel terminology, which may be a bad > assumption, the answer is that rationing PCIDs is a fools errand, at > least on Intel CPUs. Yes, sorry, I should probably have confessed that I'm most used to parisc SIDs, which are additional 32 bit qualifiers the CPU explicitly adds to every virtual address. The perform exactly the same function, though except they're a bit more explicit (and we have more bits). On PA every virtual address is actually a GVA consisting of 32 bit of SID and 64 bits of VA and we use this 96 byte address for virtual indexing and things. And parisc doesn't have virtualization acceleration so we only have one type of SID. Thanks for the rest of the elaboration. James
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 4a3081e9f4b5..bbbf10fc1b50 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -16,6 +16,7 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o +kvm-$(CONFIG_CGROUP_SEV) += svm/sev_cgroup.o kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o vmx/evmcs.o vmx/nested.o kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 7bf7bf734979..2cc0bea21a76 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -17,6 +17,7 @@ #include "x86.h" #include "svm.h" +#include "sev_cgroup.h" static int sev_flush_asids(void); static DECLARE_RWSEM(sev_deactivate_lock); @@ -80,7 +81,7 @@ static bool __sev_recycle_asids(void) static int sev_asid_new(void) { bool retry = true; - int pos; + int pos, ret; mutex_lock(&sev_bitmap_lock); @@ -98,6 +99,12 @@ static int sev_asid_new(void) return -EBUSY; } + ret = sev_asid_try_charge(pos); + if (ret) { + mutex_unlock(&sev_bitmap_lock); + return ret; + } + __set_bit(pos, sev_asid_bitmap); mutex_unlock(&sev_bitmap_lock); @@ -127,6 +134,8 @@ static void sev_asid_free(int asid) sd->sev_vmcbs[pos] = NULL; } + sev_asid_uncharge(pos); + mutex_unlock(&sev_bitmap_lock); } @@ -1143,6 +1152,9 @@ int __init sev_hardware_setup(void) if (!status) return 1; + if (sev_cgroup_setup(max_sev_asid)) + return 1; + /* * Check SEV platform status. * @@ -1157,6 +1169,7 @@ int __init sev_hardware_setup(void) pr_info("SEV supported\n"); err: + sev_cgroup_teardown(); kfree(status); return rc; } @@ -1170,6 +1183,7 @@ void sev_hardware_teardown(void) bitmap_free(sev_reclaim_asid_bitmap); sev_flush_asids(); + sev_cgroup_teardown(); } void pre_sev_run(struct vcpu_svm *svm, int cpu) diff --git a/arch/x86/kvm/svm/sev_cgroup.c b/arch/x86/kvm/svm/sev_cgroup.c new file mode 100644 index 000000000000..f76a934b8cf2 --- /dev/null +++ b/arch/x86/kvm/svm/sev_cgroup.c @@ -0,0 +1,414 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * SEV cgroup controller + * + * Copyright 2020 Google LLC + * Author: Vipin Sharma <vipinsh@google.com> + */ + +#include <linux/cgroup.h> +#include <linux/mutex.h> +#include <linux/slab.h> +#include <linux/lockdep.h> + +#define MAX_SEV_ASIDS_STR "max" + +/** + * struct sev_cgroup - Stores SEV ASID related cgroup data. + * @css: cgroup subsys state object. + * @max: Max limit of the count of the SEV ASIDs in the cgroup. + * @usage: Current count of the SEV ASIDs in the cgroup. + * @allocation_failure_event: Number of times the SEV ASIDs allocation denied. + * @events_file: File handle for sev.events file. + */ +struct sev_cgroup { + struct cgroup_subsys_state css; + unsigned int max; + unsigned int usage; + unsigned long allocation_failure_event; + struct cgroup_file events_file; +}; + +/* Maximum number of sev asids supported in the platform */ +static unsigned int sev_max_asids; + +/* Global array to store which ASID is charged to which cgroup */ +static struct sev_cgroup **sev_asids_cgroup_array; + +/* + * To synchronize sev_asids_cgroup_array changes from charging/uncharging, + * css_offline, max, and printing used ASIDs. + */ +static DEFINE_MUTEX(sev_cgroup_lock); + +/** + * css_sev() - Get sev_cgroup from the css. + * @css: cgroup subsys state object. + * + * Context: Any context. + * Return: + * * %NULL - If @css is null. + * * struct sev_cgroup * - SEV cgroup of the specified css. + */ +static struct sev_cgroup *css_sev(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct sev_cgroup, css) : NULL; +} + +/** + * parent_sev_cgroup() - Get the parent sev cgroup in the cgroup hierarchy + * @sevcg: sev cgroup node whose parent is needed. + * + * Context: Any context. + * Return: + * * struct sev_cgroup * - Parent sev cgroup in the hierarchy. + * * %NULL - If @sevcg is null or it is the root in the hierarchy. + */ +static struct sev_cgroup *parent_sev_cgroup(struct sev_cgroup *sevcg) +{ + return sevcg ? css_sev(sevcg->css.parent) : NULL; +} + +/* + * sev_asid_cgroup_dec() - Decrement the SEV ASID usage in the cgroup. + * @sevcg: SEV cgroup. + * + * Context: Any context. Expects sev_cgroup_lock mutex to be held by the + * caller. + */ +static void sev_asid_cgroup_dec(struct sev_cgroup *sevcg) +{ + lockdep_assert_held(&sev_cgroup_lock); + sevcg->usage--; + /* + * If this ever becomes max then there is a bug in the SEV cgroup code. + */ + WARN_ON_ONCE(sevcg->usage == UINT_MAX); +} + +/** + * sev_asid_try_charge() - Try charging an SEV ASID to the cgroup. + * @pos: Index of SEV ASID in the SEV ASIDs bitmap. + * + * Try charging an SEV ASID to the current task's cgroup and all its ancestors + * up to the root. If charging is not possible due to the limit constraint, + * then notify the event file and return -errorno. + * + * Context: Process context. Takes and release sev_cgroup_lock mutex. + * Return: + * * 0 - If successfully charged the cgroup. + * * -EINVAL - If pos is not valid. + * * -EBUSY - If usage has already reached the limit. + */ +int sev_asid_try_charge(int pos) +{ + struct sev_cgroup *start, *i, *j; + int ret = 0; + + mutex_lock(&sev_cgroup_lock); + + start = css_sev(task_css(current, sev_cgrp_id)); + + for (i = start; i; i = parent_sev_cgroup(i)) { + if (i->usage == i->max) + goto e_limit; + + i->usage++; + } + + sev_asids_cgroup_array[pos] = start; +exit: + mutex_unlock(&sev_cgroup_lock); + return ret; + +e_limit: + for (j = start; j != i; j = parent_sev_cgroup(j)) + sev_asid_cgroup_dec(j); + + start->allocation_failure_event++; + cgroup_file_notify(&start->events_file); + + ret = -EBUSY; + goto exit; +} +EXPORT_SYMBOL(sev_asid_try_charge); + +/** + * sev_asid_uncharge() - Uncharge an SEV ASID from the cgroup. + * @pos: Index of SEV ASID in the SEV ASIDs bitmap. + * + * Uncharge an SEV ASID from the cgroup to which it was charged in + * sev_asid_try_charge(). + * + * Context: Process context. Takes and release sev_cgroup_lock mutex. + */ +void sev_asid_uncharge(int pos) +{ + struct sev_cgroup *i; + + mutex_lock(&sev_cgroup_lock); + + for (i = sev_asids_cgroup_array[pos]; i; i = parent_sev_cgroup(i)) + sev_asid_cgroup_dec(i); + + sev_asids_cgroup_array[pos] = NULL; + + mutex_unlock(&sev_cgroup_lock); +} +EXPORT_SYMBOL(sev_asid_uncharge); + +/** + * sev_cgroup_setup() - Setup the sev cgroup before charging. + * @max: Maximum number of SEV ASIDs supported by the platform. + * + * Initialize the sev_asids_cgroup_array which stores ASID to cgroup mapping. + * + * Context: Process context. Takes and release sev_cgroup_lock mutex. + * Return: + * * 0 - If setup was successful. + * * -ENOMEM - If memory not available to allocate the array. + */ +int sev_cgroup_setup(unsigned int max) +{ + int ret = 0; + + mutex_lock(&sev_cgroup_lock); + + sev_max_asids = max; + sev_asids_cgroup_array = kcalloc(sev_max_asids, + sizeof(struct sev_cgroup *), + GFP_KERNEL); + if (!sev_asids_cgroup_array) { + sev_max_asids = 0; + ret = -ENOMEM; + } + + mutex_unlock(&sev_cgroup_lock); + + return ret; +} +EXPORT_SYMBOL(sev_cgroup_setup); + +/** + * sev_cgroup_teardown() - Release resources, no more charging/uncharging will + * happen. + * + * Context: Process context. Takes and release sev_cgroup_lock mutex. + */ +void sev_cgroup_teardown(void) +{ + mutex_lock(&sev_cgroup_lock); + + kfree(sev_asids_cgroup_array); + sev_asids_cgroup_array = NULL; + sev_max_asids = 0; + + mutex_unlock(&sev_cgroup_lock); +} +EXPORT_SYMBOL(sev_cgroup_teardown); + +/** + * sev_max_write() - Take user supplied max value limit for the cgroup. + * @of: Handler for the file. + * @buf: Data from the user. + * @nbytes: Number of bytes of the data. + * @off: Offset in the file. + * + * Context: Process context. Takes and release sev_cgroup_lock mutex. + * Return: + * * >= 0 - Number of bytes read in the buffer. + * * -EINVAL - If @buf is lower than the current usage, negative, exceeds max + * value of u32, or not a number. + */ +static ssize_t sev_max_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + struct sev_cgroup *sevcg; + unsigned int max; + int err; + + buf = strstrip(buf); + if (!strcmp(buf, MAX_SEV_ASIDS_STR)) { + max = UINT_MAX; + } else { + err = kstrtouint(buf, 0, &max); + if (err) + return err; + } + + sevcg = css_sev(of_css(of)); + + mutex_lock(&sev_cgroup_lock); + + if (max < sevcg->usage) { + mutex_unlock(&sev_cgroup_lock); + return -EINVAL; + } + + sevcg->max = max; + + mutex_unlock(&sev_cgroup_lock); + return nbytes; +} + +/** + * sev_max_show() - Print the current max limit in the cgroup. + * @sf: Interface file + * @v: Arguments passed + * + * Context: Any context. + * @Return: 0 to denote successful print. + */ +static int sev_max_show(struct seq_file *sf, void *v) +{ + unsigned int max = css_sev(seq_css(sf))->max; + + if (max == UINT_MAX) + seq_printf(sf, "%s\n", MAX_SEV_ASIDS_STR); + else + seq_printf(sf, "%u\n", max); + + return 0; +} + +/** + * sev_current() - Get the current usage of SEV ASIDs in the cgroup. + * @css: cgroup subsys state object + * @cft: Handler for cgroup interface file + * + * Context: Any context. + * Return: Current count of SEV ASIDs used in the cgroup. + */ +static u64 sev_current(struct cgroup_subsys_state *css, struct cftype *cft) +{ + return css_sev(css)->usage; +} + +/** + * sev_events() - Show the tally of events that occurred in the SEV cgroup. + * @sf: Interface file. + * @v: Arguments passed. + * + * Context: Any context. + * Return: 0 to denote the successful print. + */ +static int sev_events(struct seq_file *sf, void *v) +{ + struct cgroup_subsys_state *css = seq_css(sf); + + seq_printf(sf, "max %lu\n", css_sev(css)->allocation_failure_event); + return 0; +} + +/* sev cgroup interface files */ +static struct cftype sev_files[] = { + { + /* Maximum count of SEV ASIDs allowed */ + .name = "max", + .write = sev_max_write, + .seq_show = sev_max_show, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { + /* Current usage of SEV ASIDs */ + .name = "current", + .read_u64 = sev_current, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { + /* + * Flat keyed event file. + * + * max %allocation_failure_event + * Number of times SEV ASIDs not allocated because current + * usage reached the max limit + */ + .name = "events", + .file_offset = offsetof(struct sev_cgroup, events_file), + .seq_show = sev_events, + .flags = CFTYPE_NOT_ON_ROOT, + }, + {} +}; + +/** + * sev_css_alloc() - Allocate a sev cgroup node in the cgroup hieararchy. + * @parent_css: cgroup subsys state of the parent cgroup node. + * + * Context: Process context. + * Return: + * * struct cgroup_subsys_state * - Pointer to css field of struct sev_cgroup. + * * ERR_PTR(-ENOMEM) - No memory available to create sev_cgroup node. + */ +static struct cgroup_subsys_state * +sev_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct sev_cgroup *sevcg; + + sevcg = kzalloc(sizeof(*sevcg), GFP_KERNEL); + if (!sevcg) + return ERR_PTR(-ENOMEM); + + sevcg->max = UINT_MAX; + sevcg->usage = 0; + sevcg->allocation_failure_event = 0; + + return &sevcg->css; +} + +/** + * sev_css_free() - Free the sev_cgroup that @css belongs to. + * @css: cgroup subsys state object + * + * Context: Any context. + */ +static void sev_css_free(struct cgroup_subsys_state *css) +{ + kfree(css_sev(css)); +} + +/** + * sev_css_offline() - cgroup is killed, move charges to parent. + * @css: css of the killed cgroup. + * + * Since charges do not migrate when the task moves, a killed css might have + * charges. Update the sev_asids_cgroup_array to point to the @css->parent. + * Parent is already charged in sev_asid_try_charge(), so its usage need not + * change. + * + * Context: Process context. Takes and release sev_cgroup_lock mutex. + */ +static void sev_css_offline(struct cgroup_subsys_state *css) +{ + struct sev_cgroup *sevcg, *parentcg; + int i; + + if (!css->parent) + return; + + sevcg = css_sev(css); + + mutex_lock(&sev_cgroup_lock); + + if (!sevcg->usage) { + mutex_unlock(&sev_cgroup_lock); + return; + } + + parentcg = parent_sev_cgroup(sevcg); + + for (i = 0; i < sev_max_asids; i++) { + if (sev_asids_cgroup_array[i] == sevcg) + sev_asids_cgroup_array[i] = parentcg; + } + + mutex_unlock(&sev_cgroup_lock); +} + +struct cgroup_subsys sev_cgrp_subsys = { + .css_alloc = sev_css_alloc, + .css_free = sev_css_free, + .css_offline = sev_css_offline, + .legacy_cftypes = sev_files, + .dfl_cftypes = sev_files +}; diff --git a/arch/x86/kvm/svm/sev_cgroup.h b/arch/x86/kvm/svm/sev_cgroup.h new file mode 100644 index 000000000000..d2d69870a005 --- /dev/null +++ b/arch/x86/kvm/svm/sev_cgroup.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * SEV cgroup interface for charging and uncharging the cgroup. + * + * Copyright 2020 Google LLC + * Author: Vipin Sharma <vipinsh@google.com> + */ + +#ifndef _SEV_CGROUP_H_ +#define _SEV_CGROUP_H_ + +#ifdef CONFIG_CGROUP_SEV + +int sev_asid_try_charge(int pos); +void sev_asid_uncharge(int pos); +int sev_cgroup_setup(unsigned int max); +void sev_cgroup_teardown(void); + +#else /* CONFIG_CGROUP_SEV */ + +static inline int sev_asid_try_charge(int pos) +{ + return 0; +} + +static inline void sev_asid_uncharge(int pos) +{ +} + +static inline int sev_cgroup_setup(unsigned int max) +{ + return 0; +} + +static inline void sev_cgroup_teardown(void) +{ +} +#endif /* CONFIG_CGROUP_SEV */ + +#endif /* _SEV_CGROUP_H_ */ diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index acb77dcff3b4..d21a5b4a2037 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -61,6 +61,9 @@ SUBSYS(pids) SUBSYS(rdma) #endif +#if IS_ENABLED(CONFIG_CGROUP_SEV) +SUBSYS(sev) +#endif /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/init/Kconfig b/init/Kconfig index d6a0b31b13dc..1a57c362b803 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1101,6 +1101,20 @@ config CGROUP_BPF BPF_CGROUP_INET_INGRESS will be executed on the ingress path of inet sockets. +config CGROUP_SEV + bool "SEV ASID controller" + depends on KVM_AMD_SEV + default n + help + Provides a controller for AMD SEV ASIDs. This controller limits and + shows the total usage of SEV ASIDs used in encrypted VMs on AMD + processors. Whenever a new encrypted VM is created using SEV on an + AMD processor, this controller will check the current limit in the + cgroup to which the task belongs and will deny the SEV ASID if the + cgroup has already reached its limit. + + Say N if unsure. + config CGROUP_DEBUG bool "Debug controller" default n