From patchwork Thu May 14 15:38:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549141 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 231DE59D for ; Thu, 14 May 2020 15:38:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07E1C2065D for ; Thu, 14 May 2020 15:38:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727772AbgENPim (ORCPT ); Thu, 14 May 2020 11:38:42 -0400 Received: from foss.arm.com ([217.140.110.172]:39144 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726056AbgENPil (ORCPT ); Thu, 14 May 2020 11:38:41 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DFC971045; Thu, 14 May 2020 08:38:40 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A84523F71E; Thu, 14 May 2020 08:38:39 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking Date: Thu, 14 May 2020 16:38:18 +0100 Message-Id: <1589470709-4104-2-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org kvmtool uses brlock for protecting accesses to the ioport and mmio red-black trees. brlock allows concurrent reads, but only one writer, which is assumed not to be a VCPU thread (for more information see commit 0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a compiler barrier on read and pausing the entire virtual machine on writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write lock. When we will implement reassignable BARs, the mmio or ioport mapping will be done as a result of a VCPU mmio access. When brlock is a pthread read/write lock, it means that we will try to acquire a write lock with the read lock already held by the same VCPU and we will deadlock. When it's not, a VCPU will have to call kvm__pause, which means the virtual machine will stay paused forever. Let's avoid all this by using a mutex and reference counting the red-black tree entries. This way we can guarantee that we won't unregister a node that another thread is currently using for emulation. Signed-off-by: Alexandru Elisei Reviewed-by: Andre Przywara --- include/kvm/ioport.h | 2 + include/kvm/rbtree-interval.h | 4 +- ioport.c | 85 ++++++++++++++++++++++----------- mmio.c | 107 ++++++++++++++++++++++++++++++++---------- 4 files changed, 143 insertions(+), 55 deletions(-) diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h index 62a719327e3f..039633f76bdd 100644 --- a/include/kvm/ioport.h +++ b/include/kvm/ioport.h @@ -22,6 +22,8 @@ struct ioport { struct ioport_operations *ops; void *priv; struct device_header dev_hdr; + u32 refcount; + bool remove; }; struct ioport_operations { diff --git a/include/kvm/rbtree-interval.h b/include/kvm/rbtree-interval.h index 730eb5e8551d..17cd3b5f3199 100644 --- a/include/kvm/rbtree-interval.h +++ b/include/kvm/rbtree-interval.h @@ -6,7 +6,9 @@ #define RB_INT_INIT(l, h) \ (struct rb_int_node){.low = l, .high = h} -#define rb_int(n) rb_entry(n, struct rb_int_node, node) +#define rb_int(n) rb_entry(n, struct rb_int_node, node) +#define rb_int_start(n) ((n)->low) +#define rb_int_end(n) ((n)->low + (n)->high - 1) struct rb_int_node { struct rb_node node; diff --git a/ioport.c b/ioport.c index d9f2e8ea3c3b..844a832d25a4 100644 --- a/ioport.c +++ b/ioport.c @@ -2,7 +2,6 @@ #include "kvm/kvm.h" #include "kvm/util.h" -#include "kvm/brlock.h" #include "kvm/rbtree-interval.h" #include "kvm/mutex.h" @@ -16,6 +15,8 @@ #define ioport_node(n) rb_entry(n, struct ioport, node) +static DEFINE_MUTEX(ioport_lock); + static struct rb_root ioport_tree = RB_ROOT; static struct ioport *ioport_search(struct rb_root *root, u64 addr) @@ -39,6 +40,36 @@ static void ioport_remove(struct rb_root *root, struct ioport *data) rb_int_erase(root, &data->node); } +static struct ioport *ioport_get(struct rb_root *root, u64 addr) +{ + struct ioport *ioport; + + mutex_lock(&ioport_lock); + ioport = ioport_search(root, addr); + if (ioport) + ioport->refcount++; + mutex_unlock(&ioport_lock); + + return ioport; +} + +/* Called with ioport_lock held. */ +static void ioport_unregister(struct rb_root *root, struct ioport *data) +{ + device__unregister(&data->dev_hdr); + ioport_remove(root, data); + free(data); +} + +static void ioport_put(struct rb_root *root, struct ioport *data) +{ + mutex_lock(&ioport_lock); + data->refcount--; + if (data->remove && data->refcount == 0) + ioport_unregister(root, data); + mutex_unlock(&ioport_lock); +} + #ifdef CONFIG_HAS_LIBFDT static void generate_ioport_fdt_node(void *fdt, struct device_header *dev_hdr, @@ -80,16 +111,22 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i .bus_type = DEVICE_BUS_IOPORT, .data = generate_ioport_fdt_node, }, + /* + * Start from 0 because ioport__unregister() doesn't decrement + * the reference count. + */ + .refcount = 0, + .remove = false, }; - br_write_lock(kvm); + mutex_lock(&ioport_lock); r = ioport_insert(&ioport_tree, entry); if (r < 0) goto out_free; r = device__register(&entry->dev_hdr); if (r < 0) goto out_remove; - br_write_unlock(kvm); + mutex_unlock(&ioport_lock); return port; @@ -97,33 +134,28 @@ out_remove: ioport_remove(&ioport_tree, entry); out_free: free(entry); - br_write_unlock(kvm); + mutex_unlock(&ioport_lock); return r; } int ioport__unregister(struct kvm *kvm, u16 port) { struct ioport *entry; - int r; - - br_write_lock(kvm); - r = -ENOENT; + mutex_lock(&ioport_lock); entry = ioport_search(&ioport_tree, port); - if (!entry) - goto done; - - device__unregister(&entry->dev_hdr); - ioport_remove(&ioport_tree, entry); - - free(entry); - - r = 0; - -done: - br_write_unlock(kvm); + if (!entry) { + mutex_unlock(&ioport_lock); + return -ENOENT; + } + /* The same reasoning from kvm__deregister_mmio() applies. */ + if (entry->refcount == 0) + ioport_unregister(&ioport_tree, entry); + else + entry->remove = true; + mutex_unlock(&ioport_lock); - return r; + return 0; } static void ioport__unregister_all(void) @@ -136,9 +168,7 @@ static void ioport__unregister_all(void) while (rb) { rb_node = rb_int(rb); entry = ioport_node(rb_node); - device__unregister(&entry->dev_hdr); - ioport_remove(&ioport_tree, entry); - free(entry); + ioport_unregister(&ioport_tree, entry); rb = rb_first(&ioport_tree); } } @@ -164,8 +194,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, void *ptr = data; struct kvm *kvm = vcpu->kvm; - br_read_lock(kvm); - entry = ioport_search(&ioport_tree, port); + entry = ioport_get(&ioport_tree, port); if (!entry) goto out; @@ -180,9 +209,9 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, ptr += size; } -out: - br_read_unlock(kvm); + ioport_put(&ioport_tree, entry); +out: if (ret) return true; diff --git a/mmio.c b/mmio.c index 61e1d47a587d..cd141cd30750 100644 --- a/mmio.c +++ b/mmio.c @@ -1,7 +1,7 @@ #include "kvm/kvm.h" #include "kvm/kvm-cpu.h" #include "kvm/rbtree-interval.h" -#include "kvm/brlock.h" +#include "kvm/mutex.h" #include #include @@ -15,10 +15,14 @@ #define mmio_node(n) rb_entry(n, struct mmio_mapping, node) +static DEFINE_MUTEX(mmio_lock); + struct mmio_mapping { struct rb_int_node node; void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr); void *ptr; + u32 refcount; + bool remove; }; static struct rb_root mmio_tree = RB_ROOT; @@ -51,6 +55,11 @@ static int mmio_insert(struct rb_root *root, struct mmio_mapping *data) return rb_int_insert(root, &data->node); } +static void mmio_remove(struct rb_root *root, struct mmio_mapping *data) +{ + rb_int_erase(root, &data->node); +} + static const char *to_direction(u8 is_write) { if (is_write) @@ -59,6 +68,41 @@ static const char *to_direction(u8 is_write) return "read"; } +static struct mmio_mapping *mmio_get(struct rb_root *root, u64 phys_addr, u32 len) +{ + struct mmio_mapping *mmio; + + mutex_lock(&mmio_lock); + mmio = mmio_search(root, phys_addr, len); + if (mmio) + mmio->refcount++; + mutex_unlock(&mmio_lock); + + return mmio; +} + +/* Called with mmio_lock held. */ +static void mmio_deregister(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio) +{ + struct kvm_coalesced_mmio_zone zone = (struct kvm_coalesced_mmio_zone) { + .addr = rb_int_start(&mmio->node), + .size = 1, + }; + ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone); + + mmio_remove(root, mmio); + free(mmio); +} + +static void mmio_put(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio) +{ + mutex_lock(&mmio_lock); + mmio->refcount--; + if (mmio->remove && mmio->refcount == 0) + mmio_deregister(kvm, root, mmio); + mutex_unlock(&mmio_lock); +} + int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce, void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr), void *ptr) @@ -72,9 +116,15 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c return -ENOMEM; *mmio = (struct mmio_mapping) { - .node = RB_INT_INIT(phys_addr, phys_addr + phys_addr_len), - .mmio_fn = mmio_fn, - .ptr = ptr, + .node = RB_INT_INIT(phys_addr, phys_addr + phys_addr_len), + .mmio_fn = mmio_fn, + .ptr = ptr, + /* + * Start from 0 because kvm__deregister_mmio() doesn't decrement + * the reference count. + */ + .refcount = 0, + .remove = false, }; if (coalesce) { @@ -88,9 +138,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c return -errno; } } - br_write_lock(kvm); + mutex_lock(&mmio_lock); ret = mmio_insert(&mmio_tree, mmio); - br_write_unlock(kvm); + mutex_unlock(&mmio_lock); return ret; } @@ -98,25 +148,30 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr) { struct mmio_mapping *mmio; - struct kvm_coalesced_mmio_zone zone; - br_write_lock(kvm); + mutex_lock(&mmio_lock); mmio = mmio_search_single(&mmio_tree, phys_addr); if (mmio == NULL) { - br_write_unlock(kvm); + mutex_unlock(&mmio_lock); return false; } + /* + * The PCI emulation code calls this function when memory access is + * disabled for a device, or when a BAR has a new address assigned. PCI + * emulation doesn't use any locks and as a result we can end up in a + * situation where we have called mmio_get() to do emulation on one VCPU + * thread (let's call it VCPU0), and several other VCPU threads have + * called kvm__deregister_mmio(). In this case, if we decrement refcount + * kvm__deregister_mmio() (either directly, or by calling mmio_put()), + * refcount will reach 0 and we will free the mmio node before VCPU0 has + * called mmio_put(). This will trigger use-after-free errors on VCPU0. + */ + if (mmio->refcount == 0) + mmio_deregister(kvm, &mmio_tree, mmio); + else + mmio->remove = true; + mutex_unlock(&mmio_lock); - zone = (struct kvm_coalesced_mmio_zone) { - .addr = phys_addr, - .size = 1, - }; - ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone); - - rb_int_erase(&mmio_tree, &mmio->node); - br_write_unlock(kvm); - - free(mmio); return true; } @@ -124,18 +179,18 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u { struct mmio_mapping *mmio; - br_read_lock(vcpu->kvm); - mmio = mmio_search(&mmio_tree, phys_addr, len); - - if (mmio) - mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr); - else { + mmio = mmio_get(&mmio_tree, phys_addr, len); + if (!mmio) { if (vcpu->kvm->cfg.mmio_debug) fprintf(stderr, "Warning: Ignoring MMIO %s at %016llx (length %u)\n", to_direction(is_write), (unsigned long long)phys_addr, len); + goto out; } - br_read_unlock(vcpu->kvm); + mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr); + mmio_put(vcpu->kvm, &mmio_tree, mmio); + +out: return true; } From patchwork Thu May 14 15:38:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549143 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92A9A59D for ; Thu, 14 May 2020 15:38:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8524C20767 for ; Thu, 14 May 2020 15:38:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727904AbgENPin (ORCPT ); Thu, 14 May 2020 11:38:43 -0400 Received: from foss.arm.com ([217.140.110.172]:39150 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726056AbgENPin (ORCPT ); Thu, 14 May 2020 11:38:43 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3DEBB11B3; Thu, 14 May 2020 08:38:42 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 23C733F71E; Thu, 14 May 2020 08:38:41 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 02/12] pci: Add helpers for BAR values and memory/IO space access Date: Thu, 14 May 2020 16:38:19 +0100 Message-Id: <1589470709-4104-3-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We're going to be checking the BAR type, the address written to it and if access to memory or I/O space is enabled quite often when we add support for reasignable BARs; make our life easier by adding helpers for it. Reviewed-by: Andre Przywara Signed-off-by: Alexandru Elisei --- include/kvm/pci.h | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ pci.c | 4 ++-- powerpc/spapr_pci.c | 2 +- 3 files changed, 56 insertions(+), 3 deletions(-) diff --git a/include/kvm/pci.h b/include/kvm/pci.h index 2c29c094d4cb..0f0815b36157 100644 --- a/include/kvm/pci.h +++ b/include/kvm/pci.h @@ -5,6 +5,7 @@ #include #include #include +#include #include "kvm/devices.h" #include "kvm/msi.h" @@ -161,4 +162,56 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type); +static inline bool __pci__memory_space_enabled(u16 command) +{ + return command & PCI_COMMAND_MEMORY; +} + +static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr) +{ + return __pci__memory_space_enabled(pci_hdr->command); +} + +static inline bool __pci__io_space_enabled(u16 command) +{ + return command & PCI_COMMAND_IO; +} + +static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr) +{ + return __pci__io_space_enabled(pci_hdr->command); +} + +static inline bool __pci__bar_is_io(u32 bar) +{ + return bar & PCI_BASE_ADDRESS_SPACE_IO; +} + +static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num) +{ + return __pci__bar_is_io(pci_hdr->bar[bar_num]); +} + +static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num) +{ + return !pci__bar_is_io(pci_hdr, bar_num); +} + +static inline u32 __pci__bar_address(u32 bar) +{ + if (__pci__bar_is_io(bar)) + return bar & PCI_BASE_ADDRESS_IO_MASK; + return bar & PCI_BASE_ADDRESS_MEM_MASK; +} + +static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num) +{ + return __pci__bar_address(pci_hdr->bar[bar_num]); +} + +static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num) +{ + return pci_hdr->bar_size[bar_num]; +} + #endif /* KVM__PCI_H */ diff --git a/pci.c b/pci.c index 3ecdd0f9c75c..81e9cec918fb 100644 --- a/pci.c +++ b/pci.c @@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, * size, it will write the address back. */ if (bar < 6) { - if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO) + if (pci__bar_is_io(pci_hdr, bar)) mask = (u32)PCI_BASE_ADDRESS_IO_MASK; else mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; @@ -211,7 +211,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, */ memcpy(&value, data, size); if (value == 0xffffffff) - value = ~(pci_hdr->bar_size[bar] - 1); + value = ~(pci__bar_size(pci_hdr, bar) - 1); /* Preserve the special bits. */ value = (value & mask) | (pci_hdr->bar[bar] & ~mask); memcpy(base + offset, &value, size); diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c index a15f7d895a46..7be44d950acb 100644 --- a/powerpc/spapr_pci.c +++ b/powerpc/spapr_pci.c @@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm, of_pci_b_ddddd(devid) | of_pci_b_fff(fn) | of_pci_b_rrrrrrrr(bars[i])); - reg[n+1].size = cpu_to_be64(hdr->bar_size[i]); + reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i)); reg[n+1].addr = 0; assigned_addresses[n].phys_hi = cpu_to_be32( From patchwork Thu May 14 15:38:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549145 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 09C0E59D for ; Thu, 14 May 2020 15:38:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA54F20758 for ; Thu, 14 May 2020 15:38:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727951AbgENPio (ORCPT ); Thu, 14 May 2020 11:38:44 -0400 Received: from foss.arm.com ([217.140.110.172]:39162 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727937AbgENPio (ORCPT ); Thu, 14 May 2020 11:38:44 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8CDAB1FB; Thu, 14 May 2020 08:38:43 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 75AB83F71E; Thu, 14 May 2020 08:38:42 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 03/12] virtio/pci: Get emulated region address from BARs Date: Thu, 14 May 2020 16:38:20 +0100 Message-Id: <1589470709-4104-4-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The struct virtio_pci fields port_addr, mmio_addr and msix_io_block represent the same addresses that are written in the corresponding BARs. Remove this duplication of information and always use the address from the BAR. This will make our life a lot easier when we add support for reassignable BARs, because we won't have to update the fields on each BAR change. No functional changes. Reviewed-by: Andre Przywara Signed-off-by: Alexandru Elisei --- include/kvm/virtio-pci.h | 3 -- virtio/pci.c | 82 ++++++++++++++++++++++++++++++------------------ 2 files changed, 52 insertions(+), 33 deletions(-) diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h index 278a25950d8b..959b4b81c871 100644 --- a/include/kvm/virtio-pci.h +++ b/include/kvm/virtio-pci.h @@ -24,8 +24,6 @@ struct virtio_pci { void *dev; struct kvm *kvm; - u16 port_addr; - u32 mmio_addr; u8 status; u8 isr; u32 features; @@ -43,7 +41,6 @@ struct virtio_pci { u32 config_gsi; u32 vq_vector[VIRTIO_PCI_MAX_VQ]; u32 gsis[VIRTIO_PCI_MAX_VQ]; - u32 msix_io_block; u64 msix_pba; struct msix_table msix_table[VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG]; diff --git a/virtio/pci.c b/virtio/pci.c index c6529493f06f..eded8685e1b3 100644 --- a/virtio/pci.c +++ b/virtio/pci.c @@ -13,6 +13,21 @@ #include #include +static u16 virtio_pci__port_addr(struct virtio_pci *vpci) +{ + return pci__bar_address(&vpci->pci_hdr, 0); +} + +static u32 virtio_pci__mmio_addr(struct virtio_pci *vpci) +{ + return pci__bar_address(&vpci->pci_hdr, 1); +} + +static u32 virtio_pci__msix_io_addr(struct virtio_pci *vpci) +{ + return pci__bar_address(&vpci->pci_hdr, 2); +} + static void virtio_pci__ioevent_callback(struct kvm *kvm, void *param) { struct virtio_pci_ioevent_param *ioeventfd = param; @@ -25,6 +40,8 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde { struct ioevent ioevent; struct virtio_pci *vpci = vdev->virtio; + u32 mmio_addr = virtio_pci__mmio_addr(vpci); + u16 port_addr = virtio_pci__port_addr(vpci); int r, flags = 0; int fd; @@ -48,7 +65,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde flags |= IOEVENTFD_FLAG_USER_POLL; /* ioport */ - ioevent.io_addr = vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY; + ioevent.io_addr = port_addr + VIRTIO_PCI_QUEUE_NOTIFY; ioevent.io_len = sizeof(u16); ioevent.fd = fd = eventfd(0, 0); r = ioeventfd__add_event(&ioevent, flags | IOEVENTFD_FLAG_PIO); @@ -56,7 +73,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde return r; /* mmio */ - ioevent.io_addr = vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY; + ioevent.io_addr = mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY; ioevent.io_len = sizeof(u16); ioevent.fd = eventfd(0, 0); r = ioeventfd__add_event(&ioevent, flags); @@ -68,7 +85,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde return 0; free_ioport_evt: - ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq); + ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq); return r; } @@ -76,9 +93,11 @@ static void virtio_pci_exit_vq(struct kvm *kvm, struct virtio_device *vdev, int vq) { struct virtio_pci *vpci = vdev->virtio; + u32 mmio_addr = virtio_pci__mmio_addr(vpci); + u16 port_addr = virtio_pci__port_addr(vpci); - ioeventfd__del_event(vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq); - ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq); + ioeventfd__del_event(mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq); + ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq); virtio_exit_vq(kvm, vdev, vpci->dev, vq); } @@ -162,7 +181,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p { struct virtio_device *vdev = ioport->priv; struct virtio_pci *vpci = vdev->virtio; - unsigned long offset = port - vpci->port_addr; + unsigned long offset = port - virtio_pci__port_addr(vpci); return virtio_pci__data_in(vcpu, vdev, offset, data, size); } @@ -318,7 +337,7 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 { struct virtio_device *vdev = ioport->priv; struct virtio_pci *vpci = vdev->virtio; - unsigned long offset = port - vpci->port_addr; + unsigned long offset = port - virtio_pci__port_addr(vpci); return virtio_pci__data_out(vcpu, vdev, offset, data, size); } @@ -335,17 +354,18 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu, struct virtio_device *vdev = ptr; struct virtio_pci *vpci = vdev->virtio; struct msix_table *table; + u32 msix_io_addr = virtio_pci__msix_io_addr(vpci); int vecnum; size_t offset; - if (addr > vpci->msix_io_block + PCI_IO_SIZE) { + if (addr > msix_io_addr + PCI_IO_SIZE) { if (is_write) return; table = (struct msix_table *)&vpci->msix_pba; - offset = addr - (vpci->msix_io_block + PCI_IO_SIZE); + offset = addr - (msix_io_addr + PCI_IO_SIZE); } else { table = vpci->msix_table; - offset = addr - vpci->msix_io_block; + offset = addr - msix_io_addr; } vecnum = offset / sizeof(struct msix_table); offset = offset % sizeof(struct msix_table); @@ -434,19 +454,20 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu, { struct virtio_device *vdev = ptr; struct virtio_pci *vpci = vdev->virtio; + u32 mmio_addr = virtio_pci__mmio_addr(vpci); if (!is_write) - virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr, - data, len); + virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len); else - virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr, - data, len); + virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len); } int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, int device_id, int subsys_id, int class) { struct virtio_pci *vpci = vdev->virtio; + u32 mmio_addr, msix_io_block; + u16 port_addr; int r; vpci->kvm = kvm; @@ -454,20 +475,21 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE)); - r = pci_get_io_port_block(PCI_IO_SIZE); - r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev); + port_addr = pci_get_io_port_block(PCI_IO_SIZE); + r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE, + vdev); if (r < 0) return r; - vpci->port_addr = (u16)r; + port_addr = (u16)r; - vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE); - r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false, + mmio_addr = pci_get_mmio_block(PCI_IO_SIZE); + r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false, virtio_pci__io_mmio_callback, vdev); if (r < 0) goto free_ioport; - vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2); - r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false, + msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2); + r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false, virtio_pci__msix_mmio_callback, vdev); if (r < 0) goto free_mmio; @@ -483,11 +505,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, .class[2] = (class >> 16) & 0xff, .subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET), .subsys_id = cpu_to_le16(subsys_id), - .bar[0] = cpu_to_le32(vpci->port_addr + .bar[0] = cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO), - .bar[1] = cpu_to_le32(vpci->mmio_addr + .bar[1] = cpu_to_le32(mmio_addr | PCI_BASE_ADDRESS_SPACE_MEMORY), - .bar[2] = cpu_to_le32(vpci->msix_io_block + .bar[2] = cpu_to_le32(msix_io_block | PCI_BASE_ADDRESS_SPACE_MEMORY), .status = cpu_to_le16(PCI_STATUS_CAP_LIST), .capabilities = (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr, @@ -533,11 +555,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, return 0; free_msix_mmio: - kvm__deregister_mmio(kvm, vpci->msix_io_block); + kvm__deregister_mmio(kvm, msix_io_block); free_mmio: - kvm__deregister_mmio(kvm, vpci->mmio_addr); + kvm__deregister_mmio(kvm, mmio_addr); free_ioport: - ioport__unregister(kvm, vpci->port_addr); + ioport__unregister(kvm, port_addr); return r; } @@ -557,9 +579,9 @@ int virtio_pci__exit(struct kvm *kvm, struct virtio_device *vdev) struct virtio_pci *vpci = vdev->virtio; virtio_pci__reset(kvm, vdev); - kvm__deregister_mmio(kvm, vpci->mmio_addr); - kvm__deregister_mmio(kvm, vpci->msix_io_block); - ioport__unregister(kvm, vpci->port_addr); + kvm__deregister_mmio(kvm, virtio_pci__mmio_addr(vpci)); + kvm__deregister_mmio(kvm, virtio_pci__msix_io_addr(vpci)); + ioport__unregister(kvm, virtio_pci__port_addr(vpci)); return 0; } From patchwork Thu May 14 15:38:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549147 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83913913 for ; Thu, 14 May 2020 15:38:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6BF5A20758 for ; Thu, 14 May 2020 15:38:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727970AbgENPip (ORCPT ); Thu, 14 May 2020 11:38:45 -0400 Received: from foss.arm.com ([217.140.110.172]:39174 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727956AbgENPip (ORCPT ); Thu, 14 May 2020 11:38:45 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DEAA611B3; Thu, 14 May 2020 08:38:44 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C58D73F71E; Thu, 14 May 2020 08:38:43 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 04/12] vfio: Reserve ioports when configuring the BAR Date: Thu, 14 May 2020 16:38:21 +0100 Message-Id: <1589470709-4104-5-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Let's be consistent and reserve ioports when we are configuring the BAR, not when we map it, just like we do with mmio regions. Reviewed-by: Andre Przywara Signed-off-by: Alexandru Elisei --- vfio/core.c | 9 +++------ vfio/pci.c | 4 +++- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/vfio/core.c b/vfio/core.c index b80ebf3bb913..bad3c7c8cd26 100644 --- a/vfio/core.c +++ b/vfio/core.c @@ -202,14 +202,11 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev, struct vfio_region *region) { if (region->is_ioport) { - int port = pci_get_io_port_block(region->info.size); - - port = ioport__register(kvm, port, &vfio_ioport_ops, - region->info.size, region); + int port = ioport__register(kvm, region->port_base, + &vfio_ioport_ops, region->info.size, + region); if (port < 0) return port; - - region->port_base = port; return 0; } diff --git a/vfio/pci.c b/vfio/pci.c index 89d29b89db43..0b548e4bf9e2 100644 --- a/vfio/pci.c +++ b/vfio/pci.c @@ -894,7 +894,9 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev, } } - if (!region->is_ioport) { + if (region->is_ioport) { + region->port_base = pci_get_io_port_block(region->info.size); + } else { /* Grab some MMIO space in the guest */ map_size = ALIGN(region->info.size, PAGE_SIZE); region->guest_phys_addr = pci_get_mmio_block(map_size); From patchwork Thu May 14 15:38:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 724A159D for ; Thu, 14 May 2020 15:39:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A21120756 for ; Thu, 14 May 2020 15:39:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728004AbgENPir (ORCPT ); Thu, 14 May 2020 11:38:47 -0400 Received: from foss.arm.com ([217.140.110.172]:39182 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727999AbgENPiq (ORCPT ); Thu, 14 May 2020 11:38:46 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3B4DB1FB; Thu, 14 May 2020 08:38:46 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 23AB73F71E; Thu, 14 May 2020 08:38:45 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 05/12] pci: Limit configuration transaction size to 32 bits Date: Thu, 14 May 2020 16:38:22 +0100 Message-Id: <1589470709-4104-6-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From PCI Local Bus Specification Revision 3.0. section 3.8 "64-Bit Bus Extension": "The bandwidth requirements for I/O and configuration transactions cannot justify the added complexity, and, therefore, only memory transactions support 64-bit data transfers". Further down, the spec also describes the possible responses of a target which has been requested to do a 64-bit transaction. Limit the transaction to the lower 32 bits, to match the second accepted behaviour. Reviewed-by: Andre Przywara Signed-off-by: Alexandru Elisei --- pci.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/pci.c b/pci.c index 81e9cec918fb..eb0bb366a291 100644 --- a/pci.c +++ b/pci.c @@ -119,6 +119,9 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 { union pci_config_address pci_config_address; + if (size > 4) + size = 4; + pci_config_address.w = ioport__read32(&pci_config_address_bits); /* * If someone accesses PCI configuration space offsets that are not @@ -135,6 +138,9 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 { union pci_config_address pci_config_address; + if (size > 4) + size = 4; + pci_config_address.w = ioport__read32(&pci_config_address_bits); /* * If someone accesses PCI configuration space offsets that are not @@ -248,6 +254,9 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data, cfg_addr.w = (u32)addr; cfg_addr.enable_bit = 1; + if (len > 4) + len = 4; + if (is_write) pci__config_wr(kvm, cfg_addr, data, len); else From patchwork Thu May 14 15:38:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549151 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4100059D for ; Thu, 14 May 2020 15:38:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 27E7320748 for ; Thu, 14 May 2020 15:38:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728043AbgENPiu (ORCPT ); Thu, 14 May 2020 11:38:50 -0400 Received: from foss.arm.com ([217.140.110.172]:39188 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728017AbgENPis (ORCPT ); Thu, 14 May 2020 11:38:48 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8AB211045; Thu, 14 May 2020 08:38:47 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7328B3F71E; Thu, 14 May 2020 08:38:46 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice Date: Thu, 14 May 2020 16:38:23 +0100 Message-Id: <1589470709-4104-7-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org After writing to the device fd as part of the PCI configuration space emulation, we read back from the device to make sure that the write finished. The value is read back into the PCI configuration space and afterwards, the same value is copied by the PCI emulation code. Let's read from the device fd into a temporary variable, to prevent this double write. The double write is harmless in itself. But when we implement reassignable BARs, we need to keep track of the old BAR value, and the VFIO code is overwritting it. Signed-off-by: Alexandru Elisei Reviewed-by: Andre Przywara --- vfio/pci.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/vfio/pci.c b/vfio/pci.c index 0b548e4bf9e2..2de893407574 100644 --- a/vfio/pci.c +++ b/vfio/pci.c @@ -3,6 +3,8 @@ #include "kvm/kvm-cpu.h" #include "kvm/vfio.h" +#include + #include #include #include @@ -478,7 +480,10 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd struct vfio_region_info *info; struct vfio_pci_device *pdev; struct vfio_device *vdev; - void *base = pci_hdr; + u32 tmp; + + /* Make sure a larger size will not overrun tmp on the stack. */ + assert(sz <= 4); if (offset == PCI_ROM_ADDRESS) return; @@ -498,7 +503,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI) vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz); - if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz) + if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz) vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x", sz, offset); } From patchwork Thu May 14 15:38:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549149 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E679618 for ; Thu, 14 May 2020 15:38:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2FCFC20767 for ; Thu, 14 May 2020 15:38:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728032AbgENPiu (ORCPT ); Thu, 14 May 2020 11:38:50 -0400 Received: from foss.arm.com ([217.140.110.172]:39194 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728039AbgENPit (ORCPT ); Thu, 14 May 2020 11:38:49 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DF43F1FB; Thu, 14 May 2020 08:38:48 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C51AE3F71E; Thu, 14 May 2020 08:38:47 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers Date: Thu, 14 May 2020 16:38:24 +0100 Message-Id: <1589470709-4104-8-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org A vesa device is used by the SDL, GTK or VNC framebuffers. Don't allow the user to specify more than one of these options because kvmtool will create identical vesa devices and bad things will happen: $ ./lkvm run -c2 -m2048 -k bzImage --sdl --gtk # lkvm run -k bzImage -m 2048 -c 2 --name guest-10159 Error: device region [d0000000-d012bfff] would overlap device region [d0000000-d012bfff] *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 *** *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 *** *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 *** ======= Backtrace: ========= ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5] ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a] (+0x777e5)[0x7fae0ed447e5] /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fae0ed5153c] *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 *** /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab] /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c] /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c] ======= Backtrace: ========= Aborted (core dumped) The vesa device is explicitly created during the initialization phase of the above framebuffers. Also remove the superfluous check for their existence. Signed-off-by: Alexandru Elisei Reviewed-by: Andre Przywara --- builtin-run.c | 5 +++++ hw/vesa.c | 2 -- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/builtin-run.c b/builtin-run.c index 03b119e29995..c23e7a2e5ecd 100644 --- a/builtin-run.c +++ b/builtin-run.c @@ -543,6 +543,11 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv) kvm->cfg.console = DEFAULT_CONSOLE; video = kvm->cfg.vnc || kvm->cfg.sdl || kvm->cfg.gtk; + if (video) { + if ((kvm->cfg.vnc && (kvm->cfg.sdl || kvm->cfg.gtk)) || + (kvm->cfg.sdl && kvm->cfg.gtk)) + die("Only one of --vnc, --sdl or --gtk can be specified"); + } if (!strncmp(kvm->cfg.console, "virtio", 6)) kvm->cfg.active_console = CONSOLE_VIRTIO; diff --git a/hw/vesa.c b/hw/vesa.c index dd59a112330b..a4ec7e16e1e6 100644 --- a/hw/vesa.c +++ b/hw/vesa.c @@ -61,8 +61,6 @@ struct framebuffer *vesa__init(struct kvm *kvm) BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE)); BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT); - if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk) - return NULL; vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE); r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL); if (r < 0) From patchwork Thu May 14 15:38:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549153 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AFA4618 for ; Thu, 14 May 2020 15:38:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E8AB820748 for ; Thu, 14 May 2020 15:38:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728041AbgENPiw (ORCPT ); Thu, 14 May 2020 11:38:52 -0400 Received: from foss.arm.com ([217.140.110.172]:39210 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728040AbgENPiv (ORCPT ); Thu, 14 May 2020 11:38:51 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 59D781045; Thu, 14 May 2020 08:38:50 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 237ED3F71E; Thu, 14 May 2020 08:38:49 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation Date: Thu, 14 May 2020 16:38:25 +0100 Message-Id: <1589470709-4104-9-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Implement callbacks for activating and deactivating emulation for a BAR region. This is in preparation for allowing a guest operating system to enable and disable access to I/O or memory space, or to reassign the BARs. The emulated vesa device framebuffer isn't designed to allow stopping and restarting at arbitrary points in the guest execution. Furthermore, on x86, the kernel will not change the BAR addresses, which on bare metal are programmed by the firmware, so take the easy way out and refuse to activate/deactivate emulation for the BAR regions. We also take this opportunity to make the vesa emulation code more consistent by moving all static variable definitions in one place, at the top of the file. Signed-off-by: Alexandru Elisei Reviewed-by: Andre Przywara --- hw/vesa.c | 70 ++++++++++++++++++++++------------- include/kvm/pci.h | 18 ++++++++- pci.c | 39 ++++++++++++++++++++ vfio/pci.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++-------- virtio/pci.c | 92 ++++++++++++++++++++++++++++++++++------------ 5 files changed, 258 insertions(+), 68 deletions(-) diff --git a/hw/vesa.c b/hw/vesa.c index a4ec7e16e1e6..8659a0028d31 100644 --- a/hw/vesa.c +++ b/hw/vesa.c @@ -18,6 +18,31 @@ #include #include +static struct pci_device_header vesa_pci_device = { + .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), + .device_id = cpu_to_le16(PCI_DEVICE_ID_VESA), + .header_type = PCI_HEADER_TYPE_NORMAL, + .revision_id = 0, + .class[2] = 0x03, + .subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET), + .subsys_id = cpu_to_le16(PCI_SUBSYSTEM_ID_VESA), + .bar[1] = cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY), + .bar_size[1] = VESA_MEM_SIZE, +}; + +static struct device_header vesa_device = { + .bus_type = DEVICE_BUS_PCI, + .data = &vesa_pci_device, +}; + +static struct framebuffer vesafb = { + .width = VESA_WIDTH, + .height = VESA_HEIGHT, + .depth = VESA_BPP, + .mem_addr = VESA_MEM_ADDR, + .mem_size = VESA_MEM_SIZE, +}; + static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size) { return true; @@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = { .io_out = vesa_pci_io_out, }; -static struct pci_device_header vesa_pci_device = { - .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), - .device_id = cpu_to_le16(PCI_DEVICE_ID_VESA), - .header_type = PCI_HEADER_TYPE_NORMAL, - .revision_id = 0, - .class[2] = 0x03, - .subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET), - .subsys_id = cpu_to_le16(PCI_SUBSYSTEM_ID_VESA), - .bar[1] = cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY), - .bar_size[1] = VESA_MEM_SIZE, -}; - -static struct device_header vesa_device = { - .bus_type = DEVICE_BUS_PCI, - .data = &vesa_pci_device, -}; +static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr, + int bar_num, void *data) +{ + /* We don't support remapping of the framebuffer. */ + return 0; +} -static struct framebuffer vesafb; +static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr, + int bar_num, void *data) +{ + /* We don't support remapping of the framebuffer. */ + return -EINVAL; +} struct framebuffer *vesa__init(struct kvm *kvm) { @@ -68,6 +88,11 @@ struct framebuffer *vesa__init(struct kvm *kvm) vesa_pci_device.bar[0] = cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO); vesa_pci_device.bar_size[0] = PCI_IO_SIZE; + r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate, + vesa__bar_deactivate, NULL); + if (r < 0) + goto unregister_ioport; + r = device__register(&vesa_device); if (r < 0) goto unregister_ioport; @@ -82,15 +107,8 @@ struct framebuffer *vesa__init(struct kvm *kvm) if (r < 0) goto unmap_dev; - vesafb = (struct framebuffer) { - .width = VESA_WIDTH, - .height = VESA_HEIGHT, - .depth = VESA_BPP, - .mem = mem, - .mem_addr = VESA_MEM_ADDR, - .mem_size = VESA_MEM_SIZE, - .kvm = kvm, - }; + vesafb.mem = mem; + vesafb.kvm = kvm; return fb__register(&vesafb); unmap_dev: diff --git a/include/kvm/pci.h b/include/kvm/pci.h index 0f0815b36157..73e06d76d244 100644 --- a/include/kvm/pci.h +++ b/include/kvm/pci.h @@ -89,12 +89,19 @@ struct pci_cap_hdr { u8 next; }; +struct pci_device_header; + +typedef int (*bar_activate_fn_t)(struct kvm *kvm, + struct pci_device_header *pci_hdr, + int bar_num, void *data); +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm, + struct pci_device_header *pci_hdr, + int bar_num, void *data); + #define PCI_BAR_OFFSET(b) (offsetof(struct pci_device_header, bar[b])) #define PCI_DEV_CFG_SIZE 256 #define PCI_DEV_CFG_MASK (PCI_DEV_CFG_SIZE - 1) -struct pci_device_header; - struct pci_config_operations { void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr, u8 offset, void *data, int sz); @@ -136,6 +143,9 @@ struct pci_device_header { /* Private to lkvm */ u32 bar_size[6]; + bar_activate_fn_t bar_activate_fn; + bar_deactivate_fn_t bar_deactivate_fn; + void *data; struct pci_config_operations cfg_ops; /* * PCI INTx# are level-triggered, but virtual device often feature @@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type); +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr, + bar_activate_fn_t bar_activate_fn, + bar_deactivate_fn_t bar_deactivate_fn, void *data); + static inline bool __pci__memory_space_enabled(u16 command) { return command & PCI_COMMAND_MEMORY; diff --git a/pci.c b/pci.c index eb0bb366a291..b8e71b5f8d6c 100644 --- a/pci.c +++ b/pci.c @@ -66,6 +66,11 @@ int pci__assign_irq(struct pci_device_header *pci_hdr) return pci_hdr->irq_line; } +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num) +{ + return pci__bar_size(pci_hdr, bar_num); +} + static void *pci_config_address_ptr(u16 port) { unsigned long offset; @@ -273,6 +278,40 @@ struct pci_device_header *pci__find_dev(u8 dev_num) return hdr->data; } +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr, + bar_activate_fn_t bar_activate_fn, + bar_deactivate_fn_t bar_deactivate_fn, void *data) +{ + int i, r; + + assert(bar_activate_fn && bar_deactivate_fn); + + pci_hdr->bar_activate_fn = bar_activate_fn; + pci_hdr->bar_deactivate_fn = bar_deactivate_fn; + pci_hdr->data = data; + + for (i = 0; i < 6; i++) { + if (!pci_bar_is_implemented(pci_hdr, i)) + continue; + + if (pci__bar_is_io(pci_hdr, i) && + pci__io_space_enabled(pci_hdr)) { + r = bar_activate_fn(kvm, pci_hdr, i, data); + if (r < 0) + return r; + } + + if (pci__bar_is_memory(pci_hdr, i) && + pci__memory_space_enabled(pci_hdr)) { + r = bar_activate_fn(kvm, pci_hdr, i, data); + if (r < 0) + return r; + } + } + + return 0; +} + int pci__init(struct kvm *kvm) { int r; diff --git a/vfio/pci.c b/vfio/pci.c index 2de893407574..34f19792765e 100644 --- a/vfio/pci.c +++ b/vfio/pci.c @@ -10,6 +10,8 @@ #include #include +#include + /* Wrapper around UAPI vfio_irq_set */ union vfio_irq_eventfd { struct vfio_irq_set irq; @@ -456,6 +458,88 @@ out_unlock: mutex_unlock(&pdev->msi.mutex); } +static int vfio_pci_bar_activate(struct kvm *kvm, + struct pci_device_header *pci_hdr, + int bar_num, void *data) +{ + struct vfio_device *vdev = data; + struct vfio_pci_device *pdev = &vdev->pci; + struct vfio_pci_msix_pba *pba = &pdev->msix_pba; + struct vfio_pci_msix_table *table = &pdev->msix_table; + struct vfio_region *region; + bool has_msix; + int ret; + + assert((u32)bar_num < vdev->info.num_regions); + + region = &vdev->regions[bar_num]; + has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX; + + if (has_msix && (u32)bar_num == table->bar) { + ret = kvm__register_mmio(kvm, table->guest_phys_addr, + table->size, false, + vfio_pci_msix_table_access, pdev); + /* + * The MSIX table and the PBA structure can share the same BAR, + * but for convenience we register different regions for mmio + * emulation. We want to we update both if they share the same + * BAR. + */ + if (ret < 0 || table->bar != pba->bar) + goto out; + } + + if (has_msix && (u32)bar_num == pba->bar) { + ret = kvm__register_mmio(kvm, pba->guest_phys_addr, + pba->size, false, + vfio_pci_msix_pba_access, pdev); + goto out; + } + + ret = vfio_map_region(kvm, vdev, region); +out: + return ret; +} + +static int vfio_pci_bar_deactivate(struct kvm *kvm, + struct pci_device_header *pci_hdr, + int bar_num, void *data) +{ + struct vfio_device *vdev = data; + struct vfio_pci_device *pdev = &vdev->pci; + struct vfio_pci_msix_pba *pba = &pdev->msix_pba; + struct vfio_pci_msix_table *table = &pdev->msix_table; + struct vfio_region *region; + bool has_msix, success; + int ret; + + assert((u32)bar_num < vdev->info.num_regions); + + region = &vdev->regions[bar_num]; + has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX; + + if (has_msix && (u32)bar_num == table->bar) { + success = kvm__deregister_mmio(kvm, table->guest_phys_addr); + /* kvm__deregister_mmio fails when the region is not found. */ + ret = (success ? 0 : -ENOENT); + /* See vfio_pci_bar_activate(). */ + if (ret < 0 || table->bar!= pba->bar) + goto out; + } + + if (has_msix && (u32)bar_num == pba->bar) { + success = kvm__deregister_mmio(kvm, pba->guest_phys_addr); + ret = (success ? 0 : -ENOENT); + goto out; + } + + vfio_unmap_region(kvm, region); + ret = 0; + +out: + return ret; +} + static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr, u8 offset, void *data, int sz) { @@ -818,12 +902,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev) ret = -ENOMEM; goto out_free; } - pba->guest_phys_addr = table->guest_phys_addr + table->size; - - ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size, - false, vfio_pci_msix_table_access, pdev); - if (ret < 0) - goto out_free; /* * We could map the physical PBA directly into the guest, but it's @@ -833,10 +911,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev) * between MSI-X table and PBA. For the sake of isolation, create a * virtual PBA. */ - ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false, - vfio_pci_msix_pba_access, pdev); - if (ret < 0) - goto out_free; + pba->guest_phys_addr = table->guest_phys_addr + table->size; pdev->msix.entries = entries; pdev->msix.nr_entries = nr_entries; @@ -907,11 +982,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev, region->guest_phys_addr = pci_get_mmio_block(map_size); } - /* Map the BARs into the guest or setup a trap region. */ - ret = vfio_map_region(kvm, vdev, region); - if (ret) - return ret; - return 0; } @@ -958,7 +1028,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm, } /* We've configured the BARs, fake up a Configuration Space */ - return vfio_pci_fixup_cfg_space(vdev); + ret = vfio_pci_fixup_cfg_space(vdev); + if (ret) + return ret; + + return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate, + vfio_pci_bar_deactivate, vdev); } /* diff --git a/virtio/pci.c b/virtio/pci.c index eded8685e1b3..6eea6c686695 100644 --- a/virtio/pci.c +++ b/virtio/pci.c @@ -11,6 +11,7 @@ #include #include #include +#include #include static u16 virtio_pci__port_addr(struct virtio_pci *vpci) @@ -462,6 +463,66 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu, virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len); } +static int virtio_pci__bar_activate(struct kvm *kvm, + struct pci_device_header *pci_hdr, + int bar_num, void *data) +{ + struct virtio_device *vdev = data; + u32 bar_addr, bar_size; + int r = -EINVAL; + + assert(bar_num <= 2); + + bar_addr = pci__bar_address(pci_hdr, bar_num); + bar_size = pci__bar_size(pci_hdr, bar_num); + + switch (bar_num) { + case 0: + r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops, + bar_size, vdev); + if (r > 0) + r = 0; + break; + case 1: + r = kvm__register_mmio(kvm, bar_addr, bar_size, false, + virtio_pci__io_mmio_callback, vdev); + break; + case 2: + r = kvm__register_mmio(kvm, bar_addr, bar_size, false, + virtio_pci__msix_mmio_callback, vdev); + break; + } + + return r; +} + +static int virtio_pci__bar_deactivate(struct kvm *kvm, + struct pci_device_header *pci_hdr, + int bar_num, void *data) +{ + u32 bar_addr; + bool success; + int r = -EINVAL; + + assert(bar_num <= 2); + + bar_addr = pci__bar_address(pci_hdr, bar_num); + + switch (bar_num) { + case 0: + r = ioport__unregister(kvm, bar_addr); + break; + case 1: + case 2: + success = kvm__deregister_mmio(kvm, bar_addr); + /* kvm__deregister_mmio fails when the region is not found. */ + r = (success ? 0 : -ENOENT); + break; + } + + return r; +} + int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, int device_id, int subsys_id, int class) { @@ -476,23 +537,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE)); port_addr = pci_get_io_port_block(PCI_IO_SIZE); - r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE, - vdev); - if (r < 0) - return r; - port_addr = (u16)r; - mmio_addr = pci_get_mmio_block(PCI_IO_SIZE); - r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false, - virtio_pci__io_mmio_callback, vdev); - if (r < 0) - goto free_ioport; - msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2); - r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false, - virtio_pci__msix_mmio_callback, vdev); - if (r < 0) - goto free_mmio; vpci->pci_hdr = (struct pci_device_header) { .vendor_id = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET), @@ -518,6 +564,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, .bar_size[2] = cpu_to_le32(PCI_IO_SIZE*2), }; + r = pci__register_bar_regions(kvm, &vpci->pci_hdr, + virtio_pci__bar_activate, + virtio_pci__bar_deactivate, vdev); + if (r < 0) + return r; + vpci->dev_hdr = (struct device_header) { .bus_type = DEVICE_BUS_PCI, .data = &vpci->pci_hdr, @@ -550,17 +602,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev, r = device__register(&vpci->dev_hdr); if (r < 0) - goto free_msix_mmio; + return r; return 0; - -free_msix_mmio: - kvm__deregister_mmio(kvm, msix_io_block); -free_mmio: - kvm__deregister_mmio(kvm, mmio_addr); -free_ioport: - ioport__unregister(kvm, port_addr); - return r; } int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev) From patchwork Thu May 14 15:38:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549155 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAD3F59D for ; Thu, 14 May 2020 15:38:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DCE2620758 for ; Thu, 14 May 2020 15:38:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728052AbgENPix (ORCPT ); Thu, 14 May 2020 11:38:53 -0400 Received: from foss.arm.com ([217.140.110.172]:39220 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728026AbgENPiw (ORCPT ); Thu, 14 May 2020 11:38:52 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AACC51FB; Thu, 14 May 2020 08:38:51 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 92FB73F71E; Thu, 14 May 2020 08:38:50 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 09/12] pci: Toggle BAR I/O and memory space emulation Date: Thu, 14 May 2020 16:38:26 +0100 Message-Id: <1589470709-4104-10-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org During configuration of the BAR addresses, a Linux guest disables and enables access to I/O and memory space. When access is disabled, we don't stop emulating the memory regions described by the BARs. Now that we have callbacks for activating and deactivating emulation for a BAR region, let's use that to stop emulation when access is disabled, and re-activate it when access is re-enabled. Reviewed-by: Andre Przywara Signed-off-by: Alexandru Elisei --- pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/pci.c b/pci.c index b8e71b5f8d6c..96239160110c 100644 --- a/pci.c +++ b/pci.c @@ -163,6 +163,42 @@ static struct ioport_operations pci_config_data_ops = { .io_out = pci_config_data_out, }; +static void pci_config_command_wr(struct kvm *kvm, + struct pci_device_header *pci_hdr, + u16 new_command) +{ + int i; + bool toggle_io, toggle_mem; + + toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO; + toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY; + + for (i = 0; i < 6; i++) { + if (!pci_bar_is_implemented(pci_hdr, i)) + continue; + + if (toggle_io && pci__bar_is_io(pci_hdr, i)) { + if (__pci__io_space_enabled(new_command)) + pci_hdr->bar_activate_fn(kvm, pci_hdr, i, + pci_hdr->data); + else + pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i, + pci_hdr->data); + } + + if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) { + if (__pci__memory_space_enabled(new_command)) + pci_hdr->bar_activate_fn(kvm, pci_hdr, i, + pci_hdr->data); + else + pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i, + pci_hdr->data); + } + } + + pci_hdr->command = new_command; +} + void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size) { void *base; @@ -188,6 +224,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, if (*(u32 *)(base + offset) == 0) return; + if (offset == PCI_COMMAND) { + memcpy(&value, data, size); + pci_config_command_wr(kvm, pci_hdr, (u16)value); + return; + } + bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32); /* From patchwork Thu May 14 15:38:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549157 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D538559D for ; Thu, 14 May 2020 15:38:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0CA120760 for ; Thu, 14 May 2020 15:38:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727939AbgENPiz (ORCPT ); Thu, 14 May 2020 11:38:55 -0400 Received: from foss.arm.com ([217.140.110.172]:39228 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728054AbgENPiy (ORCPT ); Thu, 14 May 2020 11:38:54 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C93391FB; Thu, 14 May 2020 08:38:53 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E30043F71E; Thu, 14 May 2020 08:38:51 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs Date: Thu, 14 May 2020 16:38:27 +0100 Message-Id: <1589470709-4104-11-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org BARs are used by the guest to configure the access to the PCI device by writing the address to which the device will respond. The basic idea for adding support for reassignable BARs is straightforward: deactivate emulation for the memory region described by the old BAR value, and activate emulation for the new region. BAR reassignment can be done while device access is enabled and memory regions for different devices can overlap as long as no access is made to the overlapping memory regions. This means that it is legal for the BARs of two distinct devices to point to an overlapping memory region, and indeed, this is how Linux does resource assignment at boot. To account for this situation, the simple algorithm described above is enhanced to scan for all devices and: - Deactivate emulation for any BARs that might overlap with the new BAR value. - Enable emulation for any BARs that were overlapping with the old value after the BAR has been updated. Activating/deactivating emulation of a memory region has side effects. In order to prevent the execution of the same callback twice we now keep track of the state of the region emulation. For example, this can happen if we program a BAR with an address that overlaps a second BAR, thus deactivating emulation for the second BAR, and then we disable all region accesses to the second BAR by writing to the command register. Signed-off-by: Alexandru Elisei Reviewed-by: Andre Przywara --- include/kvm/pci.h | 14 ++- pci.c | 250 +++++++++++++++++++++++++++++++++++++++++++----------- vfio/pci.c | 12 +++ 3 files changed, 227 insertions(+), 49 deletions(-) diff --git a/include/kvm/pci.h b/include/kvm/pci.h index 73e06d76d244..bf81323d83b7 100644 --- a/include/kvm/pci.h +++ b/include/kvm/pci.h @@ -11,6 +11,17 @@ #include "kvm/msi.h" #include "kvm/fdt.h" +#define pci_dev_err(pci_hdr, fmt, ...) \ + pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__) +#define pci_dev_warn(pci_hdr, fmt, ...) \ + pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__) +#define pci_dev_info(pci_hdr, fmt, ...) \ + pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__) +#define pci_dev_dbg(pci_hdr, fmt, ...) \ + pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__) +#define pci_dev_die(pci_hdr, fmt, ...) \ + die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__) + /* * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1. * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for @@ -142,7 +153,8 @@ struct pci_device_header { }; /* Private to lkvm */ - u32 bar_size[6]; + u32 bar_size[6]; + bool bar_active[6]; bar_activate_fn_t bar_activate_fn; bar_deactivate_fn_t bar_deactivate_fn; void *data; diff --git a/pci.c b/pci.c index 96239160110c..2e2c0270a166 100644 --- a/pci.c +++ b/pci.c @@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu return pci__bar_size(pci_hdr, bar_num); } +static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num) +{ + return pci_hdr->bar_active[bar_num]; +} + static void *pci_config_address_ptr(u16 port) { unsigned long offset; @@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = { .io_out = pci_config_data_out, }; +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr, + int bar_num) +{ + int r = 0; + + if (pci_bar_is_active(pci_hdr, bar_num)) + goto out; + + r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data); + if (r < 0) { + pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d", + bar_num); + goto out; + } + pci_hdr->bar_active[bar_num] = true; + +out: + return r; +} + +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr, + int bar_num) +{ + int r = 0; + + if (!pci_bar_is_active(pci_hdr, bar_num)) + goto out; + + r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data); + if (r < 0) { + pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d", + bar_num); + goto out; + } + pci_hdr->bar_active[bar_num] = false; + +out: + return r; +} + static void pci_config_command_wr(struct kvm *kvm, struct pci_device_header *pci_hdr, u16 new_command) @@ -179,26 +224,167 @@ static void pci_config_command_wr(struct kvm *kvm, if (toggle_io && pci__bar_is_io(pci_hdr, i)) { if (__pci__io_space_enabled(new_command)) - pci_hdr->bar_activate_fn(kvm, pci_hdr, i, - pci_hdr->data); + pci_activate_bar(kvm, pci_hdr, i); else - pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i, - pci_hdr->data); + pci_deactivate_bar(kvm, pci_hdr, i); } if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) { if (__pci__memory_space_enabled(new_command)) - pci_hdr->bar_activate_fn(kvm, pci_hdr, i, - pci_hdr->data); + pci_activate_bar(kvm, pci_hdr, i); else - pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i, - pci_hdr->data); + pci_deactivate_bar(kvm, pci_hdr, i); } } pci_hdr->command = new_command; } +static int pci_toggle_bar_regions(bool activate, struct kvm *kvm, u32 start, u32 size) +{ + struct device_header *dev_hdr; + struct pci_device_header *tmp_hdr; + u32 tmp_start, tmp_size; + int i, r; + + dev_hdr = device__first_dev(DEVICE_BUS_PCI); + while (dev_hdr) { + tmp_hdr = dev_hdr->data; + for (i = 0; i < 6; i++) { + if (!pci_bar_is_implemented(tmp_hdr, i)) + continue; + + tmp_start = pci__bar_address(tmp_hdr, i); + tmp_size = pci__bar_size(tmp_hdr, i); + if (tmp_start + tmp_size <= start || + tmp_start >= start + size) + continue; + + if (activate) + r = pci_activate_bar(kvm, tmp_hdr, i); + else + r = pci_deactivate_bar(kvm, tmp_hdr, i); + if (r < 0) + return r; + } + dev_hdr = device__next_dev(dev_hdr); + } + + return 0; +} + +static inline int pci_activate_bar_regions(struct kvm *kvm, u32 start, u32 size) +{ + return pci_toggle_bar_regions(true, kvm, start, size); +} + +static inline int pci_deactivate_bar_regions(struct kvm *kvm, u32 start, u32 size) +{ + return pci_toggle_bar_regions(false, kvm, start, size); +} + +static void pci_config_bar_wr(struct kvm *kvm, + struct pci_device_header *pci_hdr, int bar_num, + u32 value) +{ + u32 old_addr, new_addr, bar_size; + u32 mask; + int r; + + if (pci__bar_is_io(pci_hdr, bar_num)) + mask = (u32)PCI_BASE_ADDRESS_IO_MASK; + else + mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; + + /* + * If the kernel masks the BAR, it will expect to find the size of the + * BAR there next time it reads from it. After the kernel reads the + * size, it will write the address back. + * + * According to the PCI local bus specification REV 3.0: The number of + * upper bits that a device actually implements depends on how much of + * the address space the device will respond to. A device that wants a 1 + * MB memory address space (using a 32-bit base address register) would + * build the top 12 bits of the address register, hardwiring the other + * bits to 0. + * + * Furthermore, software can determine how much address space the device + * requires by writing a value of all 1's to the register and then + * reading the value back. The device will return 0's in all don't-care + * address bits, effectively specifying the address space required. + * + * Software computes the size of the address space with the formula + * S = ~B + 1, where S is the memory size and B is the value read from + * the BAR. This means that the BAR value that kvmtool should return is + * B = ~(S - 1). + */ + if (value == 0xffffffff) { + value = ~(pci__bar_size(pci_hdr, bar_num) - 1); + /* Preserve the special bits. */ + value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask); + pci_hdr->bar[bar_num] = value; + return; + } + + value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask); + + /* Don't toggle emulation when region type access is disbled. */ + if (pci__bar_is_io(pci_hdr, bar_num) && + !pci__io_space_enabled(pci_hdr)) { + pci_hdr->bar[bar_num] = value; + return; + } + + if (pci__bar_is_memory(pci_hdr, bar_num) && + !pci__memory_space_enabled(pci_hdr)) { + pci_hdr->bar[bar_num] = value; + return; + } + + /* + * BAR reassignment can be done while device access is enabled and + * memory regions for different devices can overlap as long as no access + * is made to the overlapping memory regions. To implement BAR + * reasignment, we deactivate emulation for the region described by the + * BAR value that the guest is changing, we disable emulation for the + * regions that overlap with the new one (by scanning through all PCI + * devices), we enable emulation for the new BAR value and finally we + * enable emulation for all device regions that were overlapping with + * the old value. + */ + old_addr = pci__bar_address(pci_hdr, bar_num); + new_addr = __pci__bar_address(value); + bar_size = pci__bar_size(pci_hdr, bar_num); + + r = pci_deactivate_bar(kvm, pci_hdr, bar_num); + if (r < 0) + return; + + r = pci_deactivate_bar_regions(kvm, new_addr, bar_size); + if (r < 0) { + /* + * We cannot update the BAR because of an overlapping region + * that failed to deactivate emulation, so keep the old BAR + * value and re-activate emulation for it. + */ + pci_activate_bar(kvm, pci_hdr, bar_num); + return; + } + + pci_hdr->bar[bar_num] = value; + r = pci_activate_bar(kvm, pci_hdr, bar_num); + if (r < 0) { + /* + * New region cannot be emulated, re-enable the regions that + * were overlapping. + */ + pci_activate_bar_regions(kvm, new_addr, bar_size); + return; + } + + pci_activate_bar_regions(kvm, old_addr, bar_size); +} + void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size) { void *base; @@ -206,7 +392,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, struct pci_device_header *pci_hdr; u8 dev_num = addr.device_number; u32 value = 0; - u32 mask; if (!pci_device_exists(addr.bus_number, dev_num, 0)) return; @@ -231,46 +416,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, } bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32); - - /* - * If the kernel masks the BAR, it will expect to find the size of the - * BAR there next time it reads from it. After the kernel reads the - * size, it will write the address back. - */ if (bar < 6) { - if (pci__bar_is_io(pci_hdr, bar)) - mask = (u32)PCI_BASE_ADDRESS_IO_MASK; - else - mask = (u32)PCI_BASE_ADDRESS_MEM_MASK; - /* - * According to the PCI local bus specification REV 3.0: - * The number of upper bits that a device actually implements - * depends on how much of the address space the device will - * respond to. A device that wants a 1 MB memory address space - * (using a 32-bit base address register) would build the top - * 12 bits of the address register, hardwiring the other bits - * to 0. - * - * Furthermore, software can determine how much address space - * the device requires by writing a value of all 1's to the - * register and then reading the value back. The device will - * return 0's in all don't-care address bits, effectively - * specifying the address space required. - * - * Software computes the size of the address space with the - * formula S = ~B + 1, where S is the memory size and B is the - * value read from the BAR. This means that the BAR value that - * kvmtool should return is B = ~(S - 1). - */ memcpy(&value, data, size); - if (value == 0xffffffff) - value = ~(pci__bar_size(pci_hdr, bar) - 1); - /* Preserve the special bits. */ - value = (value & mask) | (pci_hdr->bar[bar] & ~mask); - memcpy(base + offset, &value, size); - } else { - memcpy(base + offset, data, size); + pci_config_bar_wr(kvm, pci_hdr, bar, value); + return; } + + memcpy(base + offset, data, size); } void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size) @@ -336,16 +488,18 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr if (!pci_bar_is_implemented(pci_hdr, i)) continue; + assert(!pci_bar_is_active(pci_hdr, i)); + if (pci__bar_is_io(pci_hdr, i) && pci__io_space_enabled(pci_hdr)) { - r = bar_activate_fn(kvm, pci_hdr, i, data); + r = pci_activate_bar(kvm, pci_hdr, i); if (r < 0) return r; } if (pci__bar_is_memory(pci_hdr, i) && pci__memory_space_enabled(pci_hdr)) { - r = bar_activate_fn(kvm, pci_hdr, i, data); + r = pci_activate_bar(kvm, pci_hdr, i); if (r < 0) return r; } diff --git a/vfio/pci.c b/vfio/pci.c index 34f19792765e..49ecd12a38cd 100644 --- a/vfio/pci.c +++ b/vfio/pci.c @@ -467,6 +467,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm, struct vfio_pci_msix_pba *pba = &pdev->msix_pba; struct vfio_pci_msix_table *table = &pdev->msix_table; struct vfio_region *region; + u32 bar_addr; bool has_msix; int ret; @@ -475,7 +476,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm, region = &vdev->regions[bar_num]; has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX; + bar_addr = pci__bar_address(pci_hdr, bar_num); + if (pci__bar_is_io(pci_hdr, bar_num)) + region->port_base = bar_addr; + else + region->guest_phys_addr = bar_addr; + if (has_msix && (u32)bar_num == table->bar) { + table->guest_phys_addr = region->guest_phys_addr; ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size, false, vfio_pci_msix_table_access, pdev); @@ -490,6 +498,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm, } if (has_msix && (u32)bar_num == pba->bar) { + if (pba->bar == table->bar) + pba->guest_phys_addr = table->guest_phys_addr + table->size; + else + pba->guest_phys_addr = region->guest_phys_addr; ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false, vfio_pci_msix_pba_access, pdev); From patchwork Thu May 14 15:38:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549159 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1087059D for ; Thu, 14 May 2020 15:38:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F33872065D for ; Thu, 14 May 2020 15:38:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727937AbgENPi6 (ORCPT ); Thu, 14 May 2020 11:38:58 -0400 Received: from foss.arm.com ([217.140.110.172]:39248 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727983AbgENPi4 (ORCPT ); Thu, 14 May 2020 11:38:56 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5DC621FB; Thu, 14 May 2020 08:38:56 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 35EA73F71E; Thu, 14 May 2020 08:38:54 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org, Julien Thierry Subject: [PATCH v4 kvmtool 11/12] arm/fdt: Remove 'linux,pci-probe-only' property Date: Thu, 14 May 2020 16:38:28 +0100 Message-Id: <1589470709-4104-12-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Julien Thierry PCI now supports configurable BARs. Get rid of the no longer needed, Linux-only, fdt property. Reviewed-by: Andre Przywara Signed-off-by: Julien Thierry Signed-off-by: Alexandru Elisei --- arm/fdt.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arm/fdt.c b/arm/fdt.c index c80e6da323b6..02091e9e0bee 100644 --- a/arm/fdt.c +++ b/arm/fdt.c @@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm) /* /chosen */ _FDT(fdt_begin_node(fdt, "chosen")); - _FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1)); /* Pass on our amended command line to a Linux kernel only. */ if (kvm->cfg.firmware_filename) { From patchwork Thu May 14 15:38:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 11549161 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DB0C618 for ; Thu, 14 May 2020 15:39:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BFE42065D for ; Thu, 14 May 2020 15:39:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728078AbgENPjA (ORCPT ); Thu, 14 May 2020 11:39:00 -0400 Received: from foss.arm.com ([217.140.110.172]:39258 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728045AbgENPi7 (ORCPT ); Thu, 14 May 2020 11:38:59 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA5851FB; Thu, 14 May 2020 08:38:58 -0700 (PDT) Received: from e121566-lin.arm.com (unknown [10.57.31.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F00043F71E; Thu, 14 May 2020 08:38:56 -0700 (PDT) From: Alexandru Elisei To: kvm@vger.kernel.org Cc: will@kernel.org, julien.thierry.kdev@gmail.com, andre.przywara@arm.com, sami.mujawar@arm.com, lorenzo.pieralisi@arm.com, maz@kernel.org Subject: [PATCH v4 kvmtool 12/12] vfio: Trap MMIO access to BAR addresses which aren't page aligned Date: Thu, 14 May 2020 16:38:29 +0100 Message-Id: <1589470709-4104-13-git-send-email-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> References: <1589470709-4104-1-git-send-email-alexandru.elisei@arm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org KVM_SET_USER_MEMORY_REGION will fail if the guest physical address is not aligned to the page size. However, it is legal for a guest to program an address which isn't aligned to the page size. Trap and emulate MMIO accesses to the region when that happens. Without this patch, when assigning a Seagate Barracude hard drive to a VM I was seeing these errors: [ 0.286029] pci 0000:00:00.0: BAR 0: assigned [mem 0x41004600-0x4100467f] Error: 0000:01:00.0: failed to register region with KVM Error: [1095:3132] Error activating emulation for BAR 0 [..] [ 10.561794] irq 13: nobody cared (try booting with the "irqpoll" option) [ 10.563122] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-seattle-00009-g909b20467ed1 #133 [ 10.563124] Hardware name: linux,dummy-virt (DT) [ 10.563126] Call trace: [ 10.563134] dump_backtrace+0x0/0x140 [ 10.563137] show_stack+0x14/0x20 [ 10.563141] dump_stack+0xbc/0x100 [ 10.563146] __report_bad_irq+0x48/0xd4 [ 10.563148] note_interrupt+0x288/0x378 [ 10.563151] handle_irq_event_percpu+0x80/0x88 [ 10.563153] handle_irq_event+0x44/0xc8 [ 10.563155] handle_fasteoi_irq+0xb4/0x160 [ 10.563157] generic_handle_irq+0x24/0x38 [ 10.563159] __handle_domain_irq+0x60/0xb8 [ 10.563162] gic_handle_irq+0x50/0xa0 [ 10.563164] el1_irq+0xb8/0x180 [ 10.563166] arch_cpu_idle+0x10/0x18 [ 10.563170] do_idle+0x204/0x290 [ 10.563172] cpu_startup_entry+0x20/0x40 [ 10.563175] rest_init+0xd4/0xe0 [ 10.563180] arch_call_rest_init+0xc/0x14 [ 10.563182] start_kernel+0x420/0x44c [ 10.563183] handlers: [ 10.563650] [<000000001e474803>] sil24_interrupt [ 10.564559] Disabling IRQ #13 [..] [ 11.832916] ata1: spurious interrupt (slot_stat 0x0 active_tag -84148995 sactive 0x0) [ 12.045444] ata_ratelimit: 1 callbacks suppressed With this patch, I don't see the errors and the device works as expected. Reviewed-by: Andre Przywara Signed-off-by: Alexandru Elisei --- vfio/core.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/vfio/core.c b/vfio/core.c index bad3c7c8cd26..0b45e78b40b4 100644 --- a/vfio/core.c +++ b/vfio/core.c @@ -226,6 +226,15 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev, if (!(region->info.flags & VFIO_REGION_INFO_FLAG_MMAP)) return vfio_setup_trap_region(kvm, vdev, region); + /* + * KVM_SET_USER_MEMORY_REGION will fail because the guest physical + * address isn't page aligned, let's emulate the region ourselves. + */ + if (region->guest_phys_addr & (PAGE_SIZE - 1)) + return kvm__register_mmio(kvm, region->guest_phys_addr, + region->info.size, false, + vfio_mmio_access, region); + if (region->info.flags & VFIO_REGION_INFO_FLAG_READ) prot |= PROT_READ; if (region->info.flags & VFIO_REGION_INFO_FLAG_WRITE)