From patchwork Sun Oct 30 19:59:31 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 9404629 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BE48160586 for ; Sun, 30 Oct 2016 19:59:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF35928B02 for ; Sun, 30 Oct 2016 19:59:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9FBCF28E3C; Sun, 30 Oct 2016 19:59:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EE37128B02 for ; Sun, 30 Oct 2016 19:59:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755016AbcJ3T7d (ORCPT ); Sun, 30 Oct 2016 15:59:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42026 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753871AbcJ3T7d (ORCPT ); Sun, 30 Oct 2016 15:59:33 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 89DF64E4D9; Sun, 30 Oct 2016 19:59:32 +0000 (UTC) Received: from t450s.home (ovpn03.gateway.prod.ext.phx2.redhat.com [10.5.9.3]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u9UJxVn4019668; Sun, 30 Oct 2016 15:59:32 -0400 Date: Sun, 30 Oct 2016 13:59:31 -0600 From: Alex Williamson To: Yongji Xie Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, pbonzini@redhat.com, aik@ozlabs.ru, zhong@linux.vnet.ibm.com Subject: Re: [Qemu-devel] [PATCH v3] vfio: Add support for mmapping sub-page MMIO BARs Message-ID: <20161030135931.1a2adc95@t450s.home> In-Reply-To: <1477472173-5124-1-git-send-email-xyjxie@linux.vnet.ibm.com> References: <1477472173-5124-1-git-send-email-xyjxie@linux.vnet.ibm.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Sun, 30 Oct 2016 19:59:32 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 26 Oct 2016 16:56:13 +0800 Yongji Xie wrote: > Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap > sub-page MMIO BARs if the mmio page is exclusive") allows VFIO > to mmap sub-page BARs. This is the corresponding QEMU patch. > With those patches applied, we could passthrough sub-page BARs > to guest, which can help to improve IO performance for some devices. > > In this patch, we expand MemoryRegions of these sub-page > MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that > the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION > with a valid size. The expanding size will be recovered when > the base address of sub-page BAR is changed and not page aligned > any more in guest. And we also set the priority of these BARs' > memory regions to zero in case of overlap with BARs which share > the same page with sub-page BARs in guest. > > Signed-off-by: Yongji Xie > --- > hw/vfio/common.c | 3 +-- > hw/vfio/pci.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 78 insertions(+), 2 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index 9505fb3..6e48f98 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -662,8 +662,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region, > region, name, region->size); > > if (!vbasedev->no_mmap && > - region->flags & VFIO_REGION_INFO_FLAG_MMAP && > - !(region->size & ~qemu_real_host_page_mask)) { > + region->flags & VFIO_REGION_INFO_FLAG_MMAP) { > > vfio_setup_region_sparse_mmaps(region, info); > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > index 65d30fd..0516e62 100644 > --- a/hw/vfio/pci.c > +++ b/hw/vfio/pci.c > @@ -1071,6 +1071,65 @@ static const MemoryRegionOps vfio_vga_ops = { > }; > > /* > + * Expand memory region of sub-page(size < PAGE_SIZE) MMIO BAR to page > + * size if the BAR is in an exclusive page in host so that we could map > + * this BAR to guest. But this sub-page BAR may not occupy an exclusive > + * page in guest. So we should set the priority of the expanded memory > + * region to zero in case of overlap with BARs which share the same page > + * with the sub-page BAR in guest. Besides, we should also recover the > + * size of this sub-page BAR when its base address is changed in guest > + * and not page aligned any more. > + */ > +static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar) > +{ > + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); > + VFIORegion *region = &vdev->bars[bar].region; > + MemoryRegion *mmap_mr, *mr; > + PCIIORegion *r; > + pcibus_t bar_addr; > + > + /* Make sure that the whole region is allowed to be mmapped */ > + if (!(region->nr_mmaps == 1 && > + region->mmaps[0].size == region->size)) { Isn't region->size equal to the BAR size, which is less than PAGE_SIZE? Doesn't that mean that the mmap(2) was performed with a length val less than a page size? Isn't that going to fail? We could also test mmaps[0].mmap here instead of burying it below. > + return; > + } > + > + r = &pdev->io_regions[bar]; > + bar_addr = r->addr; > + if (bar_addr == PCI_BAR_UNMAPPED) { > + return; > + } See below, why not reset sizes on this case? > + > + mr = region->mem; > + mmap_mr = ®ion->mmaps[0].mem; > + memory_region_transaction_begin(); > + if (memory_region_size(mr) < qemu_real_host_page_size) { > + if (!(bar_addr & ~qemu_real_host_page_mask) && > + memory_region_is_mapped(mr) && region->mmaps[0].mmap) { > + /* Expand memory region to page size and set priority */ > + memory_region_del_subregion(r->address_space, mr); > + memory_region_set_size(mr, qemu_real_host_page_size); > + memory_region_set_size(mmap_mr, qemu_real_host_page_size); > + memory_region_add_subregion_overlap(r->address_space, > + bar_addr, mr, 0); > + } > + } else { > + /* This case would happen when guest rescan one PCI device */ > + if (bar_addr & ~qemu_real_host_page_mask) { > + /* Recover the size of memory region */ > + memory_region_set_size(mr, r->size); > + memory_region_set_size(mmap_mr, r->size); > + } else if (memory_region_is_mapped(mr)) { > + /* Set the priority of memory region to zero */ > + memory_region_del_subregion(r->address_space, mr); > + memory_region_add_subregion_overlap(r->address_space, > + bar_addr, mr, 0); > + } > + } > + memory_region_transaction_commit(); The logic flow here is confusing to me, too many cases that presume the previous state of the device. Can't we *always* set the memory region sizes based on the value written to the BAR? And can't we *always* re-prioritize regions when we have a mapped MemoryRegion for which we've modified the size? Something like below (untested) makes a lot more sense to me: Please note that QEMU 2.8 goes into freeze on Nov-1, we need to rev quickly to get this in. Thanks, Alex > +} > + > +/* > * PCI config space > */ > uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len) > @@ -1153,6 +1212,24 @@ void vfio_pci_write_config(PCIDevice *pdev, > } else if (was_enabled && !is_enabled) { > vfio_msix_disable(vdev); > } > + } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) || > + range_covers_byte(addr, len, PCI_COMMAND)) { > + pcibus_t old_addr[PCI_NUM_REGIONS - 1]; > + int bar; > + > + for (bar = 0; bar < PCI_ROM_SLOT; bar++) { > + old_addr[bar] = pdev->io_regions[bar].addr; > + } > + > + pci_default_write_config(pdev, addr, val, len); > + > + for (bar = 0; bar < PCI_ROM_SLOT; bar++) { > + if (old_addr[bar] != pdev->io_regions[bar].addr && > + pdev->io_regions[bar].size > 0 && > + pdev->io_regions[bar].size < qemu_real_host_page_size) { > + vfio_sub_page_bar_update_mapping(pdev, bar); > + } > + } > } else { > /* Write everything to QEMU to keep emulated bits correct */ > pci_default_write_config(pdev, addr, val, len); --- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 03e3c94..d7dbe0e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1087,45 +1087,35 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar) MemoryRegion *mmap_mr, *mr; PCIIORegion *r; pcibus_t bar_addr; + uint64_t size = region->size; /* Make sure that the whole region is allowed to be mmapped */ - if (!(region->nr_mmaps == 1 && - region->mmaps[0].size == region->size)) { + if (region->nr_mmaps != 1 || !region->mmaps[0].mmap || + region->mmaps[0].size != region->size) { return; } r = &pdev->io_regions[bar]; bar_addr = r->addr; - if (bar_addr == PCI_BAR_UNMAPPED) { - return; - } - mr = region->mem; mmap_mr = ®ion->mmaps[0].mem; + + /* If BAR is mapped and page aligned, update to fill PAGE_SIZE */ + if (bar_addr != PCI_BAR_UNMAPPED && + !(bar_addr & ~qemu_real_host_page_mask)) { + size = qemu_real_host_page_size; + } + memory_region_transaction_begin(); - if (memory_region_size(mr) < qemu_real_host_page_size) { - if (!(bar_addr & ~qemu_real_host_page_mask) && - memory_region_is_mapped(mr) && region->mmaps[0].mmap) { - /* Expand memory region to page size and set priority */ - memory_region_del_subregion(r->address_space, mr); - memory_region_set_size(mr, qemu_real_host_page_size); - memory_region_set_size(mmap_mr, qemu_real_host_page_size); - memory_region_add_subregion_overlap(r->address_space, - bar_addr, mr, 0); - } - } else { - /* This case would happen when guest rescan one PCI device */ - if (bar_addr & ~qemu_real_host_page_mask) { - /* Recover the size of memory region */ - memory_region_set_size(mr, r->size); - memory_region_set_size(mmap_mr, r->size); - } else if (memory_region_is_mapped(mr)) { - /* Set the priority of memory region to zero */ - memory_region_del_subregion(r->address_space, mr); - memory_region_add_subregion_overlap(r->address_space, - bar_addr, mr, 0); - } + + memory_region_set_size(mr, size); + memory_region_set_size(mmap_mr, size); + if (size != region->size && memory_region_is_mapped(mr)) { + memory_region_del_subregion(r->address_space, mr); + memory_region_add_subregion_overlap(r->address_space, + bar_addr, mr, 0); } + memory_region_transaction_commit(); }