Message ID | 20220815062958.100366-1-gshan@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | hw/arm/virt: Improve address assignment for high memory regions | expand |
Hi Marc, On 8/15/22 4:29 PM, Gavin Shan wrote: > There are three high memory regions, which are VIRT_HIGH_REDIST2, > VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses > are floating on highest RAM address. However, they can be disabled > in several cases. > > (1) One specific high memory region is disabled by developer by > toggling vms->highmem_{redists, ecam, mmio}. > > (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is > 'virt-2.12' or ealier than it. > > (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded > on 32-bits system. > > (4) One specific high memory region is disabled when it breaks the > PA space limit. > > The current implementation of virt_set_memmap() isn't comprehensive > because the space for one specific high memory region is always > reserved from the PA space for case (1), (2) and (3). In the code, > 'base' and 'vms->highest_gpa' are always increased for those three > cases. It's unnecessary since the assigned space of the disabled > high memory region won't be used afterwards. > > The series intends to improve the address assignment for these > high memory regions: > > PATCH[1] and PATCH[2] are cleanup and preparatory works. > PATCH[3] improves address assignment for these high memory regions > PATCH[4] moves the address assignment logic into standalone helper > > History > ======= > v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html > > Changelog > ========= > v2: > * Split the patches for easier review (Gavin) > * Improved changelog (Marc) > * Use 'bool fits' in virt_set_high_memmap() (Eric) > Could you help to review when you have free cycles? It's just a kindly ping :) Thanks, Gavin > > Gavin Shan (4): > hw/arm/virt: Rename variable size to region_size in virt_set_memmap() > hw/arm/virt: Introduce variable region_base in virt_set_memmap() > hw/arm/virt: Improve address assignment for high memory regions > virt/hw/virt: Add virt_set_high_memmap() helper > > hw/arm/virt.c | 84 ++++++++++++++++++++++++++++++--------------------- > 1 file changed, 50 insertions(+), 34 deletions(-) >
Hi Gavin, On 8/24/22 05:29, Gavin Shan wrote: > Hi Marc, > > On 8/15/22 4:29 PM, Gavin Shan wrote: >> There are three high memory regions, which are VIRT_HIGH_REDIST2, >> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses >> are floating on highest RAM address. However, they can be disabled >> in several cases. >> (1) One specific high memory region is disabled by developer by >> toggling vms->highmem_{redists, ecam, mmio}. >> (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is >> 'virt-2.12' or ealier than it. >> (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded >> on 32-bits system. >> (4) One specific high memory region is disabled when it breaks the >> PA space limit. >> The current implementation of virt_set_memmap() isn't comprehensive >> because the space for one specific high memory region is always >> reserved from the PA space for case (1), (2) and (3). In the code, >> 'base' and 'vms->highest_gpa' are always increased for those three >> cases. It's unnecessary since the assigned space of the disabled >> high memory region won't be used afterwards. >> >> The series intends to improve the address assignment for these >> high memory regions: >> >> PATCH[1] and PATCH[2] are cleanup and preparatory works. >> PATCH[3] improves address assignment for these high memory regions >> PATCH[4] moves the address assignment logic into standalone helper >> >> History >> ======= >> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html >> >> Changelog >> ========= >> v2: >> * Split the patches for easier review (Gavin) >> * Improved changelog (Marc) >> * Use 'bool fits' in virt_set_high_memmap() (Eric) >> You did not really convince me about migration compat wrt the high MMIO region. Aren't the PCI BARs saved/restored meaning the device driver is expecting to find data at the same GPA. But what if your high MMIO region was relocated in the dest QEMU with a possibly smaller VM IPA? Don't you have MMIO regions now allocated outside of the dest MMIO region? How does the PCI host bridge route accesses to those regions? What do I miss? Thanks Eric > > Could you help to review when you have free cycles? It's just a kindly > ping :) > > Thanks, > Gavin > >> >> Gavin Shan (4): >> hw/arm/virt: Rename variable size to region_size in virt_set_memmap() >> hw/arm/virt: Introduce variable region_base in virt_set_memmap() >> hw/arm/virt: Improve address assignment for high memory regions >> virt/hw/virt: Add virt_set_high_memmap() helper >> >> hw/arm/virt.c | 84 ++++++++++++++++++++++++++++++--------------------- >> 1 file changed, 50 insertions(+), 34 deletions(-) >> >
Hi Eric, On 8/24/22 6:06 PM, Eric Auger wrote: > On 8/24/22 05:29, Gavin Shan wrote: >> On 8/15/22 4:29 PM, Gavin Shan wrote: >>> There are three high memory regions, which are VIRT_HIGH_REDIST2, >>> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses >>> are floating on highest RAM address. However, they can be disabled >>> in several cases. >>> (1) One specific high memory region is disabled by developer by >>> toggling vms->highmem_{redists, ecam, mmio}. >>> (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is >>> 'virt-2.12' or ealier than it. >>> (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded >>> on 32-bits system. >>> (4) One specific high memory region is disabled when it breaks the >>> PA space limit. >>> The current implementation of virt_set_memmap() isn't comprehensive >>> because the space for one specific high memory region is always >>> reserved from the PA space for case (1), (2) and (3). In the code, >>> 'base' and 'vms->highest_gpa' are always increased for those three >>> cases. It's unnecessary since the assigned space of the disabled >>> high memory region won't be used afterwards. >>> >>> The series intends to improve the address assignment for these >>> high memory regions: >>> >>> PATCH[1] and PATCH[2] are cleanup and preparatory works. >>> PATCH[3] improves address assignment for these high memory regions >>> PATCH[4] moves the address assignment logic into standalone helper >>> >>> History >>> ======= >>> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html >>> >>> Changelog >>> ========= >>> v2: >>> * Split the patches for easier review (Gavin) >>> * Improved changelog (Marc) >>> * Use 'bool fits' in virt_set_high_memmap() (Eric) >>> > You did not really convince me about migration compat wrt the high MMIO > region. Aren't the PCI BARs saved/restored meaning the device driver is > expecting to find data at the same GPA. But what if your high MMIO > region was relocated in the dest QEMU with a possibly smaller VM IPA? > Don't you have MMIO regions now allocated outside of the dest MMIO > region? How does the PCI host bridge route accesses to those regions? > What do I miss? > I'm currently looking into virtio-pci-net migration, but need time to investigate how the device is migrated. I will get back to you once I have something. Thanks for your comments :) Thanks, Gavin
Hi Eric, On 8/24/22 6:06 PM, Eric Auger wrote: > On 8/24/22 05:29, Gavin Shan wrote: >> On 8/15/22 4:29 PM, Gavin Shan wrote: >>> There are three high memory regions, which are VIRT_HIGH_REDIST2, >>> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses >>> are floating on highest RAM address. However, they can be disabled >>> in several cases. >>> (1) One specific high memory region is disabled by developer by >>> toggling vms->highmem_{redists, ecam, mmio}. >>> (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is >>> 'virt-2.12' or ealier than it. >>> (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded >>> on 32-bits system. >>> (4) One specific high memory region is disabled when it breaks the >>> PA space limit. >>> The current implementation of virt_set_memmap() isn't comprehensive >>> because the space for one specific high memory region is always >>> reserved from the PA space for case (1), (2) and (3). In the code, >>> 'base' and 'vms->highest_gpa' are always increased for those three >>> cases. It's unnecessary since the assigned space of the disabled >>> high memory region won't be used afterwards. >>> >>> The series intends to improve the address assignment for these >>> high memory regions: >>> >>> PATCH[1] and PATCH[2] are cleanup and preparatory works. >>> PATCH[3] improves address assignment for these high memory regions >>> PATCH[4] moves the address assignment logic into standalone helper >>> >>> History >>> ======= >>> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html >>> >>> Changelog >>> ========= >>> v2: >>> * Split the patches for easier review (Gavin) >>> * Improved changelog (Marc) >>> * Use 'bool fits' in virt_set_high_memmap() (Eric) >>> > You did not really convince me about migration compat wrt the high MMIO > region. Aren't the PCI BARs saved/restored meaning the device driver is > expecting to find data at the same GPA. But what if your high MMIO > region was relocated in the dest QEMU with a possibly smaller VM IPA? > Don't you have MMIO regions now allocated outside of the dest MMIO > region? How does the PCI host bridge route accesses to those regions? > What do I miss? > [...] Sorry for the delay because I was offline last week. I was intending to explain the migration on virtio-net device and spent some time to go through the code. I found it's still complicated for an example because PCI and virtio device models are involved. So lets still use e1000e.c as an example here. There are lots of registers residing in MMIO region, including MSIx table. The MSIx table is backed by PCIDevice::msix_table, which is a buffer. The access to MSIx table is read from or write to the buffer. The corresponding handler is hw/msix.c::msix_table_mmio_ops. msix_init() is called by e1000e.c to setup the MSIx table, which is associated with memory BAR#3. As the registers in MSIx table is totally emulated by QEMU, the BAR's base address isn't a concern. struct PCIDevice { : uint8_t *msix_table; : MemoryRegion msix_table_mmio; : }; /* @table_bar is registered as memory BAR#3 in e1000e_pci_realize() */ int msix_init(struct PCIDevice *dev, unsigned short nentries, MemoryRegion *table_bar, uint8_t table_bar_nr, unsigned table_offset, MemoryRegion *pba_bar, uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos, Error **errp) { : memory_region_init_io(&dev->msix_table_mmio, OBJECT(dev), &msix_table_mmio_ops, dev, "msix-table", table_size); memory_region_add_subregion(table_bar, table_offset, &dev->msix_table_mmio); : } As we concerned, the BAR's base addresses for MSIx tables are different on source and destination VMs. It's still not a problem because the registers in MSIx table are migrated, saved on source VM and restored on destination VM one by one. It's to say, not the whole buffer (PCIDevice::msix_table) is saved and restored at once. Further more, the unique ID string, instead the corresponding BAR's base address, is used to identify the MSIx table. For this particular case, the ID string is something like "e1000e_dev_id/pci-0000:05:00.0/msix state". With this ID string is received on the destination VM, the object and PCI device is located and the forth-coming data is saved to PCIDevice::msix_table. static const VMStateDescription e1000e_vmstate = { .name = "e1000e", .version_id = 1, .minimum_version_id = 1, .pre_save = e1000e_pre_save, .post_load = e1000e_post_load, .fields = (VMStateField[]) { VMSTATE_PCI_DEVICE(parent_obj, E1000EState), VMSTATE_MSIX(parent_obj, E1000EState), : } }; #define VMSTATE_MSIX_TEST(_field, _state, _test) { \ .name = (stringify(_field)), \ .size = sizeof(PCIDevice), \ .vmsd = &vmstate_msix, \ .flags = VMS_STRUCT, \ .offset = vmstate_offset_value(_state, _field, PCIDevice), \ .field_exists = (_test) \ } #define VMSTATE_MSIX(_f, _s) \ VMSTATE_MSIX_TEST(_f, _s, NULL) /* On source VM, PCIDevice::msix_table is transferred to destination VM */ void msix_save(PCIDevice *dev, QEMUFile *f) { : qemu_put_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE); qemu_put_buffer(f, dev->msix_pba, DIV_ROUND_UP(n, 8)); } /* On destination VM, the received data is write to PCIDevice::msix_table */ void msix_load(PCIDevice *dev, QEMUFile *f) { : qemu_get_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE); qemu_get_buffer(f, dev->msix_pba, DIV_ROUND_UP(n, 8)); : } static int put_msix_state(QEMUFile *f, void *pv, size_t size, const VMStateField *field, JSONWriter *vmdesc) { msix_save(pv, f); return 0; } static int get_msix_state(QEMUFile *f, void *pv, size_t size, const VMStateField *field) { msix_load(pv, f); return 0; } static VMStateInfo vmstate_info_msix = { .name = "msix state", .get = get_msix_state, .put = put_msix_state, }; const VMStateDescription vmstate_msix = { .name = "msix", .fields = (VMStateField[]) { { .name = "msix", .version_id = 0, .field_exists = NULL, .size = 0, /* ouch */ .info = &vmstate_info_msix, .flags = VMS_SINGLE, .offset = 0, }, } }; Thanks, Gavin