mbox series

[v2,0/4] hw/arm/virt: Improve address assignment for high memory regions

Message ID 20220815062958.100366-1-gshan@redhat.com (mailing list archive)
Headers show
Series hw/arm/virt: Improve address assignment for high memory regions | expand

Message

Gavin Shan Aug. 15, 2022, 6:29 a.m. UTC
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.
    
(1) One specific high memory region is disabled by developer by
    toggling vms->highmem_{redists, ecam, mmio}.
    
(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
    'virt-2.12' or ealier than it.
    
(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
    on 32-bits system.
    
(4) One specific high memory region is disabled when it breaks the
    PA space limit.
    
The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions:

PATCH[1] and PATCH[2] are cleanup and preparatory works.
PATCH[3] improves address assignment for these high memory regions
PATCH[4] moves the address assignment logic into standalone helper

History
=======
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
=========
v2:
  * Split the patches for easier review                        (Gavin)
  * Improved changelog                                         (Marc)
  * Use 'bool fits' in virt_set_high_memmap()                  (Eric)


Gavin Shan (4):
  hw/arm/virt: Rename variable size to region_size in virt_set_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_memmap()
  hw/arm/virt: Improve address assignment for high memory regions
  virt/hw/virt: Add virt_set_high_memmap() helper

 hw/arm/virt.c | 84 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 34 deletions(-)

Comments

Gavin Shan Aug. 24, 2022, 3:29 a.m. UTC | #1
Hi Marc,

On 8/15/22 4:29 PM, Gavin Shan wrote:
> There are three high memory regions, which are VIRT_HIGH_REDIST2,
> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
> are floating on highest RAM address. However, they can be disabled
> in several cases.
>      
> (1) One specific high memory region is disabled by developer by
>      toggling vms->highmem_{redists, ecam, mmio}.
>      
> (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
>      'virt-2.12' or ealier than it.
>      
> (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
>      on 32-bits system.
>      
> (4) One specific high memory region is disabled when it breaks the
>      PA space limit.
>      
> The current implementation of virt_set_memmap() isn't comprehensive
> because the space for one specific high memory region is always
> reserved from the PA space for case (1), (2) and (3). In the code,
> 'base' and 'vms->highest_gpa' are always increased for those three
> cases. It's unnecessary since the assigned space of the disabled
> high memory region won't be used afterwards.
> 
> The series intends to improve the address assignment for these
> high memory regions:
> 
> PATCH[1] and PATCH[2] are cleanup and preparatory works.
> PATCH[3] improves address assignment for these high memory regions
> PATCH[4] moves the address assignment logic into standalone helper
> 
> History
> =======
> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html
> 
> Changelog
> =========
> v2:
>    * Split the patches for easier review                        (Gavin)
>    * Improved changelog                                         (Marc)
>    * Use 'bool fits' in virt_set_high_memmap()                  (Eric)
> 

Could you help to review when you have free cycles? It's just a kindly
ping :)

Thanks,
Gavin

> 
> Gavin Shan (4):
>    hw/arm/virt: Rename variable size to region_size in virt_set_memmap()
>    hw/arm/virt: Introduce variable region_base in virt_set_memmap()
>    hw/arm/virt: Improve address assignment for high memory regions
>    virt/hw/virt: Add virt_set_high_memmap() helper
> 
>   hw/arm/virt.c | 84 ++++++++++++++++++++++++++++++---------------------
>   1 file changed, 50 insertions(+), 34 deletions(-)
>
Eric Auger Aug. 24, 2022, 8:06 a.m. UTC | #2
Hi Gavin,

On 8/24/22 05:29, Gavin Shan wrote:
> Hi Marc,
>
> On 8/15/22 4:29 PM, Gavin Shan wrote:
>> There are three high memory regions, which are VIRT_HIGH_REDIST2,
>> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
>> are floating on highest RAM address. However, they can be disabled
>> in several cases.
>>      (1) One specific high memory region is disabled by developer by
>>      toggling vms->highmem_{redists, ecam, mmio}.
>>      (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
>>      'virt-2.12' or ealier than it.
>>      (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
>>      on 32-bits system.
>>      (4) One specific high memory region is disabled when it breaks the
>>      PA space limit.
>>      The current implementation of virt_set_memmap() isn't comprehensive
>> because the space for one specific high memory region is always
>> reserved from the PA space for case (1), (2) and (3). In the code,
>> 'base' and 'vms->highest_gpa' are always increased for those three
>> cases. It's unnecessary since the assigned space of the disabled
>> high memory region won't be used afterwards.
>>
>> The series intends to improve the address assignment for these
>> high memory regions:
>>
>> PATCH[1] and PATCH[2] are cleanup and preparatory works.
>> PATCH[3] improves address assignment for these high memory regions
>> PATCH[4] moves the address assignment logic into standalone helper
>>
>> History
>> =======
>> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html
>>
>> Changelog
>> =========
>> v2:
>>    * Split the patches for easier review                        (Gavin)
>>    * Improved changelog                                         (Marc)
>>    * Use 'bool fits' in virt_set_high_memmap()                  (Eric)
>>
You did not really convince me about migration compat wrt the high MMIO
region. Aren't the PCI BARs saved/restored meaning the device driver is
expecting to find data at the same GPA. But what if your high MMIO
region was relocated in the dest QEMU with a possibly smaller VM IPA?
Don't you have MMIO regions now allocated outside of the dest MMIO
region? How does the PCI host bridge route accesses to those regions?
What do I miss?

Thanks

Eric
>
> Could you help to review when you have free cycles? It's just a kindly
> ping :)
>
> Thanks,
> Gavin
>
>>
>> Gavin Shan (4):
>>    hw/arm/virt: Rename variable size to region_size in virt_set_memmap()
>>    hw/arm/virt: Introduce variable region_base in virt_set_memmap()
>>    hw/arm/virt: Improve address assignment for high memory regions
>>    virt/hw/virt: Add virt_set_high_memmap() helper
>>
>>   hw/arm/virt.c | 84 ++++++++++++++++++++++++++++++---------------------
>>   1 file changed, 50 insertions(+), 34 deletions(-)
>>
>
Gavin Shan Aug. 26, 2022, 6:02 a.m. UTC | #3
Hi Eric,

On 8/24/22 6:06 PM, Eric Auger wrote:
> On 8/24/22 05:29, Gavin Shan wrote:
>> On 8/15/22 4:29 PM, Gavin Shan wrote:
>>> There are three high memory regions, which are VIRT_HIGH_REDIST2,
>>> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
>>> are floating on highest RAM address. However, they can be disabled
>>> in several cases.
>>>       (1) One specific high memory region is disabled by developer by
>>>       toggling vms->highmem_{redists, ecam, mmio}.
>>>       (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
>>>       'virt-2.12' or ealier than it.
>>>       (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
>>>       on 32-bits system.
>>>       (4) One specific high memory region is disabled when it breaks the
>>>       PA space limit.
>>>       The current implementation of virt_set_memmap() isn't comprehensive
>>> because the space for one specific high memory region is always
>>> reserved from the PA space for case (1), (2) and (3). In the code,
>>> 'base' and 'vms->highest_gpa' are always increased for those three
>>> cases. It's unnecessary since the assigned space of the disabled
>>> high memory region won't be used afterwards.
>>>
>>> The series intends to improve the address assignment for these
>>> high memory regions:
>>>
>>> PATCH[1] and PATCH[2] are cleanup and preparatory works.
>>> PATCH[3] improves address assignment for these high memory regions
>>> PATCH[4] moves the address assignment logic into standalone helper
>>>
>>> History
>>> =======
>>> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html
>>>
>>> Changelog
>>> =========
>>> v2:
>>>     * Split the patches for easier review                        (Gavin)
>>>     * Improved changelog                                         (Marc)
>>>     * Use 'bool fits' in virt_set_high_memmap()                  (Eric)
>>>
> You did not really convince me about migration compat wrt the high MMIO
> region. Aren't the PCI BARs saved/restored meaning the device driver is
> expecting to find data at the same GPA. But what if your high MMIO
> region was relocated in the dest QEMU with a possibly smaller VM IPA?
> Don't you have MMIO regions now allocated outside of the dest MMIO
> region? How does the PCI host bridge route accesses to those regions?
> What do I miss?
> 

I'm currently looking into virtio-pci-net migration, but need time to
investigate how the device is migrated. I will get back to you once
I have something. Thanks for your comments :)

Thanks,
Gavin
Gavin Shan Sept. 6, 2022, 2:39 a.m. UTC | #4
Hi Eric,

On 8/24/22 6:06 PM, Eric Auger wrote:
> On 8/24/22 05:29, Gavin Shan wrote:
>> On 8/15/22 4:29 PM, Gavin Shan wrote:
>>> There are three high memory regions, which are VIRT_HIGH_REDIST2,
>>> VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
>>> are floating on highest RAM address. However, they can be disabled
>>> in several cases.
>>>       (1) One specific high memory region is disabled by developer by
>>>       toggling vms->highmem_{redists, ecam, mmio}.
>>>       (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
>>>       'virt-2.12' or ealier than it.
>>>       (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
>>>       on 32-bits system.
>>>       (4) One specific high memory region is disabled when it breaks the
>>>       PA space limit.
>>>       The current implementation of virt_set_memmap() isn't comprehensive
>>> because the space for one specific high memory region is always
>>> reserved from the PA space for case (1), (2) and (3). In the code,
>>> 'base' and 'vms->highest_gpa' are always increased for those three
>>> cases. It's unnecessary since the assigned space of the disabled
>>> high memory region won't be used afterwards.
>>>
>>> The series intends to improve the address assignment for these
>>> high memory regions:
>>>
>>> PATCH[1] and PATCH[2] are cleanup and preparatory works.
>>> PATCH[3] improves address assignment for these high memory regions
>>> PATCH[4] moves the address assignment logic into standalone helper
>>>
>>> History
>>> =======
>>> v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html
>>>
>>> Changelog
>>> =========
>>> v2:
>>>     * Split the patches for easier review                        (Gavin)
>>>     * Improved changelog                                         (Marc)
>>>     * Use 'bool fits' in virt_set_high_memmap()                  (Eric)
>>>
> You did not really convince me about migration compat wrt the high MMIO
> region. Aren't the PCI BARs saved/restored meaning the device driver is
> expecting to find data at the same GPA. But what if your high MMIO
> region was relocated in the dest QEMU with a possibly smaller VM IPA?
> Don't you have MMIO regions now allocated outside of the dest MMIO
> region? How does the PCI host bridge route accesses to those regions?
> What do I miss?
> 

[...]

Sorry for the delay because I was offline last week. I was intending
to explain the migration on virtio-net device and spent some time to
go through the code. I found it's still complicated for an example
because PCI and virtio device models are involved. So lets still use
e1000e.c as an example here.

There are lots of registers residing in MMIO region, including MSIx
table. The MSIx table is backed by PCIDevice::msix_table, which is
a buffer. The access to MSIx table is read from or write to the buffer.
The corresponding handler is hw/msix.c::msix_table_mmio_ops. msix_init()
is called by e1000e.c to setup the MSIx table, which is associated with
memory BAR#3. As the registers in MSIx table is totally emulated by
QEMU, the BAR's base address isn't a concern.

   struct PCIDevice {
      :
      uint8_t *msix_table;
      :
      MemoryRegion msix_table_mmio;
      :
   };

   /* @table_bar is registered as memory BAR#3 in e1000e_pci_realize() */
   int msix_init(struct PCIDevice *dev, unsigned short nentries,
                 MemoryRegion *table_bar, uint8_t table_bar_nr,
                 unsigned table_offset, MemoryRegion *pba_bar,
                 uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
                 Error **errp)
   {
        :
     memory_region_init_io(&dev->msix_table_mmio, OBJECT(dev), &msix_table_mmio_ops, dev,
                           "msix-table", table_size);
     memory_region_add_subregion(table_bar, table_offset, &dev->msix_table_mmio);
        :
   }

As we concerned, the BAR's base addresses for MSIx tables are different on source
and destination VMs. It's still not a problem because the registers in MSIx table
are migrated, saved on source VM and restored on destination VM one by one. It's
to say, not the whole buffer (PCIDevice::msix_table) is saved and restored at once.
Further more, the unique ID string, instead the corresponding BAR's base address,
is used to identify the MSIx table. For this particular case, the ID string is
something like "e1000e_dev_id/pci-0000:05:00.0/msix state". With this ID string
is received on the destination VM, the object and PCI device is located and the
forth-coming data is saved to PCIDevice::msix_table.

   static const VMStateDescription e1000e_vmstate = {
     .name = "e1000e",
     .version_id = 1,
     .minimum_version_id = 1,
     .pre_save = e1000e_pre_save,
     .post_load = e1000e_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_PCI_DEVICE(parent_obj, E1000EState),
         VMSTATE_MSIX(parent_obj, E1000EState),
         :
     }
   };

   #define VMSTATE_MSIX_TEST(_field, _state, _test) {                 \
     .name         = (stringify(_field)),                             \
     .size         = sizeof(PCIDevice),                               \
     .vmsd         = &vmstate_msix,                                   \
     .flags        = VMS_STRUCT,                                      \
     .offset       = vmstate_offset_value(_state, _field, PCIDevice), \
     .field_exists = (_test)                                          \
   }

   #define VMSTATE_MSIX(_f, _s)                                       \
       VMSTATE_MSIX_TEST(_f, _s, NULL)


   /* On source VM, PCIDevice::msix_table is transferred to destination VM */
   void msix_save(PCIDevice *dev, QEMUFile *f)
   {
     :
     qemu_put_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
     qemu_put_buffer(f, dev->msix_pba, DIV_ROUND_UP(n, 8));
   }

   /* On destination VM, the received data is write to PCIDevice::msix_table */
   void msix_load(PCIDevice *dev, QEMUFile *f)
   {
     :
     qemu_get_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
     qemu_get_buffer(f, dev->msix_pba, DIV_ROUND_UP(n, 8));
     :
   }

   static int put_msix_state(QEMUFile *f, void *pv, size_t size,
                             const VMStateField *field, JSONWriter *vmdesc)
   {
     msix_save(pv, f);

     return 0;
   }

   static int get_msix_state(QEMUFile *f, void *pv, size_t size,
                             const VMStateField *field)
   {
     msix_load(pv, f);
     return 0;
   }

   static VMStateInfo vmstate_info_msix = {
     .name = "msix state",
     .get  = get_msix_state,
     .put  = put_msix_state,
   };

   const VMStateDescription vmstate_msix = {
     .name = "msix",
     .fields = (VMStateField[]) {
         {
             .name         = "msix",
             .version_id   = 0,
             .field_exists = NULL,
             .size         = 0,   /* ouch */
             .info         = &vmstate_info_msix,
             .flags        = VMS_SINGLE,
             .offset       = 0,
         },
     }
  };

Thanks,
Gavin