Message ID | 20250128151544.26fc827d.olaf@aepfle.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | slow start of Pod HVM domU with qemu 9.1 | expand |
On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > Hello, > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > Depending on the domU kernel, it may trigger a warning, which prompted me > to notice this change in behavior: > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > ... > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > [ 16.136086] random: crng init done > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > [ 31.716029] Showing busy workqueues and worker pools: > [ 31.721164] workqueue events: flags=0x0 > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > [ 31.728000] in-flight: 17:balloon_process > [ 31.728000] pending: hpet_work > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > [ 31.736000] pending: vmstat_update > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > [ 50.400102] clocksource: Switched to clocksource xen > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > ... > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > xen: mapcache: Add support for grant mappings > > As you can see, v4 instead of v5 was apparently applied. > This was probably unintentional, but would probably not change the result. Hi Olaf, It looks like v8 was applied, or am I missing something? > > With this change the domU starts fast again: > > --- a/hw/xen/xen-mapcache.c > +++ b/hw/xen/xen-mapcache.c > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > ram_addr_t addr; > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > + if (1) > if (addr == RAM_ADDR_INVALID) { > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > } > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > { > xen_invalidate_map_cache_entry_single(mapcache, buffer); > + if (1) > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > } > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > bdrv_drain_all(); > > xen_invalidate_map_cache_single(mapcache); > + if (0) > xen_invalidate_map_cache_single(mapcache_grants); > } > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > maxmem = 4096 > memory = 2048 > maxvcpus = 4 > vcpus = 2 > pae = 1 > acpi = 1 > apic = 1 > viridian = 0 > rtc_timeoffset = 0 > localtime = 0 > on_poweroff = "destroy" > on_reboot = "destroy" > on_crash = "destroy" > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > sdl = 0 > vnc = 1 > vncunused = 1 > vnclisten = "127.0.0.1" > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > parallel = "none" > serial = "pty" > builder = "hvm" > kernel = "/bug1236329/linux" > ramdisk = "/bug1236329/initrd" > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > boot = "c" > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > usb = 1 > usbdevice = "tablet" > > Any idea what can be done to restore boot times? A guess is that it's taking a long time to walk the grants mapcache when invalidating (in QEMU). Despite it being unused and empty. We could try to find a way to keep track of usage and do nothing when invalidating an empty/unused cache. Best regards, Edgar
On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > Hello, > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > Depending on the domU kernel, it may trigger a warning, which prompted me > > to notice this change in behavior: > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > ... > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > [ 16.136086] random: crng init done > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > [ 31.716029] Showing busy workqueues and worker pools: > > [ 31.721164] workqueue events: flags=0x0 > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > [ 31.728000] in-flight: 17:balloon_process > > [ 31.728000] pending: hpet_work > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > [ 31.736000] pending: vmstat_update > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > [ 50.400102] clocksource: Switched to clocksource xen > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > ... > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > xen: mapcache: Add support for grant mappings > > > > As you can see, v4 instead of v5 was apparently applied. > > This was probably unintentional, but would probably not change the result. > > Hi Olaf, > > It looks like v8 was applied, or am I missing something? > > > > > > With this change the domU starts fast again: > > > > --- a/hw/xen/xen-mapcache.c > > +++ b/hw/xen/xen-mapcache.c > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > ram_addr_t addr; > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > + if (1) > > if (addr == RAM_ADDR_INVALID) { > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > } > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > { > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > + if (1) > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > } > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > bdrv_drain_all(); > > > > xen_invalidate_map_cache_single(mapcache); > > + if (0) > > xen_invalidate_map_cache_single(mapcache_grants); > > } > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > maxmem = 4096 > > memory = 2048 > > maxvcpus = 4 > > vcpus = 2 > > pae = 1 > > acpi = 1 > > apic = 1 > > viridian = 0 > > rtc_timeoffset = 0 > > localtime = 0 > > on_poweroff = "destroy" > > on_reboot = "destroy" > > on_crash = "destroy" > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > sdl = 0 > > vnc = 1 > > vncunused = 1 > > vnclisten = "127.0.0.1" > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > parallel = "none" > > serial = "pty" > > builder = "hvm" > > kernel = "/bug1236329/linux" > > ramdisk = "/bug1236329/initrd" > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > boot = "c" > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > usb = 1 > > usbdevice = "tablet" > > > > Any idea what can be done to restore boot times? > > > A guess is that it's taking a long time to walk the grants mapcache > when invalidating (in QEMU). Despite it being unused and empty. We > could try to find a way to keep track of usage and do nothing when > invalidating an empty/unused cache. If mapcache_grants is unused and empty, the call to xen_invalidate_map_cache_single(mapcache_grants) should be really fast? I think probably it might be the opposite: mapcache_grants is utilized, so going through all the mappings in xen_invalidate_map_cache_single takes time. However, I wonder if it is really needed. At least in the PoD case, the reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU memory has changed. But that doesn't affect the grant mappings, because those are mappings of other domains' memory. So I am thinking whether we should remove the call to xen_invalidate_map_cache_single(mapcache_grants) ? Adding x86 maintainers: do we need to flush grant table mappings for the PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE request to QEMU?
On 29.01.2025 00:58, Stefano Stabellini wrote: > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: >> On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: >>> With this change the domU starts fast again: >>> >>> --- a/hw/xen/xen-mapcache.c >>> +++ b/hw/xen/xen-mapcache.c >>> @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) >>> ram_addr_t addr; >>> >>> addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); >>> + if (1) >>> if (addr == RAM_ADDR_INVALID) { >>> addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); >>> } >>> @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) >>> static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) >>> { >>> xen_invalidate_map_cache_entry_single(mapcache, buffer); >>> + if (1) >>> xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); >>> } >>> >>> @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) >>> bdrv_drain_all(); >>> >>> xen_invalidate_map_cache_single(mapcache); >>> + if (0) >>> xen_invalidate_map_cache_single(mapcache_grants); >>> } >>> >>> I did the testing with libvirt, the domU.cfg equivalent looks like this: >>> maxmem = 4096 >>> memory = 2048 >>> maxvcpus = 4 >>> vcpus = 2 >>> pae = 1 >>> acpi = 1 >>> apic = 1 >>> viridian = 0 >>> rtc_timeoffset = 0 >>> localtime = 0 >>> on_poweroff = "destroy" >>> on_reboot = "destroy" >>> on_crash = "destroy" >>> device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" >>> sdl = 0 >>> vnc = 1 >>> vncunused = 1 >>> vnclisten = "127.0.0.1" >>> vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] >>> parallel = "none" >>> serial = "pty" >>> builder = "hvm" >>> kernel = "/bug1236329/linux" >>> ramdisk = "/bug1236329/initrd" >>> cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" >>> boot = "c" >>> disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] >>> usb = 1 >>> usbdevice = "tablet" >>> >>> Any idea what can be done to restore boot times? >> >> >> A guess is that it's taking a long time to walk the grants mapcache >> when invalidating (in QEMU). Despite it being unused and empty. We >> could try to find a way to keep track of usage and do nothing when >> invalidating an empty/unused cache. > > If mapcache_grants is unused and empty, the call to > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > I think probably it might be the opposite: mapcache_grants is utilized, > so going through all the mappings in xen_invalidate_map_cache_single > takes time. > > However, I wonder if it is really needed. At least in the PoD case, the > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > memory has changed. But that doesn't affect the grant mappings, because > those are mappings of other domains' memory. > > So I am thinking whether we should remove the call to > xen_invalidate_map_cache_single(mapcache_grants) ? > > Adding x86 maintainers: do we need to flush grant table mappings for the > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > request to QEMU? Judging from two of the three uses of ioreq_request_mapcache_invalidate() in x86'es p2m.c, I'd say no. The 3rd use there is unconditional, but maybe wrongly so. However, the answer also depends on what qemu does when encountering a granted page. Would it enter it into its mapcache? Can it even access it? (If it can't, how does emulated I/O work to such pages? If it can, isn't this a violation of the grant's permissions, as it's targeted at solely the actual HVM domain named in the grant?) Jan
On Tue, Jan 28, 2025 at 03:58:14PM -0800, Stefano Stabellini wrote: > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > > Hello, > > > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > > Depending on the domU kernel, it may trigger a warning, which prompted me > > > to notice this change in behavior: > > > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > > ... > > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > > [ 16.136086] random: crng init done > > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > > [ 31.716029] Showing busy workqueues and worker pools: > > > [ 31.721164] workqueue events: flags=0x0 > > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > > [ 31.728000] in-flight: 17:balloon_process > > > [ 31.728000] pending: hpet_work > > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > > [ 31.736000] pending: vmstat_update > > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > > [ 50.400102] clocksource: Switched to clocksource xen > > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > > ... > > > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > > > xen: mapcache: Add support for grant mappings > > > > > > As you can see, v4 instead of v5 was apparently applied. > > > This was probably unintentional, but would probably not change the result. > > > > Hi Olaf, > > > > It looks like v8 was applied, or am I missing something? > > > > > > > > > > With this change the domU starts fast again: > > > > > > --- a/hw/xen/xen-mapcache.c > > > +++ b/hw/xen/xen-mapcache.c > > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > > ram_addr_t addr; > > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > > + if (1) > > > if (addr == RAM_ADDR_INVALID) { > > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > > } > > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > > { > > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > > + if (1) > > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > > } > > > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > > bdrv_drain_all(); > > > > > > xen_invalidate_map_cache_single(mapcache); > > > + if (0) > > > xen_invalidate_map_cache_single(mapcache_grants); > > > } > > > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > > maxmem = 4096 > > > memory = 2048 > > > maxvcpus = 4 > > > vcpus = 2 > > > pae = 1 > > > acpi = 1 > > > apic = 1 > > > viridian = 0 > > > rtc_timeoffset = 0 > > > localtime = 0 > > > on_poweroff = "destroy" > > > on_reboot = "destroy" > > > on_crash = "destroy" > > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > > sdl = 0 > > > vnc = 1 > > > vncunused = 1 > > > vnclisten = "127.0.0.1" > > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > > parallel = "none" > > > serial = "pty" > > > builder = "hvm" > > > kernel = "/bug1236329/linux" > > > ramdisk = "/bug1236329/initrd" > > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > > boot = "c" > > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > > usb = 1 > > > usbdevice = "tablet" > > > > > > Any idea what can be done to restore boot times? > > > > > > A guess is that it's taking a long time to walk the grants mapcache > > when invalidating (in QEMU). Despite it being unused and empty. We > > could try to find a way to keep track of usage and do nothing when > > invalidating an empty/unused cache. > > If mapcache_grants is unused and empty, the call to > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? Yes, I agree but looking at the invalidation code it looks like if we're unconditionally walking all the buckets in the hash-table... > > I think probably it might be the opposite: mapcache_grants is utilized, > so going through all the mappings in xen_invalidate_map_cache_single > takes time. The reason I don't think it's being used is because we've only enabled grants for PVH machines and Olaf runs HVM machines, so QEMU would never end up mapping grants for DMA. > > However, I wonder if it is really needed. At least in the PoD case, the > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > memory has changed. But that doesn't affect the grant mappings, because > those are mappings of other domains' memory. > > So I am thinking whether we should remove the call to > xen_invalidate_map_cache_single(mapcache_grants) ? Good point! > > Adding x86 maintainers: do we need to flush grant table mappings for the > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > request to QEMU? Cheers, Edgar
On Wed, Jan 29, 2025 at 09:52:19AM +0100, Jan Beulich wrote: > On 29.01.2025 00:58, Stefano Stabellini wrote: > > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > >> On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > >>> With this change the domU starts fast again: > >>> > >>> --- a/hw/xen/xen-mapcache.c > >>> +++ b/hw/xen/xen-mapcache.c > >>> @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > >>> ram_addr_t addr; > >>> > >>> addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > >>> + if (1) > >>> if (addr == RAM_ADDR_INVALID) { > >>> addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > >>> } > >>> @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > >>> static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > >>> { > >>> xen_invalidate_map_cache_entry_single(mapcache, buffer); > >>> + if (1) > >>> xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > >>> } > >>> > >>> @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > >>> bdrv_drain_all(); > >>> > >>> xen_invalidate_map_cache_single(mapcache); > >>> + if (0) > >>> xen_invalidate_map_cache_single(mapcache_grants); > >>> } > >>> > >>> I did the testing with libvirt, the domU.cfg equivalent looks like this: > >>> maxmem = 4096 > >>> memory = 2048 > >>> maxvcpus = 4 > >>> vcpus = 2 > >>> pae = 1 > >>> acpi = 1 > >>> apic = 1 > >>> viridian = 0 > >>> rtc_timeoffset = 0 > >>> localtime = 0 > >>> on_poweroff = "destroy" > >>> on_reboot = "destroy" > >>> on_crash = "destroy" > >>> device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > >>> sdl = 0 > >>> vnc = 1 > >>> vncunused = 1 > >>> vnclisten = "127.0.0.1" > >>> vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > >>> parallel = "none" > >>> serial = "pty" > >>> builder = "hvm" > >>> kernel = "/bug1236329/linux" > >>> ramdisk = "/bug1236329/initrd" > >>> cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > >>> boot = "c" > >>> disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > >>> usb = 1 > >>> usbdevice = "tablet" > >>> > >>> Any idea what can be done to restore boot times? > >> > >> > >> A guess is that it's taking a long time to walk the grants mapcache > >> when invalidating (in QEMU). Despite it being unused and empty. We > >> could try to find a way to keep track of usage and do nothing when > >> invalidating an empty/unused cache. > > > > If mapcache_grants is unused and empty, the call to > > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > > > I think probably it might be the opposite: mapcache_grants is utilized, > > so going through all the mappings in xen_invalidate_map_cache_single > > takes time. > > > > However, I wonder if it is really needed. At least in the PoD case, the > > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > > memory has changed. But that doesn't affect the grant mappings, because > > those are mappings of other domains' memory. > > > > So I am thinking whether we should remove the call to > > xen_invalidate_map_cache_single(mapcache_grants) ? > > > > Adding x86 maintainers: do we need to flush grant table mappings for the > > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > > request to QEMU? > > Judging from two of the three uses of ioreq_request_mapcache_invalidate() > in x86'es p2m.c, I'd say no. The 3rd use there is unconditional, but > maybe wrongly so. > > However, the answer also depends on what qemu does when encountering a > granted page. Would it enter it into its mapcache? Can it even access it? > (If it can't, how does emulated I/O work to such pages? If it can, isn't > this a violation of the grant's permissions, as it's targeted at solely > the actual HVM domain named in the grant?) > QEMU will only map granted pages if the guest explicitly asks QEMU to DMA into granted pages. Guests first need to grant pages to the domain running QEMU, then pass a cookie/address to QEMU with the grant id. QEMU will then map the granted memory, enter it into a dedicated mapcache (mapcache_grants) and then emulate device DMA to/from the grant. So QEMU will only map grants intended for QEMU DMA devices, not any grant to any domain. Details: https://github.com/torvalds/linux/blob/master/drivers/xen/grant-dma-ops.c Cheers, Edgar
On Wed, 29 Jan 2025, Edgar E. Iglesias wrote: > On Tue, Jan 28, 2025 at 03:58:14PM -0800, Stefano Stabellini wrote: > > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > > > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > > > Hello, > > > > > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > > > Depending on the domU kernel, it may trigger a warning, which prompted me > > > > to notice this change in behavior: > > > > > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > > > ... > > > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > > > [ 16.136086] random: crng init done > > > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > > > [ 31.716029] Showing busy workqueues and worker pools: > > > > [ 31.721164] workqueue events: flags=0x0 > > > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > > > [ 31.728000] in-flight: 17:balloon_process > > > > [ 31.728000] pending: hpet_work > > > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > > > [ 31.736000] pending: vmstat_update > > > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > > > [ 50.400102] clocksource: Switched to clocksource xen > > > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > > > ... > > > > > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > > > > > xen: mapcache: Add support for grant mappings > > > > > > > > As you can see, v4 instead of v5 was apparently applied. > > > > This was probably unintentional, but would probably not change the result. > > > > > > Hi Olaf, > > > > > > It looks like v8 was applied, or am I missing something? > > > > > > > > > > > > > > With this change the domU starts fast again: > > > > > > > > --- a/hw/xen/xen-mapcache.c > > > > +++ b/hw/xen/xen-mapcache.c > > > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > > > ram_addr_t addr; > > > > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > > > + if (1) > > > > if (addr == RAM_ADDR_INVALID) { > > > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > > > } > > > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > > > { > > > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > > > + if (1) > > > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > > > } > > > > > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > > > bdrv_drain_all(); > > > > > > > > xen_invalidate_map_cache_single(mapcache); > > > > + if (0) > > > > xen_invalidate_map_cache_single(mapcache_grants); > > > > } > > > > > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > > > maxmem = 4096 > > > > memory = 2048 > > > > maxvcpus = 4 > > > > vcpus = 2 > > > > pae = 1 > > > > acpi = 1 > > > > apic = 1 > > > > viridian = 0 > > > > rtc_timeoffset = 0 > > > > localtime = 0 > > > > on_poweroff = "destroy" > > > > on_reboot = "destroy" > > > > on_crash = "destroy" > > > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > > > sdl = 0 > > > > vnc = 1 > > > > vncunused = 1 > > > > vnclisten = "127.0.0.1" > > > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > > > parallel = "none" > > > > serial = "pty" > > > > builder = "hvm" > > > > kernel = "/bug1236329/linux" > > > > ramdisk = "/bug1236329/initrd" > > > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > > > boot = "c" > > > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > > > usb = 1 > > > > usbdevice = "tablet" > > > > > > > > Any idea what can be done to restore boot times? > > > > > > > > > A guess is that it's taking a long time to walk the grants mapcache > > > when invalidating (in QEMU). Despite it being unused and empty. We > > > could try to find a way to keep track of usage and do nothing when > > > invalidating an empty/unused cache. > > > > If mapcache_grants is unused and empty, the call to > > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > Yes, I agree but looking at the invalidation code it looks like if we're > unconditionally walking all the buckets in the hash-table... > > > > > > I think probably it might be the opposite: mapcache_grants is utilized, > > so going through all the mappings in xen_invalidate_map_cache_single > > takes time. > > The reason I don't think it's being used is because we've only enabled > grants for PVH machines and Olaf runs HVM machines, so QEMU would never > end up mapping grants for DMA. Oh, I see! In that case we could have a trivial check on mc->last_entry == NULL as fast path, something like: if ( mc->last_entry == NULL ) return; at the beginning of xen_invalidate_map_cache_single? > > However, I wonder if it is really needed. At least in the PoD case, the > > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > > memory has changed. But that doesn't affect the grant mappings, because > > those are mappings of other domains' memory. > > > > So I am thinking whether we should remove the call to > > xen_invalidate_map_cache_single(mapcache_grants) ? > > Good point! Let's see how the discussion evolves on that point > > Adding x86 maintainers: do we need to flush grant table mappings for the > > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > > request to QEMU?
--- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) ram_addr_t addr; addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); + if (1) if (addr == RAM_ADDR_INVALID) { addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); } @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) { xen_invalidate_map_cache_entry_single(mapcache, buffer); + if (1) xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); } @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) bdrv_drain_all(); xen_invalidate_map_cache_single(mapcache); + if (0) xen_invalidate_map_cache_single(mapcache_grants); }