Message ID | 20250128151544.26fc827d.olaf@aepfle.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | slow start of Pod HVM domU with qemu 9.1 | expand |
On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > Hello, > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > Depending on the domU kernel, it may trigger a warning, which prompted me > to notice this change in behavior: > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > ... > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > [ 16.136086] random: crng init done > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > [ 31.716029] Showing busy workqueues and worker pools: > [ 31.721164] workqueue events: flags=0x0 > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > [ 31.728000] in-flight: 17:balloon_process > [ 31.728000] pending: hpet_work > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > [ 31.736000] pending: vmstat_update > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > [ 50.400102] clocksource: Switched to clocksource xen > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > ... > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > xen: mapcache: Add support for grant mappings > > As you can see, v4 instead of v5 was apparently applied. > This was probably unintentional, but would probably not change the result. Hi Olaf, It looks like v8 was applied, or am I missing something? > > With this change the domU starts fast again: > > --- a/hw/xen/xen-mapcache.c > +++ b/hw/xen/xen-mapcache.c > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > ram_addr_t addr; > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > + if (1) > if (addr == RAM_ADDR_INVALID) { > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > } > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > { > xen_invalidate_map_cache_entry_single(mapcache, buffer); > + if (1) > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > } > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > bdrv_drain_all(); > > xen_invalidate_map_cache_single(mapcache); > + if (0) > xen_invalidate_map_cache_single(mapcache_grants); > } > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > maxmem = 4096 > memory = 2048 > maxvcpus = 4 > vcpus = 2 > pae = 1 > acpi = 1 > apic = 1 > viridian = 0 > rtc_timeoffset = 0 > localtime = 0 > on_poweroff = "destroy" > on_reboot = "destroy" > on_crash = "destroy" > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > sdl = 0 > vnc = 1 > vncunused = 1 > vnclisten = "127.0.0.1" > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > parallel = "none" > serial = "pty" > builder = "hvm" > kernel = "/bug1236329/linux" > ramdisk = "/bug1236329/initrd" > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > boot = "c" > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > usb = 1 > usbdevice = "tablet" > > Any idea what can be done to restore boot times? A guess is that it's taking a long time to walk the grants mapcache when invalidating (in QEMU). Despite it being unused and empty. We could try to find a way to keep track of usage and do nothing when invalidating an empty/unused cache. Best regards, Edgar
On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > Hello, > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > Depending on the domU kernel, it may trigger a warning, which prompted me > > to notice this change in behavior: > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > ... > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > [ 16.136086] random: crng init done > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > [ 31.716029] Showing busy workqueues and worker pools: > > [ 31.721164] workqueue events: flags=0x0 > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > [ 31.728000] in-flight: 17:balloon_process > > [ 31.728000] pending: hpet_work > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > [ 31.736000] pending: vmstat_update > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > [ 50.400102] clocksource: Switched to clocksource xen > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > ... > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > xen: mapcache: Add support for grant mappings > > > > As you can see, v4 instead of v5 was apparently applied. > > This was probably unintentional, but would probably not change the result. > > Hi Olaf, > > It looks like v8 was applied, or am I missing something? > > > > > > With this change the domU starts fast again: > > > > --- a/hw/xen/xen-mapcache.c > > +++ b/hw/xen/xen-mapcache.c > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > ram_addr_t addr; > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > + if (1) > > if (addr == RAM_ADDR_INVALID) { > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > } > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > { > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > + if (1) > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > } > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > bdrv_drain_all(); > > > > xen_invalidate_map_cache_single(mapcache); > > + if (0) > > xen_invalidate_map_cache_single(mapcache_grants); > > } > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > maxmem = 4096 > > memory = 2048 > > maxvcpus = 4 > > vcpus = 2 > > pae = 1 > > acpi = 1 > > apic = 1 > > viridian = 0 > > rtc_timeoffset = 0 > > localtime = 0 > > on_poweroff = "destroy" > > on_reboot = "destroy" > > on_crash = "destroy" > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > sdl = 0 > > vnc = 1 > > vncunused = 1 > > vnclisten = "127.0.0.1" > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > parallel = "none" > > serial = "pty" > > builder = "hvm" > > kernel = "/bug1236329/linux" > > ramdisk = "/bug1236329/initrd" > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > boot = "c" > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > usb = 1 > > usbdevice = "tablet" > > > > Any idea what can be done to restore boot times? > > > A guess is that it's taking a long time to walk the grants mapcache > when invalidating (in QEMU). Despite it being unused and empty. We > could try to find a way to keep track of usage and do nothing when > invalidating an empty/unused cache. If mapcache_grants is unused and empty, the call to xen_invalidate_map_cache_single(mapcache_grants) should be really fast? I think probably it might be the opposite: mapcache_grants is utilized, so going through all the mappings in xen_invalidate_map_cache_single takes time. However, I wonder if it is really needed. At least in the PoD case, the reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU memory has changed. But that doesn't affect the grant mappings, because those are mappings of other domains' memory. So I am thinking whether we should remove the call to xen_invalidate_map_cache_single(mapcache_grants) ? Adding x86 maintainers: do we need to flush grant table mappings for the PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE request to QEMU?
On 29.01.2025 00:58, Stefano Stabellini wrote: > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: >> On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: >>> With this change the domU starts fast again: >>> >>> --- a/hw/xen/xen-mapcache.c >>> +++ b/hw/xen/xen-mapcache.c >>> @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) >>> ram_addr_t addr; >>> >>> addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); >>> + if (1) >>> if (addr == RAM_ADDR_INVALID) { >>> addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); >>> } >>> @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) >>> static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) >>> { >>> xen_invalidate_map_cache_entry_single(mapcache, buffer); >>> + if (1) >>> xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); >>> } >>> >>> @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) >>> bdrv_drain_all(); >>> >>> xen_invalidate_map_cache_single(mapcache); >>> + if (0) >>> xen_invalidate_map_cache_single(mapcache_grants); >>> } >>> >>> I did the testing with libvirt, the domU.cfg equivalent looks like this: >>> maxmem = 4096 >>> memory = 2048 >>> maxvcpus = 4 >>> vcpus = 2 >>> pae = 1 >>> acpi = 1 >>> apic = 1 >>> viridian = 0 >>> rtc_timeoffset = 0 >>> localtime = 0 >>> on_poweroff = "destroy" >>> on_reboot = "destroy" >>> on_crash = "destroy" >>> device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" >>> sdl = 0 >>> vnc = 1 >>> vncunused = 1 >>> vnclisten = "127.0.0.1" >>> vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] >>> parallel = "none" >>> serial = "pty" >>> builder = "hvm" >>> kernel = "/bug1236329/linux" >>> ramdisk = "/bug1236329/initrd" >>> cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" >>> boot = "c" >>> disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] >>> usb = 1 >>> usbdevice = "tablet" >>> >>> Any idea what can be done to restore boot times? >> >> >> A guess is that it's taking a long time to walk the grants mapcache >> when invalidating (in QEMU). Despite it being unused and empty. We >> could try to find a way to keep track of usage and do nothing when >> invalidating an empty/unused cache. > > If mapcache_grants is unused and empty, the call to > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > I think probably it might be the opposite: mapcache_grants is utilized, > so going through all the mappings in xen_invalidate_map_cache_single > takes time. > > However, I wonder if it is really needed. At least in the PoD case, the > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > memory has changed. But that doesn't affect the grant mappings, because > those are mappings of other domains' memory. > > So I am thinking whether we should remove the call to > xen_invalidate_map_cache_single(mapcache_grants) ? > > Adding x86 maintainers: do we need to flush grant table mappings for the > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > request to QEMU? Judging from two of the three uses of ioreq_request_mapcache_invalidate() in x86'es p2m.c, I'd say no. The 3rd use there is unconditional, but maybe wrongly so. However, the answer also depends on what qemu does when encountering a granted page. Would it enter it into its mapcache? Can it even access it? (If it can't, how does emulated I/O work to such pages? If it can, isn't this a violation of the grant's permissions, as it's targeted at solely the actual HVM domain named in the grant?) Jan
On Tue, Jan 28, 2025 at 03:58:14PM -0800, Stefano Stabellini wrote: > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > > Hello, > > > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > > Depending on the domU kernel, it may trigger a warning, which prompted me > > > to notice this change in behavior: > > > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > > ... > > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > > [ 16.136086] random: crng init done > > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > > [ 31.716029] Showing busy workqueues and worker pools: > > > [ 31.721164] workqueue events: flags=0x0 > > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > > [ 31.728000] in-flight: 17:balloon_process > > > [ 31.728000] pending: hpet_work > > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > > [ 31.736000] pending: vmstat_update > > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > > [ 50.400102] clocksource: Switched to clocksource xen > > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > > ... > > > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > > > xen: mapcache: Add support for grant mappings > > > > > > As you can see, v4 instead of v5 was apparently applied. > > > This was probably unintentional, but would probably not change the result. > > > > Hi Olaf, > > > > It looks like v8 was applied, or am I missing something? > > > > > > > > > > With this change the domU starts fast again: > > > > > > --- a/hw/xen/xen-mapcache.c > > > +++ b/hw/xen/xen-mapcache.c > > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > > ram_addr_t addr; > > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > > + if (1) > > > if (addr == RAM_ADDR_INVALID) { > > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > > } > > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > > { > > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > > + if (1) > > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > > } > > > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > > bdrv_drain_all(); > > > > > > xen_invalidate_map_cache_single(mapcache); > > > + if (0) > > > xen_invalidate_map_cache_single(mapcache_grants); > > > } > > > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > > maxmem = 4096 > > > memory = 2048 > > > maxvcpus = 4 > > > vcpus = 2 > > > pae = 1 > > > acpi = 1 > > > apic = 1 > > > viridian = 0 > > > rtc_timeoffset = 0 > > > localtime = 0 > > > on_poweroff = "destroy" > > > on_reboot = "destroy" > > > on_crash = "destroy" > > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > > sdl = 0 > > > vnc = 1 > > > vncunused = 1 > > > vnclisten = "127.0.0.1" > > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > > parallel = "none" > > > serial = "pty" > > > builder = "hvm" > > > kernel = "/bug1236329/linux" > > > ramdisk = "/bug1236329/initrd" > > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > > boot = "c" > > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > > usb = 1 > > > usbdevice = "tablet" > > > > > > Any idea what can be done to restore boot times? > > > > > > A guess is that it's taking a long time to walk the grants mapcache > > when invalidating (in QEMU). Despite it being unused and empty. We > > could try to find a way to keep track of usage and do nothing when > > invalidating an empty/unused cache. > > If mapcache_grants is unused and empty, the call to > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? Yes, I agree but looking at the invalidation code it looks like if we're unconditionally walking all the buckets in the hash-table... > > I think probably it might be the opposite: mapcache_grants is utilized, > so going through all the mappings in xen_invalidate_map_cache_single > takes time. The reason I don't think it's being used is because we've only enabled grants for PVH machines and Olaf runs HVM machines, so QEMU would never end up mapping grants for DMA. > > However, I wonder if it is really needed. At least in the PoD case, the > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > memory has changed. But that doesn't affect the grant mappings, because > those are mappings of other domains' memory. > > So I am thinking whether we should remove the call to > xen_invalidate_map_cache_single(mapcache_grants) ? Good point! > > Adding x86 maintainers: do we need to flush grant table mappings for the > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > request to QEMU? Cheers, Edgar
On Wed, Jan 29, 2025 at 09:52:19AM +0100, Jan Beulich wrote: > On 29.01.2025 00:58, Stefano Stabellini wrote: > > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > >> On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > >>> With this change the domU starts fast again: > >>> > >>> --- a/hw/xen/xen-mapcache.c > >>> +++ b/hw/xen/xen-mapcache.c > >>> @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > >>> ram_addr_t addr; > >>> > >>> addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > >>> + if (1) > >>> if (addr == RAM_ADDR_INVALID) { > >>> addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > >>> } > >>> @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > >>> static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > >>> { > >>> xen_invalidate_map_cache_entry_single(mapcache, buffer); > >>> + if (1) > >>> xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > >>> } > >>> > >>> @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > >>> bdrv_drain_all(); > >>> > >>> xen_invalidate_map_cache_single(mapcache); > >>> + if (0) > >>> xen_invalidate_map_cache_single(mapcache_grants); > >>> } > >>> > >>> I did the testing with libvirt, the domU.cfg equivalent looks like this: > >>> maxmem = 4096 > >>> memory = 2048 > >>> maxvcpus = 4 > >>> vcpus = 2 > >>> pae = 1 > >>> acpi = 1 > >>> apic = 1 > >>> viridian = 0 > >>> rtc_timeoffset = 0 > >>> localtime = 0 > >>> on_poweroff = "destroy" > >>> on_reboot = "destroy" > >>> on_crash = "destroy" > >>> device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > >>> sdl = 0 > >>> vnc = 1 > >>> vncunused = 1 > >>> vnclisten = "127.0.0.1" > >>> vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > >>> parallel = "none" > >>> serial = "pty" > >>> builder = "hvm" > >>> kernel = "/bug1236329/linux" > >>> ramdisk = "/bug1236329/initrd" > >>> cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > >>> boot = "c" > >>> disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > >>> usb = 1 > >>> usbdevice = "tablet" > >>> > >>> Any idea what can be done to restore boot times? > >> > >> > >> A guess is that it's taking a long time to walk the grants mapcache > >> when invalidating (in QEMU). Despite it being unused and empty. We > >> could try to find a way to keep track of usage and do nothing when > >> invalidating an empty/unused cache. > > > > If mapcache_grants is unused and empty, the call to > > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > > > I think probably it might be the opposite: mapcache_grants is utilized, > > so going through all the mappings in xen_invalidate_map_cache_single > > takes time. > > > > However, I wonder if it is really needed. At least in the PoD case, the > > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > > memory has changed. But that doesn't affect the grant mappings, because > > those are mappings of other domains' memory. > > > > So I am thinking whether we should remove the call to > > xen_invalidate_map_cache_single(mapcache_grants) ? > > > > Adding x86 maintainers: do we need to flush grant table mappings for the > > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > > request to QEMU? > > Judging from two of the three uses of ioreq_request_mapcache_invalidate() > in x86'es p2m.c, I'd say no. The 3rd use there is unconditional, but > maybe wrongly so. > > However, the answer also depends on what qemu does when encountering a > granted page. Would it enter it into its mapcache? Can it even access it? > (If it can't, how does emulated I/O work to such pages? If it can, isn't > this a violation of the grant's permissions, as it's targeted at solely > the actual HVM domain named in the grant?) > QEMU will only map granted pages if the guest explicitly asks QEMU to DMA into granted pages. Guests first need to grant pages to the domain running QEMU, then pass a cookie/address to QEMU with the grant id. QEMU will then map the granted memory, enter it into a dedicated mapcache (mapcache_grants) and then emulate device DMA to/from the grant. So QEMU will only map grants intended for QEMU DMA devices, not any grant to any domain. Details: https://github.com/torvalds/linux/blob/master/drivers/xen/grant-dma-ops.c Cheers, Edgar
On Wed, 29 Jan 2025, Edgar E. Iglesias wrote: > On Tue, Jan 28, 2025 at 03:58:14PM -0800, Stefano Stabellini wrote: > > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > > > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > > > Hello, > > > > > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > > > Depending on the domU kernel, it may trigger a warning, which prompted me > > > > to notice this change in behavior: > > > > > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > > > ... > > > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > > > [ 16.136086] random: crng init done > > > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > > > [ 31.716029] Showing busy workqueues and worker pools: > > > > [ 31.721164] workqueue events: flags=0x0 > > > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > > > [ 31.728000] in-flight: 17:balloon_process > > > > [ 31.728000] pending: hpet_work > > > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > > > [ 31.736000] pending: vmstat_update > > > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > > > [ 50.400102] clocksource: Switched to clocksource xen > > > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > > > ... > > > > > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > > > > > xen: mapcache: Add support for grant mappings > > > > > > > > As you can see, v4 instead of v5 was apparently applied. > > > > This was probably unintentional, but would probably not change the result. > > > > > > Hi Olaf, > > > > > > It looks like v8 was applied, or am I missing something? > > > > > > > > > > > > > > With this change the domU starts fast again: > > > > > > > > --- a/hw/xen/xen-mapcache.c > > > > +++ b/hw/xen/xen-mapcache.c > > > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > > > ram_addr_t addr; > > > > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > > > + if (1) > > > > if (addr == RAM_ADDR_INVALID) { > > > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > > > } > > > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > > > { > > > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > > > + if (1) > > > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > > > } > > > > > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > > > bdrv_drain_all(); > > > > > > > > xen_invalidate_map_cache_single(mapcache); > > > > + if (0) > > > > xen_invalidate_map_cache_single(mapcache_grants); > > > > } > > > > > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > > > maxmem = 4096 > > > > memory = 2048 > > > > maxvcpus = 4 > > > > vcpus = 2 > > > > pae = 1 > > > > acpi = 1 > > > > apic = 1 > > > > viridian = 0 > > > > rtc_timeoffset = 0 > > > > localtime = 0 > > > > on_poweroff = "destroy" > > > > on_reboot = "destroy" > > > > on_crash = "destroy" > > > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > > > sdl = 0 > > > > vnc = 1 > > > > vncunused = 1 > > > > vnclisten = "127.0.0.1" > > > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > > > parallel = "none" > > > > serial = "pty" > > > > builder = "hvm" > > > > kernel = "/bug1236329/linux" > > > > ramdisk = "/bug1236329/initrd" > > > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > > > boot = "c" > > > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > > > usb = 1 > > > > usbdevice = "tablet" > > > > > > > > Any idea what can be done to restore boot times? > > > > > > > > > A guess is that it's taking a long time to walk the grants mapcache > > > when invalidating (in QEMU). Despite it being unused and empty. We > > > could try to find a way to keep track of usage and do nothing when > > > invalidating an empty/unused cache. > > > > If mapcache_grants is unused and empty, the call to > > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > Yes, I agree but looking at the invalidation code it looks like if we're > unconditionally walking all the buckets in the hash-table... > > > > > > I think probably it might be the opposite: mapcache_grants is utilized, > > so going through all the mappings in xen_invalidate_map_cache_single > > takes time. > > The reason I don't think it's being used is because we've only enabled > grants for PVH machines and Olaf runs HVM machines, so QEMU would never > end up mapping grants for DMA. Oh, I see! In that case we could have a trivial check on mc->last_entry == NULL as fast path, something like: if ( mc->last_entry == NULL ) return; at the beginning of xen_invalidate_map_cache_single? > > However, I wonder if it is really needed. At least in the PoD case, the > > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > > memory has changed. But that doesn't affect the grant mappings, because > > those are mappings of other domains' memory. > > > > So I am thinking whether we should remove the call to > > xen_invalidate_map_cache_single(mapcache_grants) ? > > Good point! Let's see how the discussion evolves on that point > > Adding x86 maintainers: do we need to flush grant table mappings for the > > PV backends running in QEMU when Xen issues a IOREQ_TYPE_INVALIDATE > > request to QEMU?
On Wed, 29 Jan 2025, Stefano Stabellini wrote: > On Wed, 29 Jan 2025, Edgar E. Iglesias wrote: > > On Tue, Jan 28, 2025 at 03:58:14PM -0800, Stefano Stabellini wrote: > > > On Tue, 28 Jan 2025, Edgar E. Iglesias wrote: > > > > On Tue, Jan 28, 2025 at 03:15:44PM +0100, Olaf Hering wrote: > > > > > Hello, > > > > > > > > > > starting with qemu 9.1 a PoD HVM domU takes a long time to start. > > > > > Depending on the domU kernel, it may trigger a warning, which prompted me > > > > > to notice this change in behavior: > > > > > > > > > > [ 0.000000] Linux version 4.12.14-120-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Nov 7 16:39:09 UTC 2019 (fd9dc36) > > > > > ... > > > > > [ 1.096432] HPET: 3 timers in total, 0 timers will be used for per-cpu timer > > > > > [ 1.101636] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 > > > > > [ 1.104051] hpet0: 3 comparators, 64-bit 62.500000 MHz counter > > > > > [ 16.136086] random: crng init done > > > > > [ 31.712052] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 30s! > > > > > [ 31.716029] Showing busy workqueues and worker pools: > > > > > [ 31.721164] workqueue events: flags=0x0 > > > > > [ 31.724054] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 > > > > > [ 31.728000] in-flight: 17:balloon_process > > > > > [ 31.728000] pending: hpet_work > > > > > [ 31.728023] workqueue mm_percpu_wq: flags=0x8 > > > > > [ 31.732987] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 > > > > > [ 31.736000] pending: vmstat_update > > > > > [ 31.736024] pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=30s workers=2 idle: 34 > > > > > [ 50.400102] clocksource: Switched to clocksource xen > > > > > [ 50.441153] VFS: Disk quotas dquot_6.6.0 > > > > > ... > > > > > > > > > > With qemu 9.0 and older, this domU will start the /init task after 8 seconds. > > > > > > > > > > The change which caused this commit is qemu.git commit 9ecdd4bf08dfe4a37e16b8a8b228575aff641468 > > > > > Author: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > > > AuthorDate: Tue Apr 30 10:26:45 2024 +0200 > > > > > Commit: Edgar E. Iglesias <edgar.iglesias@amd.com> > > > > > CommitDate: Sun Jun 9 20:16:14 2024 +0200 > > > > > > > > > > xen: mapcache: Add support for grant mappings > > > > > > > > > > As you can see, v4 instead of v5 was apparently applied. > > > > > This was probably unintentional, but would probably not change the result. > > > > > > > > Hi Olaf, > > > > > > > > It looks like v8 was applied, or am I missing something? > > > > > > > > > > > > > > > > > > With this change the domU starts fast again: > > > > > > > > > > --- a/hw/xen/xen-mapcache.c > > > > > +++ b/hw/xen/xen-mapcache.c > > > > > @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) > > > > > ram_addr_t addr; > > > > > > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); > > > > > + if (1) > > > > > if (addr == RAM_ADDR_INVALID) { > > > > > addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); > > > > > } > > > > > @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) > > > > > static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) > > > > > { > > > > > xen_invalidate_map_cache_entry_single(mapcache, buffer); > > > > > + if (1) > > > > > xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); > > > > > } > > > > > > > > > > @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) > > > > > bdrv_drain_all(); > > > > > > > > > > xen_invalidate_map_cache_single(mapcache); > > > > > + if (0) > > > > > xen_invalidate_map_cache_single(mapcache_grants); > > > > > } > > > > > > > > > > I did the testing with libvirt, the domU.cfg equivalent looks like this: > > > > > maxmem = 4096 > > > > > memory = 2048 > > > > > maxvcpus = 4 > > > > > vcpus = 2 > > > > > pae = 1 > > > > > acpi = 1 > > > > > apic = 1 > > > > > viridian = 0 > > > > > rtc_timeoffset = 0 > > > > > localtime = 0 > > > > > on_poweroff = "destroy" > > > > > on_reboot = "destroy" > > > > > on_crash = "destroy" > > > > > device_model_override = "/usr/lib64/qemu-9.1/bin/qemu-system-i386" > > > > > sdl = 0 > > > > > vnc = 1 > > > > > vncunused = 1 > > > > > vnclisten = "127.0.0.1" > > > > > vif = [ "mac=52:54:01:23:63:29,bridge=br0,script=vif-bridge" ] > > > > > parallel = "none" > > > > > serial = "pty" > > > > > builder = "hvm" > > > > > kernel = "/bug1236329/linux" > > > > > ramdisk = "/bug1236329/initrd" > > > > > cmdline = "console=ttyS0,115200n8 quiet ignore_loglevel"" > > > > > boot = "c" > > > > > disk = [ "format=qcow2,vdev=hda,access=rw,backendtype=qdisk,target=/bug1236329/sles12sp5.qcow2" ] > > > > > usb = 1 > > > > > usbdevice = "tablet" > > > > > > > > > > Any idea what can be done to restore boot times? > > > > > > > > > > > > A guess is that it's taking a long time to walk the grants mapcache > > > > when invalidating (in QEMU). Despite it being unused and empty. We > > > > could try to find a way to keep track of usage and do nothing when > > > > invalidating an empty/unused cache. > > > > > > If mapcache_grants is unused and empty, the call to > > > xen_invalidate_map_cache_single(mapcache_grants) should be really fast? > > > > Yes, I agree but looking at the invalidation code it looks like if we're > > unconditionally walking all the buckets in the hash-table... > > > > > > > > > > I think probably it might be the opposite: mapcache_grants is utilized, > > > so going through all the mappings in xen_invalidate_map_cache_single > > > takes time. > > > > The reason I don't think it's being used is because we've only enabled > > grants for PVH machines and Olaf runs HVM machines, so QEMU would never > > end up mapping grants for DMA. > > Oh, I see! In that case we could have a trivial check on mc->last_entry > == NULL as fast path, something like: > > if ( mc->last_entry == NULL ) > return; > > at the beginning of xen_invalidate_map_cache_single? > > > > > However, I wonder if it is really needed. At least in the PoD case, the > > > reason for the IOREQ_TYPE_INVALIDATE request is that the underlying DomU > > > memory has changed. But that doesn't affect the grant mappings, because > > > those are mappings of other domains' memory. > > > > > > So I am thinking whether we should remove the call to > > > xen_invalidate_map_cache_single(mapcache_grants) ? > > > > Good point! > > Let's see how the discussion evolves on that point Jan and Juergen clarified that there is no need to call xen_invalidate_map_cache_single for grants on IOREQ_TYPE_INVALIDATE requests. --- xen: no need to flush the mapcache for grants On IOREQ_TYPE_INVALIDATE we need to invalidate the mapcache for regular mappings. Since recently we started reusing the mapcache also to keep track of grants mappings. However, there is no need to remove grant mappings on IOREQ_TYPE_INVALIDATE requests, we shouldn't do that. So remove the function call. Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c index 00bfbcc6fb..698b5c53ed 100644 --- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -700,7 +700,6 @@ void xen_invalidate_map_cache(void) bdrv_drain_all(); xen_invalidate_map_cache_single(mapcache); - xen_invalidate_map_cache_single(mapcache_grants); } static uint8_t *xen_replace_cache_entry_unlocked(MapCache *mc,
--- a/hw/xen/xen-mapcache.c +++ b/hw/xen/xen-mapcache.c @@ -522,6 +522,7 @@ ram_addr_t xen_ram_addr_from_mapcache(void *ptr) ram_addr_t addr; addr = xen_ram_addr_from_mapcache_single(mapcache, ptr); + if (1) if (addr == RAM_ADDR_INVALID) { addr = xen_ram_addr_from_mapcache_single(mapcache_grants, ptr); } @@ -626,6 +627,7 @@ static void xen_invalidate_map_cache_entry_single(MapCache *mc, uint8_t *buffer) static void xen_invalidate_map_cache_entry_all(uint8_t *buffer) { xen_invalidate_map_cache_entry_single(mapcache, buffer); + if (1) xen_invalidate_map_cache_entry_single(mapcache_grants, buffer); } @@ -700,6 +702,7 @@ void xen_invalidate_map_cache(void) bdrv_drain_all(); xen_invalidate_map_cache_single(mapcache); + if (0) xen_invalidate_map_cache_single(mapcache_grants); }