mbox series

[v3,00/18] VM forking

Message ID cover.1577721845.git.tamas.lengyel@intel.com (mailing list archive)
Headers show
Series VM forking | expand

Message

Tamas K Lengyel Dec. 30, 2019, 4:11 p.m. UTC
The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The main design goal with this series has been to reduce the time of creating
the VM fork as much as possible. To achieve this the VM forking process is
split into two steps:
    1) forking the VM on the hypervisor side;
    2) starting QEMU to handle the backed for emulated devices.

Step 1) involves creating a VM using the new "xl fork-vm" command. The
parent VM is expected to remain paused after forks are created from it (which
is different then what process forking normally entails). During this forking
operation the HVM context and VM settings are copied over to the new forked VM.
This operation is fast and it allows the forked VM to be unpaused and to be
monitored and accessed via VMI. Note however that without its device model
running (depending on what is executing in the VM) it is bound to
misbehave/crash when its trying to access devices that would be emulated by
QEMU. We anticipate that for certain use-cases this would be an acceptable
situation, in case for example when fuzzing is performed of code segments that
don't access such devices.

Step 2) involves launching QEMU to support the forked VM, which requires the
QEMU Xen savefile to be generated manually from the parent VM. This can be
accomplished simply by connecting to its QMP socket and issuing the
"xen-save-devices-state" command as documented by QEMU:
https://github.com/qemu/qemu/blob/master/docs/xen-save-devices-state.txt
Once the QEMU Xen savefile is generated the new "xl fork-launch-dm" command is
used to launch QEMU and load the specified savefile for it.

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork. This is an optimization for cases where the
forks are very short-lived and run without a device model, so resetting saves
some time compared to creating a brand new fork provided the fork has not
aquired a lot of memory. If the fork has a lot of memory deduplicated it is
likely going to be faster to create a new fork from scratch and destroying the
old one.

The series has been tested with both Linux and Windows VMs and functions as
expected. VM forking time has been measured to be 0.018s, device model launch
to be around 1s depending largely on the number of devices being emulated. Fork
resets have been measured to be 0.011s under the optimal circumstances.

Patches 1-2 implement changes to existing internal Xen APIs to make VM forking
possible.

Patches 3-14 are code-cleanups and adjustments of to Xen memory sharing
subsystem with no functional changes.

Patch 15 adds the hypervisor-side code implementing VM forking.

Patch 16 is integration of mem_access with forked VMs.

Patch 17 implements the VM fork reset operation hypervisor side bits.

Patch 18 adds the toolstack-side code implementing VM forking and reset.

Tamas K Lengyel (18):
  x86/hvm: introduce hvm_copy_context_and_params
  xen/x86: Make hap_get_allocation accessible
  x86/mem_sharing: make get_two_gfns take locks conditionally
  x86/mem_sharing: drop flags from mem_sharing_unshare_page
  x86/mem_sharing: don't try to unshare twice during page fault
  x86/mem_sharing: define mem_sharing_domain to hold some scattered
    variables
  x86/mem_sharing: Use INVALID_MFN and p2m_is_shared in
    relinquish_shared_pages
  x86/mem_sharing: Make add_to_physmap static and shorten name
  x86/mem_sharing: Convert MEM_SHARING_DESTROY_GFN to a bool
  x86/mem_sharing: Replace MEM_SHARING_DEBUG with gdprintk
  x86/mem_sharing: ASSERT that p2m_set_entry succeeds
  x86/mem_sharing: Enable mem_sharing on first memop
  x86/mem_sharing: Skip xen heap pages in memshr nominate
  x86/mem_sharing: check page type count earlier
  xen/mem_sharing: VM forking
  xen/mem_access: Use __get_gfn_type_access in set_mem_access
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 tools/libxc/include/xenctrl.h     |  13 +
 tools/libxc/xc_memshr.c           |  22 ++
 tools/libxl/libxl.h               |   7 +
 tools/libxl/libxl_create.c        | 237 +++++++++-----
 tools/libxl/libxl_dm.c            |   2 +-
 tools/libxl/libxl_dom.c           |  83 +++--
 tools/libxl/libxl_internal.h      |   1 +
 tools/libxl/libxl_types.idl       |   1 +
 tools/xl/xl.h                     |   5 +
 tools/xl/xl_cmdtable.c            |  22 ++
 tools/xl/xl_saverestore.c         |  96 ++++++
 tools/xl/xl_vmcontrol.c           |   8 +
 xen/arch/x86/hvm/hvm.c            | 271 ++++++++++------
 xen/arch/x86/mm/hap/hap.c         |   3 +-
 xen/arch/x86/mm/mem_access.c      |   5 +-
 xen/arch/x86/mm/mem_sharing.c     | 502 +++++++++++++++++++++++-------
 xen/arch/x86/mm/p2m.c             |  16 +-
 xen/common/memory.c               |   2 +-
 xen/drivers/passthrough/pci.c     |   3 +-
 xen/include/asm-x86/hap.h         |   1 +
 xen/include/asm-x86/hvm/domain.h  |   6 +-
 xen/include/asm-x86/hvm/hvm.h     |   2 +
 xen/include/asm-x86/mem_sharing.h |  43 ++-
 xen/include/asm-x86/p2m.h         |  14 +-
 xen/include/public/memory.h       |   6 +
 xen/include/xen/sched.h           |   1 +
 26 files changed, 1033 insertions(+), 339 deletions(-)

Comments

George Dunlap Jan. 8, 2020, 2:58 p.m. UTC | #1
On 12/30/19 4:11 PM, Tamas K Lengyel wrote:
> The following series implements VM forking for Intel HVM guests to allow for
> the fast creation of identical VMs without the assosciated high startup costs
> of booting or restoring the VM from a savefile.

Tamas,

This doesn't seem to apply to staging.  Could you give me a commit hash
to which it should apply?

Thanks,
 -George
Tamas K Lengyel Jan. 8, 2020, 3:19 p.m. UTC | #2
On Wed, Jan 8, 2020 at 7:58 AM George Dunlap <george.dunlap@citrix.com> wrote:
>
> On 12/30/19 4:11 PM, Tamas K Lengyel wrote:
> > The following series implements VM forking for Intel HVM guests to allow for
> > the fast creation of identical VMs without the assosciated high startup costs
> > of booting or restoring the VM from a savefile.
>
> Tamas,
>
> This doesn't seem to apply to staging.  Could you give me a commit hash
> to which it should apply?

Hi George,
huh it's weird. I rebased it before I sent on staging but somehow the
latest commit in staging it seems was a commit from April 2019, which
doesn't seem right. If this was just on one system I would say I just
had a stale branch on that system and forgot to pull it before the
rebase but it's actually happened on both my test systems
independently. Also, with my v2 branch of this series was branched
from a more recent version of staging. I really don't know what
happened... I'll be sending v4 shortly.

Tamas