From patchwork Wed Dec 9 10:08:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 11961043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11423C433FE for ; Wed, 9 Dec 2020 10:10:42 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 45E3923A22 for ; Wed, 9 Dec 2020 10:10:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45E3923A22 Authentication-Results: mail.kernel.org; dmarc=pass (p=none dis=none) header.from=nongnu.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:40670 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kmwQh-0006Ry-FU for qemu-devel@archiver.kernel.org; Wed, 09 Dec 2020 05:10:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38208) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kmwOQ-00055q-Lg for qemu-devel@nongnu.org; Wed, 09 Dec 2020 05:08:18 -0500 Received: from relay.sw.ru ([185.231.240.75]:33394 helo=relay3.sw.ru) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kmwOO-0005Uc-0O for qemu-devel@nongnu.org; Wed, 09 Dec 2020 05:08:18 -0500 Received: from [192.168.15.226] (helo=andrey-MS-7B54.sw.ru) by relay3.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1kmwO8-00CNgH-RN; Wed, 09 Dec 2020 13:08:00 +0300 To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [PATCH v6 0/4] migration: UFFD write-tracking migration/snapshots Date: Wed, 9 Dec 2020 13:08:07 +0300 Message-Id: <20201209100811.190316-1-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay3.sw.ru X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Reply-to: Andrey Gruzdev X-Patchwork-Original-From: Andrey Gruzdev via From: Andrey Gruzdev This patch series is a kind of 'rethinking' of Denis Plotnikov's ideas he's implemented in his series '[PATCH v0 0/4] migration: add background snapshot'. Currently the only way to make (external) live VM snapshot is using existing dirty page logging migration mechanism. The main problem is that it tends to produce a lot of page duplicates while running VM goes on updating already saved pages. That leads to the fact that vmstate image size is commonly several times bigger then non-zero part of virtual machine's RSS. Time required to converge RAM migration and the size of snapshot image severely depend on the guest memory write rate, sometimes resulting in unacceptably long snapshot creation time and huge image size. This series propose a way to solve the aforementioned problems. This is done by using different RAM migration mechanism based on UFFD write protection management introduced in v5.7 kernel. The migration strategy is to 'freeze' guest RAM content using write-protection and iteratively release protection for memory ranges that have already been saved to the migration stream. At the same time we read in pending UFFD write fault events and save those pages out-of-order with higher priority. How to use: 1. Enable write-tracking migration capability virsh qemu-monitor-command --hmp migrate_set_capability. track-writes-ram on 2. Start the external migration to a file virsh qemu-monitor-command --hmp migrate exec:'cat > ./vm_state' 3. Wait for the migration finish and check that the migration has completed. state. Changes v4->v5: * 1. Refactored util/userfaultfd.c code to support features required by postcopy. * 2. Introduced checks for host kernel and guest memory backend compatibility * to 'background-snapshot' branch in migrate_caps_check(). * 3. Switched to using trace_xxx instead of info_report()/error_report() for * cases when error message must be hidden (probing UFFD-IO) or info may be * really littering output if goes to stderr. * 4 Added RCU_READ_LOCK_GUARDs to the code dealing with RAM block list. * 5. Added memory_region_ref() for each RAM block being wr-protected. * 6. Reused qemu_ram_block_from_host() instead of custom RAM block lookup routine. * 7. Refused from using specific hwaddr/ram_addr_t in favour of void */uint64_t. * 8. Currently dropped 'linear-scan-rate-limiting' patch. The reason is that * that choosen criteria for high-latency fault detection (i.e. timestamp of * UFFD event fetch) is not representative enough for this task. * At the moment it looks somehow like premature optimization effort. * 8. Dropped some unnecessary/unused code. Changes v5->v6: * 1. Consider possible hot pluggin/unpluggin of memory device - don't use static * for write-tracking support level in migrate_query_write_tracking(), check * each time when one tries to enable 'background-snapshot' capability. Andrey Gruzdev (4): migration: introduce 'background-snapshot' migration capability migration: introduce UFFD-WP low-level interface helpers migration: support UFFD write fault processing in ram_save_iterate() migration: implementation of background snapshot thread include/exec/memory.h | 8 + include/qemu/userfaultfd.h | 35 ++++ migration/migration.c | 357 ++++++++++++++++++++++++++++++++++++- migration/migration.h | 4 + migration/ram.c | 270 ++++++++++++++++++++++++++++ migration/ram.h | 6 + migration/savevm.c | 1 - migration/savevm.h | 2 + migration/trace-events | 2 + qapi/migration.json | 7 +- util/meson.build | 1 + util/trace-events | 9 + util/userfaultfd.c | 345 +++++++++++++++++++++++++++++++++++ 13 files changed, 1043 insertions(+), 4 deletions(-) create mode 100644 include/qemu/userfaultfd.h create mode 100644 util/userfaultfd.c