From patchwork Wed May 20 20:42:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Straub X-Patchwork-Id: 11561559 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AAFA912 for ; Wed, 20 May 2020 20:45:16 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 42B6E20823 for ; Wed, 20 May 2020 20:45:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=web.de header.i=@web.de header.b="hY5T4AC0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 42B6E20823 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=web.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:49436 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbVaV-000870-9v for patchwork-qemu-devel@patchwork.kernel.org; Wed, 20 May 2020 16:45:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44398) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXn-0005kD-LM for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:27 -0400 Received: from mout.web.de ([212.227.17.11]:54509) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXm-00074z-Ik for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1590007334; bh=AjMQkHOf6lMrzLOXJDOiR2XIYx1RZkeSv1cRQw5pyuA=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:In-Reply-To:References; b=hY5T4AC0KLL2wO40rGwXvNj0+egoPPHBKR0CumQkCsSQGduih+7bZFmMEA7H/5nr1 3bspMl4bVCrBpPggEQopdz6fVLnSPqBSCMtCdgaGCpGRZ/lltTbFN7sbPNC3I1hJjK pR9v/LC0dOm0xSKF5wLq+FE6igEHPLxv/Towbeoo= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from luklap ([88.130.61.105]) by smtp.web.de (mrweb106 [213.165.67.124]) with ESMTPSA (Nemesis) id 1MRF3W-1jNFwo139Y-00NMzx; Wed, 20 May 2020 22:42:14 +0200 Date: Wed, 20 May 2020 22:42:13 +0200 From: Lukas Straub To: qemu-devel Subject: [PATCH v2 1/6] migration/colo.c: Use event instead of semaphore Message-ID: <1a13487e3f1ea3e378497cdefb5c9bb6a29f3bfc.1590007004.git.lukasstraub2@web.de> In-Reply-To: References: MIME-Version: 1.0 X-Provags-ID: V03:K1:OhSQ4c25roRjnkblBc3E55EWt8WdTAbhzil+XSV9VcNgKwW6JOX Te198ldL+Xncxr+TUoVOUzUBXVMzd1yUC7wyJ4M/u8E9rLKHi4b83vFp+bzB+GrdDxWvQnQ xMPYDJuLW53vl//jrABpD5j0k+ZFgk9rAffbknbznczKasEfXesGpNNMYnACW7EMNT++YL6 K0wT/n1kIBn0rBaPw/hsw== X-UI-Out-Filterresults: notjunk:1;V03:K0:FBI/0iZnJ7I=:j47IYbGm8PrUKbtHmLS/r6 VDjvIHCK+v3bbolt4VTOIxVNBndA3l0Qy1HC/E0wGcfo0qo9/RN5tijq+aBWlUwIyQ0Zq+mn4 V9GWpJKLS1l3jW02CzTKVxXMRVTs34AVX9oIOeyrtLMssVOid34yHay+S9XtuyuQaqU1yS6GC QxOv2cedRBc6mCitNGySuzniG0yxHxr/PRbsxzLtBLsD1Q0NTD5gkh0XGN77k6tjWH6+hT2Ov Yyr6ctdtUZnXo90aJTVhiDxWZoW+k5KHByhbXYadSPQKXcEimoRnnWHGN7fhZ7c3X5R2o/l1p sXUdMTzbnxkOa5Uq3QUinH4Rl216jqsNDCmmooK9aq7fgfsasYO0181xYHM3oKBQzucFaZ+At qzOQNtWK/jqzdHwJ7lgncQqlTgef5o6VfwdgLktAMP9Mo55GDCyQJGdQ7A/vsoNT8UAVYQdzn 5WyRFlub3CIxv01H3ZVK40y2XvvubR3bdJ29x80UW3SAx+OJwrAXICt/vPvgQi/Riuwb8sF4f h1Fk9BY64dgoqBBuMJST03SgdhMhCUTrKi3IXEiIqvaJykycW/xs2YXrPpSUBhEuGipOBoGEk 8Ujhqu5aCletXLtu+BCCQA6s6U5mPg22yqDCdzERP0V7KVIgnnrNp5fWOTa3x8u4aXLNpLds5 nRzZI5tpMPIXGlEgH7n/zrVNUDySzE7IbETfWiCmlHS/i5RdnKzsFlaCXoaJjoum4Y5MLTuBy pvLFVDZtUBCasm+ubl38Wy2WASzwrWJiTQDqaPnJqZDDGMXWTjj6UJy01JdeYb0mH84qIJ6eJ ttHJ5cB4vM4tv7Hoci0qJk2HBfVCuGM7d3lUae5KkyfDmEFkQd9Szlkbj4+IWvz6vLYdnip/+ 7ioW93mqJf/W1ipmbFqoSJH/IPX6NS18SOvuYdZvPY54+kzQkYWX2lhFou/y2OR7e9+f786Ca q9HQQFnSiQ7qyr+jiQv75K/QmWaCS4m16N6HI2vRy2pQXDeaF+BOSb+CDKqRurqkpp+2WnZTz 7eg+WZVNRnzCuDnxIiy6AvUpEEr3UWSzrM8ZgNeegP5/7xe99t5srwiKhTQxA+HZqHNeLYbKi jJE2WGnglh9ceG4wwqin9ntmTb8Ya+h/AM+FpMlIc4ctExXsUTZ591fiY7QdsO4BAtPLijyyR Rsezzoe29tKBajqGeZ4QMij6oj1MJ5Pg9uVN2oCGhfN0NECGKbUZzwSWVlXMpj+tdoyKd6uIN 8FYf7uI21juzFwh+I Received-SPF: pass client-ip=212.227.17.11; envelope-from=lukasstraub2@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/20 16:42:25 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hailiang Zhang , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" If multiple packets miscompare in a short timeframe, the semaphore value will be increased multiple times. This causes multiple checkpoints even if one would be sufficient. Fix this by using a event instead of a semaphore for triggering checkpoints. Now, checkpoint requests will be ignored until the checkpoint event is sent to colo-compare (which releases the miscompared packets). Benchmark results (iperf3): Client-to-server tcp: without patch: ~66 Mbit/s with patch: ~61 Mbit/s Server-to-client tcp: without patch: ~702 Kbit/s with patch: ~16 Mbit/s Signed-off-by: Lukas Straub Reviewed-by: zhanghailiang --- migration/colo.c | 9 +++++---- migration/migration.h | 4 ++-- 2 files changed, 7 insertions(+), 6 deletions(-) -- 2.20.1 diff --git a/migration/colo.c b/migration/colo.c index a54ac84f41..09168627bc 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -430,6 +430,7 @@ static int colo_do_checkpoint_transaction(MigrationState *s, goto out; } + qemu_event_reset(&s->colo_checkpoint_event); colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err); if (local_err) { goto out; @@ -580,7 +581,7 @@ static void colo_process_checkpoint(MigrationState *s) goto out; } - qemu_sem_wait(&s->colo_checkpoint_sem); + qemu_event_wait(&s->colo_checkpoint_event); if (s->state != MIGRATION_STATUS_COLO) { goto out; @@ -628,7 +629,7 @@ out: colo_compare_unregister_notifier(&packets_compare_notifier); timer_del(s->colo_delay_timer); timer_free(s->colo_delay_timer); - qemu_sem_destroy(&s->colo_checkpoint_sem); + qemu_event_destroy(&s->colo_checkpoint_event); /* * Must be called after failover BH is completed, @@ -645,7 +646,7 @@ void colo_checkpoint_notify(void *opaque) MigrationState *s = opaque; int64_t next_notify_time; - qemu_sem_post(&s->colo_checkpoint_sem); + qemu_event_set(&s->colo_checkpoint_event); s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); next_notify_time = s->colo_checkpoint_time + s->parameters.x_checkpoint_delay; @@ -655,7 +656,7 @@ void colo_checkpoint_notify(void *opaque) void migrate_start_colo_process(MigrationState *s) { qemu_mutex_unlock_iothread(); - qemu_sem_init(&s->colo_checkpoint_sem, 0); + qemu_event_init(&s->colo_checkpoint_event, false); s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, colo_checkpoint_notify, s); diff --git a/migration/migration.h b/migration/migration.h index 507284e563..f617960522 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -215,8 +215,8 @@ struct MigrationState /* The semaphore is used to notify COLO thread that failover is finished */ QemuSemaphore colo_exit_sem; - /* The semaphore is used to notify COLO thread to do checkpoint */ - QemuSemaphore colo_checkpoint_sem; + /* The event is used to notify COLO thread to do checkpoint */ + QemuEvent colo_checkpoint_event; int64_t colo_checkpoint_time; QEMUTimer *colo_delay_timer; From patchwork Wed May 20 20:42:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Straub X-Patchwork-Id: 11561557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 87AE5912 for ; Wed, 20 May 2020 20:45:13 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5E7C92084C for ; Wed, 20 May 2020 20:45:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=web.de header.i=@web.de header.b="APDrhYgA" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E7C92084C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=web.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:49456 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbVaS-00087S-Cg for patchwork-qemu-devel@patchwork.kernel.org; Wed, 20 May 2020 16:45:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44396) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXn-0005jr-GC for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:27 -0400 Received: from mout.web.de ([212.227.15.14]:37307) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXm-00074N-Ls for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1590007337; bh=LivmC0T4MlEYh1b0AKhwKgJifO6Iy7vR+p/8U/uGwlM=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:In-Reply-To:References; b=APDrhYgAcyOvuq8G6Q8EMPVtD5yl5wa6fe4wogYdScJ3h2ZKrnoE6Ptv8gjbNfaA+ 2pYCyCpSCKG7KbDXeksgM3eSm9JkiJBYo+P1ChJVXxDf99afeOr/zNzjmaQ0suMPEA SpPHqBUt/choLSMGz8MzfbvRnQtlf74hEsrpwhjA= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from luklap ([88.130.61.105]) by smtp.web.de (mrweb003 [213.165.67.108]) with ESMTPSA (Nemesis) id 0MSJH1-1jVIHa371f-00TTDl; Wed, 20 May 2020 22:42:17 +0200 Date: Wed, 20 May 2020 22:42:16 +0200 From: Lukas Straub To: qemu-devel Subject: [PATCH v2 2/6] migration/colo.c: Use cpu_synchronize_all_states() Message-ID: <9cacbf2006c8687a983d67ed1565317b58dc55c9.1590007004.git.lukasstraub2@web.de> In-Reply-To: References: MIME-Version: 1.0 X-Provags-ID: V03:K1:VspTdd77bpHWFOBfwaUo4LojGXS5krH0gtRO61e3ItG/YdpHPDg AHOesQ8Makmbiy3y6EHPFEbmjJYwvNEkzQBppc92BTDDA9PQM5IZfH80Y/ehwvLxja8AANR A1j3yeBvNe/T5nYmev6gUfcjsleERRKR+EiHStHRcIwUEu8nMNMUr4pLcZ5j4y6YqXdBdQD Z04y4yzGO+UEFTiNvz1Fw== X-UI-Out-Filterresults: notjunk:1;V03:K0:xTmUBPZDSc4=:cSqxp6veOHyfQ7jEWmy+qs FHkotW2YBylVKXXM868kzIouA5FjZIxbzdC3g3gg1hV24T0HY1ORysARLS71DepdXyduDI6xE plevC+UFwP7gkLypjqreQD1HmIYagL8xHCLxOuL6eIizjW0+ea0RynnJHWy9D4Mi6jEp3sVGc EsCmaD70k3aeEdZ8CHqKiRBvRWMavrvt9T7/51+PFWh+wrwDLqGvSHfcJjRqJfrOgr3B83qYq sxSL/cyAI3sNj9YPeeNSJxsGS1Zvskje6XcppgUTuqdlqOvlbogHRDNdr9b0Ve1mNvN0DkJG7 2Z0Ir0pkO7lz2oAn7MxhfgULr34xfBfNHbObghLiEtYwtHKRKYSFchi+D0WOIK18eBRtZ0e3y 6RwUwVDg1Hs3sZrI4lyTYFXaxDdGyAMfvjwpz26ef4GBFgGu+fmCW10Xo5nc09irPCWFiuvR9 esE7ybH6GlJ8a4KJUN3KzyYK/BC+o29bUoyf+CcbHuHGUaw+Ba7JXr7OaX9WTWRJG8OlZjUYV 2qd/bPndrOqEWpORWIp5jdy2MmaKjnJ7omeneepbwdMvE/N702SVQKrLDwZQRm2hxRztE6hGg LjceDfRa7UMZDp9nfAS/0mOGuqOWv4NnSYy1b+BDXRXKlEF+QM0JiCXtTxFPfBY64Gj0ivc9v Vrwq20lEYGk751Ftrf/kmZP6wLE2hTm+y3JDeJrO6+tiYv5ghpfWERiMjCcrfxIKiKIowo9Dd 4OveEj01zmaM3jVE4k2qDd6HR1foXsXS5qXfiop2rPRCmhnwKntm5NcZpcXBB9OrcG58k54H3 xhlC34kyXFFP5FQI5oAqZNMUAgP8O4/tfVn3NqjoyGHk4QvL1in5d3YZ7AqIllsqFSQR5Y9dD pOK84OhFe10tR5X9u03T11mJ8QhQFiEsdibzdNZIC9F+jEOjQRtEqb2Zt+VsIt+YYG1Uq3NFt 6j5QnwmCxaQ+vIZrQLrSp96aTKaH14EOiy74eNYTjFc73xsGETR1yngNnJA/y3g4XwG2Atkfk AnIo4idhHeOjhtQ8Chg6kdvTxm5LJZ5QhHx80GBJmRx/WkuvKoCCvUenr3SJZNmTLNJcOjWNW FAaSQup0FxZpqVS15jFz67ZO+7EBFohAvC4wAPXVGWbftuIXS55kuEseqyS0Y5DYUGGwP/rOd ATmfrIgjWtzkulq344r1b4ZBbHnzZUHkX7qyXv+xbAB4IWWz5E0Y6SHdrrfktk8rDuiUJ8rPY O4k7cg4W6L08QPElQ Received-SPF: pass client-ip=212.227.15.14; envelope-from=lukasstraub2@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/20 16:42:25 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hailiang Zhang , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" cpu_synchronize_all_pre_loadvm() marks all vcpus as dirty, so the registers are loaded from CPUState before we continue running the vm. However if we failover during checkpoint, CPUState is not initialized and the registers are loaded with garbage. This causes guest hangs and crashes. Fix this by using cpu_synchronize_all_states(), which initializes CPUState from the current cpu registers additionally to marking the vcpus as dirty. Signed-off-by: Lukas Straub Reviewed-by: Dr. David Alan Gilbert --- migration/colo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.20.1 diff --git a/migration/colo.c b/migration/colo.c index 09168627bc..6b2ad35aa4 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -696,7 +696,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, } qemu_mutex_lock_iothread(); - cpu_synchronize_all_pre_loadvm(); + cpu_synchronize_all_states(); ret = qemu_loadvm_state_main(mis->from_src_file, mis); qemu_mutex_unlock_iothread(); From patchwork Wed May 20 20:42:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Straub X-Patchwork-Id: 11561563 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D260B1391 for ; Wed, 20 May 2020 20:51:10 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A8F9320758 for ; Wed, 20 May 2020 20:51:10 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=web.de header.i=@web.de header.b="Icnfj42E" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A8F9320758 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=web.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:56794 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbVgD-0003Tp-Gj for patchwork-qemu-devel@patchwork.kernel.org; Wed, 20 May 2020 16:51:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44410) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXp-0005mO-QN for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:29 -0400 Received: from mout.web.de ([212.227.17.11]:46605) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXo-00076d-SV for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1590007341; bh=nBz36uKLAVET+svpvOg5XAEq13IOV9YVmFR9/mOLz4w=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:In-Reply-To:References; b=Icnfj42EK4XLxK3Hmpl0O/cZPwrSq8FNGXWw3P38gY8rC28tKiAtAbGvAR+BkTQ1o GOAqQtsPM/EgZY+m5iyOn7BOz3P9nnj0myxEZTWQ7+qtIRvHCUFZAplFGSW6yBzaCk iL8Pzzjz3HWDvJlzh+Zeqg+oaxeHiQgbwq4zhaqo= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from luklap ([88.130.61.105]) by smtp.web.de (mrweb106 [213.165.67.124]) with ESMTPSA (Nemesis) id 1N7gbY-1iyPJX19Fq-014duS; Wed, 20 May 2020 22:42:21 +0200 Date: Wed, 20 May 2020 22:42:20 +0200 From: Lukas Straub To: qemu-devel Subject: [PATCH v2 3/6] migration/colo.c: Flush ram cache only after receiving device state Message-ID: <38382499f40a1288d7e7f0fd59173b23660b69d9.1590007004.git.lukasstraub2@web.de> In-Reply-To: References: MIME-Version: 1.0 X-Provags-ID: V03:K1:70VK6yf9zPvIdnGLOJ8M2cKJqRj4iOoKXJcVIcnOFJZzPSoVYNj /W4B8o6JwdveFklqYkTqR5b8DX2Jmd/q6TZ6zpnb49g/KyBLNyZ/RdZvES+XubtBOqcOBYA xU+U/iYWVUM+vGBZyVvH0/63RdogoaPExXThRppJodpg46vA8LJPv7WvLMNHB1RB/GAdNcp JCPWaHeq/us8AMnwM9X6g== X-UI-Out-Filterresults: notjunk:1;V03:K0:hJ/Iah1VeIk=:If04k/IPK1QczH/dN7wyaY 4/g133qSADLa6JuXzXuVIYO9syoAQ3Hp67u8JwDZ6BPbGi+nqICGebFUZorBlVOCpIhxVwoRV tVysWrKXE9zpNgPqFLPFvCkbLMdSB7i9e/JYXyd8gy87OO8Qkc/KLK0KegU9NMYFHEVk7uIOC IsjBRk+qVTZlxScoQvKhbEjT2ggCambHqtAn30yK5ARH+KSIkCZsT3fPHyFOyw/8lC9z+j3jw 7X2UBEZJ0L0w9BUCSlkBwoD4BC/Et32lYfcHwf+tYH0ablUAAC9Z0bk91Ust7N1dLf/fgLRBe opOhXUsGlb7bRBlzzV6y6tz1n9j4UugIvWxtr0f3ksdXjcjA0xHDulM9335394ji+DRsSYDBu hTL+P2ONiTWENM6hCRgg5zaWC1ita8nG2YzOFLHvhLMqEWEeOl2XsCiA9AbnPzFHerub43btI EwV7m6H4ir5c17bW7DwQouT+AMobuIu+1zuiVXTdUBb9F4MLkdsGnxCEJvtp3QfX34w6BczDE 9vxeKDLUIwr2EGkKg03G2iPHZCixo6GNEPSH05m3EoPqFPDVzmRN1gyyZvIFDV57yw8Wn2wvq HrbmNw2rUhuemR5l+PK1t7IbNLvI228IaighnD6Anwil+wpWUey0eWLumP7RepNxw72YauecW aJJGwwQwSMoCEi9dhqcIh578o8wK9qN8e+Kif6N4aHUKRtXX/f7hoVwvY93b/OZIJdue0HrqP LKyHVljomOZ9Ltcnuy0153aLgksKX5jdTSO+iiiSUxUoDPlc5D5Gcuhq3SUssoevvEfTHeACM JvQr6N5ZGlNS+NPzY5St+0pq6gUOYRgfcQJDwfiEZf7LndD8K41KSOBYE28stwhB1geHgvpJq gqmA+knl+SkhBGxXGTq3rBrc3U6+9yiRRZQ3zW+ba/sK1s8PO/7vh/Wumd3xKQVIfLyGKO3P4 R1AyMuX7prKHP6VvNv12/2kFGOIl30HUj2TsSSVMhUl2Q/Hcqm1K0CgLArkTC1vc0LGRqpzdH HeUo2yfZKixNMmpQ1iHJIJNaVco7nmK35nm74AVhAiao/PARmvdF49U/YtF/dnK//srgMlbi0 X3eKdXmEIYNQnpi1XiDyR6cYCSn+fApSpvF1UpFxxzknk9O8C+6w2HzIIeArf8Ikg0+NRhPbI BHKnpDn7JaIwqkwW2ku9A0L1RaJUT2JcZYveXGADiizGHVyGEoBs67+9vv/c6GAowp+FcVnox NZQtkTyB1TbiJWEaU Received-SPF: pass client-ip=212.227.17.11; envelope-from=lukasstraub2@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/20 16:42:25 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hailiang Zhang , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" If we suceed in receiving ram state, but fail receiving the device state, there will be a mismatch between the two. Fix this by flushing the ram cache only after the vmstate has been received. Signed-off-by: Lukas Straub Reviewed-by: zhanghailiang --- migration/colo.c | 1 + migration/ram.c | 5 +---- migration/ram.h | 1 + 3 files changed, 3 insertions(+), 4 deletions(-) -- 2.20.1 diff --git a/migration/colo.c b/migration/colo.c index 6b2ad35aa4..2947363ae5 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -739,6 +739,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, qemu_mutex_lock_iothread(); vmstate_loading = true; + colo_flush_ram_cache(); ret = qemu_load_device_state(fb); if (ret < 0) { error_setg(errp, "COLO: load device state failed"); diff --git a/migration/ram.c b/migration/ram.c index 04f13feb2e..5baec5fce9 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3313,7 +3313,7 @@ static bool postcopy_is_running(void) * Flush content of RAM cache into SVM's memory. * Only flush the pages that be dirtied by PVM or SVM or both. */ -static void colo_flush_ram_cache(void) +void colo_flush_ram_cache(void) { RAMBlock *block = NULL; void *dst_host; @@ -3585,9 +3585,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } trace_ram_load_complete(ret, seq_iter); - if (!ret && migration_incoming_in_colo_state()) { - colo_flush_ram_cache(); - } return ret; } diff --git a/migration/ram.h b/migration/ram.h index 5ceaff7cb4..2eeaacfa13 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -65,6 +65,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); /* ram cache */ int colo_init_ram_cache(void); +void colo_flush_ram_cache(void); void colo_release_ram_cache(void); void colo_incoming_start_dirty_log(void); From patchwork Wed May 20 20:42:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Straub X-Patchwork-Id: 11561567 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DDA41138A for ; Wed, 20 May 2020 20:56:05 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5BB4620758 for ; Wed, 20 May 2020 20:56:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=web.de header.i=@web.de header.b="fp/IYFg1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BB4620758 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=web.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:59434 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbVkx-0005Y2-VJ for patchwork-qemu-devel@patchwork.kernel.org; Wed, 20 May 2020 16:56:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44416) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXt-0005ou-ID for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:33 -0400 Received: from mout.web.de ([212.227.15.3]:34685) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXs-00079F-JK for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1590007345; bh=rI0siZjB1fnXQBtvS0qkV9XKqMdAQZHTmQcm9q2ms80=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:In-Reply-To:References; b=fp/IYFg1mY9KYSufwivykFYkZ3B2aZ/fxAJNGFe69WDcp2D7YR1uaALHs6HSU48rI ezq3UHtb0TVep4SIAkwcN3n822UhpgKfS8r806lo+BF110I8pSWp5r1jR05n4jrMy2 mrja6sRfVapra7W85h/0nkczFoJRoSy8/ogk3blo= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from luklap ([88.130.61.105]) by smtp.web.de (mrweb001 [213.165.67.108]) with ESMTPSA (Nemesis) id 0LshGz-1ivPZY1Ygz-012Foq; Wed, 20 May 2020 22:42:25 +0200 Date: Wed, 20 May 2020 22:42:23 +0200 From: Lukas Straub To: qemu-devel Subject: [PATCH v2 4/6] migration/colo.c: Relaunch failover even if there was an error Message-ID: <85df331bfe69661072d9f29b616f065ca261f471.1590007004.git.lukasstraub2@web.de> In-Reply-To: References: MIME-Version: 1.0 X-Provags-ID: V03:K1:L3cYkJcfcO04JSDwvaPz2lFaFK2ur+1NvpQsnqJLp6zlV6L0/mq JL2MeDqtaOUEPGh389v5NTKzCUr/A9421cqcAETznYwfIZrpRIhdiMZAjlKnTZSDuWD8hAt 4no/Y1BGx5dwFHW+3ZhduK5WDX0lY3B3/TZ5B5hrG9ORdp/V3yyD6qqgDJuXEI6G1CMc8SN OmLxzUDnOhXZWjMARVkAg== X-UI-Out-Filterresults: notjunk:1;V03:K0:H45V3UIycU0=:MhMKrNBUt9EbNQipKzwj2k qpmONqDPhBGkvRM7ts+m6UUPWKeB+0SW+Y5tPSaVDtZDsf7Gr14KqdCt8QP5NhHynvt9cqdVm 7kGRROzIUvj6DbgT1VHUdBR214kWvHge01FMZhCoM4GF7AHrDKtFRWNUpmOlJa03tAq6uMuen ujSsN8lNH/qiQsx9yWtfqkEhds37z67MdjRKp9fJ0KFO26SvUJd7MxEMj1pn3AbqAVDNAYQNj fFvM9Vl9vIZJSg8PtJkzHK/stPZEPHc252NN9lLcO8lWYggUlLilR60IjmxEH3I+gYSpl9BaS DpS+m/VLbJrHnutoXWMoV5BXRQuZCHcyPOlpXQXDTcvak1wrXyST0D96Mr7/N+N6lSpyBLlgl sZ+V4Aw+H0ZGw6uCFXi7vlIiaJf97oPg2UmROVMYZruOBC6q3Zhaz6vQurvlekDzYAXJhj9Uz ByHXtilUJ69hABaTs4BwKaqf9Grd+DnnRDxpC+fY4A1p6xTVUJGS9eKbpbAhboItoLRCEnobE KWP6elnGdpl/1mvPuyJN37GZ4vllG3N6nbVfuOGA7Xl84nSRR8CVGhuFBESTUTgODjMY9tI5D n9MAOifQF+WncHG30lujDSVteNUnimrJsvXCBC96Bwvr168c9m14XBThXp5/Tq94NsHJn3GSF kY7nRHCKrSDCrfKpd0/5rMikQ3yy8Z85KItVCblrUGJPGT3ZGrw305YHYEwl3TbBN/gjyoXnb 15YaxD6/0vNuIsc+km2hn+0ccAy/sUfc6H6NlwcXktSQUEwfJ3s1o54YXXYYnsSHNnY+VtBZR KFVH9f3qJ6YOMdwU53q6ybaU29LiQmwq8T6z4TXlbJ0I03DAvyWgzM+MYioEOmGjxvLIn/QTt 21vPJSIP+C/NZqDwAsSOxCbNc7gxfqoC2m4oh1V/okT7CtLi8H4ApAZtNEtqRvzOXUpy1QIAu Qtlmbf/2ThtPX5shsVCWEV+D18Dx+GZ5arYCrWxT5DjdkQ/FyRojkJzqoyuD2tU+VxQcTYJWh TMVVa42KFHAx/eQ9ZqmGqpZ/83H1bS5LzCFZQLyOzHgkKq8jiJ4HWcVEcRbmfcj8ssfR0yYmz mwXPUWDiHuBiyWtyeV+koYiF+Wvfs0KWuCfB4iv5nis5jFZfBupVH8ADjp39oHLIg11scuv3a btXAZC0zke785hcP0jHucM3z6OjLDKbNAVAFuPiVARoGAn0EjLpLdiQPsXfYt/HTwsfNNNNA9 5C3pQSYe9sorHV0rq Received-SPF: pass client-ip=212.227.15.3; envelope-from=lukasstraub2@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/20 16:42:20 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hailiang Zhang , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" If vmstate_loading is true, secondary_vm_do_failover will set failover status to FAILOVER_STATUS_RELAUNCH and return success without initiating failover. However, if there is an error during the vmstate_loading section, failover isn't relaunched. Instead we then wait for failover on colo_incoming_sem. Fix this by relaunching failover even if there was an error. Also, to make this work properly, set vmstate_loading to false when returning during the vmstate_loading section. Signed-off-by: Lukas Straub Reviewed-by: zhanghailiang --- migration/colo.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) -- 2.20.1 diff --git a/migration/colo.c b/migration/colo.c index 2947363ae5..a69782efc5 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -743,6 +743,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, ret = qemu_load_device_state(fb); if (ret < 0) { error_setg(errp, "COLO: load device state failed"); + vmstate_loading = false; qemu_mutex_unlock_iothread(); return; } @@ -751,6 +752,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, replication_get_error_all(&local_err); if (local_err) { error_propagate(errp, local_err); + vmstate_loading = false; qemu_mutex_unlock_iothread(); return; } @@ -759,6 +761,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, replication_do_checkpoint_all(&local_err); if (local_err) { error_propagate(errp, local_err); + vmstate_loading = false; qemu_mutex_unlock_iothread(); return; } @@ -770,6 +773,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, if (local_err) { error_propagate(errp, local_err); + vmstate_loading = false; qemu_mutex_unlock_iothread(); return; } @@ -780,9 +784,6 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, qemu_mutex_unlock_iothread(); if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { - failover_set_state(FAILOVER_STATUS_RELAUNCH, - FAILOVER_STATUS_NONE); - failover_request_active(NULL); return; } @@ -881,6 +882,14 @@ void *colo_process_incoming_thread(void *opaque) error_report_err(local_err); break; } + + if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { + failover_set_state(FAILOVER_STATUS_RELAUNCH, + FAILOVER_STATUS_NONE); + failover_request_active(NULL); + break; + } + if (failover_get_state() != FAILOVER_STATUS_NONE) { error_report("failover request"); break; @@ -888,8 +897,6 @@ void *colo_process_incoming_thread(void *opaque) } out: - vmstate_loading = false; - /* * There are only two reasons we can get here, some error happened * or the user triggered failover. From patchwork Wed May 20 20:42:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Straub X-Patchwork-Id: 11561573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 70D0D913 for ; Wed, 20 May 2020 21:01:43 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 06A342075F for ; Wed, 20 May 2020 21:01:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=web.de header.i=@web.de header.b="DiFgKPm4" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 06A342075F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=web.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:33616 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbVqP-0000Tt-PO for patchwork-qemu-devel@patchwork.kernel.org; Wed, 20 May 2020 17:01:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44428) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXz-0005sn-GY for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:39 -0400 Received: from mout.web.de ([217.72.192.78]:53935) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVXy-00079c-KR for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1590007349; bh=uxF+8MNykSBGMimYpB4teFNIItz1JuyXzJm8wPg7yRU=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:In-Reply-To:References; b=DiFgKPm4OQWQNrHbt9Z2CBtrRkdiYixfP7azjeD3swF48Zvb5n2AmYA/KY4IJXc80 4WYI6BRefwfU2VGvlWyrIuGNN2PMePaNiXUJ27E5dZZLo/LBc84FgZR25GbR8jyveI XkBTuXMolgP+9APGpiRbVqyPBhEqAY/QP0AH6hGo= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from luklap ([88.130.61.105]) by smtp.web.de (mrweb103 [213.165.67.124]) with ESMTPSA (Nemesis) id 0LqDYi-1j6YVR2bw0-00dmrV; Wed, 20 May 2020 22:42:29 +0200 Date: Wed, 20 May 2020 22:42:27 +0200 From: Lukas Straub To: qemu-devel Subject: [PATCH v2 5/6] migration/colo.c: Move colo_notify_compares_event to the right place Message-ID: <8c8faa57854695ff230cd89b35449472eafa7772.1590007004.git.lukasstraub2@web.de> In-Reply-To: References: MIME-Version: 1.0 X-Provags-ID: V03:K1:tFSBr2bnyk9Zxo3laF4cPnhjWSIUpUqya9EkRpeseO6NAFhtS6J LJGuTPZpeACno+GwMS1rSmN0adqZsBsxA2K+lolwIWrL32pSB9O9iu7bKQwMwUwnEVChh65 4/qp5brkj0W92wPEOuMWaOvsB23xJMmKM562wWDsosIgBZMTYpzmAomtqQDsaHNmwBH3MrP mJj6cJaaut3ovFpBg9PtA== X-UI-Out-Filterresults: notjunk:1;V03:K0:APqpuKj75j4=:64uSZoQD/8s681u4VuoQ5b wMAK32NXUXVyE/V+ZprMjtUXgVAD63rf8dsnRLco+gbWKoH+IriOlRi+AEAH/4lS93AVOHzHi ASAqxO7MrZYUuAWdkuiBWXDLSwH86/ILJYOMa5aZLZl8z2L7fvdp8zhmH4mgMpCylyJne8OhY P0ndqJd+scujs6hnqHVKr6sAwSX1MBSHlY65DuuLLq55cnN+TZ5fiJ1JYEKNz/mlZlxWkYd7g NLqQRnlI7f0jjKlDFj6hmfh121+6PYEKc4LIlLxfAg1bmBXsqNNFHJ0Rs2W+ICaLBtUZzQQL2 fOycjm2EKPFXYUDo7k/I6/2ToSfTyRb9A5yFmGNMFCeZgXEEittJVU8xm6LcXYqN6y6Q40xqL nuwTPAWzNPNITE0+8rBSZaVsYLtk5RNP+OT1n/xHFLekOQhhgW2e4MgYYJS3wgxcuR9xYm2t6 LAXgYleKzp6hGYjSkT7I52lpzk2CezWWZYO9Ql626yD1egrLpQLKiNZUGpQ4dUpyTWz7PhY3M PU1J9pYj7hYewi8Wy4fhE53RUeJrTdlaTz9jyx2rnDoyz2IeFV9cKzS0qxdpxqi3sK1gJBvXF fgt3I1ZSKlSKuje+rsJodV/zSGb3QVETQ79gfYrN9O9j3DQe5RgcwtLBbP2IPePHCLozNaPb0 YQiqcBdUoI66dcqkIE0n4Sju3GssIAB82fLj2RecCySNbgAonYrVibiKZ4kb0UZkdOUPJ3qYR O6eoHhgIMHb7atmx8t8Frm+Pnttfj1mE6JHpMg5QqmaKR5JZG0ptH7TUMcqpQBEavXGKTVvz2 xnGvM/FKkbpXqdf1M1X/3bcO7GyJTys4mkNZqfbZMEzmcmfBd9V9kezFDQni40MdW18lTqxrS 5FG1FS4dQCUv6lHl+RQK2KiJtXxhb2wsm2EgZKCFk1jFLPJlOpHbu84cgFZmxKcZ3f2VjSiyb DSS4FD/q8cEEXXIqH1TRWyzO5bVrGGq+iOfe/tQLwRyGmEwOeurS6HivE7UopXkdiw9gS9Tmi zsnbQA9tPIvgOQQc365f67vH3yR4Wt+r5a/mH3EdWmU6Rsmf56nrDeQAgQ4d67UfLVCYDCtc1 hlG9cgDvMVxdAeJlq4eWGTIxkwGO/qk6kxtcYq7P052vlOhvJKrVVM2AqO5RfdZPcqNR8+otU 0btRrZVOhqa0x2jbTrBNTO0HppgDO+DD89Q/zWnS4N67YTe95LMbUfdeT2EI7WvwKwBIdtT+U 0Q1sJowOkksN8YJgq Received-SPF: pass client-ip=217.72.192.78; envelope-from=lukasstraub2@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/20 16:42:37 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hailiang Zhang , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" If the secondary has to failover during checkpointing, it still is in the old state (i.e. different state than primary). Thus we can't expose the primary state until after the checkpoint is sent. This fixes sporadic connection reset of client connections during failover. Signed-off-by: Lukas Straub Reviewed-by: zhanghailiang --- migration/colo.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) -- 2.20.1 diff --git a/migration/colo.c b/migration/colo.c index a69782efc5..a3fc21e86e 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -430,12 +430,6 @@ static int colo_do_checkpoint_transaction(MigrationState *s, goto out; } - qemu_event_reset(&s->colo_checkpoint_event); - colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err); - if (local_err) { - goto out; - } - /* Disable block migration */ migrate_set_block_enabled(false, &local_err); qemu_mutex_lock_iothread(); @@ -494,6 +488,12 @@ static int colo_do_checkpoint_transaction(MigrationState *s, goto out; } + qemu_event_reset(&s->colo_checkpoint_event); + colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err); + if (local_err) { + goto out; + } + colo_receive_check_message(s->rp_state.from_dst_file, COLO_MESSAGE_VMSTATE_LOADED, &local_err); if (local_err) { From patchwork Wed May 20 20:42:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukas Straub X-Patchwork-Id: 11561575 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1398590 for ; Wed, 20 May 2020 21:07:03 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 83E61207E8 for ; Wed, 20 May 2020 21:07:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=web.de header.i=@web.de header.b="d0FVuWdx" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83E61207E8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=web.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:38334 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jbVvZ-0003Qv-9D for patchwork-qemu-devel@patchwork.kernel.org; Wed, 20 May 2020 17:07:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44434) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVY1-0005uR-Lf for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:41 -0400 Received: from mout.web.de ([212.227.15.14]:38821) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jbVY0-00079y-Rb for qemu-devel@nongnu.org; Wed, 20 May 2020 16:42:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1590007353; bh=c2L3X4H7w/EoaLKNHwRFS99beuVs7O0gWwWmQO5Q7pQ=; h=X-UI-Sender-Class:Date:From:To:Cc:Subject:In-Reply-To:References; b=d0FVuWdx+Rru7Spb+JtjDo8GM10kgJjuZfqUCzW8BCQY/S2m3qeuBCl7OHHrIx26N rECdYCvQ/7CCo/AsHbODlRHs7w9ZD2NaTjynq9zSe+cRPTwCqmo14pLTWab2Cxsu6p aXWKConxVwiFrssqsdnnb4M+2U4GfdK5NVVp3Ta8= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from luklap ([88.130.61.105]) by smtp.web.de (mrweb003 [213.165.67.108]) with ESMTPSA (Nemesis) id 0Lj2Cs-1j3pDj0jio-00dGNo; Wed, 20 May 2020 22:42:33 +0200 Date: Wed, 20 May 2020 22:42:32 +0200 From: Lukas Straub To: qemu-devel Subject: [PATCH v2 6/6] migration/migration.c: Fix hang in ram_save_host_page Message-ID: In-Reply-To: References: MIME-Version: 1.0 X-Provags-ID: V03:K1:FEH/aqH+YoSjHjWD+k0jCpanSaUTVOXNqQhtejSVbx26nrKA/WI top37b28OrYTQvNghaFlVP9Q2tyY/fK4tqB3ckW7vytHjBLCMbLkNvzWhc00w1gaqhrFL0s E9xn5sB7pLr4EUbDWOqXFudXv00hdStgbai80agWqW6eC8fMxEGNF9rV6vWxwqNPs9VhriJ OEmb7fSXHIA6oKx2P31RQ== X-UI-Out-Filterresults: notjunk:1;V03:K0:qFJ1REFqSlk=:NLRU0EmffZPPsFFGSJPQYW zfJ3bro42FkzSrpDTXvn9l/gOD7Dgasvfz+AqS6gn/dpguvivTpk4RydQcnwrqmEux9HePXjc oz1Zu/1hJZjXZwpBATzMd/LZbF3FHZ1efACcKDDIRNSDx4w9r6uvv9a46wj8wMt9+s6Ab+Vnp sg2CONFEFefZ7HR0CWR9hAmiuUqxtxsihZTElzPSZ3j0TfBgyKvAnn+qu+xiKz/jhEdc5jXPn dLRwTmuLtCkYZteO6gd5KMcNDvWSzltimy3GNcYIRhJ591Yy7AaLJR/SF5jYujAdN5FKW3GkB r7JkfuRyRjm9Nss5tCjbRigjEruDLzjHnmHSwy+QwO1aPB9Hb6rPYIpfpZaC4EE8W5PKI6BMA Txe7px/bkkk1agRXaKjU4eHY0/m3iGYiBpZgv1ouVfv0v7Z98MeLvrIMBi/fXXkKfmbz25pXh uX08USFjw/xf7YfR8X9rWfsr1Q+GsUbuD+eoxCDAjGIchTw4smNSjBoKoF2fWOG681RnLe16X zl5DapvM6m8Ydemgz2usx5K3yyss6P6y6+aFdURwqTa8Hc72w4m9ESqSXmjo13df5EZJb3UBI sKnKvyDpe/mzz9C58JaMJdZooKtPK1N77dEOX49YCTgIxC5vGxSL9oDx/HKON3F0hPdfiDhGD MP+CemicD81CvrDJ3ZxwrgGDAnmGKfowybt5u8ksAFjYMtdgiyCuNMghzQe/giCRh3dgBQzEt kICL4o9RNv8L5vP12ZpD6i48KIqksMZSI+NjTA+3O1d9+J9u6RkICiHZqmOWhHc+gS3SZ7K8o QBxLKd+10Mgm8JbgOEd9kv/CuT6kXugO06DJQqhTJpaD63blVJkxwvpnuYYGW1tZRfyAwQrZ3 TGoe+yNO3I80BefDn7gIX089dBXfFU5dxY+t1TdifK269BXgeujS3qo+EnQmGLHqmFYkBZ0Kj CBohr6SMrIxqJ2WrN/tvZvc/983w89HcnuUFACpmHJEHgDXYsPHkrtoD/A43vHz+HmPKHkdKy Fm8psHrpX1Om/Da56BRVgHmpXwwIRBdZGYpPLsybfBxzsjD6qFRDmOPwqXNFaYHJ0gK9K3Dgq 3rWLN6Rya2Xq4os+zdzczluTiyb24828/F0xwG08970jDUiwW/LApQHey9LIOjQI+ySs44jp1 71hXRxXJHkJxtvRbvBTyGhLXmMtDF1neA2wnhIkEA0gicWcPQ3n6sgEdTOQwAVBpx6cLc2VvI VAOae4qd5oLvPBIcp Received-SPF: pass client-ip=212.227.15.14; envelope-from=lukasstraub2@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/20 16:42:25 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hailiang Zhang , "Dr. David Alan Gilbert" , Juan Quintela Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" migration_rate_limit will erroneously ratelimit a shutdown socket, which causes the migration thread to hang in ram_save_host_page if the socket is shutdown. Fix this by explicitly testing if the socket has errors or was shutdown in migration_rate_limit. Signed-off-by: Lukas Straub Reviewed-by: Dr. David Alan Gilbert --- migration/migration.c | 4 ++++ 1 file changed, 4 insertions(+) -- 2.20.1 diff --git a/migration/migration.c b/migration/migration.c index 187ac0410c..e8bd32d48c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3347,6 +3347,10 @@ bool migration_rate_limit(void) bool urgent = false; migration_update_counters(s, now); if (qemu_file_rate_limit(s->to_dst_file)) { + + if (qemu_file_get_error(s->to_dst_file)) { + return false; + } /* * Wait for a delay to do rate limiting OR * something urgent to post the semaphore.