From patchwork Fri Jul 30 07:40:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12410583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B481C4338F for ; Fri, 30 Jul 2021 07:42:39 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A766160FED for ; Fri, 30 Jul 2021 07:42:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A766160FED Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=nongnu.org Received: from localhost ([::1]:59752 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m9NAD-0007Tq-FU for qemu-devel@archiver.kernel.org; Fri, 30 Jul 2021 03:42:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59296) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m9N9T-0006no-5F for qemu-devel@nongnu.org; Fri, 30 Jul 2021 03:41:51 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:34725) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m9N9M-0005FK-Tf for qemu-devel@nongnu.org; Fri, 30 Jul 2021 03:41:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1627630902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=9+/f8vIM4VkxjfOUdaCitH4Bqd1oI9Uz6Mj6sGDOo58=; b=SkPbfAEuH0vE6b5an5rJ0ts6R4O2kKk2XtQ00gKtMat3CwWecq8mBK5/YzR7/Hsm9+hY9F XqdEZViXR4slvyAcNmHbM+UI97puLh8S6mXOuy96dle9WexE79yJWfvgER3W1np2KJynq3 /wouQKI/1HLJKO4fqAzTKlCFS4vSYv8= Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-509-mnKTHkuKN16xHClkp7yWkQ-1; Fri, 30 Jul 2021 03:41:40 -0400 X-MC-Unique: mnKTHkuKN16xHClkp7yWkQ-1 Received: by mail-pl1-f198.google.com with SMTP id p7-20020a170902b087b029012c2879a885so7266961plr.6 for ; Fri, 30 Jul 2021 00:41:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=9+/f8vIM4VkxjfOUdaCitH4Bqd1oI9Uz6Mj6sGDOo58=; b=rl+cEq+xdtWIbXnQE+eJmJcRxh45Hm0uho+HgWYaNA/BXiDPasE9Z82aYORn2zCcIS g6Y/zpkjB5mzueWLCAzSEbsn8+V1lyvpj5708sx7SqgG6+eOVK3239uG1jt58lwUZ5iw MBbXl9XgXcgEYDdaVGR5Lq4TjJG6W7HVpsUksDQYcT9meCAeaE9gUg4HJJWA1PLNIKwG 1XMAA7ZcdGr1blY6QZaPXRhHSItk4VAYcPrkamRr7Lp1pisdv2maOE80AqQ3d0CP3807 UqnJes5LLIVhEHfzHtCKz8jiuT+5D6tJAbf3RhbBzamk8imEMXQDiErWzaa5exYQCs/+ AjEQ== X-Gm-Message-State: AOAM532cv+wjbOfsxdblDBkMDval1pf4PHqC2NkXVhG0Sg/AGJjq6Q5C Azl2cLybh5IEaeU2+dPfv59XQBpXn1gpGCE1yQtby2If+pQgvzcQpDR4L3RvbsWvU53VwxWmUaM 65EBgWjk6KxOQ8O4= X-Received: by 2002:a63:2152:: with SMTP id s18mr623829pgm.25.1627630899869; Fri, 30 Jul 2021 00:41:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyjeJzeaZbjT0X9GE1LpvjxpPddhOPwMx5uftfvEphKRtOzG6betisXsLp/jsY7mxFw4wV70A== X-Received: by 2002:a63:2152:: with SMTP id s18mr623807pgm.25.1627630899563; Fri, 30 Jul 2021 00:41:39 -0700 (PDT) Received: from LeoBras.redhat.com ([2804:431:c7f0:5f2c:3ab4:a48:5ddf:8053]) by smtp.gmail.com with ESMTPSA id bg8sm1108296pjb.4.2021.07.30.00.41.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jul 2021 00:41:38 -0700 (PDT) From: Leonardo Bras To: Juan Quintela , "Dr. David Alan Gilbert" , Lukas Straub Subject: [PATCH 1/1] migration: Terminate multifd threads on yank Date: Fri, 30 Jul 2021 04:40:45 -0300 Message-Id: <20210730074043.54260-1-leobras@redhat.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -34 X-Spam_score: -3.5 X-Spam_bar: --- X-Spam_report: (-3.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.717, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Xiaohui , Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From source host viewpoint, losing a connection during migration will cause the sockets to get stuck in sendmsg() syscall, waiting for the receiving side to reply. In migration, yank works by shutting-down the migration QIOChannel fd. This causes a failure in the next sendmsg() for that fd, and the whole migration gets cancelled. In multifd, due to having multiple sockets in multiple threads, on a connection loss there will be extra sockets stuck in sendmsg(), and because they will be holding their own mutex, there is good chance the main migration thread can get stuck in multifd_send_pages() waiting for one of those mutexes. While it's waiting, the main migration thread can't run sendmsg() on it's fd, and therefore can't cause the migration to be cancelled, thus causing yank not to work. Fixes this by shutting down all migration fds (including multifd ones), so no thread get's stuck in sendmsg() while holding a lock, and thus allowing the main migration thread to properly cancel migration when yank is used. There is no need to do the same procedure to yank to work in the receiving host since ops->recv_pages() is kept outside the mutex protected code in multifd_recv_thread(). Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=1970337 Reported-by: Li Xiaohui Signed-off-by: Leonardo Bras --- migration/multifd.c | 11 +++++++++++ migration/multifd.h | 1 + migration/yank_functions.c | 2 ++ 3 files changed, 14 insertions(+) diff --git a/migration/multifd.c b/migration/multifd.c index 377da78f5b..744a180dfe 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -1040,6 +1040,17 @@ void multifd_recv_sync_main(void) trace_multifd_recv_sync_main(multifd_recv_state->packet_num); } +void multifd_shutdown(void) +{ + if (!migrate_use_multifd()) { + return; + } + + if (multifd_send_state) { + multifd_send_terminate_threads(NULL); + } +} + static void *multifd_recv_thread(void *opaque) { MultiFDRecvParams *p = opaque; diff --git a/migration/multifd.h b/migration/multifd.h index 8d6751f5ed..0517213bdf 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -22,6 +22,7 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); void multifd_send_sync_main(QEMUFile *f); int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); +void multifd_shutdown(void); /* Multifd Compression flags */ #define MULTIFD_FLAG_SYNC (1 << 0) diff --git a/migration/yank_functions.c b/migration/yank_functions.c index 8c08aef14a..9335a64f00 100644 --- a/migration/yank_functions.c +++ b/migration/yank_functions.c @@ -15,12 +15,14 @@ #include "io/channel-socket.h" #include "io/channel-tls.h" #include "qemu-file.h" +#include "multifd.h" void migration_yank_iochannel(void *opaque) { QIOChannel *ioc = QIO_CHANNEL(opaque); qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL); + multifd_shutdown(); } /* Return whether yank is supported on this ioc */