From patchwork Tue Feb 1 06:28:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12731436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 40770C433EF for ; Tue, 1 Feb 2022 07:13:24 +0000 (UTC) Received: from localhost ([::1]:44062 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nEnLu-0004dd-SQ for qemu-devel@archiver.kernel.org; Tue, 01 Feb 2022 02:13:22 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36438) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmg0-0003D4-2Z for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34165) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmfs-0006ZE-0V for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:29:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643696994; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HbttibxrhEYFMdwUzOJM8hTkFPF7dWKiGF+1KRZ3R/w=; b=Kn3Wsp8s+8d44RaOREWeLUCwcckVH45u6k4EiypaJmjohDnFmvFHI9R1U11A4mv77OKpap NimTBjicYbG5H/Kh+RR5aeNj5wsZ12XzbIOnOCPhxs+vKJavMTlbibbHKkrb7Vh3p9lBhI s88m0Nph2iHUAeks1myu8Zpg9g3KQaQ= Received: from mail-oi1-f200.google.com (mail-oi1-f200.google.com [209.85.167.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-0WALIf_wNL6gH1TcOvUcRQ-1; Tue, 01 Feb 2022 01:29:53 -0500 X-MC-Unique: 0WALIf_wNL6gH1TcOvUcRQ-1 Received: by mail-oi1-f200.google.com with SMTP id ay31-20020a056808301f00b002d06e828c00so1430487oib.2 for ; Mon, 31 Jan 2022 22:29:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HbttibxrhEYFMdwUzOJM8hTkFPF7dWKiGF+1KRZ3R/w=; b=Pjo1spe/unSmgtizhEjMacW0jFDT8X32Zd3jk35EmIS07G/S99CwIpPi+/+mXndUen TG+cU9syekMBxnHBaK7HprW215GWlzBf1XPljVoKi32lTwCtnv9kD3PLpKzAb77CEhZz zPgl45jajVT4guuVg8hjWB1G9GyKDsOVngRRygYB19WOaZ/indswe0XNzfRLc/mi7F2R Z0OA/xl9JJymfemnWIaH+1CFrVJKP34smgwvxB9yBqnKHNKHDjHxh7XltJ8ba8GfwEW1 WdaNy/D+2e7UkAcLf412INc7Q5YrTx0LnWJYVJqtnfORMhQFkWle65qqxG3NvRkljSet lTRg== X-Gm-Message-State: AOAM5326/9HfHKq51BqUfYdXdwJMfKZkoFuyleEcUJHoXNmSsUz95nLA 2pvy1ROo8roqVkgxFX6uF1XTeSa6tllkf1IvtyOBxU3lvnGviLYb+5epTo0+KMLB3kwEjsRXMqQ CRuR173b6AMeCEwE= X-Received: by 2002:a9d:5185:: with SMTP id y5mr13409541otg.243.1643696992327; Mon, 31 Jan 2022 22:29:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJwPmaapYHOCZLgLH/0QTiVP1OeK7jzyjbbTvHux9wpFgs77/mumqy6HLYuzKzsYAjVx0aI96Q== X-Received: by 2002:a9d:5185:: with SMTP id y5mr13409528otg.243.1643696992044; Mon, 31 Jan 2022 22:29:52 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:95e9:6da1:67bd:fdc3:e12e]) by smtp.gmail.com with ESMTPSA id l14sm14424720ooq.12.2022.01.31.22.29.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jan 2022 22:29:51 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , Paolo Bonzini , Elena Ufimtseva , Jagannathan Raman , John G Johnson , =?utf-8?q?Daniel_P=2E_Berrang?= =?utf-8?q?=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Fam Zheng , Peter Xu Subject: [PATCH v8 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Date: Tue, 1 Feb 2022 03:28:59 -0300 Message-Id: <20220201062901.428838-2-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220201062901.428838-1-leobras@redhat.com> References: <20220201062901.428838-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.081, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add flags to io_writev and introduce io_flush as optional callback to QIOChannelClass, allowing the implementation of zero copy writes by subclasses. How to use them: - Write data using qio_channel_writev*(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY), - Wait write completion with qio_channel_flush(). Notes: As some zero copy write implementations work asynchronously, it's recommended to keep the write buffer untouched until the return of qio_channel_flush(), to avoid the risk of sending an updated buffer instead of the buffer state during write. As io_flush callback is optional, if a subclass does not implement it, then: - io_flush will return 0 without changing anything. Also, some functions like qio_channel_writev_full_all() were adapted to receive a flag parameter. That allows shared code between zero copy and non-zero copy writev, and also an easier implementation on new flags. Signed-off-by: Leonardo Bras Reviewed-by: Peter Xu Reviewed-by: Juan Quintela --- include/io/channel.h | 38 ++++++++++++++++++++- chardev/char-io.c | 2 +- hw/remote/mpqemu-link.c | 2 +- io/channel-buffer.c | 1 + io/channel-command.c | 1 + io/channel-file.c | 1 + io/channel-socket.c | 2 ++ io/channel-tls.c | 1 + io/channel-websock.c | 1 + io/channel.c | 53 +++++++++++++++++++++++------ migration/rdma.c | 1 + scsi/pr-manager-helper.c | 2 +- tests/unit/test-io-channel-socket.c | 1 + 13 files changed, 92 insertions(+), 14 deletions(-) diff --git a/include/io/channel.h b/include/io/channel.h index 88988979f8..c680ee7480 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass, #define QIO_CHANNEL_ERR_BLOCK -2 +#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1 + typedef enum QIOChannelFeature QIOChannelFeature; enum QIOChannelFeature { QIO_CHANNEL_FEATURE_FD_PASS, QIO_CHANNEL_FEATURE_SHUTDOWN, QIO_CHANNEL_FEATURE_LISTEN, + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY, }; @@ -104,6 +107,7 @@ struct QIOChannelClass { size_t niov, int *fds, size_t nfds, + int flags, Error **errp); ssize_t (*io_readv)(QIOChannel *ioc, const struct iovec *iov, @@ -136,6 +140,8 @@ struct QIOChannelClass { IOHandler *io_read, IOHandler *io_write, void *opaque); + int (*io_flush)(QIOChannel *ioc, + Error **errp); }; /* General I/O handling functions */ @@ -228,6 +234,7 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc, * @niov: the length of the @iov array * @fds: an array of file handles to send * @nfds: number of file handles in @fds + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*) * @errp: pointer to a NULL-initialized error object * * Write data to the IO channel, reading it from the @@ -260,6 +267,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp); /** @@ -837,6 +845,7 @@ int qio_channel_readv_full_all(QIOChannel *ioc, * @niov: the length of the @iov array * @fds: an array of file handles to send * @nfds: number of file handles in @fds + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*) * @errp: pointer to a NULL-initialized error object * * @@ -846,6 +855,14 @@ int qio_channel_readv_full_all(QIOChannel *ioc, * to be written, yielding from the current coroutine * if required. * + * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags, + * instead of waiting for all requested data to be written, + * this function will wait until it's all queued for writing. + * In this case, if the buffer gets changed between queueing and + * sending, the updated buffer will be sent. If this is not a + * desired behavior, it's suggested to call qio_channel_flush() + * before reusing the buffer. + * * Returns: 0 if all bytes were written, or -1 on error */ @@ -853,6 +870,25 @@ int qio_channel_writev_full_all(QIOChannel *ioc, const struct iovec *iov, size_t niov, int *fds, size_t nfds, - Error **errp); + int flags, Error **errp); + +/** + * qio_channel_flush: + * @ioc: the channel object + * @errp: pointer to a NULL-initialized error object + * + * Will block until every packet queued with + * qio_channel_writev_full() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY + * is sent, or return in case of any error. + * + * If not implemented, acts as a no-op, and returns 0. + * + * Returns -1 if any error is found, + * 1 if every send failed to use zero copy. + * 0 otherwise. + */ + +int qio_channel_flush(QIOChannel *ioc, + Error **errp); #endif /* QIO_CHANNEL_H */ diff --git a/chardev/char-io.c b/chardev/char-io.c index 8ced184160..4451128cba 100644 --- a/chardev/char-io.c +++ b/chardev/char-io.c @@ -122,7 +122,7 @@ int io_channel_send_full(QIOChannel *ioc, ret = qio_channel_writev_full( ioc, &iov, 1, - fds, nfds, NULL); + fds, nfds, 0, NULL); if (ret == QIO_CHANNEL_ERR_BLOCK) { if (offset) { return offset; diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c index 7e841820e5..e8f556bd27 100644 --- a/hw/remote/mpqemu-link.c +++ b/hw/remote/mpqemu-link.c @@ -69,7 +69,7 @@ bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp) } if (!qio_channel_writev_full_all(ioc, send, G_N_ELEMENTS(send), - fds, nfds, errp)) { + fds, nfds, 0, errp)) { ret = true; } else { trace_mpqemu_send_io_error(msg->cmd, msg->size, nfds); diff --git a/io/channel-buffer.c b/io/channel-buffer.c index baa4e2b089..bf52011be2 100644 --- a/io/channel-buffer.c +++ b/io/channel-buffer.c @@ -81,6 +81,7 @@ static ssize_t qio_channel_buffer_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc); diff --git a/io/channel-command.c b/io/channel-command.c index 338da73ade..54560464ae 100644 --- a/io/channel-command.c +++ b/io/channel-command.c @@ -258,6 +258,7 @@ static ssize_t qio_channel_command_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc); diff --git a/io/channel-file.c b/io/channel-file.c index d7cf6d278f..ef6807a6be 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -114,6 +114,7 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); diff --git a/io/channel-socket.c b/io/channel-socket.c index 459922c874..b2d254ef8d 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -525,6 +525,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); @@ -620,6 +621,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); diff --git a/io/channel-tls.c b/io/channel-tls.c index 2ae1b92fc0..4ce890a538 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -301,6 +301,7 @@ static ssize_t qio_channel_tls_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); diff --git a/io/channel-websock.c b/io/channel-websock.c index 70889bb54d..035dd6075b 100644 --- a/io/channel-websock.c +++ b/io/channel-websock.c @@ -1127,6 +1127,7 @@ static ssize_t qio_channel_websock_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc); diff --git a/io/channel.c b/io/channel.c index e8b019dc36..b8b99fdc4c 100644 --- a/io/channel.c +++ b/io/channel.c @@ -72,18 +72,32 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); - if ((fds || nfds) && - !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) { + if (fds || nfds) { + if (!qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS)) { + error_setg_errno(errp, EINVAL, + "Channel does not support file descriptor passing"); + return -1; + } + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { + error_setg_errno(errp, EINVAL, + "Zero Copy does not support file descriptor passing"); + return -1; + } + } + + if ((flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) && + !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { error_setg_errno(errp, EINVAL, - "Channel does not support file descriptor passing"); + "Requested Zero Copy feature is not available"); return -1; } - return klass->io_writev(ioc, iov, niov, fds, nfds, errp); + return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp); } @@ -217,14 +231,14 @@ int qio_channel_writev_all(QIOChannel *ioc, size_t niov, Error **errp) { - return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp); + return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, 0, errp); } int qio_channel_writev_full_all(QIOChannel *ioc, const struct iovec *iov, size_t niov, int *fds, size_t nfds, - Error **errp) + int flags, Error **errp) { int ret = -1; struct iovec *local_iov = g_new(struct iovec, niov); @@ -235,10 +249,16 @@ int qio_channel_writev_full_all(QIOChannel *ioc, iov, niov, 0, iov_size(iov, niov)); + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { + assert(fds == NULL && nfds == 0); + } + while (nlocal_iov > 0) { ssize_t len; - len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds, - errp); + + len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, + nfds, flags, errp); + if (len == QIO_CHANNEL_ERR_BLOCK) { if (qemu_in_coroutine()) { qio_channel_yield(ioc, G_IO_OUT); @@ -277,7 +297,7 @@ ssize_t qio_channel_writev(QIOChannel *ioc, size_t niov, Error **errp) { - return qio_channel_writev_full(ioc, iov, niov, NULL, 0, errp); + return qio_channel_writev_full(ioc, iov, niov, NULL, 0, 0, errp); } @@ -297,7 +317,7 @@ ssize_t qio_channel_write(QIOChannel *ioc, Error **errp) { struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen }; - return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, errp); + return qio_channel_writev_full(ioc, &iov, 1, NULL, 0, 0, errp); } @@ -473,6 +493,19 @@ off_t qio_channel_io_seek(QIOChannel *ioc, return klass->io_seek(ioc, offset, whence, errp); } +int qio_channel_flush(QIOChannel *ioc, + Error **errp) +{ + QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); + + if (!klass->io_flush || + !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + return 0; + } + + return klass->io_flush(ioc, errp); +} + static void qio_channel_restart_read(void *opaque) { diff --git a/migration/rdma.c b/migration/rdma.c index c7c7a38487..ed2a2a013b 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2833,6 +2833,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); diff --git a/scsi/pr-manager-helper.c b/scsi/pr-manager-helper.c index 451c7631b7..3be52a98d5 100644 --- a/scsi/pr-manager-helper.c +++ b/scsi/pr-manager-helper.c @@ -77,7 +77,7 @@ static int pr_manager_helper_write(PRManagerHelper *pr_mgr, iov.iov_base = (void *)buf; iov.iov_len = sz; n_written = qio_channel_writev_full(QIO_CHANNEL(pr_mgr->ioc), &iov, 1, - nfds ? &fd : NULL, nfds, errp); + nfds ? &fd : NULL, nfds, 0, errp); if (n_written <= 0) { assert(n_written != QIO_CHANNEL_ERR_BLOCK); diff --git a/tests/unit/test-io-channel-socket.c b/tests/unit/test-io-channel-socket.c index c49eec1f03..6713886d02 100644 --- a/tests/unit/test-io-channel-socket.c +++ b/tests/unit/test-io-channel-socket.c @@ -444,6 +444,7 @@ static void test_io_channel_unix_fd_pass(void) G_N_ELEMENTS(iosend), fdsend, G_N_ELEMENTS(fdsend), + 0, &error_abort); qio_channel_readv_full(dst, From patchwork Tue Feb 1 06:29:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12731416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0FE2AC433F5 for ; Tue, 1 Feb 2022 06:50:44 +0000 (UTC) Received: from localhost ([::1]:36468 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nEmzz-0006cI-QS for qemu-devel@archiver.kernel.org; Tue, 01 Feb 2022 01:50:43 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36448) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmg1-0003D6-Bi for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:51502) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmfu-0006ZZ-Rt for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:01 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643696998; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dn99Mnb5I9sYceQ+hc1e1JkyjO/jqGS1k9CFTPCCIMk=; b=iHdayhv2outnTFWo37i8cgpW/jQ6jfAkrMZNkn6ZlTOfkXrS3Q7eqASrH4t9TLaBD+G261 Kz77KNEqoBcEwjP2oVCxbX+KbhgFWfvkch5qUX76XpkjQGtqDGwIw2D9Sr/BEWf54mC4ah WXvGTvRTSatejacJSfRWFYSG4e+hkLs= Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-281-HkE_hmFXP8iDzn5JjNTV3A-1; Tue, 01 Feb 2022 01:29:57 -0500 X-MC-Unique: HkE_hmFXP8iDzn5JjNTV3A-1 Received: by mail-oo1-f70.google.com with SMTP id e6-20020a4ab146000000b002df82c0ed9aso6158212ooo.21 for ; Mon, 31 Jan 2022 22:29:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dn99Mnb5I9sYceQ+hc1e1JkyjO/jqGS1k9CFTPCCIMk=; b=YGJLqKczGPDHbd0ZIk227t3EdMDPZDa+zhroXNUhP+5eCnwDNMnUFoUE2fGG6h1vAp DuGv2Pxm/bwBh9g0ZT9SBE3HLoYDpivPp8HVqqPut4l1XRvequxTsqRjCB+xQ1i+OKeC McKP7pn4SAyQ4GPYIq7r7QbYHGDNBKqVudSw8/UljSo+CtVh+RtFhpGf0F2QCq356ORu VFk18Tnyc9cKazVLvKv+00smpU/fob2cd4i1W8gaq2BYfbdoKDxJCSB1O/Nf30sZTx5w DSFSptkurw18QwQzPjJxofr+nmuzm59+Au2v2aMc5h5VJHsTm0wYq1lXlpHsq8EJZ9aQ 3M0A== X-Gm-Message-State: AOAM532OBmlL8x0+uLXz2U9zbCSRlDf+9jkcCnjrBGcJkC57lCSDWH+T 53kS5KISW3Q5fbnY4vqX4pXKq9vkXBbEO39/RfRPe7dVq3j82PxABqb5knXujNiajbeIcUj0EUf XuUawbfGGDMlBIiA= X-Received: by 2002:a54:4803:: with SMTP id j3mr318018oij.279.1643696996231; Mon, 31 Jan 2022 22:29:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJyIMCt5ngDc6bg9+mUG5nscj4lcgWiiG1m3jXORfjyicn8neZIkM+KnpLsHOPUYZrXv33p6Lw== X-Received: by 2002:a54:4803:: with SMTP id j3mr318007oij.279.1643696996050; Mon, 31 Jan 2022 22:29:56 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:95e9:6da1:67bd:fdc3:e12e]) by smtp.gmail.com with ESMTPSA id l14sm14424720ooq.12.2022.01.31.22.29.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jan 2022 22:29:55 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , Paolo Bonzini , Elena Ufimtseva , Jagannathan Raman , John G Johnson , =?utf-8?q?Daniel_P=2E_Berrang?= =?utf-8?q?=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Fam Zheng , Peter Xu Subject: [PATCH v8 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Date: Tue, 1 Feb 2022 03:29:00 -0300 Message-Id: <20220201062901.428838-3-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220201062901.428838-1-leobras@redhat.com> References: <20220201062901.428838-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.081, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" For CONFIG_LINUX, implement the new zero copy flag and the optional callback io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY feature is available in the host kernel, which is checked on qio_channel_socket_connect_sync() qio_channel_socket_flush() was implemented by counting how many times sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the socket's error queue, in order to find how many of them finished sending. Flush will loop until those counters are the same, or until some error occurs. Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY: 1: Buffer - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying, some caution is necessary to avoid overwriting any buffer before it's sent. If something like this happen, a newer version of the buffer may be sent instead. - If this is a problem, it's recommended to call qio_channel_flush() before freeing or re-using the buffer. 2: Locked memory - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and unlocked after it's sent. - Depending on the size of each buffer, and how often it's sent, it may require a larger amount of locked memory than usually available to non-root user. - If the required amount of locked memory is not available, writev_zero_copy will return an error, which can abort an operation like migration, - Because of this, when an user code wants to add zero copy as a feature, it requires a mechanism to disable it, so it can still be accessible to less privileged users. Signed-off-by: Leonardo Bras Reviewed-by: Peter Xu Reviewed-by: Daniel P. Berrangé Reviewed-by: Juan Quintela --- include/io/channel-socket.h | 2 + io/channel-socket.c | 108 ++++++++++++++++++++++++++++++++++-- 2 files changed, 106 insertions(+), 4 deletions(-) diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h index e747e63514..513c428fe4 100644 --- a/include/io/channel-socket.h +++ b/include/io/channel-socket.h @@ -47,6 +47,8 @@ struct QIOChannelSocket { socklen_t localAddrLen; struct sockaddr_storage remoteAddr; socklen_t remoteAddrLen; + ssize_t zero_copy_queued; + ssize_t zero_copy_sent; }; diff --git a/io/channel-socket.c b/io/channel-socket.c index b2d254ef8d..155a0a2ada 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -26,6 +26,10 @@ #include "io/channel-watch.h" #include "trace.h" #include "qapi/clone-visitor.h" +#ifdef CONFIG_LINUX +#include +#include +#endif #define SOCKET_MAX_FDS 16 @@ -55,6 +59,8 @@ qio_channel_socket_new(void) sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); sioc->fd = -1; + sioc->zero_copy_queued = 0; + sioc->zero_copy_sent = 0; ioc = QIO_CHANNEL(sioc); qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); @@ -154,6 +160,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, return -1; } +#ifdef CONFIG_LINUX + int ret, v = 1; + ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); + if (ret == 0) { + /* Zero copy available on host */ + qio_channel_set_feature(QIO_CHANNEL(ioc), + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY); + } +#endif + return 0; } @@ -534,6 +550,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)]; size_t fdsize = sizeof(int) * nfds; struct cmsghdr *cmsg; + int sflags = 0; memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)); @@ -558,15 +575,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, memcpy(CMSG_DATA(cmsg), fds, fdsize); } + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { + sflags = MSG_ZEROCOPY; + } + retry: - ret = sendmsg(sioc->fd, &msg, 0); + ret = sendmsg(sioc->fd, &msg, sflags); if (ret <= 0) { - if (errno == EAGAIN) { + switch (errno) { + case EAGAIN: return QIO_CHANNEL_ERR_BLOCK; - } - if (errno == EINTR) { + case EINTR: goto retry; + case ENOBUFS: + if (sflags & MSG_ZEROCOPY) { + error_setg_errno(errp, errno, + "Process can't lock enough memory for using MSG_ZEROCOPY"); + return -1; + } + break; } + error_setg_errno(errp, errno, "Unable to write to socket"); return -1; @@ -660,6 +689,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, } #endif /* WIN32 */ + +#ifdef CONFIG_LINUX +static int qio_channel_socket_flush(QIOChannel *ioc, + Error **errp) +{ + QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); + struct msghdr msg = {}; + struct sock_extended_err *serr; + struct cmsghdr *cm; + char control[CMSG_SPACE(sizeof(*serr))]; + int received; + int ret = 1; + + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + memset(control, 0, sizeof(control)); + + while (sioc->zero_copy_sent < sioc->zero_copy_queued) { + received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE); + if (received < 0) { + switch (errno) { + case EAGAIN: + /* Nothing on errqueue, wait until something is available */ + qio_channel_wait(ioc, G_IO_ERR); + continue; + case EINTR: + continue; + default: + error_setg_errno(errp, errno, + "Unable to read errqueue"); + return -1; + } + } + + cm = CMSG_FIRSTHDR(&msg); + if (cm->cmsg_level != SOL_IP && + cm->cmsg_type != IP_RECVERR) { + error_setg_errno(errp, EPROTOTYPE, + "Wrong cmsg in errqueue"); + return -1; + } + + serr = (void *) CMSG_DATA(cm); + if (serr->ee_errno != SO_EE_ORIGIN_NONE) { + error_setg_errno(errp, serr->ee_errno, + "Error on socket"); + return -1; + } + if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) { + error_setg_errno(errp, serr->ee_origin, + "Error not from zero copy"); + return -1; + } + + /* No errors, count successfully finished sendmsg()*/ + sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1; + + /* If any sendmsg() succeeded using zero copy, return 0 at the end */ + if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) { + ret = 0; + } + } + + return ret; +} + +#endif /* CONFIG_LINUX */ + static int qio_channel_socket_set_blocking(QIOChannel *ioc, bool enabled, @@ -790,6 +887,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass, ioc_klass->io_set_delay = qio_channel_socket_set_delay; ioc_klass->io_create_watch = qio_channel_socket_create_watch; ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler; +#ifdef CONFIG_LINUX + ioc_klass->io_flush = qio_channel_socket_flush; +#endif } static const TypeInfo qio_channel_socket_info = { From patchwork Tue Feb 1 06:29:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12731437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7D692C433F5 for ; Tue, 1 Feb 2022 07:13:28 +0000 (UTC) Received: from localhost ([::1]:44106 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nEnLz-0004fd-JC for qemu-devel@archiver.kernel.org; Tue, 01 Feb 2022 02:13:27 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36494) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmg4-0003DD-Ef for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:24545) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmfz-0006aC-Ee for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643697002; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0G9Jevgvj0WBxMB1WHHXftcWhExC84ddWV5f4LyQR/U=; b=K//mA8kpzYV89oTf6CbYWJ+PkmK5YmEgucRBVeeYukvhAwaD/PNNwdkNxJq6g77qDhcP6z LreJTf7KJgFS76Y3Mdfj2qsqRvYQx39PyW8TmoBIKrnNUHge2eyTWfxxuGOHcfg5Nr3pbB STCM+k4OKpNhtOnk4SuQr+j+nmGnD98= Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-208-l-92H-VCOrSNJqYcOxQ07g-1; Tue, 01 Feb 2022 01:30:01 -0500 X-MC-Unique: l-92H-VCOrSNJqYcOxQ07g-1 Received: by mail-oo1-f70.google.com with SMTP id c124-20020a4a4f82000000b002fb4087a29fso2640351oob.20 for ; Mon, 31 Jan 2022 22:30:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0G9Jevgvj0WBxMB1WHHXftcWhExC84ddWV5f4LyQR/U=; b=Ho828RJIZOCg53q3w4UU2Mz0sy8uHAIEEIgAcxh9zmXevnXZ/261xRZZ1ofgVSFDj6 X1MMHFX4e5h7QIbWKyI7KCPgMDiJL6EcCCOgVJbHuzjIi2FblGrQ17RCqyq2Os98rHIq VWzk3BlwlVz9+/nBJxK+rjsvmJ6ZOYHgGo4AnjGV8hR+T3Ct1pNYKb1V0xTkoLbzAy/a AsYcBr59INyJPKhvAves/z/Ua9HpXbs5tHpv9k/NokMhdb2lICtGDfYWgwoS/DmXWHgw KnP80qnWVSHNPupoHiS02IesdzGX5U+UPGEMmJqSVKDgYTxnXTpAmhByEb5gW5Horjv9 QeHg== X-Gm-Message-State: AOAM533waBlkAEoSf/97HJrKYRDu6s2DnXY/QfU7yubgUwWFOX4Diz8Y 0IM9IQh1Own8+DyZSY1h5+Td+MqH24ZZLXUNfPAJDPbG5MbNCKkmg4LALJODWrDY8qW5E62s7d3 gzfx4aO21SAl1ztA= X-Received: by 2002:a05:6808:1583:: with SMTP id t3mr321164oiw.248.1643697000244; Mon, 31 Jan 2022 22:30:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJxpMWe4YJbWwiMzoj2z5Sap9AtH7rw+7u3i+hN+o7uX4eiD5wn68X2R1ShYJ6Cu2MTzDivVWA== X-Received: by 2002:a05:6808:1583:: with SMTP id t3mr321157oiw.248.1643697000002; Mon, 31 Jan 2022 22:30:00 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:95e9:6da1:67bd:fdc3:e12e]) by smtp.gmail.com with ESMTPSA id l14sm14424720ooq.12.2022.01.31.22.29.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jan 2022 22:29:59 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , Paolo Bonzini , Elena Ufimtseva , Jagannathan Raman , John G Johnson , =?utf-8?q?Daniel_P=2E_Berrang?= =?utf-8?q?=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Fam Zheng , Peter Xu Subject: [PATCH v8 3/5] migration: Add zero-copy-send parameter for QMP/HMP for Linux Date: Tue, 1 Feb 2022 03:29:01 -0300 Message-Id: <20220201062901.428838-4-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220201062901.428838-1-leobras@redhat.com> References: <20220201062901.428838-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.081, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add property that allows zero-copy migration of memory pages on the sending side, and also includes a helper function migrate_use_zero_copy_send() to check if it's enabled. No code is introduced to actually do the migration, but it allow future implementations to enable/disable this feature. On non-Linux builds this parameter is compiled-out. Signed-off-by: Leonardo Bras Reviewed-by: Peter Xu Reviewed-by: Daniel P. Berrangé Reviewed-by: Juan Quintela --- qapi/migration.json | 24 ++++++++++++++++++++++++ migration/migration.h | 5 +++++ migration/migration.c | 32 ++++++++++++++++++++++++++++++++ migration/socket.c | 5 +++++ monitor/hmp-cmds.c | 6 ++++++ 5 files changed, 72 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index 5975a0e104..5b4753b5de 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -741,6 +741,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy-send: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.1) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -780,6 +787,7 @@ 'xbzrle-cache-size', 'max-postcopy-bandwidth', 'max-cpu-throttle', 'multifd-compression', 'multifd-zlib-level' ,'multifd-zstd-level', + { 'name': 'zero-copy-send', 'if' : 'CONFIG_LINUX'}, 'block-bitmap-mapping' ] } ## @@ -906,6 +914,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy-send: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.1) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -960,6 +975,7 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', + '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' }, '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } ## @@ -1106,6 +1122,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy-send: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.1) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -1158,6 +1181,7 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', + '*zero-copy-send': { 'type': 'bool', 'if': 'CONFIG_LINUX' }, '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } ## diff --git a/migration/migration.h b/migration/migration.h index 8130b703eb..4cbc901ea0 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void); int migrate_multifd_zlib_level(void); int migrate_multifd_zstd_level(void); +#ifdef CONFIG_LINUX +bool migrate_use_zero_copy_send(void); +#else +#define migrate_use_zero_copy_send() (false) +#endif int migrate_use_xbzrle(void); uint64_t migrate_xbzrle_cache_size(void); bool migrate_colo_enabled(void); diff --git a/migration/migration.c b/migration/migration.c index bcc385b94b..1b3230a97b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -893,6 +893,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) params->multifd_zlib_level = s->parameters.multifd_zlib_level; params->has_multifd_zstd_level = true; params->multifd_zstd_level = s->parameters.multifd_zstd_level; +#ifdef CONFIG_LINUX + params->has_zero_copy_send = true; + params->zero_copy_send = s->parameters.zero_copy_send; +#endif params->has_xbzrle_cache_size = true; params->xbzrle_cache_size = s->parameters.xbzrle_cache_size; params->has_max_postcopy_bandwidth = true; @@ -1549,6 +1553,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params, if (params->has_multifd_compression) { dest->multifd_compression = params->multifd_compression; } +#ifdef CONFIG_LINUX + if (params->has_zero_copy_send) { + dest->zero_copy_send = params->zero_copy_send; + } +#endif if (params->has_xbzrle_cache_size) { dest->xbzrle_cache_size = params->xbzrle_cache_size; } @@ -1661,6 +1670,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) if (params->has_multifd_compression) { s->parameters.multifd_compression = params->multifd_compression; } +#ifdef CONFIG_LINUX + if (params->has_zero_copy_send) { + s->parameters.zero_copy_send = params->zero_copy_send; + } +#endif if (params->has_xbzrle_cache_size) { s->parameters.xbzrle_cache_size = params->xbzrle_cache_size; xbzrle_cache_resize(params->xbzrle_cache_size, errp); @@ -2551,6 +2565,17 @@ int migrate_multifd_zstd_level(void) return s->parameters.multifd_zstd_level; } +#ifdef CONFIG_LINUX +bool migrate_use_zero_copy_send(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.zero_copy_send; +} +#endif + int migrate_use_xbzrle(void) { MigrationState *s; @@ -4194,6 +4219,10 @@ static Property migration_properties[] = { DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState, parameters.multifd_zstd_level, DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL), +#ifdef CONFIG_LINUX + DEFINE_PROP_BOOL("zero_copy_send", MigrationState, + parameters.zero_copy_send, false), +#endif DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState, parameters.xbzrle_cache_size, DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE), @@ -4291,6 +4320,9 @@ static void migration_instance_init(Object *obj) params->has_multifd_compression = true; params->has_multifd_zlib_level = true; params->has_multifd_zstd_level = true; +#ifdef CONFIG_LINUX + params->has_zero_copy_send = true; +#endif params->has_xbzrle_cache_size = true; params->has_max_postcopy_bandwidth = true; params->has_max_cpu_throttle = true; diff --git a/migration/socket.c b/migration/socket.c index 05705a32d8..f586f983b7 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task, } else { trace_migration_socket_outgoing_connected(data->hostname); } + + if (migrate_use_zero_copy_send()) { + error_setg(&err, "Zero copy send not available in migration"); + } + migration_channel_connect(data->s, sioc, data->hostname, err); object_unref(OBJECT(sioc)); } diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index 8c384dc1b2..abd566d8cb 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -1309,6 +1309,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) p->has_multifd_zstd_level = true; visit_type_uint8(v, param, &p->multifd_zstd_level, &err); break; +#ifdef CONFIG_LINUX + case MIGRATION_PARAMETER_ZERO_COPY_SEND: + p->has_zero_copy_send = true; + visit_type_bool(v, param, &p->zero_copy_send, &err); + break; +#endif case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE: p->has_xbzrle_cache_size = true; if (!visit_type_size(v, param, &cache_size, &err)) { From patchwork Tue Feb 1 06:29:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12731415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 50BCFC433EF for ; Tue, 1 Feb 2022 06:50:17 +0000 (UTC) Received: from localhost ([::1]:36430 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nEmzW-0006as-50 for qemu-devel@archiver.kernel.org; Tue, 01 Feb 2022 01:50:14 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36568) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmg7-0003DK-9G for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:44220) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmg2-0006cL-Dm for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643697005; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=obEhyfdKeFZPz947EhpcCq9pBreTXhzcDGdVnISAPPE=; b=B2cs0xTE+aauvG46GYYUF9qG1N8QMpamlhuby+2ONwEsQrCA8ydYsUNuEtiBUUnoaeHZqC LeqH+5j228FpaUFg/KcMjO5ZXCYm0PG59Jw1ok+glIKoPVAzjb7EYX4P9TYC4to97SsTy3 Nxr9NCiKj6EpTiMqJtfmCT8j2fyW+DA= Received: from mail-oo1-f70.google.com (mail-oo1-f70.google.com [209.85.161.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-641-IwNXN9xLNcCJl7boAQBr7A-1; Tue, 01 Feb 2022 01:30:04 -0500 X-MC-Unique: IwNXN9xLNcCJl7boAQBr7A-1 Received: by mail-oo1-f70.google.com with SMTP id e6-20020a4ab146000000b002df82c0ed9aso6158295ooo.21 for ; Mon, 31 Jan 2022 22:30:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=obEhyfdKeFZPz947EhpcCq9pBreTXhzcDGdVnISAPPE=; b=uVsO6FiO7x6be5v6kb8R2IoQ8sIv6myv/JOxAUK18zW6xj9zZpu4+v7oHXr2xdzGu4 j5ZddKWN6E0Kq5077lcunDQVN4EKNHALlS2FgThgVnNr2cCj4/U5SNRV1+VfaHpU7aNt 3yaUYhMt7c5sGbHsYQGoOQmMHxnjmU1OZwDdpXONuBkgHy/W8MWl5tsKojzT46ghr0aB lDStQzZfu456bvK24Ar4uCMo2Ipcerkwa2eqb443YOynNQbS3s2jcW80wEAdvbxiqwZ4 cXGMyZUXDx0s/N3ueUTicavMbZ05kMPUIriD2lbRQKEfqSJ0gUEisJxkwiqKcms8liWf k/+w== X-Gm-Message-State: AOAM532BCvvThDRo3kKxpEgSI0c4806ixAbwHR3PvyfNITmr0sL9WXTH IRg7ruvG8RwoewIhEWvdjuh3QQTe4YDWxFgiswG1btwI/WLvsZxcZEqaWllbHJBrUxac4WYsZTM 1XfRMWcRHbAkuL8I= X-Received: by 2002:a54:4812:: with SMTP id j18mr335945oij.186.1643697004194; Mon, 31 Jan 2022 22:30:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJxOW3CiZJ6QsfFHylmw5lGiRr53tBmMqACfnq28L39SIB17EjLYGXemaHSgt7U79UcSNNvZJQ== X-Received: by 2002:a54:4812:: with SMTP id j18mr335929oij.186.1643697003998; Mon, 31 Jan 2022 22:30:03 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:95e9:6da1:67bd:fdc3:e12e]) by smtp.gmail.com with ESMTPSA id l14sm14424720ooq.12.2022.01.31.22.30.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jan 2022 22:30:03 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , Paolo Bonzini , Elena Ufimtseva , Jagannathan Raman , John G Johnson , =?utf-8?q?Daniel_P=2E_Berrang?= =?utf-8?q?=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Fam Zheng , Peter Xu Subject: [PATCH v8 4/5] migration: Add migrate_use_tls() helper Date: Tue, 1 Feb 2022 03:29:02 -0300 Message-Id: <20220201062901.428838-5-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220201062901.428838-1-leobras@redhat.com> References: <20220201062901.428838-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.081, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" A lot of places check parameters.tls_creds in order to evaluate if TLS is in use, and sometimes call migrate_get_current() just for that test. Add new helper function migrate_use_tls() in order to simplify testing for TLS usage. Signed-off-by: Leonardo Bras Reviewed-by: Juan Quintela Reviewed-by: Peter Xu Reviewed-by: Daniel P. Berrangé --- migration/migration.h | 1 + migration/channel.c | 3 +-- migration/migration.c | 9 +++++++++ migration/multifd.c | 5 +---- 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 4cbc901ea0..debacb2251 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -344,6 +344,7 @@ bool migrate_use_zero_copy_send(void); #else #define migrate_use_zero_copy_send() (false) #endif +int migrate_use_tls(void); int migrate_use_xbzrle(void); uint64_t migrate_xbzrle_cache_size(void); bool migrate_colo_enabled(void); diff --git a/migration/channel.c b/migration/channel.c index c4fc000a1a..086b5c0d8b 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -38,8 +38,7 @@ void migration_channel_process_incoming(QIOChannel *ioc) trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); - if (s->parameters.tls_creds && - *s->parameters.tls_creds && + if (migrate_use_tls() && !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)) { migration_tls_channel_process_incoming(s, ioc, &local_err); diff --git a/migration/migration.c b/migration/migration.c index 1b3230a97b..3e0a25bb5b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2576,6 +2576,15 @@ bool migrate_use_zero_copy_send(void) } #endif +int migrate_use_tls(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.tls_creds && *s->parameters.tls_creds; +} + int migrate_use_xbzrle(void) { MigrationState *s; diff --git a/migration/multifd.c b/migration/multifd.c index 76b57a7177..43998ad117 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -784,14 +784,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, Error *error) { - MigrationState *s = migrate_get_current(); - trace_multifd_set_outgoing_channel( ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error); if (!error) { - if (s->parameters.tls_creds && - *s->parameters.tls_creds && + if (migrate_use_tls() && !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)) { multifd_tls_channel_connect(p, ioc, &error); From patchwork Tue Feb 1 06:29:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12731463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 77716C433F5 for ; Tue, 1 Feb 2022 07:26:44 +0000 (UTC) Received: from localhost ([::1]:50964 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nEnYp-0001iF-7a for qemu-devel@archiver.kernel.org; Tue, 01 Feb 2022 02:26:43 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36592) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmgA-0003DX-BD for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25640) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nEmg6-0006dH-MZ for qemu-devel@nongnu.org; Tue, 01 Feb 2022 01:30:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643697009; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9AmGAz4qTlgfGRFAixOe2SM28q3/GkCDL2jrBc1WW8c=; b=YUyfVux6W5CeSP5yTuVRWpW89PulUV5edz2l6V3WcSM11rZ32RM1z4zPf+DW1qlRbrSWCc 9CK4CrL4A3eZCQFe3gCXLrmgFo15pb3+SXYasC+L0kB/Ms7kdUgCFH2PI92mhvdg9102du nAzFpAcmwNt2QZ4Db3RTVJRg7gAwdS0= Received: from mail-oo1-f72.google.com (mail-oo1-f72.google.com [209.85.161.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-376-CmCledQVNOe-RtyAEFef1Q-1; Tue, 01 Feb 2022 01:30:09 -0500 X-MC-Unique: CmCledQVNOe-RtyAEFef1Q-1 Received: by mail-oo1-f72.google.com with SMTP id e14-20020a056820060e00b002edda9f237bso5654471oow.22 for ; Mon, 31 Jan 2022 22:30:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9AmGAz4qTlgfGRFAixOe2SM28q3/GkCDL2jrBc1WW8c=; b=HvM+GAYJw/NXLrFhmosEdAduaixc09eClDtPQiirDCnU+pezBZzgXq67Oj7wpEd6vW uhmPmC38qFj3R745mg7VworE17L81Mi8AqjXkZ4ahudLEtkc5PEcL+HQkfjX9/adJyBc BjWDwZT77zJ31zAExhe3X31X8sbw8hXDCYgaVVxg1ekmF9UyAj4oGGzY6Bm5W2O+tnGI zEb8BHB3WNZZX2cUeuQOeYr1YCXG3/WSZjL/v0zeZqy8UeoaHkhxDWxETc/OQrG3l0KD UIGVXBkU8E1rOGc5nbxwt2yAh2cYC45E/LJu+BCrwdewHatURWKTgXzr3wUbOd1GYB3v oEGg== X-Gm-Message-State: AOAM532oWazdO72dis7ETF0K3QbeVYYQS3Zdw11cCNMCuGOeUrNfKYEm 3ZQRUWRZ3LgRYzaFVq3bV4EjahgxW3dxS8oZwYOLiCM7PFS1+wNzU5ePE4O2OnwXAe3aXYwi38/ ubgD0AyDqnotOlMI= X-Received: by 2002:a9d:7341:: with SMTP id l1mr9328093otk.36.1643697008190; Mon, 31 Jan 2022 22:30:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwPXx64vx07YOTrMOjj7oaY/VtW4YwuLvzw3N5kUQlXqdw+HU6TPGIRCLb0+5vwnR9+VXA2lA== X-Received: by 2002:a9d:7341:: with SMTP id l1mr9328080otk.36.1643697007988; Mon, 31 Jan 2022 22:30:07 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:95e9:6da1:67bd:fdc3:e12e]) by smtp.gmail.com with ESMTPSA id l14sm14424720ooq.12.2022.01.31.22.30.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jan 2022 22:30:07 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Marc-Andr=C3=A9_Lureau?= , Paolo Bonzini , Elena Ufimtseva , Jagannathan Raman , John G Johnson , =?utf-8?q?Daniel_P=2E_Berrang?= =?utf-8?q?=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Fam Zheng , Peter Xu Subject: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Date: Tue, 1 Feb 2022 03:29:03 -0300 Message-Id: <20220201062901.428838-6-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220201062901.428838-1-leobras@redhat.com> References: <20220201062901.428838-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.081, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Implement zero copy send on nocomp_send_write(), by making use of QIOChannel writev + flags & flush interface. Change multifd_send_sync_main() so flush_zero_copy() can be called after each iteration in order to make sure all dirty pages are sent before a new iteration is started. It will also flush at the beginning and at the end of migration. Also make it return -1 if flush_zero_copy() fails, in order to cancel the migration process, and avoid resuming the guest in the target host without receiving all current RAM. This will work fine on RAM migration because the RAM pages are not usually freed, and there is no problem on changing the pages content between writev_zero_copy() and the actual sending of the buffer, because this change will dirty the page and cause it to be re-sent on a next iteration anyway. A lot of locked memory may be needed in order to use multid migration with zero-copy enabled, so disabling the feature should be necessary for low-privileged users trying to perform multifd migrations. Signed-off-by: Leonardo Bras --- migration/multifd.h | 4 +++- migration/migration.c | 11 ++++++++++- migration/multifd.c | 41 +++++++++++++++++++++++++++++++++++------ migration/ram.c | 29 ++++++++++++++++++++++------- migration/socket.c | 5 +++-- 5 files changed, 73 insertions(+), 17 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index 4dda900a0b..7ec688fb4f 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp); bool multifd_recv_all_channels_created(void); bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); -void multifd_send_sync_main(QEMUFile *f); +int multifd_send_sync_main(QEMUFile *f); int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); /* Multifd Compression flags */ @@ -96,6 +96,8 @@ typedef struct { uint32_t packet_len; /* pointer to the packet */ MultiFDPacket_t *packet; + /* multifd flags for sending ram */ + int write_flags; /* multifd flags for each packet */ uint32_t flags; /* size of the next packet that contains pages */ diff --git a/migration/migration.c b/migration/migration.c index 3e0a25bb5b..1450fd0370 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp) error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: "); return false; } - +#ifdef CONFIG_LINUX + if (params->zero_copy_send && + (!migrate_use_multifd() || + params->multifd_compression != MULTIFD_COMPRESSION_NONE || + (params->tls_creds && *params->tls_creds))) { + error_setg(errp, + "Zero copy only available for non-compressed non-TLS multifd migration"); + return false; + } +#endif return true; } diff --git a/migration/multifd.c b/migration/multifd.c index 43998ad117..2d68b9cf4f 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -568,19 +568,28 @@ void multifd_save_cleanup(void) multifd_send_state = NULL; } -void multifd_send_sync_main(QEMUFile *f) +int multifd_send_sync_main(QEMUFile *f) { int i; + bool flush_zero_copy; if (!migrate_use_multifd()) { - return; + return 0; } if (multifd_send_state->pages->num) { if (multifd_send_pages(f) < 0) { error_report("%s: multifd_send_pages fail", __func__); - return; + return 0; } } + + /* + * When using zero-copy, it's necessary to flush after each iteration to + * make sure pages from earlier iterations don't end up replacing newer + * pages. + */ + flush_zero_copy = migrate_use_zero_copy_send(); + for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f) if (p->quit) { error_report("%s: channel %d has already quit", __func__, i); qemu_mutex_unlock(&p->mutex); - return; + return 0; } p->packet_num = multifd_send_state->packet_num++; @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f) ram_counters.transferred += p->packet_len; qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); + + if (flush_zero_copy) { + int ret; + Error *err = NULL; + + ret = qio_channel_flush(p->c, &err); + if (ret < 0) { + error_report_err(err); + return -1; + } + } } for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f) qemu_sem_wait(&p->sem_sync); } trace_multifd_send_sync_main(multifd_send_state->packet_num); + + return 0; } static void *multifd_send_thread(void *opaque) @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque) p->iov[0].iov_len = p->packet_len; p->iov[0].iov_base = p->packet; - ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num, - &local_err); + ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL, + 0, p->write_flags, &local_err); if (ret != 0) { break; } @@ -910,6 +932,13 @@ int multifd_save_setup(Error **errp) /* We need one extra place for the packet header */ p->iov = g_new0(struct iovec, page_count + 1); p->normal = g_new0(ram_addr_t, page_count); + + if (migrate_use_zero_copy_send()) { + p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; + } else { + p->write_flags = 0; + } + socket_send_channel_create(multifd_new_send_channel_async, p); } diff --git a/migration/ram.c b/migration/ram.c index 91ca743ac8..d8c215e43a 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2902,6 +2902,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) { RAMState **rsp = opaque; RAMBlock *block; + int ret; if (compress_threads_save_setup()) { return -1; @@ -2936,7 +2937,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); - multifd_send_sync_main(f); + ret = multifd_send_sync_main(f); + if (ret < 0) { + return ret; + } + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); @@ -3045,7 +3050,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { - multifd_send_sync_main(rs->f); + ret = multifd_send_sync_main(rs->f); + if (ret < 0) { + return ret; + } + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); ram_transferred_add(8); @@ -3105,13 +3114,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ram_control_after_iterate(f, RAM_CONTROL_FINISH); } - if (ret >= 0) { - multifd_send_sync_main(rs->f); - qemu_put_be64(f, RAM_SAVE_FLAG_EOS); - qemu_fflush(f); + if (ret < 0) { + return ret; } - return ret; + ret = multifd_send_sync_main(rs->f); + if (ret < 0) { + return ret; + } + + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + qemu_fflush(f); + + return 0; } static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, diff --git a/migration/socket.c b/migration/socket.c index f586f983b7..c49431425e 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -78,8 +78,9 @@ static void socket_outgoing_migration(QIOTask *task, trace_migration_socket_outgoing_connected(data->hostname); } - if (migrate_use_zero_copy_send()) { - error_setg(&err, "Zero copy send not available in migration"); + if (migrate_use_zero_copy_send() && + !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + error_setg(&err, "Zero copy send feature not detected in host kernel"); } migration_channel_connect(data->s, sioc, data->hostname, err);