From patchwork Thu Jan 6 22:13:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12705866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 73F92C433EF for ; Thu, 6 Jan 2022 22:16:53 +0000 (UTC) Received: from localhost ([::1]:40878 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n5b40-00006K-FK for qemu-devel@archiver.kernel.org; Thu, 06 Jan 2022 17:16:52 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51076) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1S-0005ID-GA for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:14 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:23522) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1P-0006Fv-CF for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641507249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PDaez9BUtW4cAqTIxZlO+K029x+1Co0AzBVNeAsnpuI=; b=Ik1BiYASpQAB/iCCFjtPZhJNkVv6DJVhAcMfmRiKO6mi1oiWmjrmJRdAttScWCTJYGDdp8 6/nf9NcrpNILvqV9Nq2+4oYcpN+hMmtsFZ66TTkCPuGobFuQDA9CvLNNc0GRko/r8BNicX ZxzY1Yedzo/vdBJQjs+Nh9/UDK6/RoE= Received: from mail-ua1-f72.google.com (mail-ua1-f72.google.com [209.85.222.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-491-DcyzZCWuPzyfW-aLEPfoig-1; Thu, 06 Jan 2022 17:14:09 -0500 X-MC-Unique: DcyzZCWuPzyfW-aLEPfoig-1 Received: by mail-ua1-f72.google.com with SMTP id w14-20020ab055ce000000b002fedc60272fso2141317uaa.21 for ; Thu, 06 Jan 2022 14:14:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PDaez9BUtW4cAqTIxZlO+K029x+1Co0AzBVNeAsnpuI=; b=EVRDdMQFVndZPk42H6FuRIuswebiBXH64xoPc57VXgMd/ZpSS+gDLb04MEwVx+M7vF FLXYtUnUDBThFfjADg1m06m/u+MnJLUJzE/QIDfhY8h0pWqC3lp+h7qrqO/TxNwvNhCt 6aotq7em00qeXZfXJ8+iO7rK6i5icbxDvGvqpjJSxCboptKHCuTr1obFwXsYXSB+qoM4 UDgLvTqnzVpfEstjI1OOHqJwUhfOgzmHERm+IrpHrLMAfz35+mCklb6A6rljkXpgq34H bCXsq7IX0ppoyL5M4C6RRplskf8IfU4/Iy+HzVub3A2j4DZKmXX6ydNwI3V6IZe/pLYh QM3Q== X-Gm-Message-State: AOAM5316nMP2EpXlYElCNcLtgGaGPhUU7MlDxujNGNbrcjbqeqwRzsxi Fa8zrcywl/RB17PveuXwBr4UWveIqKBDnePBk/Risp2XgQpYjsQ5W8EL3jWK61dxxQDDSvmBCdG SUnHplCT+kJz8p18= X-Received: by 2002:a05:6102:34f5:: with SMTP id bi21mr19678534vsb.1.1641507247206; Thu, 06 Jan 2022 14:14:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJx28tiEDNccKeJaLFPCGKVpXwkO0EMe4mG/aVGLJD5o1HWpt6FZuDCRXgRtpomfoAXgeI+/ew== X-Received: by 2002:a05:6102:34f5:: with SMTP id bi21mr19678524vsb.1.1641507246983; Thu, 06 Jan 2022 14:14:06 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:cc01:fae1:7982:b010:d91]) by smtp.gmail.com with ESMTPSA id c15sm2098831uaj.13.2022.01.06.14.14.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jan 2022 14:14:06 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Peter Xu Subject: [PATCH v7 1/5] QIOChannel: Add flags on io_writev and introduce io_flush callback Date: Thu, 6 Jan 2022 19:13:38 -0300 Message-Id: <20220106221341.8779-2-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220106221341.8779-1-leobras@redhat.com> References: <20220106221341.8779-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.372, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add flags to io_writev and introduce io_flush as optional callback to QIOChannelClass, allowing the implementation of zero copy writes by subclasses. How to use them: - Write data using qio_channel_writev(...,QIO_CHANNEL_WRITE_FLAG_ZERO_COPY), - Wait write completion with qio_channel_flush(). Notes: As some zero copy implementations work asynchronously, it's recommended to keep the write buffer untouched until the return of qio_channel_flush(), to avoid the risk of sending an updated buffer instead of the buffer state during write. As io_flush callback is optional, if a subclass does not implement it, then: - io_flush will return 0 without changing anything. Also, some functions like qio_channel_writev_full_all() were adapted to receive a flag parameter. That allows shared code between zero copy and non-zero copy writev, and also an easier implementation on new flags. Signed-off-by: Leonardo Bras --- include/io/channel.h | 67 +++++++++++++++++++++++++++++++++++--------- io/channel-buffer.c | 1 + io/channel-command.c | 1 + io/channel-file.c | 1 + io/channel-socket.c | 2 ++ io/channel-tls.c | 1 + io/channel-websock.c | 1 + io/channel.c | 51 +++++++++++++++++++++++---------- migration/rdma.c | 1 + 9 files changed, 98 insertions(+), 28 deletions(-) diff --git a/include/io/channel.h b/include/io/channel.h index 88988979f8..343766ce5b 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -32,12 +32,15 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass, #define QIO_CHANNEL_ERR_BLOCK -2 +#define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1 + typedef enum QIOChannelFeature QIOChannelFeature; enum QIOChannelFeature { QIO_CHANNEL_FEATURE_FD_PASS, QIO_CHANNEL_FEATURE_SHUTDOWN, QIO_CHANNEL_FEATURE_LISTEN, + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY, }; @@ -104,6 +107,7 @@ struct QIOChannelClass { size_t niov, int *fds, size_t nfds, + int flags, Error **errp); ssize_t (*io_readv)(QIOChannel *ioc, const struct iovec *iov, @@ -136,6 +140,8 @@ struct QIOChannelClass { IOHandler *io_read, IOHandler *io_write, void *opaque); + int (*io_flush)(QIOChannel *ioc, + Error **errp); }; /* General I/O handling functions */ @@ -222,12 +228,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc, /** - * qio_channel_writev_full: + * qio_channel_writev_full_flags: * @ioc: the channel object * @iov: the array of memory regions to write data from * @niov: the length of the @iov array * @fds: an array of file handles to send * @nfds: number of file handles in @fds + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*) * @errp: pointer to a NULL-initialized error object * * Write data to the IO channel, reading it from the @@ -255,12 +262,16 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc, * or QIO_CHANNEL_ERR_BLOCK if no data is can be sent * and the channel is non-blocking */ -ssize_t qio_channel_writev_full(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, - size_t nfds, - Error **errp); +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + int flags, + Error **errp); + +#define qio_channel_writev_full(ioc, iov, niov, fds, nfds, errp) \ + qio_channel_writev_full_flags(ioc, iov, niov, fds, nfds, 0, errp) /** * qio_channel_readv_all_eof: @@ -831,12 +842,13 @@ int qio_channel_readv_full_all(QIOChannel *ioc, Error **errp); /** - * qio_channel_writev_full_all: + * qio_channel_writev_full_all_flags: * @ioc: the channel object * @iov: the array of memory regions to write data from * @niov: the length of the @iov array * @fds: an array of file handles to send * @nfds: number of file handles in @fds + * @flags: write flags (QIO_CHANNEL_WRITE_FLAG_*) * @errp: pointer to a NULL-initialized error object * * @@ -846,13 +858,42 @@ int qio_channel_readv_full_all(QIOChannel *ioc, * to be written, yielding from the current coroutine * if required. * + * If QIO_CHANNEL_WRITE_FLAG_ZERO_COPY is passed in flags, + * instead of waiting for all requested data to be written, + * this function will wait until it's all queued for writing. + * In this case, if the buffer gets changed between queueing and + * sending, the updated buffer will be sent. If this is not a + * desired behavior, it's suggested to call qio_channel_flush() + * before reusing the buffer. + * * Returns: 0 if all bytes were written, or -1 on error */ -int qio_channel_writev_full_all(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, size_t nfds, - Error **errp); +int qio_channel_writev_full_all_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, size_t nfds, + int flags, Error **errp); +#define qio_channel_writev_full_all(ioc, iov, niov, fds, nfds, errp) \ + qio_channel_writev_full_all_flags(ioc, iov, niov, fds, nfds, 0, errp) + +/** + * qio_channel_flush: + * @ioc: the channel object + * @errp: pointer to a NULL-initialized error object + * + * Will block until every packet queued with + * qio_channel_writev_full_flags() + QIO_CHANNEL_WRITE_FLAG_ZERO_COPY + * is sent, or return in case of any error. + * + * If not implemented, acts as a no-op, and returns 0. + * + * Returns -1 if any error is found, + * 1 if every send failed to use zero copy. + * 0 otherwise. + */ + +int qio_channel_flush(QIOChannel *ioc, + Error **errp); #endif /* QIO_CHANNEL_H */ diff --git a/io/channel-buffer.c b/io/channel-buffer.c index baa4e2b089..bf52011be2 100644 --- a/io/channel-buffer.c +++ b/io/channel-buffer.c @@ -81,6 +81,7 @@ static ssize_t qio_channel_buffer_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc); diff --git a/io/channel-command.c b/io/channel-command.c index b2a9e27138..5ff1691bad 100644 --- a/io/channel-command.c +++ b/io/channel-command.c @@ -258,6 +258,7 @@ static ssize_t qio_channel_command_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc); diff --git a/io/channel-file.c b/io/channel-file.c index c4bf799a80..348a48545e 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -114,6 +114,7 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); diff --git a/io/channel-socket.c b/io/channel-socket.c index 606ec97cf7..bfbd64787e 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -525,6 +525,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); @@ -620,6 +621,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); diff --git a/io/channel-tls.c b/io/channel-tls.c index 2ae1b92fc0..4ce890a538 100644 --- a/io/channel-tls.c +++ b/io/channel-tls.c @@ -301,6 +301,7 @@ static ssize_t qio_channel_tls_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc); diff --git a/io/channel-websock.c b/io/channel-websock.c index 70889bb54d..035dd6075b 100644 --- a/io/channel-websock.c +++ b/io/channel-websock.c @@ -1127,6 +1127,7 @@ static ssize_t qio_channel_websock_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelWebsock *wioc = QIO_CHANNEL_WEBSOCK(ioc); diff --git a/io/channel.c b/io/channel.c index e8b019dc36..904855e16e 100644 --- a/io/channel.c +++ b/io/channel.c @@ -67,12 +67,13 @@ ssize_t qio_channel_readv_full(QIOChannel *ioc, } -ssize_t qio_channel_writev_full(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, - size_t nfds, - Error **errp) +ssize_t qio_channel_writev_full_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, + size_t nfds, + int flags, + Error **errp) { QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); @@ -83,7 +84,7 @@ ssize_t qio_channel_writev_full(QIOChannel *ioc, return -1; } - return klass->io_writev(ioc, iov, niov, fds, nfds, errp); + return klass->io_writev(ioc, iov, niov, fds, nfds, flags, errp); } @@ -217,14 +218,15 @@ int qio_channel_writev_all(QIOChannel *ioc, size_t niov, Error **errp) { - return qio_channel_writev_full_all(ioc, iov, niov, NULL, 0, errp); + return qio_channel_writev_full_all_flags(ioc, iov, niov, NULL, 0, 0, + errp); } -int qio_channel_writev_full_all(QIOChannel *ioc, - const struct iovec *iov, - size_t niov, - int *fds, size_t nfds, - Error **errp) +int qio_channel_writev_full_all_flags(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + int *fds, size_t nfds, + int flags, Error **errp) { int ret = -1; struct iovec *local_iov = g_new(struct iovec, niov); @@ -235,10 +237,16 @@ int qio_channel_writev_full_all(QIOChannel *ioc, iov, niov, 0, iov_size(iov, niov)); + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { + assert(fds == NULL && nfds == 0); + } + while (nlocal_iov > 0) { ssize_t len; - len = qio_channel_writev_full(ioc, local_iov, nlocal_iov, fds, nfds, - errp); + + len = qio_channel_writev_full_flags(ioc, local_iov, nlocal_iov, fds, + nfds, flags, errp); + if (len == QIO_CHANNEL_ERR_BLOCK) { if (qemu_in_coroutine()) { qio_channel_yield(ioc, G_IO_OUT); @@ -473,6 +481,19 @@ off_t qio_channel_io_seek(QIOChannel *ioc, return klass->io_seek(ioc, offset, whence, errp); } +int qio_channel_flush(QIOChannel *ioc, + Error **errp) +{ + QIOChannelClass *klass = QIO_CHANNEL_GET_CLASS(ioc); + + if (!klass->io_flush || + !qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + return 0; + } + + return klass->io_flush(ioc, errp); +} + static void qio_channel_restart_read(void *opaque) { diff --git a/migration/rdma.c b/migration/rdma.c index f5d3bbe7e9..54acd2000e 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2833,6 +2833,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc, size_t niov, int *fds, size_t nfds, + int flags, Error **errp) { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); From patchwork Thu Jan 6 22:13:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12705865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A7209C433EF for ; Thu, 6 Jan 2022 22:16:49 +0000 (UTC) Received: from localhost ([::1]:40622 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n5b3w-0008L7-N3 for qemu-devel@archiver.kernel.org; Thu, 06 Jan 2022 17:16:48 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51090) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1T-0005IS-G0 for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:15 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:60949) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1Q-0006G2-D7 for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641507251; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=97GQvsNxYByL1rdLgt+K0//fb3EwwD9nI4XNP8AgV+Y=; b=EGz2QIwSpWsg1rZpiXHEl2TvJ+3oNuVAk59A99USgzk+qsEdl+YaZpBJJtOudNY+YvVnzR pmobF0/ZC2k8OcJXPBCQJNP2lPAyMxtIK3X3qVrDENsXWJ+LILiSw4r4AdNkFMhN6Xh5+k EKjkY3LLyL7R7nwZz7ua0V8Mnj/0IGQ= Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com [209.85.222.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-36-umHsLwAxNNeXzLZYGKRL5g-1; Thu, 06 Jan 2022 17:14:10 -0500 X-MC-Unique: umHsLwAxNNeXzLZYGKRL5g-1 Received: by mail-ua1-f70.google.com with SMTP id g5-20020ab01045000000b002f9d0603fbfso2145353uab.6 for ; Thu, 06 Jan 2022 14:14:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=97GQvsNxYByL1rdLgt+K0//fb3EwwD9nI4XNP8AgV+Y=; b=xrL8opKtLVR4Mtk27UwsYJbZngDo7N8ce7lLti8YrKah9fySARkysXwDW9gapEdDLb dK8sMUQO5qOzffwSAMReZtzK2v9Im8/5/GvIykayOLPIY0MpaRJITewnCz4gZpvuiUBn uKdudOeBSZ/oLzGZlS2rW0hbg0ZPKfzLWEQWQIpIe83R+03DYNT49oBYE7VEi8mEFJ88 ZsGNVatHGIUdxyPJjv/Y5UHx2bBhRMcaeFX5U1NIo1EeCrjw/1t692pCdqEc4t8ndExL 067LGWKegaWDWX7kGwhist0BbdnAMY2A6D2I+7mfLAeCn++c/8bjebciaXVwVClliLQ3 WWog== X-Gm-Message-State: AOAM532z5fUSn9ig3q6e7Vl+Rvd+4KCT1r9NtzCkECCj8zJ1davz5YoL wjm8AQbTISufIFHicxRO+qUtB0F7VVA6uKZ0ogmGVIhR8gabaeJvXVTvPrtMrHQkaHI2qZ8Fk3B T2cxkWk9M/Agajxw= X-Received: by 2002:ab0:3730:: with SMTP id s16mr6827160uag.83.1641507250094; Thu, 06 Jan 2022 14:14:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJwNQGhj/zDfKxXt9OZ8nIjJsK34tFzUbfgySMtyVFFW0G4HlqrcTG8SfJJmnK4rbs7Aqv7/JA== X-Received: by 2002:ab0:3730:: with SMTP id s16mr6827153uag.83.1641507249895; Thu, 06 Jan 2022 14:14:09 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:cc01:fae1:7982:b010:d91]) by smtp.gmail.com with ESMTPSA id c15sm2098831uaj.13.2022.01.06.14.14.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jan 2022 14:14:09 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Peter Xu Subject: [PATCH v7 2/5] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX Date: Thu, 6 Jan 2022 19:13:39 -0300 Message-Id: <20220106221341.8779-3-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220106221341.8779-1-leobras@redhat.com> References: <20220106221341.8779-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.372, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" For CONFIG_LINUX, implement the new zero copy flag and the optional callback io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY feature is available in the host kernel, which is checked on qio_channel_socket_connect_sync() qio_channel_socket_flush() was implemented by counting how many times sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the socket's error queue, in order to find how many of them finished sending. Flush will loop until those counters are the same, or until some error occurs. Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY: 1: Buffer - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid copying, some caution is necessary to avoid overwriting any buffer before it's sent. If something like this happen, a newer version of the buffer may be sent instead. - If this is a problem, it's recommended to call qio_channel_flush() before freeing or re-using the buffer. 2: Locked memory - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, and unlocked after it's sent. - Depending on the size of each buffer, and how often it's sent, it may require a larger amount of locked memory than usually available to non-root user. - If the required amount of locked memory is not available, writev_zero_copy will return an error, which can abort an operation like migration, - Because of this, when an user code wants to add zero copy as a feature, it requires a mechanism to disable it, so it can still be accessible to less privileged users. Signed-off-by: Leonardo Bras Reviewed-by: Peter Xu Reviewed-by: Daniel P. Berrangé --- include/io/channel-socket.h | 2 + io/channel-socket.c | 107 ++++++++++++++++++++++++++++++++++-- 2 files changed, 105 insertions(+), 4 deletions(-) diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h index e747e63514..513c428fe4 100644 --- a/include/io/channel-socket.h +++ b/include/io/channel-socket.h @@ -47,6 +47,8 @@ struct QIOChannelSocket { socklen_t localAddrLen; struct sockaddr_storage remoteAddr; socklen_t remoteAddrLen; + ssize_t zero_copy_queued; + ssize_t zero_copy_sent; }; diff --git a/io/channel-socket.c b/io/channel-socket.c index bfbd64787e..fb1e210ec5 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -26,6 +26,10 @@ #include "io/channel-watch.h" #include "trace.h" #include "qapi/clone-visitor.h" +#ifdef CONFIG_LINUX +#include +#include +#endif #define SOCKET_MAX_FDS 16 @@ -55,6 +59,8 @@ qio_channel_socket_new(void) sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); sioc->fd = -1; + sioc->zero_copy_queued = 0; + sioc->zero_copy_sent = 0; ioc = QIO_CHANNEL(sioc); qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); @@ -154,6 +160,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket *ioc, return -1; } +#ifdef CONFIG_LINUX + int ret, v = 1; + ret = qemu_setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v)); + if (ret == 0) { + /* Zero copy available on host */ + qio_channel_set_feature(QIO_CHANNEL(ioc), + QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY); + } +#endif + return 0; } @@ -534,6 +550,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)]; size_t fdsize = sizeof(int) * nfds; struct cmsghdr *cmsg; + int sflags = 0; memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)); @@ -558,15 +575,26 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, memcpy(CMSG_DATA(cmsg), fds, fdsize); } + if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) { + sflags = MSG_ZEROCOPY; + } + retry: - ret = sendmsg(sioc->fd, &msg, 0); + ret = sendmsg(sioc->fd, &msg, sflags); if (ret <= 0) { - if (errno == EAGAIN) { + switch (errno) { + case EAGAIN: return QIO_CHANNEL_ERR_BLOCK; - } - if (errno == EINTR) { + case EINTR: goto retry; + case ENOBUFS: + if (sflags & MSG_ZEROCOPY) { + error_setg_errno(errp, errno, + "Process can't lock enough memory for using MSG_ZEROCOPY"); + return -1; + } } + error_setg_errno(errp, errno, "Unable to write to socket"); return -1; @@ -660,6 +688,74 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc, } #endif /* WIN32 */ + +#ifdef CONFIG_LINUX +static int qio_channel_socket_flush(QIOChannel *ioc, + Error **errp) +{ + QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); + struct msghdr msg = {}; + struct sock_extended_err *serr; + struct cmsghdr *cm; + char control[CMSG_SPACE(sizeof(*serr))]; + int received; + int ret = 1; + + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + memset(control, 0, sizeof(control)); + + while (sioc->zero_copy_sent < sioc->zero_copy_queued) { + received = recvmsg(sioc->fd, &msg, MSG_ERRQUEUE); + if (received < 0) { + switch (errno) { + case EAGAIN: + /* Nothing on errqueue, wait until something is available */ + qio_channel_wait(ioc, G_IO_ERR); + continue; + case EINTR: + continue; + default: + error_setg_errno(errp, errno, + "Unable to read errqueue"); + return -1; + } + } + + cm = CMSG_FIRSTHDR(&msg); + if (cm->cmsg_level != SOL_IP && + cm->cmsg_type != IP_RECVERR) { + error_setg_errno(errp, EPROTOTYPE, + "Wrong cmsg in errqueue"); + return -1; + } + + serr = (void *) CMSG_DATA(cm); + if (serr->ee_errno != SO_EE_ORIGIN_NONE) { + error_setg_errno(errp, serr->ee_errno, + "Error on socket"); + return -1; + } + if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) { + error_setg_errno(errp, serr->ee_origin, + "Error not from zero copy"); + return -1; + } + + /* No errors, count successfully finished sendmsg()*/ + sioc->zero_copy_sent += serr->ee_data - serr->ee_info + 1; + + /* If any sendmsg() succeeded using zero copy, return 0 at the end */ + if (serr->ee_code != SO_EE_CODE_ZEROCOPY_COPIED) { + ret = 0; + } + } + + return ret; +} + +#endif /* CONFIG_LINUX */ + static int qio_channel_socket_set_blocking(QIOChannel *ioc, bool enabled, @@ -789,6 +885,9 @@ static void qio_channel_socket_class_init(ObjectClass *klass, ioc_klass->io_set_delay = qio_channel_socket_set_delay; ioc_klass->io_create_watch = qio_channel_socket_create_watch; ioc_klass->io_set_aio_fd_handler = qio_channel_socket_set_aio_fd_handler; +#ifdef CONFIG_LINUX + ioc_klass->io_flush = qio_channel_socket_flush; +#endif } static const TypeInfo qio_channel_socket_info = { From patchwork Thu Jan 6 22:13:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12705869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 65BF3C433EF for ; Thu, 6 Jan 2022 22:19:23 +0000 (UTC) Received: from localhost ([::1]:49626 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n5b6Q-0006Qx-G5 for qemu-devel@archiver.kernel.org; Thu, 06 Jan 2022 17:19:22 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51100) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1W-0005Ie-4K for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:32236) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1T-0006GO-6n for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641507254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0NDzPeIVUudN25zaS3Wto140+IMZy5l5AkFrdwEcCok=; b=ATCZssT6P5os6mk0PHOQDZtIsG4F2GI8XlrLJj6LSlzb4r2YdzOXy91e/yJZJrnbGifUhl LQs0Bil15qZyGfaeSmHhrKsEgcEwYTyvDicYHLDaafeMfdZEDAoh8ktmyT6eInXB0ady+i x1yUkW5kh9qG50HKSy/FkRBT+L/0qdk= Received: from mail-vk1-f197.google.com (mail-vk1-f197.google.com [209.85.221.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-621-An53h6KaPjeRGTqoyX0TKQ-1; Thu, 06 Jan 2022 17:14:13 -0500 X-MC-Unique: An53h6KaPjeRGTqoyX0TKQ-1 Received: by mail-vk1-f197.google.com with SMTP id r15-20020a1fa80f000000b003133230d1e1so1070821vke.7 for ; Thu, 06 Jan 2022 14:14:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0NDzPeIVUudN25zaS3Wto140+IMZy5l5AkFrdwEcCok=; b=vQ8WmqlLDr321R7XptsrBifnlVWdCWm9To/ou+7whVM2GAmDlJskngjUkbQ4bZtknO Ls9YWpDZomDlRRYFrjeYS0AYLemwYh5PgBE0B+nPg5kA2doRlTSy+VqBDsVsOI4YDz4x eiZr6QtGumtM+u9wONaQsDEn+fAbWV2FxXKSrVkq93uTGZf/ipT5rakPs8BkkZmd0SHC ydtkfJVE6x3QW6u2iPu+gMyvOOtqX5Mgq8zLlX6kUBwU2cQ2MBBHEVWpMCT800CM7MGv qCftSG/DmjUkooXa2AmeaYhsJQz4tvBzqW3KgicYsBnG7EeK75ABkXyuLAf4Nh7UyyN0 Pkxg== X-Gm-Message-State: AOAM530KvPYrYXAuksUznkX2fRHYJEsC0CaqLvetOPUH8CRHB69KgH1G 7QFav09Jtxhs2Qevn/fHCQfMYf9LDKE/Wfe9bvKFtasdh/5l9NFKuCON1zOQPCVOMRDuyScScbf IXgQALnu2EHEPpkc= X-Received: by 2002:a9f:3802:: with SMTP id p2mr3834788uad.35.1641507252508; Thu, 06 Jan 2022 14:14:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJwktjmZRMBJLZGwpM+HEWaIGOMIR6LT04Tpr1G9e3U1ZH68oQ0rfBrTba1ZKghnlpJ6HZi7NA== X-Received: by 2002:a9f:3802:: with SMTP id p2mr3834778uad.35.1641507252297; Thu, 06 Jan 2022 14:14:12 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:cc01:fae1:7982:b010:d91]) by smtp.gmail.com with ESMTPSA id c15sm2098831uaj.13.2022.01.06.14.14.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jan 2022 14:14:11 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Peter Xu Subject: [PATCH v7 3/5] migration: Add zero-copy parameter for QMP/HMP for Linux Date: Thu, 6 Jan 2022 19:13:40 -0300 Message-Id: <20220106221341.8779-4-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220106221341.8779-1-leobras@redhat.com> References: <20220106221341.8779-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.372, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Add property that allows zero-copy migration of memory pages, and also includes a helper function migrate_use_zero_copy() to check if it's enabled. No code is introduced to actually do the migration, but it allow future implementations to enable/disable this feature. On non-Linux builds this parameter is compiled-out. Signed-off-by: Leonardo Bras Reviewed-by: Peter Xu Reviewed-by: Daniel P. Berrangé --- qapi/migration.json | 24 ++++++++++++++++++++++++ migration/migration.h | 5 +++++ migration/migration.c | 32 ++++++++++++++++++++++++++++++++ migration/socket.c | 5 +++++ monitor/hmp-cmds.c | 6 ++++++ 5 files changed, 72 insertions(+) diff --git a/qapi/migration.json b/qapi/migration.json index bbfd48cf0b..2e62ea6ebd 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -730,6 +730,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.0) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -769,6 +776,7 @@ 'xbzrle-cache-size', 'max-postcopy-bandwidth', 'max-cpu-throttle', 'multifd-compression', 'multifd-zlib-level' ,'multifd-zstd-level', + { 'name': 'zero-copy', 'if' : 'CONFIG_LINUX'}, 'block-bitmap-mapping' ] } ## @@ -895,6 +903,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.0) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -949,6 +964,7 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', + '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' }, '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } ## @@ -1095,6 +1111,13 @@ # will consume more CPU. # Defaults to 1. (Since 5.0) # +# @zero-copy: Controls behavior on sending memory pages on migration. +# When true, enables a zero-copy mechanism for sending memory +# pages, if host supports it. +# Requires that QEMU be permitted to use locked memory for guest +# RAM pages. +# Defaults to false. (Since 7.0) +# # @block-bitmap-mapping: Maps block nodes and bitmaps on them to # aliases for the purpose of dirty bitmap migration. Such # aliases may for example be the corresponding names on the @@ -1147,6 +1170,7 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', + '*zero-copy': { 'type': 'bool', 'if': 'CONFIG_LINUX' }, '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } ## diff --git a/migration/migration.h b/migration/migration.h index 8130b703eb..1489eeb165 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -339,6 +339,11 @@ MultiFDCompression migrate_multifd_compression(void); int migrate_multifd_zlib_level(void); int migrate_multifd_zstd_level(void); +#ifdef CONFIG_LINUX +bool migrate_use_zero_copy(void); +#else +#define migrate_use_zero_copy() (false) +#endif int migrate_use_xbzrle(void); uint64_t migrate_xbzrle_cache_size(void); bool migrate_colo_enabled(void); diff --git a/migration/migration.c b/migration/migration.c index 0652165610..aa8f1dc835 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -893,6 +893,10 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) params->multifd_zlib_level = s->parameters.multifd_zlib_level; params->has_multifd_zstd_level = true; params->multifd_zstd_level = s->parameters.multifd_zstd_level; +#ifdef CONFIG_LINUX + params->has_zero_copy = true; + params->zero_copy = s->parameters.zero_copy; +#endif params->has_xbzrle_cache_size = true; params->xbzrle_cache_size = s->parameters.xbzrle_cache_size; params->has_max_postcopy_bandwidth = true; @@ -1546,6 +1550,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params, if (params->has_multifd_compression) { dest->multifd_compression = params->multifd_compression; } +#ifdef CONFIG_LINUX + if (params->has_zero_copy) { + dest->zero_copy = params->zero_copy; + } +#endif if (params->has_xbzrle_cache_size) { dest->xbzrle_cache_size = params->xbzrle_cache_size; } @@ -1658,6 +1667,11 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) if (params->has_multifd_compression) { s->parameters.multifd_compression = params->multifd_compression; } +#ifdef CONFIG_LINUX + if (params->has_zero_copy) { + s->parameters.zero_copy = params->zero_copy; + } +#endif if (params->has_xbzrle_cache_size) { s->parameters.xbzrle_cache_size = params->xbzrle_cache_size; xbzrle_cache_resize(params->xbzrle_cache_size, errp); @@ -2548,6 +2562,17 @@ int migrate_multifd_zstd_level(void) return s->parameters.multifd_zstd_level; } +#ifdef CONFIG_LINUX +bool migrate_use_zero_copy(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.zero_copy; +} +#endif + int migrate_use_xbzrle(void) { MigrationState *s; @@ -4200,6 +4225,10 @@ static Property migration_properties[] = { DEFINE_PROP_UINT8("multifd-zstd-level", MigrationState, parameters.multifd_zstd_level, DEFAULT_MIGRATE_MULTIFD_ZSTD_LEVEL), +#ifdef CONFIG_LINUX + DEFINE_PROP_BOOL("zero_copy", MigrationState, + parameters.zero_copy, false), +#endif DEFINE_PROP_SIZE("xbzrle-cache-size", MigrationState, parameters.xbzrle_cache_size, DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE), @@ -4297,6 +4326,9 @@ static void migration_instance_init(Object *obj) params->has_multifd_compression = true; params->has_multifd_zlib_level = true; params->has_multifd_zstd_level = true; +#ifdef CONFIG_LINUX + params->has_zero_copy = true; +#endif params->has_xbzrle_cache_size = true; params->has_max_postcopy_bandwidth = true; params->has_max_cpu_throttle = true; diff --git a/migration/socket.c b/migration/socket.c index 05705a32d8..f7a77aafd3 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -77,6 +77,11 @@ static void socket_outgoing_migration(QIOTask *task, } else { trace_migration_socket_outgoing_connected(data->hostname); } + + if (migrate_use_zero_copy()) { + error_setg(&err, "Zero copy not available in migration"); + } + migration_channel_connect(data->s, sioc, data->hostname, err); object_unref(OBJECT(sioc)); } diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index 2669156b28..35d47e250d 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -1297,6 +1297,12 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) p->has_multifd_zstd_level = true; visit_type_uint8(v, param, &p->multifd_zstd_level, &err); break; +#ifdef CONFIG_LINUX + case MIGRATION_PARAMETER_ZERO_COPY: + p->has_zero_copy = true; + visit_type_bool(v, param, &p->zero_copy, &err); + break; +#endif case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE: p->has_xbzrle_cache_size = true; if (!visit_type_size(v, param, &cache_size, &err)) { From patchwork Thu Jan 6 22:13:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12705868 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5EC82C433F5 for ; Thu, 6 Jan 2022 22:19:16 +0000 (UTC) Received: from localhost ([::1]:49230 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n5b6J-0006AZ-EP for qemu-devel@archiver.kernel.org; Thu, 06 Jan 2022 17:19:15 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51120) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1X-0005J9-6Z for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:19 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:22894) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1V-0006Hg-Gk for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641507256; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=coX2FGYNMRllSAimppdKkvpvVQ/fdZp6MWppRXlcVBo=; b=BQiJLRNlkUresv5e5kHuOg+QLXqAjPmk8Rrsk6qDUsh2nQ3USmOZNZKEj7EZ5iuS7x0eML y32vfry9YmulqtJpMe9ldgzDsNKA5gJ+ssBdBAXUktYLOfd3ao5Fej14FY3+34v0q6S0Xk oHLkvqazfABCsI854s7rzo+6R62UbfM= Received: from mail-vk1-f198.google.com (mail-vk1-f198.google.com [209.85.221.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-623-ReXnwYyjNu6gNrtlDfG-mw-1; Thu, 06 Jan 2022 17:14:15 -0500 X-MC-Unique: ReXnwYyjNu6gNrtlDfG-mw-1 Received: by mail-vk1-f198.google.com with SMTP id az31-20020a0561220d1f00b003147de426a4so1081344vkb.4 for ; Thu, 06 Jan 2022 14:14:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=coX2FGYNMRllSAimppdKkvpvVQ/fdZp6MWppRXlcVBo=; b=lMoEY4wFLVZT+UrpdNO4k/a571yl5fEkJD9EOO4+u5LN9u6Juz5NyemlCQZje/uCwb XEr4K+0n0zUpS9eYRG0Ue6wSwN0+G/4HBPCJsSj0JAqrbxXFL7p6G3Wlrb3muHcOtI15 6gUldqoYm55cqg+TMP3cz6q23kv+sS2Ht6spAGFrmmhgHuJpKPaPuL7BUdsmGC/ukylK ErenEKy9qj5NfqhLliM/8Z126g/sPQVMQeuLBCDwhaPWQ2aHekCepWkHoDWl3meaA6vQ P7OR8xNV3qiAgKEJxQ7THgdL/tyL9u99xyl48Cy2jW8lf88cn7bDJo3wasBLB8hgR3kd rLcQ== X-Gm-Message-State: AOAM533GF/yrcWnUB1qrSnIED0zvfGtFB04LWcv5Y3QWbQLyWj3dYWPr OW77JuCPoYKvLxD0CGVPeIbJL8gPpDI2fKEwQRrcTaeu0knqCDF7+1M0L/PLAr6ApgJovBiBSv8 jn8VQgTTMW1Qm+M4= X-Received: by 2002:ac5:ce8c:: with SMTP id 12mr22364138vke.0.1641507254820; Thu, 06 Jan 2022 14:14:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJyY6ZpephSrqrRYmWKhNH0LcixnKo2eUVDbZGjcTryGE5XwmXaBNBj+uBO73tQoA8K+cejWMA== X-Received: by 2002:ac5:ce8c:: with SMTP id 12mr22364131vke.0.1641507254666; Thu, 06 Jan 2022 14:14:14 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:cc01:fae1:7982:b010:d91]) by smtp.gmail.com with ESMTPSA id c15sm2098831uaj.13.2022.01.06.14.14.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jan 2022 14:14:14 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Peter Xu Subject: [PATCH v7 4/5] migration: Add migrate_use_tls() helper Date: Thu, 6 Jan 2022 19:13:41 -0300 Message-Id: <20220106221341.8779-5-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220106221341.8779-1-leobras@redhat.com> References: <20220106221341.8779-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.372, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" A lot of places check parameters.tls_creds in order to evaluate if TLS is in use, and sometimes call migrate_get_current() just for that test. Add new helper function migrate_use_tls() in order to simplify testing for TLS usage. Signed-off-by: Leonardo Bras Reviewed-by: Juan Quintela Reviewed-by: Peter Xu Reviewed-by: Daniel P. Berrangé --- migration/migration.h | 1 + migration/channel.c | 6 +++--- migration/migration.c | 9 +++++++++ migration/multifd.c | 5 +---- 4 files changed, 14 insertions(+), 7 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 1489eeb165..445d95bbf2 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -344,6 +344,7 @@ bool migrate_use_zero_copy(void); #else #define migrate_use_zero_copy() (false) #endif +int migrate_use_tls(void); int migrate_use_xbzrle(void); uint64_t migrate_xbzrle_cache_size(void); bool migrate_colo_enabled(void); diff --git a/migration/channel.c b/migration/channel.c index c4fc000a1a..1a45b75d29 100644 --- a/migration/channel.c +++ b/migration/channel.c @@ -32,16 +32,16 @@ */ void migration_channel_process_incoming(QIOChannel *ioc) { - MigrationState *s = migrate_get_current(); Error *local_err = NULL; trace_migration_set_incoming_channel( ioc, object_get_typename(OBJECT(ioc))); - if (s->parameters.tls_creds && - *s->parameters.tls_creds && + if (migrate_use_tls() && !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)) { + MigrationState *s = migrate_get_current(); + migration_tls_channel_process_incoming(s, ioc, &local_err); } else { migration_ioc_register_yank(ioc); diff --git a/migration/migration.c b/migration/migration.c index aa8f1dc835..7bcb800890 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2573,6 +2573,15 @@ bool migrate_use_zero_copy(void) } #endif +int migrate_use_tls(void) +{ + MigrationState *s; + + s = migrate_get_current(); + + return s->parameters.tls_creds && *s->parameters.tls_creds; +} + int migrate_use_xbzrle(void) { MigrationState *s; diff --git a/migration/multifd.c b/migration/multifd.c index 3242f688e5..677e942747 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -796,14 +796,11 @@ static bool multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc, Error *error) { - MigrationState *s = migrate_get_current(); - trace_multifd_set_outgoing_channel( ioc, object_get_typename(OBJECT(ioc)), p->tls_hostname, error); if (!error) { - if (s->parameters.tls_creds && - *s->parameters.tls_creds && + if (migrate_use_tls() && !object_dynamic_cast(OBJECT(ioc), TYPE_QIO_CHANNEL_TLS)) { multifd_tls_channel_connect(p, ioc, &error); From patchwork Thu Jan 6 22:13:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 12705867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 98B65C433F5 for ; Thu, 6 Jan 2022 22:16:57 +0000 (UTC) Received: from localhost ([::1]:41118 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n5b44-0000I1-GP for qemu-devel@archiver.kernel.org; Thu, 06 Jan 2022 17:16:56 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51136) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1c-0005Md-43 for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:24 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:20891) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n5b1X-0006Qm-Ke for qemu-devel@nongnu.org; Thu, 06 Jan 2022 17:14:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1641507259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=M5Ipwjc3dvGMTRkHj1aIocB4X1iuh8YN9du2k/gWd/g=; b=IVdNX+pPEiwx38Rs2EGtO8gMVzM5Vm4MhMx9/Vj+1pLbZMcEhAwOakxGTiVQmNcc2M5RWt nPE7AQZKCjoncaPL2QsfMemugPXe7rtH1DNc7PdFvGjL7z7mtiNULRu0Z3eJZF8PcJFdYg RmX1NUCg+H+28oSViL5UP/RgWhPHc7s= Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com [209.85.222.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-141-kLZR5AnxN9OhNF-npEvMGw-1; Thu, 06 Jan 2022 17:14:18 -0500 X-MC-Unique: kLZR5AnxN9OhNF-npEvMGw-1 Received: by mail-ua1-f70.google.com with SMTP id w14-20020ab055ce000000b002fedc60272fso2141536uaa.21 for ; Thu, 06 Jan 2022 14:14:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=M5Ipwjc3dvGMTRkHj1aIocB4X1iuh8YN9du2k/gWd/g=; b=73uilEVAHV21x7zIP3uIN+pk1tSVAXwN8qFXuREK5oWuAisJSTN68LdpDBfq/ukNB+ qnlBI1QQduyMl2tYLDBVZoztMTdputLvIzQsGYSCvd96auknBujy4OqtFbiAljhYuivT /NODVwn+D1Y88RyhXjiqOMgz+XIYDcNRSZJkV7RO6FXDZzrNddbYU1hIDIpCOAEEi1Y8 gpFXgWhXVEplfiWzgRAGv7OEAAH6blIIn1l+PUx5D7CbGiMBhAaeCi/rvqffuo6bBiqe hUdtYDN7AG2eGN2fQkW8/Jgua2gXgbxqaa9a04rea53LJaQv7+PVLwcR7UIEvIbaY2XX ZpAg== X-Gm-Message-State: AOAM532kFZtnEqYNHMpuDncGDjShZglSjtXP7LXSviIcv/xaEcPeowrN MtDtsMi1arXIa/BQLt+ZL8iqHFftFpgXs27CGGZDQX0yXafo229OW6kGT8DWkIKVk156WK9P4ri iP5NAUuYhMED9Lvg= X-Received: by 2002:a05:6102:4a5:: with SMTP id r5mr18067362vsa.22.1641507257459; Thu, 06 Jan 2022 14:14:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJzkzLbRsgWhcFR9UpmyTtDTj4JC6VAil9eVj9YzYjn5qmSu7xrXduxEVRPIS4SDOfT/QKvttg== X-Received: by 2002:a05:6102:4a5:: with SMTP id r5mr18067351vsa.22.1641507257168; Thu, 06 Jan 2022 14:14:17 -0800 (PST) Received: from LeoBras.redhat.com ([2804:431:c7f1:cc01:fae1:7982:b010:d91]) by smtp.gmail.com with ESMTPSA id c15sm2098831uaj.13.2022.01.06.14.14.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jan 2022 14:14:16 -0800 (PST) From: Leonardo Bras To: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Juan Quintela , "Dr. David Alan Gilbert" , Eric Blake , Markus Armbruster , Peter Xu Subject: [PATCH v7 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Date: Thu, 6 Jan 2022 19:13:42 -0300 Message-Id: <20220106221341.8779-6-leobras@redhat.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220106221341.8779-1-leobras@redhat.com> References: <20220106221341.8779-1-leobras@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=leobras@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.129.124; envelope-from=leobras@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.372, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Leonardo Bras , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Implement zero copy on nocomp_send_write(), by making use of QIOChannel writev + flags & flush interface. Change multifd_send_sync_main() so it can distinguish each iteration sync from the setup and the completion, so a flush_zero_copy() can be called after each iteration in order to make sure all dirty pages are sent before a new iteration is started. Also make it return -1 if flush_zero_copy() fails, in order to cancel the migration process, and avoid resuming the guest in the target host without receiving all current RAM. This will work fine on RAM migration because the RAM pages are not usually freed, and there is no problem on changing the pages content between writev_zero_copy() and the actual sending of the buffer, because this change will dirty the page and cause it to be re-sent on a next iteration anyway. A lot of locked memory may be needed in order to use multid migration with zero-copy enabled, so disabling the feature should be necessary for low-privileged users trying to perform multifd migrations. Signed-off-by: Leonardo Bras --- migration/multifd.h | 4 +++- migration/migration.c | 11 ++++++++++- migration/multifd.c | 40 +++++++++++++++++++++++++++++++++++----- migration/ram.c | 29 ++++++++++++++++++++++------- migration/socket.c | 5 +++-- 5 files changed, 73 insertions(+), 16 deletions(-) diff --git a/migration/multifd.h b/migration/multifd.h index e57adc783b..d9fbccdbe2 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -22,7 +22,7 @@ int multifd_load_cleanup(Error **errp); bool multifd_recv_all_channels_created(void); bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp); void multifd_recv_sync_main(void); -void multifd_send_sync_main(QEMUFile *f); +int multifd_send_sync_main(QEMUFile *f, bool sync); int multifd_queue_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset); /* Multifd Compression flags */ @@ -97,6 +97,8 @@ typedef struct { uint32_t packet_len; /* pointer to the packet */ MultiFDPacket_t *packet; + /* multifd flags for sending ram */ + int write_flags; /* multifd flags for each packet */ uint32_t flags; /* size of the next packet that contains pages */ diff --git a/migration/migration.c b/migration/migration.c index 7bcb800890..76a3313e66 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1476,7 +1476,16 @@ static bool migrate_params_check(MigrationParameters *params, Error **errp) error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: "); return false; } - +#ifdef CONFIG_LINUX + if (params->zero_copy && + (!migrate_use_multifd() || + params->multifd_compression != MULTIFD_COMPRESSION_NONE || + (params->tls_creds && *params->tls_creds))) { + error_setg(errp, + "Zero copy only available for non-compressed non-TLS multifd migration"); + return false; + } +#endif return true; } diff --git a/migration/multifd.c b/migration/multifd.c index 677e942747..1b6b7cc1a1 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -104,7 +104,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) */ static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp) { - return qio_channel_writev_all(p->c, p->pages->iov, used, errp); + return qio_channel_writev_full_all_flags(p->c, p->pages->iov, used, + NULL, 0, p->write_flags, errp); } /** @@ -582,19 +583,28 @@ void multifd_save_cleanup(void) multifd_send_state = NULL; } -void multifd_send_sync_main(QEMUFile *f) +int multifd_send_sync_main(QEMUFile *f, bool sync) { int i; + bool flush_zero_copy; if (!migrate_use_multifd()) { - return; + return 0; } if (multifd_send_state->pages->num) { if (multifd_send_pages(f) < 0) { error_report("%s: multifd_send_pages fail", __func__); - return; + return 0; } } + + /* + * When using zero-copy, it's necessary to flush after each iteration to + * make sure pages from earlier iterations don't end up replacing newer + * pages. + */ + flush_zero_copy = sync && migrate_use_zero_copy(); + for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -605,7 +615,7 @@ void multifd_send_sync_main(QEMUFile *f) if (p->quit) { error_report("%s: channel %d has already quit", __func__, i); qemu_mutex_unlock(&p->mutex); - return; + return 0; } p->packet_num = multifd_send_state->packet_num++; @@ -616,6 +626,17 @@ void multifd_send_sync_main(QEMUFile *f) ram_counters.transferred += p->packet_len; qemu_mutex_unlock(&p->mutex); qemu_sem_post(&p->sem); + + if (flush_zero_copy) { + int ret; + Error *err = NULL; + + ret = qio_channel_flush(p->c, &err); + if (ret < 0) { + error_report_err(err); + return -1; + } + } } for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; @@ -624,6 +645,8 @@ void multifd_send_sync_main(QEMUFile *f) qemu_sem_wait(&p->sem_sync); } trace_multifd_send_sync_main(multifd_send_state->packet_num); + + return 0; } static void *multifd_send_thread(void *opaque) @@ -919,6 +942,13 @@ int multifd_save_setup(Error **errp) p->packet->version = cpu_to_be32(MULTIFD_VERSION); p->name = g_strdup_printf("multifdsend_%d", i); p->tls_hostname = g_strdup(s->hostname); + + if (migrate_use_zero_copy()) { + p->write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY; + } else { + p->write_flags = 0; + } + socket_send_channel_create(multifd_new_send_channel_async, p); } diff --git a/migration/ram.c b/migration/ram.c index 57efa67f20..a1ae66c50c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2987,6 +2987,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque) { RAMState **rsp = opaque; RAMBlock *block; + int ret; if (compress_threads_save_setup()) { return -1; @@ -3021,7 +3022,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque) ram_control_before_iterate(f, RAM_CONTROL_SETUP); ram_control_after_iterate(f, RAM_CONTROL_SETUP); - multifd_send_sync_main(f); + ret = multifd_send_sync_main(f, false); + if (ret < 0) { + return ret; + } + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); @@ -3130,7 +3135,11 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_setup_or_active(migrate_get_current()->state)) { - multifd_send_sync_main(rs->f); + ret = multifd_send_sync_main(rs->f, true); + if (ret < 0) { + return ret; + } + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); qemu_fflush(f); ram_counters.transferred += 8; @@ -3188,13 +3197,19 @@ static int ram_save_complete(QEMUFile *f, void *opaque) ram_control_after_iterate(f, RAM_CONTROL_FINISH); } - if (ret >= 0) { - multifd_send_sync_main(rs->f); - qemu_put_be64(f, RAM_SAVE_FLAG_EOS); - qemu_fflush(f); + if (ret < 0) { + return ret; } - return ret; + ret = multifd_send_sync_main(rs->f, false); + if (ret < 0) { + return ret; + } + + qemu_put_be64(f, RAM_SAVE_FLAG_EOS); + qemu_fflush(f); + + return 0; } static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, diff --git a/migration/socket.c b/migration/socket.c index f7a77aafd3..23b03e6190 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -78,8 +78,9 @@ static void socket_outgoing_migration(QIOTask *task, trace_migration_socket_outgoing_connected(data->hostname); } - if (migrate_use_zero_copy()) { - error_setg(&err, "Zero copy not available in migration"); + if (migrate_use_zero_copy() && + !qio_channel_has_feature(sioc, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY)) { + error_setg(&err, "Zero copy feature not detected in host kernel"); } migration_channel_connect(data->s, sioc, data->hostname, err);