From patchwork Fri May 19 09:34:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Nefedov X-Patchwork-Id: 9736801 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EC1966034C for ; Fri, 19 May 2017 09:42:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBFFB285EC for ; Fri, 19 May 2017 09:42:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DED9E2865D; Fri, 19 May 2017 09:42:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAD_ENC_HEADER,BAYES_00, DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B750D288EC for ; Fri, 19 May 2017 09:42:50 +0000 (UTC) Received: from localhost ([::1]:57501 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dBeQr-0003am-OM for patchwork-qemu-devel@patchwork.kernel.org; Fri, 19 May 2017 05:42:49 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48351) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dBeJm-0006v1-Qf for qemu-devel@nongnu.org; Fri, 19 May 2017 05:35:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dBeJi-0006pd-R0 for qemu-devel@nongnu.org; Fri, 19 May 2017 05:35:30 -0400 Received: from mail-db5eur01on0106.outbound.protection.outlook.com ([104.47.2.106]:27443 helo=EUR01-DB5-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dBeJi-0006pK-5s for qemu-devel@nongnu.org; Fri, 19 May 2017 05:35:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=GnQUJDSMxiVilTVWMN0hqj0LmVvNyoshr3v6hr8m20U=; b=QZ8BOJ8GhkfyMaTcNMiudUZYZ7mNHCFcPF5pF67yGBc3pOpMRH6W54aOhtHI5aVxE7jBJd3NoujZCW317niJ8Xeya4nlXSl+hnDf9VLkhXJ+ug53qw3a1U48xNh4sY40Im1LR/8ygba72Wal1va44sMqL+5CT8MNsQ8LS7lStb8= Authentication-Results: nongnu.org; dkim=none (message not signed) header.d=none; nongnu.org; dmarc=none action=none header.from=virtuozzo.com; Received: from xantnef-ws.sw.ru (195.214.232.6) by AM5PR0801MB1988.eurprd08.prod.outlook.com (2603:10a6:203:4b::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1101.14; Fri, 19 May 2017 09:35:22 +0000 From: Anton Nefedov To: Date: Fri, 19 May 2017 12:34:39 +0300 Message-ID: <1495186480-114192-13-git-send-email-anton.nefedov@virtuozzo.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1495186480-114192-1-git-send-email-anton.nefedov@virtuozzo.com> References: <1495186480-114192-1-git-send-email-anton.nefedov@virtuozzo.com> MIME-Version: 1.0 X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: HE1PR09CA0073.eurprd09.prod.outlook.com (2603:10a6:7:3d::17) To AM5PR0801MB1988.eurprd08.prod.outlook.com (2603:10a6:203:4b::15) X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AM5PR0801MB1988: X-MS-Office365-Filtering-Correlation-Id: 9fe8baff-6872-42e6-1705-08d49e9a5f54 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201703131423075)(201703031133081); SRVR:AM5PR0801MB1988; X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1988; 3:lPYQBVLVu8uV+rslUlF9RbTBTLi6eOcYl03pnzBV/OzO5tPH1KxxhUukQJ1JeoFO8khMs4EzPZPduuFaM1iK7ZUy/j25lOGNvmn2VGyP6Ah8O2AGqDBpoSPncvblrS7612ioLeL7aspysHZ3b6tNVY9Qc/9T7UjuBbSkfOVIlTSagVGr9zewuwm+cfslgNSo9NX5OqZ8RY15dJ84Vq6rkSJubjLUnyil8xm7kBKZY8Za8WV7w3XUSIMA3WfgUB1OqOZ0P7psjWgPX8VaeqO2En+Evf5B93mkMuzA6JBWHOxueYKpUCRUz7NT6hhNjNq5zdBtiT1zJCXmRLOSrU7E6w==; 25:/33u4uEttb1oRwMnR1+M0wvg//D6WN5bIyPZgeEOpORCXdfx1tv7wRPxJ+UYaucdtJ0kYMSmAntePi0Ydro2pZPJmeckILd9hxkcunMSRmOktPDmhKU1IGThPF3UvDK1jJC1ZfqkWroGe80axaXWv03w0SCwqgbvj67PwAX2oB082drj3eQEInvIDjGZOhi/XN2tAzJWRjgWM+D3LfLULrMqholgGJP97t1IRU6gyn/kYeMxBDvCefqb5c6bJRj7BGVyjdZ73Pof7AMQSwv6/NNRA0WDnIWuhWhiUNL47B5F26wT7l8siSqFede2iPQsPRByZTj4WuuGdzq1/zBmQHoRi/5qpvE1o/wtuUI4MwkBlv2c3/3PatvGRk3R44+gm6gD6VbvSz+wzu311e5aZYajtotK0yljjDISm5vHUsW80IMmgKR+ScV5gkdICJaAq5HVY/TEB0+2pM3XFgobBQOqquyD3Ifsc8NQtY2KvCo= X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1988; 31:S1QBXWNX/xuGyF6r5rX1WaS8maoHc8OFjJfrDJqRLgvVdpKovXoT2RBKab1Ve55aANG3kQlQZ+Yv8Xaf8+mpRJdwF/1TarY+eDHvCaBJ1QOAcn4h7WzqHXuRq2SIbKPrIFx40ADmAIrCTmXpi9qaOnXEuzYHUYcZtQvl5Kk9i8bUSBEuBXo4po1ZtpQnIXAhl9ORu5qzaSeaPVRetlgkBGnoAxo/mowlWCXLEbZfYXY=; 20:1HnPm3ccaSSHP67LPTvVBm8hcENRn2uOuaL5BosQMRV+OQpBT/52x0xXSPPDeZb/m4aiReHIkTDLUYQplSvVWvRRzm0PJ7d8nqbhG13X0VJy6Vt0Kws6ssodoqo4DvKY6IGPHuVTY2fADDfe+hUvau3nhBSWCvC2BZrM6XmNEz0VZllnJD/cha2cOJyXCIDh6hk4r71COO4b9X3AwxBqRdJPumGZ2dGlrH35BXVkiRvMnx8iMmjNEyl7yazsZBIzbAwqJuM/6hmQJ81dg0C/R0m0TbjUayJctN7NQWrqWgK7S30CAirLNngVXvSCq8RCadOSWlZkmZYDodyIq4lTmiGQHPK58qwAz+0nhozVTnAWTaGkFCNul2Xw3NpRzGBVBc13Nf2VhAwUg66+HnisBoEwJKjDjR62EIrQ1A9yzoc= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(93006095)(93001095)(10201501046)(6041248)(20161123555025)(20161123558100)(20161123562025)(20161123564025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148); SRVR:AM5PR0801MB1988; BCL:0; PCL:0; RULEID:; SRVR:AM5PR0801MB1988; X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1988; 4:CiIaqmZDnnyYwOHQpLJNdylYq2FnCiBAbd4uYsxT+sFhT97aqLPDR+mbzu0vyn4emAQov2Bb4at7quHaJD7E39C2ZIOHDi/9IcyQNS51XjpUFP++oOiAd4PlQCYsnmfM1Hhmv1fVqQwq8ftD8MK6vj+87P68Dn72MFKjVq50B2Zfs+UnW9YI6JALLuDx2SnA5X6UzrvKtuZgROrvRS2UBTlnRXMHF37UAW5RKp4/8tEQqnTonEajWN7cjfmDUIC1erdFViYI3J2HGbmWyzH3Z/3p58tT3tlv/62l+C/EZ/W2K1n1dWLnbCALD48LLs2SBZYUORaz+S6r9HPGXdu71zvAOlNhgKyfSK2AIYRd7P2P13A0Uwz/XtRQ+P1I5YsOTui79NbDlKxbVLyeNHvR01GZwXchThy0LO3HXwvhFxxMaJH+kLVo1G/uiKFMjcHQUfsIvnWxStoV+5TMJKKmK9irZXzfe0nNCx2MOLgigxa+RLZFKCNdoUcgKgPvHomwuyQbjTAOeWYlXdasow2K7mck81q5KrJjw1eA/vJwndguncIfqy073QJGSymJqWhegzAdwzAZM3AAvYg+w9gLQ9Lx4xY3RteenJajCb1vujD3kEo3aEQCSxnUnm5kZErXbb6n4WsXUeaS+8p1efrLXShGn4xeqmtMTZzM2/wZUY0UjS5ESMtTD9ObfS1JXlh+CQ3/vsti/3M5AbH5jKO9CYvEAqPVIC2Lc5jyjC5LdAQOUoiP+WnhCy8DW0iRKsDJEApEQv8le02AqDKOisA10Y8VDO7pN90KVb71C26Ur6I= X-Forefront-PRVS: 031257FE13 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(4630300001)(6009001)(39450400003)(39840400002)(39410400002)(39400400002)(48376002)(8676002)(7736002)(81166006)(53416004)(305945005)(50226002)(33646002)(5003940100001)(42186005)(3846002)(4326008)(25786009)(66066001)(189998001)(50466002)(2351001)(2906002)(478600001)(6486002)(6506006)(50986999)(6512007)(54906002)(36756003)(5660300001)(86362001)(53936002)(2950100002)(110136004)(6666003)(6916009)(107886003)(38730400002)(76176999); DIR:OUT; SFP:1102; SCL:1; SRVR:AM5PR0801MB1988; H:xantnef-ws.sw.ru; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; AM5PR0801MB1988; 23:UEeDP4uUzsGl+Cd6QEX93oI7i8a1CgVGCezvt1q?= =?us-ascii?Q?h+HNQ2t7nIMMMmUL/VBO3WCg35ehGwypnGOPGJd25rn25D5e/iq6JtSE4g+n?= =?us-ascii?Q?tJ5jotoJBfzRAbSLiLGMzkxTuxgPNUeTFt8uC5XBkWH2qmbi7fw0r3kwAkqx?= =?us-ascii?Q?obVC/n6gbRP+SOP0JiefVFg88jpobC/fWPTa2zFpIpN2C2CP+4sUziAVpU8I?= =?us-ascii?Q?T3VJKA3no/Bl03ch7Ku30EokKHGoXE6eDtOb2sR/WJTDuOCIBezoI85MfylU?= =?us-ascii?Q?NGMvLSwQ8sGYQKMnNk2nZM4Lmd4ZY+Yblijqo6qLlcPc2eJPjwokhdMD1KHb?= =?us-ascii?Q?064EtxO7xMQTMk50FD7x0+uNGmKqtpqw50nyMsyd5G5GswpWlTkFpB1QvRAq?= =?us-ascii?Q?HjgV+iH5e4Wug2jKFFmUdyHRv5Zs4v5/fmGYJiJdo0ZdMNnY3IK3p0MPzzEw?= =?us-ascii?Q?AVUHkzjJMxq6DK6KtnMQyzsEG/aDA0qR8YQkz3Kque3DlVjlDZgLxfvLi2av?= =?us-ascii?Q?b9E56vCLv4d4qCSDBy2wjWq3kVtrJoQmM2oD7e2BXEtdFR9IsGqvAg2zj1QW?= =?us-ascii?Q?2qSGBm3IRuS8aQGlHWxd55UE1oBOvNf/4gZOXuuLzhG3tAaAtUwvjq/3Ap0G?= =?us-ascii?Q?C1SMRORCqk6cuHVOlMXe2UKmHEzXxBGc9wcT6ac+NKdUNnM/OPAfoJDhxPEW?= =?us-ascii?Q?6BBCNaQKicpoQxzwPJ0JSPZnLONEXlAR1HXJtubcsaOemVfzSCx45k0cvynH?= =?us-ascii?Q?FkIQ16R6HFvI8CVQm1p51rYg1sqjB2QKs62yfIe8LAQ+oJsQOpIZsAHocHJc?= =?us-ascii?Q?7ZU1esHsJQSLTyTthbNWbh7GPDp36Oy0qiWLW/UgkRwOAlpCcwwA4jyfO5qI?= =?us-ascii?Q?6Mz93b9845mDFoF7IOOWFbrzrw/Y5o4AG2ylKJqzSDtMeMbb5cMtKk+0xFLZ?= =?us-ascii?Q?FI+dIHeKEub24Ujf1fTGO8hMTgsLbqW6BDHzQB0Txl7TtDRjpUAA2RdiAuNy?= =?us-ascii?Q?bhh8eK4tV+12QJThGFN325KP8vKgIzDhqcPSgtsc3eYFb6Q=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1988; 6:f7mSMMIhbsExUwuH2PKukGvECtrhYGLG5zVz4jg3XQ/8sQsWQ7GOV0rxNXyElaz01SKOsCsfzQihJrx40xTPGDKIU/xWrPC2DiwEEzPhmwPEuh8Gn6sEkdD2/VtEVskmUno0zIweUKNNNskwbmO3hjMVdg89y2IFesc064TAT2wCqEjLZmo+AXLZ1/afibuAl/DA2TVKK8kjzLwlLZHrPqHbU8u/l2sRr33BknhrtMixqYQHIxCX8spoH0RXhA6vilg9Gtn7AE99U62zwTBPqm6n8h5UppEKyiyYG1itOyaI5ncDtJKK3nyuckziW+fbfdreJnYA0u6nI9mQl+ATZer8QAae2TZrnR/R+s/3jtoZ2fIr4ozGfVmuJM3Fb5ehWtrfZ+1hJGAiDfb9leALhTEVnbukN2ZYwrKid0WRtKsV33FeAkNo0Lr+EAu2RJWUZAQbgpvefrLo6aiu8TtwIvXUJd1GyVGQuvitEu/06BTofX61CdOidVQHc245GgZSg1XGfX6opwAhexVioe2Rdw==; 5:JvMm3qcbrUUq2+FOWu+Mvqq9JZtn0+1PgpVsYtyxtrTJynVgsmDMUAQe5uyCCx0FXlIwVNCjRh30E9Y8FP8wx/+pm7ryHSj0bP1yRjRQL9H8F0IxOMpiXxvoAKjvjnSbkGI0HVL7UV8oVfboZByEDg==; 24:eGZYkLJLLXlmoS+4rmsKEuoetCWwaoS2FvK+K7ITQvyl1WL36HR3WaEoIye0r1vnPIgpmZ6WparPqktE4WWjKztRckDz4lFIu/UT4RAW0iA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; AM5PR0801MB1988; 7:o0N5fTNuUjbuqHcZk/VSBLb/RavihAnhXdJdiHIQRmXQBCbxHgsBjwmufupk91jp+jh/H7ckJeEQeAOTmTg902w02UKqPhkUvPc7QsKH0AtaEzd/JSxOEHn4A0bSJZEKXJaHUfO6qgCJwfeiGBtYV0ko+PgqRxHNZ84WsQfnjb+bT5PJrJTN+gg1trgDfMVBHW8FWQe+aESgbMD3vOAozpXOdHAaHr2kOSG0DMVXFOS5u2JoCKkL1bjJLGVTZA2Yrvm+sJCRNm/YosT698YxIL0rvVcXongF//Kg6wSO3RQP4Mq+WwaHJ8BO7DKdcOL8jVHQGBywdMoUPSvVXuUjxw==; 20:WG7bXfTbBj/y4omc4qVSUboQEftB+YlZy/qB8BqXMbN6s6casUN9N73JZ9M3iLTv8QWmn+6Ed9fhnV5y1fwTVphitI5o8Vd9pzRGEHhOOrYjUsv3b/9RaesvMdwlf12ZW1ExmGtH7mX5ORfUSDt02Xuu8nwQCqj9kbpWbybg/3w= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 May 2017 09:35:22.0775 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1988 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 104.47.2.106 Subject: [Qemu-devel] [PATCH v1 12/13] qcow2: allow concurrent unaligned writes to the same clusters X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, "Denis V . Lunev" , Anton Nefedov , den@virtuozzo.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP If COW area of a write request to unallocated cluster is empty, concurrent write requests can be allowed with a little bit of extra synchronization; so they don't have to wait until L2 is filled. Let qcow2_cluster.c::handle_dependencies() do the most of the job: if there is an in-flight request to the same cluster, and the current request wants to write in its COW area, and its COW area is marked empty, - steal the allocated offset and write concurrently. Let the original request update L2 later when it likes. This gives an improvement for parallel misaligned writes to unallocated clusters with no backing data: HDD fio over xfs iodepth=4: seqwrite 4k: 18400 -> 22800 IOPS ( x1.24 ) seqwrite 68k: 1600 -> 2300 IOPS ( x1.44 ) Signed-off-by: Anton Nefedov Signed-off-by: Denis V. Lunev --- block/qcow2-cluster.c | 169 +++++++++++++++++++++++++++++++++++++++++++------- block/qcow2.c | 28 ++++++++- block/qcow2.h | 12 +++- 3 files changed, 181 insertions(+), 28 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index c0974e8..7cffdd4 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -898,20 +898,32 @@ out: /* * Check if there already is an AIO write request in flight which allocates - * the same cluster. In this case we need to wait until the previous - * request has completed and updated the L2 table accordingly. + * the same cluster. + * In this case, check if that request has explicitly allowed to write + * in its COW area(s). + * If yes - fill the meta to point to the same cluster. + * If no - we need to wait until the previous request has completed and + * updated the L2 table accordingly or + * has allowed writing in its COW area(s). * Returns: * 0 if there was no dependency. *cur_bytes indicates the number of * bytes from guest_offset that can be read before the next * dependency must be processed (or the request is complete). - * *m is not modified + * *m, *host_offset are not modified + * + * 1 if there is a dependency but it is possible to write concurrently + * *m is filled accordingly, + * *cur_bytes may have decreased and describes + * the length of the area that can be written to, + * *host_offset contains the starting host image offset to write to * * -EAGAIN if we had to wait for another request. The caller - * must start over, so consider *cur_bytes undefined. + * must start over, so consider *cur_bytes and *host_offset undefined. * *m is not modified */ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, - uint64_t *cur_bytes, QCowL2Meta **m) + uint64_t *host_offset, uint64_t *cur_bytes, + QCowL2Meta **m) { BDRVQcow2State *s = bs->opaque; QCowL2Meta *old_alloc; @@ -924,7 +936,7 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, const uint64_t old_start = l2meta_cow_start(old_alloc); const uint64_t old_end = l2meta_cow_end(old_alloc); - if (end <= old_start || start >= old_end) { + if (end <= old_start || start >= old_end || old_alloc->piggybacked) { /* No intersection */ continue; } @@ -936,21 +948,95 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, continue; } - /* Stop if already an l2meta exists. After yielding, it wouldn't - * be valid any more, so we'd have to clean up the old L2Metas - * and deal with requests depending on them before starting to - * gather new ones. Not worth the trouble. */ - if (*m) { + /* offsets of the cluster we're intersecting in */ + const uint64_t cluster_start = start_of_cluster(s, start); + const uint64_t cluster_end = cluster_start + s->cluster_size; + + const uint64_t old_data_start = old_start + + old_alloc->cow_start.nb_bytes; + const uint64_t old_data_end = old_alloc->offset + + old_alloc->cow_end.offset; + + const bool conflict_in_data_area = + end > old_data_start && start < old_data_end; + const bool conflict_in_old_cow_start = + /* 1). new write request area is before the old */ + start < old_data_start + && /* 2). old request did not allow writing in its cow area */ + !old_alloc->cow_start.reduced; + const bool conflict_in_old_cow_end = + /* 1). new write request area is after the old */ + start > old_data_start + && /* 2). old request did not allow writing in its cow area */ + !old_alloc->cow_end.reduced; + + if (conflict_in_data_area || + conflict_in_old_cow_start || conflict_in_old_cow_end) { + + /* Stop if already an l2meta exists. After yielding, it wouldn't + * be valid any more, so we'd have to clean up the old L2Metas + * and deal with requests depending on them before starting to + * gather new ones. Not worth the trouble. */ + if (*m) { + /* start must be cluster aligned at this point */ + assert(start == cluster_start); + *cur_bytes = 0; + return 0; + } + + /* Wait for the dependency to complete. We need to recheck + * the free/allocated clusters when we continue. */ + qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); + return -EAGAIN; + } + + /* allocations do conflict, but the competitor kindly allowed us + * to write concurrently (our data area only, not the whole cluster!) + * Inter alia, this means we must not touch the COW areas */ + + if (*host_offset) { /* start must be cluster aligned at this point */ - assert(start == start_of_cluster(s, start)); - *cur_bytes = 0; - return 0; + assert(start == cluster_start); + if ((old_alloc->alloc_offset + (start - old_start)) + != *host_offset) { + /* can't extend contiguous allocation */ + *cur_bytes = 0; + return 0; + } } - /* Wait for the dependency to complete. We need to recheck - * the free/allocated clusters when we continue. */ - qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); - return -EAGAIN; + QCowL2Meta *old_m = *m; + *m = g_malloc0(sizeof(**m)); + + **m = (QCowL2Meta) { + .next = old_m, + + .alloc_offset = old_alloc->alloc_offset + + (cluster_start - old_start), + .offset = old_alloc->offset + + (cluster_start - old_start), + .nb_clusters = 1, + .piggybacked = true, + .clusters_are_trailing = false, + + /* reduced COW areas. see above */ + .cow_start = { + .offset = 0, + .nb_bytes = start - cluster_start, + .reduced = true, + }, + .cow_end = { + .offset = MIN(end - cluster_start, s->cluster_size), + .nb_bytes = end < cluster_end ? cluster_end - end : 0, + .reduced = true, + }, + }; + qemu_co_queue_init(&(*m)->dependent_requests); + QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight); + + *host_offset = old_alloc->alloc_offset + (start - old_start); + *cur_bytes = MIN(*cur_bytes, cluster_end - start); + return 1; } /* Make sure that existing clusters and new allocations are only used up to @@ -1264,6 +1350,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset, .alloc_offset = alloc_cluster_offset, .offset = start_of_cluster(s, guest_offset), .nb_clusters = nb_clusters, + .piggybacked = false, .clusters_are_trailing = alloc_cluster_offset >= old_data_end, .keep_old_clusters = keep_old_clusters, @@ -1364,13 +1451,12 @@ again: * for contiguous clusters (the situation could have changed * while we were sleeping) * - * c) TODO: Request starts in the same cluster as the in-flight - * allocation ends. Shorten the COW of the in-fight allocation, - * set cluster_offset to write to the same cluster and set up - * the right synchronisation between the in-flight request and - * the new one. + * c) Overlap with another request's writeable COW area. Use + * the stolen offset (and let the original request update L2 + * when it pleases) + * */ - ret = handle_dependencies(bs, start, &cur_bytes, m); + ret = handle_dependencies(bs, start, &cluster_offset, &cur_bytes, m); if (ret == -EAGAIN) { /* Currently handle_dependencies() doesn't yield if we already had * an allocation. If it did, we would have to clean up the L2Meta @@ -1379,6 +1465,8 @@ again: goto again; } else if (ret < 0) { return ret; + } else if (ret) { + continue; } else if (cur_bytes == 0) { break; } else { @@ -1967,3 +2055,36 @@ void qcow2_update_data_end(BlockDriverState *bs, uint64_t off) s->data_end = off; } } + +/* + * For each @m, wait for its dependency request to finish and check for its + * success, i.e. that L2 table is updated as expected. + */ +int qcow2_wait_l2table_update(BlockDriverState *bs, const QCowL2Meta *m) +{ + BDRVQcow2State *s = bs->opaque; + QCowL2Meta *old_alloc; + uint64_t alloc_offset; + unsigned int bytes; + int ret; + + for (; m != NULL; m = m->next) { + assert(m->piggybacked); + QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) { + uint64_t a_off; + a_off = old_alloc->alloc_offset + (m->offset - old_alloc->offset); + if (!old_alloc->piggybacked && m->offset >= old_alloc->offset && + a_off == m->alloc_offset) { + + qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); + break; + } + } + ret = qcow2_get_cluster_offset(bs, m->offset, &bytes, &alloc_offset); + if (ret != QCOW2_CLUSTER_NORMAL || + alloc_offset != m->alloc_offset) { + return -1; + } + } + return 0; +} diff --git a/block/qcow2.c b/block/qcow2.c index 97a66a0..0f28a4b 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1625,6 +1625,8 @@ fail: static void handle_cow_reduce(BlockDriverState *bs, QCowL2Meta *m) { + bool trimmed = false; + if (bs->encrypted) { return; } @@ -1633,12 +1635,19 @@ static void handle_cow_reduce(BlockDriverState *bs, QCowL2Meta *m) (m->offset + m->cow_start.offset) >> BDRV_SECTOR_BITS, m->cow_start.nb_bytes >> BDRV_SECTOR_BITS)) { m->cow_start.reduced = true; + trimmed = true; } if (!m->cow_end.reduced && m->cow_end.nb_bytes != 0 && is_zero_sectors(bs, (m->offset + m->cow_end.offset) >> BDRV_SECTOR_BITS, m->cow_end.nb_bytes >> BDRV_SECTOR_BITS)) { m->cow_end.reduced = true; + trimmed = true; + } + /* The request is trimmed. Let's try to start dependent + ones, may be we will be lucky */ + if (trimmed) { + qemu_co_queue_restart_all(&m->dependent_requests); } } @@ -1787,6 +1796,10 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta) for (m = l2meta; m != NULL; m = m->next) { uint64_t bytes = m->nb_clusters << s->cluster_bits; + if (m->piggybacked) { + continue; + } + if (s->prealloc_size != 0 && handle_prealloc(bs, m)) { handle_cow_reduce(bs, m); continue; @@ -1910,9 +1923,18 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, while (l2meta != NULL) { QCowL2Meta *next; - ret = qcow2_alloc_cluster_link_l2(bs, l2meta); - if (ret < 0) { - goto fail; + if (!l2meta->piggybacked) { + ret = qcow2_alloc_cluster_link_l2(bs, l2meta); + if (ret < 0) { + goto fail; + } + } else { + ret = qcow2_wait_l2table_update(bs, l2meta); + if (ret < 0) { + /* dependency request failed, return general EIO */ + ret = -EIO; + goto fail; + } } /* Take the request off the list of running requests */ diff --git a/block/qcow2.h b/block/qcow2.h index 2fd8510..5947045 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -310,7 +310,8 @@ typedef struct Qcow2COWRegion { /** Number of bytes to copy */ int nb_bytes; - /** The region is filled with zeroes and does not require COW + /** The region does not require COW + * (either filled with zeroes or busy with other request) */ bool reduced; } Qcow2COWRegion; @@ -338,6 +339,13 @@ typedef struct QCowL2Meta bool clusters_are_trailing; /** + * True if the described clusters are being allocated by + * the other concurrent request; so this one must not actually update L2 + * or COW but only write its data + */ + bool piggybacked; + + /** * Requests that overlap with this allocation and wait to be restarted * when the allocating request has completed. */ @@ -575,6 +583,8 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs, BlockDriverAmendStatusCB *status_cb, void *cb_opaque); +int qcow2_wait_l2table_update(BlockDriverState *bs, const QCowL2Meta *m); + /* qcow2-snapshot.c functions */ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info); int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id);