From patchwork Thu Jun 1 15:14:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anton Nefedov X-Patchwork-Id: 9759875 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 0074860375 for ; Thu, 1 Jun 2017 15:27:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3842284F1 for ; Thu, 1 Jun 2017 15:27:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D7F5C284F9; Thu, 1 Jun 2017 15:27:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAD_ENC_HEADER,BAYES_00, DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A9184284F1 for ; Thu, 1 Jun 2017 15:27:19 +0000 (UTC) Received: from localhost ([::1]:45234 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dGS0M-0001FY-MQ for patchwork-qemu-devel@patchwork.kernel.org; Thu, 01 Jun 2017 11:27:18 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51762) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dGRoq-0000XV-L0 for qemu-devel@nongnu.org; Thu, 01 Jun 2017 11:15:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dGRoj-0004nh-Tp for qemu-devel@nongnu.org; Thu, 01 Jun 2017 11:15:24 -0400 Received: from mail-db5eur01on0122.outbound.protection.outlook.com ([104.47.2.122]:10425 helo=EUR01-DB5-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dGRoj-0004n4-Gx for qemu-devel@nongnu.org; Thu, 01 Jun 2017 11:15:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=3MK2Wx8M4lHCOb6fohzDXg6jsZT0kfDtCKRr/mjCIZY=; b=V2LvUl3+/XXFC8hilTYKYgQKYWFh0ff+/ihMp4KzphNFcvS4T/ThwBIz7GOH0RWqXEBluwWdrwkn98A54Atu4N0NzGdorn7dBGrtA0sSNXDHgIv+X3UztevkstIQaQqexhl8/ETIsky/8enUhGDmK1KRrBF50h6RqQhmzc02TlY= Authentication-Results: nongnu.org; dkim=none (message not signed) header.d=none; nongnu.org; dmarc=none action=none header.from=virtuozzo.com; Received: from xantnef-ws.sw.ru (195.214.232.6) by VI1PR0801MB1999.eurprd08.prod.outlook.com (2603:10a6:800:8a::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1124.9; Thu, 1 Jun 2017 15:15:14 +0000 From: Anton Nefedov To: Date: Thu, 1 Jun 2017 18:14:32 +0300 Message-ID: <1496330073-51338-15-git-send-email-anton.nefedov@virtuozzo.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1496330073-51338-1-git-send-email-anton.nefedov@virtuozzo.com> References: <1496330073-51338-1-git-send-email-anton.nefedov@virtuozzo.com> MIME-Version: 1.0 X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: DB6PR1001CA0005.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:4:b7::15) To VI1PR0801MB1999.eurprd08.prod.outlook.com (2603:10a6:800:8a::20) X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VI1PR0801MB1999: X-MS-Office365-Filtering-Correlation-Id: 4bc90806-5465-4a9c-4b68-08d4a901015c X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(201703131423075)(201703031133081); SRVR:VI1PR0801MB1999; X-Microsoft-Exchange-Diagnostics: 1; VI1PR0801MB1999; 3:is4QI25/OTtulAQFeSiaWm2m8P1s1FZUqVErKgvXNHh8J3b69gXTVtyqofXHNXYZbQlWe9j4/pcuWuvFGIZWKTLIndRa4CYcGFci2wdMM1DqQ8uk1skBwnfeWAZQ5HNgj0mDgC7GXg/PrGkCf5yJdYKyk1LBGqVbTqRNN/JXFOcwKsHawj5xjUFeNcyso3UOdnaymZYx6AtYFo/B41zxf6STV62m3G/E7IhNLvvlD/F84QTUrTSVRALjy0M3urEEcCBuiSYHn9ot4nfnom45Bbuc8cjszhFSmq/35IIcVjZ3zv5jrbg3kvmUTRgX9diwFaDc2I1qjCsxiZb8jfsM6w==; 25:Qhc8Q4SvKfnc3SnNc1kFs5Haft5QYAEw2/BfeoDUf03+8NO8mKoZC5aytI+sDr1Ui8B7BuvxhauuRbUHi+KPnMCSmHrcn4K1kBkmxz8MjdGo2MGrE0Vllm6jtc+HPhLc9kXYlhPyEtSgW2SXYU7w5odsbMXwXE4EoTzvGrSJ2oTw/Mn4oH6ZHkKyZQRgHvxY9NK/SKVLi6hwg/Wx2CwoRlIByaVcS69gtK8+Xcowit7qvwwzJSx9jfvw5/Z6v5St7gfLJ+NM832iCPfYwcpW9OG1/6vNB6L3E/L9NwZDkeEI9kdSASi8+TfT44d+bvGvyz/SD3TBEqUa6MDF9t0SWugQwuqmg1tudIFeSN6Gw1NPxVZUk1zLf5ugTPkDRt82cbBfNGS/ck1h7YLjruNjTiukxb/m8xZNwUPgahYQarn7Oteu5hNwZNFeWdQuS1gyOuwQHnSnpFCr6E3vmnnpggF87LWZcnLaDd4oag8DAEM= X-Microsoft-Exchange-Diagnostics: 1; VI1PR0801MB1999; 31:fE12KJz6QSUrgg11OeXbcgRUAO9chLBq54RxHl+pK7fcJb2Mf1LL9+WNyLKVZWmk+MJcrMwELBCkW5W3HKufZ2NwUxqDpYhOBhQBmoW+x9e7sMWQUJmmCnsE+9YxY/T8RK1cbJx9SYo/VmzJ79uMy9S7KwBsHPL4MSRvpEpiFb4CWbQRSOiEZXTbwODGq66GI0VY9sr90GtQQoEXVqunh366Levn9QzLrc1EXrLcx8o=; 20:yP/RVaWUQMSAD3ch35XZld1UoQeVLlwo+PmIuIpSq7QqWOLelGmK0ib44p+Nwwc8DW2Depuu01ZzBhoIGNMNrD0e1zxsBpsuj2/prNpAQM1K8aAsEV7l0xGmX41/drfGAFeq5c8Bhb1y5XKduToxXHBiJU+2jZf7PHWUuPptJmBIgmlPFiSmEHSjatI23mQuRmNaQeviQVw7Yab0ZPatGxs1jvtsFhoOl+jKxxlKAr042HYtl1x8ae9zF95yKzlKhPXMa+XNDUuDYkRC0XFWx5VAKEdmB5TBs2XyjrPgvaQs6W9sy+YF2dEvcWKJA+JkDPFcxVnhSmZhYSpylyajOjLg3WJAS5D80aAWiNkVztBfkYPsmSZcnAL6ovjMt4KS0LSCN3XMB4Zv4j4cCRjgPxiebm8v5r8efpuk63t6eEI= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(100000703101)(100105400095)(3002001)(6041248)(20161123560025)(20161123558100)(20161123555025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:VI1PR0801MB1999; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:VI1PR0801MB1999; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; VI1PR0801MB1999; 4:s1T1FO/jkU4psSjGnBpzDYF2KE00WLuETseCnJF/?= =?us-ascii?Q?HvrsaBOWLnWRu+PS/f+WtTrinBafJwwiUTe8FtyTb+L586xDfqFUjW/NpYYP?= =?us-ascii?Q?J/HIHvmIx4qIMTozeUABfboekwhibZ1pOCFetsqUes+pioIj6+Ce3TeY1Qwq?= =?us-ascii?Q?QpTMblYwQPlaIaBfOO6hbGLt4qVqi82XvaAqTyQQ5eKl5r2YIoRBCx6afclA?= =?us-ascii?Q?+E9Rydiip4pjKV6cxDmSNWIjnd4iIFSAFnGnIfkgrPyH7O/yP1HUnM39lNhy?= =?us-ascii?Q?nkySh3EPFyxqSpFZErMN/q9rkvv8bN+SpafwrScp5m7pm47foeZM6/9aHiZ8?= =?us-ascii?Q?KOqM6tdQNZu6v4WfgH7dDW0i3w9IOmIaGCrd3hBWhSljMPt+rBUQsNXgskx+?= =?us-ascii?Q?JwrSehBzv3vHmhJc3WO+QXrNVvESptq+kIn4B73FqUqqPitJnsf3pLaJZv6t?= =?us-ascii?Q?pEFIE9SZHpIJhTOv5Ri7rN3Ukxe8zcAlImpIPAb1XjwBMuhzrj15ly8wVvLi?= =?us-ascii?Q?8G+0CpuzCvCf4ex2xEYWXSynsaXwYxP3rhjKbeGSpwQem3EvddDLszXxqqAK?= =?us-ascii?Q?O/U7GvS6UqQr85VXLJGzUn6SI9pFtoieyFBobXxLdd7gGNpLmfhPq/QVVciy?= =?us-ascii?Q?1IVuziHpaURSbYc9LHzCoKfut2cR/FQ6G4zPnU65BS4OJ9X5Vz9l6E5seN1A?= =?us-ascii?Q?KiYOEPrr+Pe4nEZrUNAB3EVSfTw+oGstkHouZ30RXLCnhi/VfR7z87WnN1a8?= =?us-ascii?Q?mr3SO1n54u/Qq2JFTCb3hJYpD9Aj5LqyxIYPxYfH4uUgUk3ZKcJYlI+YitRM?= =?us-ascii?Q?K9M9u6e+eYIECkG/l8uwwxZXirtZS7qaFANmvsZjBagTOh8zmHdRkg+sqnv6?= =?us-ascii?Q?7aXTGDND/N1JgOxEDjExY2PaslG7TKi7eF/Grk0EC87+kpjgW/smuVAiCznv?= =?us-ascii?Q?fT+a0lwrNiSSY+HjNwDy8iYZ8IKbjskhveDUfrPeCrIO7ih7ckBtjgo+dN7/?= =?us-ascii?Q?E3zAXPYKPvd4dp1eMLWYsrpoMCRBlgQOgh0e13qjJlDopUV9ljR2khDQu/xu?= =?us-ascii?Q?nPcWBv9gzGFbE3LjBC3t/KosFbpIGQBWhjC0mZSMStroNoisshzDJRaZewQR?= =?us-ascii?Q?VqwPLNQzr4NDP6JkSEMsxZbm/DnEW6ie?= X-Forefront-PRVS: 0325F6C77B X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10019020)(4630300001)(6009001)(39830400002)(39450400003)(39400400002)(6916009)(6666003)(6486002)(7736002)(478600001)(5660300001)(305945005)(54906002)(53936002)(4326008)(6512007)(6506006)(2950100002)(110136004)(107886003)(3846002)(50226002)(8676002)(66066001)(42186005)(81166006)(38730400002)(25786009)(6116002)(76176999)(50986999)(47776003)(86362001)(5003940100001)(2906002)(2351001)(36756003)(48376002)(50466002)(33646002)(575784001)(53416004)(189998001); DIR:OUT; SFP:1102; SCL:1; SRVR:VI1PR0801MB1999; H:xantnef-ws.sw.ru; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; VI1PR0801MB1999; 23:3pcBG0Sz5Xz21R+Efi0jQy7KfqF+NClgXShGYPR?= =?us-ascii?Q?5bcr6B7k/GaFoomOVVOe67WYJxY/USjsa1dWqj0bqehhO5KCXlGKlIrVafZw?= =?us-ascii?Q?FvRr3nICzF0gk8We+qA3wff9OaSx50b8y53xX719Kv03LKxr3cs5PDhofCp9?= =?us-ascii?Q?hTUej9wl0yVRv/Ma18yd1fPye3cigJ4CWWPF74rfbHOlRA/JMwyBDOED8vj1?= =?us-ascii?Q?AQhZrQ0AUCi2D80ZMBhPceHYt4mso2ig2TFBbDJj7IV/AvMxv+zgbrHNlBnM?= =?us-ascii?Q?5y9IDcloQZGcvV57DCXD0KbJCCXZxRnIXW2VXe53gd69hDwBnNvADwxGykMz?= =?us-ascii?Q?b2A2l7+NVFKpMimf6xnZfTH+HmGDvQcQyIbFMHV2YpkEhGjxuZwfcVTcut1S?= =?us-ascii?Q?TEh1cFHCJbcYHgmsPpG8a239GnmY9oVsAkXT+Xoso07mrJlA6Jo0m0Y9lFQD?= =?us-ascii?Q?RQdhWXBU21Nd3hzH7rENq8Zy8PtVOPepdo4GMyStuPTl+NQ/HI/c8mEw3rF+?= =?us-ascii?Q?dVD3mg+XAiBN0jVlMQrvKOUFOx8jZbFVsMO4BzGL729PAXVcWhvJaysBCGsy?= =?us-ascii?Q?mL/HJUqv1Phh4dhSYF/ZhQxzcxH2zBnRJtAyzJy30r4lE89sfZDvPG8rE3ru?= =?us-ascii?Q?cVxXxJE10TznTn6rpcATZgIA50E4jYxb4L991TfiUzSLiPAn6Vu121lIkFcl?= =?us-ascii?Q?eqgHhiijrHgbNtvQoWJHiTiepBcu33UxKLIDxsxr1REN03glHLi8IEZSqmsp?= =?us-ascii?Q?4e7aDjzW4MZFEPNaEOAnFwQR5jjU9lRA9w+GLI2X4WVDVNX2DKBrVI5gBuyv?= =?us-ascii?Q?VVGFZtae0U1dV8rWN4lBQJHaYtIDz7D1M1sfZs3jQXoJx5P5ex6agn9yBtqS?= =?us-ascii?Q?7C7BhsRuR1Sy8s1I+tuuEYA945WGGGGfNUBWwSx01h8Pf4C43dE53tz1DSKe?= =?us-ascii?Q?J3QqUYtEyDyZxyIHeL94ascCGWOd4uKdwWSJ6+M/3pmk2gbVnY7GBNp4dru3?= =?us-ascii?Q?mbElDbeulaHRC7yCy7NghEAVpCupKz0O4dKZHEpN1s6DhCwCLWWAbhtVEQvF?= =?us-ascii?Q?YzrDGqt3pByDSDsrBTD/r6iqBNoY4?= X-Microsoft-Exchange-Diagnostics: 1; VI1PR0801MB1999; 6:dn8c4PiePVjJSg4qB4RYxv/bdR+KIzgssvya07oJFspHeiTFoCTCYnjTNc/apjXA+Wx0rCK04PhlzrM/kF3b0S4YHS6BvJ/8QKK9eC1OMP0h5WOKaDcnjTXyZOOf8m5RPL24VAF+mrLpMd3tx/G/zbvShTDDDYzMiRfDDY/KZppdBqlFT2GsBBPlY+pzjConBOW8vJhFs+LGjqHWceF2hDkh1XM/KqdKOPlikwwqkxKUOqZLSWucBzWAoJb6212LixL9kO1IoAMKLPbPON7RzaS3CxIg2k6pdeQXZb0aPL9dYCXljUlyB1Mt3+pxjGGEdHcWkeRtgc6bHyHZgZifP+03pkNSI0vaB9y+EuHzQ8x3yeGT5IReIlNLtJA25cs9TQ9pu5w6YWE6WvRdc9/XbFef0n0cQrZlLB6yN8twEGmj9WKirO1zNE8QFeSTY8JHCSc7BtpKWw6376xN2nQf/I2OXX/LcDJBihmlYjFiCnnri37JCzPnc6YB9UaR9EBw/yZkAeuP2gAmy9/I0IDBKQ== X-Microsoft-Exchange-Diagnostics: 1; VI1PR0801MB1999; 5:tIE0b/v916Moas3m0vYgYw5PuI1e56+4evKZ1ET4fQHawU6nr3YGgQlSucFBfvz+inQYhw4ENHYOFgxZqqcHrlm8GD02j67Ub7hNlWW/C0s5Q2iiL0rsnsxvuxeA5NyRQEZ38dNFEmduWKapFW3/MnbXvBFeXQKNzrFP9yIqc1GR+JdbSk9KhOE68V6w7svyNr08OKpWpep+fqPZnAdWvw6G7yaeBVR3KRkXZh2NuraMjuyh+QT+sf9ebYFwoMTeB0EMHn0xAPsBqeRgRtOVGf48C6ABO5llMdsL0WYn/DDivirnEvDwr6/uXsKjV5Na4sMITpD98oDI6lEiZNA3i+qbUYR+NKhG3wBBtTuxgxYDP9PWJd7g8jrlbUeNwONC/qyV74bFvqPaG9QHNe1G0hn+QSA/Z2Q4GDGJmCaWiZuAPRP2io4o7qnTAMf6MWosidZk8S32d2fcVhPfVKmiMyJM2MBYTfx5XoN/kNvvCWBETWLUYIwSTKTFe9pJkZVh; 24:S7u1kr7rI8xonqDR2GOojIHun4Pu1MVbRHF65HbORtqdh5JLkrjDxwqnl1bbMEGE4U6J+gs9XRz2YLcKzFIQTKqw8NCLWlErvNh1+rZieHc= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; VI1PR0801MB1999; 7:18FlsYWfvzDnbVWZkQt7PncC1nfQuQfW9BUNEeGGJyMpC9Jyu5J0dKOjzFCqtcnG7vLHwsOceiB/Pv3qfs4aQZM7Pf52XEnxJg/+DOhbXKYDaDkXR91RgiDeJfIKKQjobTCdnKsoZMC/80qhoLlZm/zTnTygEquZRd3iucfc9/GDltQtt/SEfVi91fKP9vmVnCXTSC+e34/DVCvS5eesE+ThprgOylp1ykU8/07T55sE7izKSCFuBoY8dXGlvbL6ZNocaqo/0oHadfy4I/nGMBJaM3QxaYbjxyoKtScoX43s4Sy75ozZYNZH4xE+tCs0+fBj9Ys3EUWM6zVrpjE//w==; 20:C5Z8TN2oFClai3M4M0iO8EfNvFhZA0tqZ39AxMUAfkLGiR1EEeREJGAJ31Ht1C3meWjTES/ABzmu0r7SFzB+QMgy2DYHVRJ8VLpt4LitW73i1vebTwioyRNm2hqhmFuo4UIcfqr37bZdhmP+/5cThFYoWf/xlmEr0JWE8KFTjqQ= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Jun 2017 15:15:14.0219 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1999 X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 104.47.2.122 Subject: [Qemu-devel] [PATCH v2 14/15] qcow2: allow concurrent unaligned writes to the same clusters X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Anton Nefedov , den@virtuozzo.com, mreitz@redhat.com, "Denis V . Lunev" Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP If COW area of a write request to unallocated cluster is empty, concurrent write requests can be allowed with a little bit of extra synchronization; so they don't have to wait until L2 is filled. Let qcow2_cluster.c::handle_dependencies() do the most of the job: if there is an in-flight request to the same cluster, and the current request wants to write in its COW area, and its COW area is marked empty, - steal the allocated offset and write concurrently. Let the original request update L2 later when it likes. This gives an improvement for parallel misaligned writes to unallocated clusters with no backing data: HDD fio over xfs iodepth=4: seqwrite 4k: 18400 -> 22800 IOPS ( x1.24 ) seqwrite 68k: 1600 -> 2300 IOPS ( x1.44 ) Signed-off-by: Anton Nefedov Signed-off-by: Denis V. Lunev --- block/qcow2-cluster.c | 169 +++++++++++++++++++++++++++++++++++++++++++------- block/qcow2.c | 28 ++++++++- block/qcow2.h | 12 +++- 3 files changed, 181 insertions(+), 28 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 2fa549d..03b2e6c 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -898,20 +898,32 @@ out: /* * Check if there already is an AIO write request in flight which allocates - * the same cluster. In this case we need to wait until the previous - * request has completed and updated the L2 table accordingly. + * the same cluster. + * In this case, check if that request has explicitly allowed to write + * in its COW area(s). + * If yes - fill the meta to point to the same cluster. + * If no - we need to wait until the previous request has completed and + * updated the L2 table accordingly or + * has allowed writing in its COW area(s). * Returns: * 0 if there was no dependency. *cur_bytes indicates the number of * bytes from guest_offset that can be read before the next * dependency must be processed (or the request is complete). - * *m is not modified + * *m, *host_offset are not modified + * + * 1 if there is a dependency but it is possible to write concurrently + * *m is filled accordingly, + * *cur_bytes may have decreased and describes + * the length of the area that can be written to, + * *host_offset contains the starting host image offset to write to * * -EAGAIN if we had to wait for another request. The caller - * must start over, so consider *cur_bytes undefined. + * must start over, so consider *cur_bytes and *host_offset undefined. * *m is not modified */ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, - uint64_t *cur_bytes, QCowL2Meta **m) + uint64_t *host_offset, uint64_t *cur_bytes, + QCowL2Meta **m) { BDRVQcow2State *s = bs->opaque; QCowL2Meta *old_alloc; @@ -924,7 +936,7 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, const uint64_t old_start = l2meta_cow_start(old_alloc); const uint64_t old_end = l2meta_cow_end(old_alloc); - if (end <= old_start || start >= old_end) { + if (end <= old_start || start >= old_end || old_alloc->piggybacked) { /* No intersection */ continue; } @@ -936,21 +948,95 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset, continue; } - /* Stop if an l2meta already exists. After yielding, it wouldn't - * be valid any more, so we'd have to clean up the old L2Metas - * and deal with requests depending on them before starting to - * gather new ones. Not worth the trouble. */ - if (*m) { + /* offsets of the cluster we're intersecting in */ + const uint64_t cluster_start = start_of_cluster(s, start); + const uint64_t cluster_end = cluster_start + s->cluster_size; + + const uint64_t old_data_start = old_start + + old_alloc->cow_start.nb_bytes; + const uint64_t old_data_end = old_alloc->offset + + old_alloc->cow_end.offset; + + const bool conflict_in_data_area = + end > old_data_start && start < old_data_end; + const bool conflict_in_old_cow_start = + /* 1). new write request area is before the old */ + start < old_data_start + && /* 2). old request did not allow writing in its cow area */ + !old_alloc->cow_start.reduced; + const bool conflict_in_old_cow_end = + /* 1). new write request area is after the old */ + start > old_data_start + && /* 2). old request did not allow writing in its cow area */ + !old_alloc->cow_end.reduced; + + if (conflict_in_data_area || + conflict_in_old_cow_start || conflict_in_old_cow_end) { + + /* Stop if an l2meta already exists. After yielding, it wouldn't + * be valid any more, so we'd have to clean up the old L2Metas + * and deal with requests depending on them before starting to + * gather new ones. Not worth the trouble. */ + if (*m) { + /* start must be cluster aligned at this point */ + assert(start == cluster_start); + *cur_bytes = 0; + return 0; + } + + /* Wait for the dependency to complete. We need to recheck + * the free/allocated clusters when we continue. */ + qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); + return -EAGAIN; + } + + /* allocations do conflict, but the competitor kindly allowed us + * to write concurrently (our data area only, not the whole cluster!) + * Inter alia, this means we must not touch the COW areas */ + + if (*host_offset) { /* start must be cluster aligned at this point */ - assert(start == start_of_cluster(s, start)); - *cur_bytes = 0; - return 0; + assert(start == cluster_start); + if ((old_alloc->alloc_offset + (start - old_start)) + != *host_offset) { + /* can't extend contiguous allocation */ + *cur_bytes = 0; + return 0; + } } - /* Wait for the dependency to complete. We need to recheck - * the free/allocated clusters when we continue. */ - qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); - return -EAGAIN; + QCowL2Meta *old_m = *m; + *m = g_malloc0(sizeof(**m)); + + **m = (QCowL2Meta) { + .next = old_m, + + .alloc_offset = old_alloc->alloc_offset + + (cluster_start - old_start), + .offset = old_alloc->offset + + (cluster_start - old_start), + .nb_clusters = 1, + .piggybacked = true, + .clusters_are_trailing = false, + + /* reduced COW areas. see above */ + .cow_start = { + .offset = 0, + .nb_bytes = start - cluster_start, + .reduced = true, + }, + .cow_end = { + .offset = MIN(end - cluster_start, s->cluster_size), + .nb_bytes = end < cluster_end ? cluster_end - end : 0, + .reduced = true, + }, + }; + qemu_co_queue_init(&(*m)->dependent_requests); + QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight); + + *host_offset = old_alloc->alloc_offset + (start - old_start); + *cur_bytes = MIN(*cur_bytes, cluster_end - start); + return 1; } /* Make sure that existing clusters and new allocations are only used up to @@ -1264,6 +1350,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset, .alloc_offset = alloc_cluster_offset, .offset = start_of_cluster(s, guest_offset), .nb_clusters = nb_clusters, + .piggybacked = false, .clusters_are_trailing = alloc_cluster_offset >= old_data_end, .keep_old_clusters = keep_old_clusters, @@ -1364,13 +1451,12 @@ again: * for contiguous clusters (the situation could have changed * while we were sleeping) * - * c) TODO: Request starts in the same cluster as the in-flight - * allocation ends. Shorten the COW of the in-fight allocation, - * set cluster_offset to write to the same cluster and set up - * the right synchronisation between the in-flight request and - * the new one. + * c) Overlap with another request's writeable COW area. Use + * the stolen offset (and let the original request update L2 + * when it pleases) + * */ - ret = handle_dependencies(bs, start, &cur_bytes, m); + ret = handle_dependencies(bs, start, &cluster_offset, &cur_bytes, m); if (ret == -EAGAIN) { /* Currently handle_dependencies() doesn't yield if we already had * an allocation. If it did, we would have to clean up the L2Meta @@ -1379,6 +1465,8 @@ again: goto again; } else if (ret < 0) { return ret; + } else if (ret) { + continue; } else if (cur_bytes == 0) { break; } else { @@ -1968,3 +2056,36 @@ void qcow2_update_data_end(BlockDriverState *bs, uint64_t off) s->data_end = off; } } + +/* + * For each @m, wait for its dependency request to finish and check for its + * success, i.e. that L2 table is updated as expected. + */ +int qcow2_wait_l2table_update(BlockDriverState *bs, const QCowL2Meta *m) +{ + BDRVQcow2State *s = bs->opaque; + QCowL2Meta *old_alloc; + uint64_t alloc_offset; + unsigned int bytes; + int ret; + + for (; m != NULL; m = m->next) { + assert(m->piggybacked); + QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) { + uint64_t a_off; + a_off = old_alloc->alloc_offset + (m->offset - old_alloc->offset); + if (!old_alloc->piggybacked && m->offset >= old_alloc->offset && + a_off == m->alloc_offset) { + + qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); + break; + } + } + ret = qcow2_get_cluster_offset(bs, m->offset, &bytes, &alloc_offset); + if (ret != QCOW2_CLUSTER_NORMAL || + alloc_offset != m->alloc_offset) { + return -1; + } + } + return 0; +} diff --git a/block/qcow2.c b/block/qcow2.c index 92d0af6..f02c5e6 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1627,6 +1627,8 @@ fail: static void handle_cow_reduce(BlockDriverState *bs, QCowL2Meta *m) { + bool trimmed = false; + if (bs->encrypted) { return; } @@ -1635,12 +1637,19 @@ static void handle_cow_reduce(BlockDriverState *bs, QCowL2Meta *m) (m->offset + m->cow_start.offset) >> BDRV_SECTOR_BITS, m->cow_start.nb_bytes >> BDRV_SECTOR_BITS)) { m->cow_start.reduced = true; + trimmed = true; } if (!m->cow_end.reduced && m->cow_end.nb_bytes != 0 && is_zero_sectors(bs, (m->offset + m->cow_end.offset) >> BDRV_SECTOR_BITS, m->cow_end.nb_bytes >> BDRV_SECTOR_BITS)) { m->cow_end.reduced = true; + trimmed = true; + } + /* The request is trimmed. Let's try to start dependent + ones, may be we will be lucky */ + if (trimmed) { + qemu_co_queue_restart_all(&m->dependent_requests); } } @@ -1782,6 +1791,10 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta) for (m = l2meta; m != NULL; m = m->next) { uint64_t bytes = m->nb_clusters << s->cluster_bits; + if (m->piggybacked) { + continue; + } + if (s->prealloc_size != 0 && handle_prealloc(bs, m)) { handle_cow_reduce(bs, m); continue; @@ -1903,9 +1916,18 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, while (l2meta != NULL) { QCowL2Meta *next; - ret = qcow2_alloc_cluster_link_l2(bs, l2meta); - if (ret < 0) { - goto fail; + if (!l2meta->piggybacked) { + ret = qcow2_alloc_cluster_link_l2(bs, l2meta); + if (ret < 0) { + goto fail; + } + } else { + ret = qcow2_wait_l2table_update(bs, l2meta); + if (ret < 0) { + /* dependency request failed, return general EIO */ + ret = -EIO; + goto fail; + } } /* Take the request off the list of running requests */ diff --git a/block/qcow2.h b/block/qcow2.h index 2fd8510..5947045 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -310,7 +310,8 @@ typedef struct Qcow2COWRegion { /** Number of bytes to copy */ int nb_bytes; - /** The region is filled with zeroes and does not require COW + /** The region does not require COW + * (either filled with zeroes or busy with other request) */ bool reduced; } Qcow2COWRegion; @@ -338,6 +339,13 @@ typedef struct QCowL2Meta bool clusters_are_trailing; /** + * True if the described clusters are being allocated by + * the other concurrent request; so this one must not actually update L2 + * or COW but only write its data + */ + bool piggybacked; + + /** * Requests that overlap with this allocation and wait to be restarted * when the allocating request has completed. */ @@ -575,6 +583,8 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs, BlockDriverAmendStatusCB *status_cb, void *cb_opaque); +int qcow2_wait_l2table_update(BlockDriverState *bs, const QCowL2Meta *m); + /* qcow2-snapshot.c functions */ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info); int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id);