From patchwork Fri Jul 12 09:46:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Drobyshev X-Patchwork-Id: 13731542 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A6013C2BD09 for ; Fri, 12 Jul 2024 09:47:25 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sSCrU-0002c3-Uw; Fri, 12 Jul 2024 05:46:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sSCrF-0002JP-5G; Fri, 12 Jul 2024 05:46:37 -0400 Received: from relay.virtuozzo.com ([130.117.225.111]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sSCrB-0001Y7-Iy; Fri, 12 Jul 2024 05:46:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=SxnnQcf/vCqlj8mFGAhTZWtk/V3Dt/KLf3GRTXyVbiM=; b=ee++AjLfkr+e 7lB0d8+Op2frbeK6axigxz5f+LITkVCPmIUmUACjZE+Up1MHLrs9jI6ZT0vmrHlgQ+SBW8ncAsfJc ROxjgaS2AwPsLZncm7LruujMYI4oSaLJyR78XOZCFveR+Yuko0C0D6HsAVtmrXEcO/Zuf9U/4uXNL nMeu62lCPOjDXrlVqDwKVmarqXvYZt0Jj22+IN29S4qioREpA7B/yXgP1iS3hhkliUs9rt9Thirq/ zpNjoNjK8LdHGgV6yAK0++U8RxQRDuZzMquxHbsZo9X8Jmv7BEXpQ5oIHF/D3vMClIQgaMtsw0TzR dhalwBhd6ZYQbKa4SAyCfQ==; Received: from [130.117.225.1] (helo=dev005.ch-qa.vzint.dev) by relay.virtuozzo.com with esmtp (Exim 4.96) (envelope-from ) id 1sSCqO-00Ceua-2b; Fri, 12 Jul 2024 11:46:17 +0200 From: Andrey Drobyshev To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, pbonzini@redhat.com, eesposit@redhat.com, andrey.drobyshev@virtuozzo.com, den@virtuozzo.com Subject: [PATCH v2 1/2] block: zero data data corruption using prealloc-filter Date: Fri, 12 Jul 2024 12:46:16 +0300 Message-Id: <20240712094617.565237-2-andrey.drobyshev@virtuozzo.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240712094617.565237-1-andrey.drobyshev@virtuozzo.com> References: <20240712094617.565237-1-andrey.drobyshev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=130.117.225.111; envelope-from=andrey.drobyshev@virtuozzo.com; helo=relay.virtuozzo.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Denis V. Lunev" We have observed that some clusters in the QCOW2 files are zeroed while preallocation filter is used. We are able to trace down the following sequence when prealloc-filter is used: co=0x55e7cbed7680 qcow2_co_pwritev_task() co=0x55e7cbed7680 preallocate_co_pwritev_part() co=0x55e7cbed7680 handle_write() co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes() co=0x55e7cbed7680 raw_do_pwrite_zeroes() co=0x7f9edb7fe500 do_fallocate() Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine 0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is time to handle next coroutine, which co=0x55e7cbee91b0 qcow2_co_pwritev_task() co=0x55e7cbee91b0 preallocate_co_pwritev_part() co=0x55e7cbee91b0 handle_write() co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes() co=0x55e7cbee91b0 raw_do_pwrite_zeroes() co=0x7f9edb7deb00 do_fallocate() The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for the same area. This means that if (once fallocate is started inside 0x7f9edb7deb00) original fallocate could end and the real write will be executed. In that case write() request is handled at the same time as fallocate(). The patch moves s->file_lock assignment before fallocate and that is crucial. The idea is that all subsequent requests into the area being preallocation will be issued as just writes without fallocate to this area and they will not proceed thanks to overlapping requests mechanics. If preallocation will fail, we will just switch to the normal expand-by-write behavior and that is not a problem except performance. Signed-off-by: Denis V. Lunev Tested-by: Andrey Drobyshev --- block/preallocate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/preallocate.c b/block/preallocate.c index d215bc5d6d..ecf0aa4baa 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, want_merge_zero = want_merge_zero && (prealloc_start <= offset); + /* + * Assign file_end before making actual preallocation. This will ensure + * that next request performed while preallocation is in progress will + * be passed without preallocation. + */ + s->file_end = prealloc_end; + ret = bdrv_co_pwrite_zeroes( bs->file, prealloc_start, prealloc_end - prealloc_start, BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT); @@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, return false; } - s->file_end = prealloc_end; return want_merge_zero; } From patchwork Fri Jul 12 09:46:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Drobyshev X-Patchwork-Id: 13731541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8131DC3DA4A for ; Fri, 12 Jul 2024 09:47:25 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sSCrU-0002ZP-2A; Fri, 12 Jul 2024 05:46:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sSCrD-0002JI-T2; Fri, 12 Jul 2024 05:46:36 -0400 Received: from relay.virtuozzo.com ([130.117.225.111]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sSCrB-0001Y8-Fz; Fri, 12 Jul 2024 05:46:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=wNiQ3C1oGxdSfTWhqpAGz0OkDU2T3/U7NR1jQ5Ua/cU=; b=TZVOYJazAcC+ dZUJnpKeEs1+2aUSej4FmT3nh1BIxSVbx4S0Cv+ocCY1+H4UYoXKg6oQkko2Tc3mm8ypCECeWp3Tr sZy6KSFbIP1kDXnKjDrqhobxsrofGVV42GCkilQwCVD44DvYMG4C+YC/n7QTfDcdHBFTQZqRedxnK pRZn14XslOvlDloqe3PKX5p+czZ5/aOf0aO0fz0nSbQZQ/BzugwVFr3urCC4yzeT8j9IVVduQTOhu dEifB7gTYNKT5zl1wIld096MTBAhWPVQFM5yttMIF83KSt5H6imQcBy7+mQYn5mwsgPX3zCo0/ngx GPx/PVmgyIEwzXtG0KzJtg==; Received: from [130.117.225.1] (helo=dev005.ch-qa.vzint.dev) by relay.virtuozzo.com with esmtp (Exim 4.96) (envelope-from ) id 1sSCqO-00Ceua-2n; Fri, 12 Jul 2024 11:46:17 +0200 From: Andrey Drobyshev To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, pbonzini@redhat.com, eesposit@redhat.com, andrey.drobyshev@virtuozzo.com, den@virtuozzo.com Subject: [PATCH v2 2/2] iotests/298: add testcase for async writes with preallocation filter Date: Fri, 12 Jul 2024 12:46:17 +0300 Message-Id: <20240712094617.565237-3-andrey.drobyshev@virtuozzo.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240712094617.565237-1-andrey.drobyshev@virtuozzo.com> References: <20240712094617.565237-1-andrey.drobyshev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=130.117.225.111; envelope-from=andrey.drobyshev@virtuozzo.com; helo=relay.virtuozzo.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The testcase simply creates a 64G image with 1M clusters, generates a list of 1M aligned offsets and feeds aio_write commands with those offsets to qemu-io run with '--aio native --nocache'. Then we check the data written at each of the offsets. Before the previous commit this could result into a race within the preallocation filter which would zeroize some clusters after actually writing data to them. Note: the test doesn't fail in 100% cases as there's a race involved, but the failures are pretty consistent so it should be good enough for detecting the problem. Signed-off-by: Andrey Drobyshev --- tests/qemu-iotests/298 | 34 ++++++++++++++++++++++++++++++++++ tests/qemu-iotests/298.out | 4 ++-- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298 index 09c9290711..d1bf5ee0df 100755 --- a/tests/qemu-iotests/298 +++ b/tests/qemu-iotests/298 @@ -20,8 +20,10 @@ import os import iotests +import random MiB = 1024 * 1024 +GiB = MiB * 1024 disk = os.path.join(iotests.test_dir, 'disk') overlay = os.path.join(iotests.test_dir, 'overlay') refdisk = os.path.join(iotests.test_dir, 'refdisk') @@ -176,5 +178,37 @@ class TestTruncate(iotests.QMPTestCase): self.do_test('off', '150M') +class TestPreallocAsyncWrites(iotests.QMPTestCase): + def setUp(self): + # Make sure we get reproducible write patterns on each run + random.seed(42) + iotests.qemu_img_create('-f', iotests.imgfmt, disk, '-o', + f'cluster_size={MiB},lazy_refcounts=on', + str(64 * GiB)) + + def tearDown(self): + os.remove(disk) + + def test_prealloc_async_writes(self): + requests = 1024 # Number of write/read requests to feed to qemu-io + total_clusters = 64 * 1024 # 64G / 1M + + offsets = random.sample(range(0, total_clusters), requests) + aio_write_cmds = [f'aio_write -P 0xaa {off}M 1M' for off in offsets] + read_cmds = [f'read -P 0xaa {off}M 1M' for off in offsets] + + proc = iotests.QemuIoInteractive('--aio', 'native', '--nocache', + '--image-opts', drive_opts) + for cmd in aio_write_cmds: + proc.cmd(cmd) + proc.close() + + proc = iotests.QemuIoInteractive('-f', iotests.imgfmt, disk) + for cmd in read_cmds: + out = proc.cmd(cmd) + self.assertFalse('Pattern verification failed' in str(out)) + proc.close() + + if __name__ == '__main__': iotests.main(supported_fmts=['qcow2'], required_fmts=['preallocate']) diff --git a/tests/qemu-iotests/298.out b/tests/qemu-iotests/298.out index fa16b5ccef..6323079e08 100644 --- a/tests/qemu-iotests/298.out +++ b/tests/qemu-iotests/298.out @@ -1,5 +1,5 @@ -............. +.............. ---------------------------------------------------------------------- -Ran 13 tests +Ran 14 tests OK From patchwork Mon Jul 15 12:36:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Drobyshev X-Patchwork-Id: 13733426 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E43FC3DA4B for ; Mon, 15 Jul 2024 12:37:40 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sTKwk-0001ai-Ie; Mon, 15 Jul 2024 08:36:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTKwg-0001XC-Gr; Mon, 15 Jul 2024 08:36:46 -0400 Received: from relay.virtuozzo.com ([130.117.225.111]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTKwd-0002X0-GJ; Mon, 15 Jul 2024 08:36:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=Sw82UZxyldCsryWuHDMLPvXTfZcc5DZM9+qg13v3L14=; b=ktUA7jpDY/uB O89+tUGL6mOtPpdvIIsrgUVgQZalVpIvq9yIEyWpAwY2BT7XZkLGGCDJUfRY8VEmhCfbMWui+wl/S pywnZEW/JZpYLKx2ddi5ZZZBPPVzlkPkqv/SwDfHaKMI2699f9XIFHHont6eM56ORSq9cspjNjwnU ++qXMBlXf15lcsX+/XHXlDUzsl4OToEVBNHJ2OFOemAm/KRDLHbtne4Nw0KloXEPIYkYArbA7sRpg jd88qh3qNYGY/9n3wagq3UwRMFiwVwjrjyBLDio5YviEiTi2gqXZaIFGyYo2fiCLLUmPHBdk/KPi1 vuy4VIzcyyLDGyqIZvSTPQ==; Received: from [130.117.225.1] (helo=dev005.ch-qa.vzint.dev) by relay.virtuozzo.com with esmtp (Exim 4.96) (envelope-from ) id 1sTKvk-00CsrT-1j; Mon, 15 Jul 2024 14:36:29 +0200 From: Andrey Drobyshev To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, pbonzini@redhat.com, eesposit@redhat.com, andrey.drobyshev@virtuozzo.com, den@virtuozzo.com Subject: [PATCH v2 3/2] scripts: add filev2p.py script for mapping virtual file offsets mapping Date: Mon, 15 Jul 2024 15:36:36 +0300 Message-Id: <20240715123636.619714-1-andrey.drobyshev@virtuozzo.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240712094617.565237-1-andrey.drobyshev@virtuozzo.com> References: <20240712094617.565237-1-andrey.drobyshev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=130.117.225.111; envelope-from=andrey.drobyshev@virtuozzo.com; helo=relay.virtuozzo.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The script is basically a wrapper around "filefrag" utility. This might be used to map virtual offsets within the file to the underlying block device offsets. In addition, a chunk size might be specified, in which case a list of such mappings will be obtained: $ scripts/filev2p.py -s 100M /sparsefile 1768M 1853882368..1895825407 (file) -> 16332619776..16374562815 (/dev/sda4) -> 84492156928..84534099967 (/dev/sda) 1895825408..1958739967 (file) -> 17213591552..17276506111 (/dev/sda4) -> 85373128704..85436043263 (/dev/sda) This could come in handy when we need to map a certain piece of data within a file inside VM to the same data within the image on the host (e.g. physical offset on VM's /dev/sda would be the virtual offset within QCOW2 image). Note: as of now the script only works with the files located on plain partitions, i.e. it doesn't work with partitions built on top of LVM. Partitions on LVM would require another level of mapping. Signed-off-by: Andrey Drobyshev --- scripts/filev2p.py | 311 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 311 insertions(+) create mode 100755 scripts/filev2p.py diff --git a/scripts/filev2p.py b/scripts/filev2p.py new file mode 100755 index 0000000000..3bd7d18b5e --- /dev/null +++ b/scripts/filev2p.py @@ -0,0 +1,311 @@ +#!/usr/bin/env python3 +# +# Map file virtual offset to the offset on the underlying block device. +# Works by parsing 'filefrag' output. +# +# Copyright (c) 2024 Virtuozzo International GmbH. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +import argparse +import os +import subprocess +import re +import sys + +from bisect import bisect_right +from collections import namedtuple +from dataclasses import dataclass +from shutil import which +from stat import S_ISBLK + + +Partition = namedtuple('Partition', ['partpath', 'diskpath', 'part_offt']) + + +@dataclass +class Extent: + '''Class representing an individual file extent. + + This is basically a piece of data within the file which is located + consecutively (i.e. not sparsely) on the underlying block device. + ''' + + log_start: int + log_end: int + phys_start: int + phys_end: int + length: int + partition: Partition + + @property + def disk_start(self): + 'Number of the first byte of this extent on the whole disk (/dev/sda)' + return self.partition.part_offt + self.phys_start + + @property + def disk_end(self): + 'Number of the last byte of this extent on the whole disk (/dev/sda)' + return self.partition.part_offt + self.phys_end + + def __str__(self): + ischunk = self.log_end > self.log_start + maybe_end = lambda s: f'..{s}' if ischunk else '' + return '%s%s (file) -> %s%s (%s) -> %s%s (%s)' % ( + self.log_start, maybe_end(self.log_end), + self.phys_start, maybe_end(self.phys_end), self.partition.partpath, + self.disk_start, maybe_end(self.disk_end), self.partition.diskpath + ) + + @classmethod + def ext_slice(cls, bigger_ext, start, end): + '''Constructor for the Extent class from a bigger extent. + + Return Extent instance which is a slice of @bigger_ext contained + within the range [start, end]. + ''' + + assert start >= bigger_ext.log_start + assert end <= bigger_ext.log_end + + if start == bigger_ext.log_start and end == bigger_ext.log_end: + return bigger_ext + + phys_start = bigger_ext.phys_start + (start - bigger_ext.log_start) + phys_end = bigger_ext.phys_end - (bigger_ext.log_end - end) + length = end - start + 1 + + return cls(start, end, phys_start, phys_end, length, + bigger_ext.partition) + + +def run_cmd(cmd: str) -> str: + '''Wrapper around subprocess.run. + + Returns stdout in case of success, emits en error and exits in case + of failure. + ''' + + proc = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, + check=False, shell=True) + if proc.stderr is not None: + stderr = f'\n{proc.stderr.decode().strip()}' + else: + stderr = '' + + if proc.returncode: + sys.exit(f'Error: Command "{cmd}" returned {proc.returncode}:{stderr}') + + return proc.stdout.decode().strip() + + +def parse_size(offset: str) -> int: + 'Convert human readable size to bytes' + + suffixes = { + **dict.fromkeys(['k', 'K', 'Kb', 'KB', 'KiB'], 2 ** 10), + **dict.fromkeys(['m', 'M', 'Mb', 'MB', 'MiB'], 2 ** 20), + **dict.fromkeys(['g', 'G', 'Gb', 'GB', 'GiB'], 2 ** 30), + **dict.fromkeys( ['T', 'Tb', 'TB', 'TiB'], 2 ** 40), + **dict.fromkeys([''], 1) + } + + sizematch = re.match(r'^([0-9]+)\s*([a-zA-Z]*)$', offset) + if not bool(sizematch): + sys.exit(f'Error: Couldn\'t parse size "{offset}". Pass offset ' + 'either in bytes or in format 1K, 2M, 3G') + + num, suff = sizematch.groups() + num = int(num) + + mult = suffixes.get(suff) + if mult is None: + sys.exit(f'Error: Couldn\'t parse size "{offset}": ' + f'unknown suffix {suff}') + + return num * mult + + +def fpath2part(filename: str) -> str: + 'Get partition on which @filename is located (i.e. /dev/sda1).' + + partpath = run_cmd(f'df --output=source {filename} | tail -n+2') + if not os.path.exists(partpath) or not S_ISBLK(os.stat(partpath).st_mode): + sys.exit(f'Error: file {filename} is located on {partpath} which ' + 'isn\'t a block device') + return partpath + + +def part2dev(partpath: str, filename: str) -> str: + 'Get block device on which @partpath is located (i.e. /dev/sda).' + dev = run_cmd(f'lsblk -no PKNAME {partpath}') + diskpath = f'/dev/{dev}' + if not os.path.exists(diskpath) or not S_ISBLK(os.stat(diskpath).st_mode): + sys.exit(f'Error: file {filename} is located on {diskpath} which ' + 'isn\'t a block device') + return diskpath + + +def part2disktype(partpath: str) -> str: + 'Parse /proc/devices and get block device type for @partpath' + + major = os.major(os.stat(partpath).st_rdev) + assert major + with open('/proc/devices', encoding='utf-8') as devf: + for line in reversed(list(devf)): + # Our major cannot be absent among block devs + if line.startswith('Block'): + break + devmajor, devtype = line.strip().split() + if int(devmajor) == major: + return devtype + + sys.exit('Error: We haven\'t found major {major} in /proc/devices, ' + 'and that can\'t be') + + +def get_part_offset(part: str, disk: str) -> int: + 'Get offset in bytes of the partition @part on the block device @disk.' + + lines = run_cmd(f'fdisk -l {disk} | egrep "^(Units|{part})"').splitlines() + + unitmatch = re.match('^.* = ([0-9]+) bytes$', lines[0]) + if not bool(unitmatch): + sys.exit(f'Error: Couldn\'t parse "fdisk -l" output:\n{lines[0]}') + secsize = int(unitmatch.group(1)) + + part_offt = int(lines[1].split()[1]) + return part_offt * secsize + + +def parse_frag_line(line: str, partition: Partition) -> Extent: + 'Construct Extent instance from a "filefrag" output line.' + + nums = [int(n) for n in re.findall(r'[0-9]+', line)] + + log_start = nums[1] + log_end = nums[2] + phys_start = nums[3] + phys_end = nums[4] + length = nums[5] + + assert log_start < log_end + assert phys_start < phys_end + assert (log_end - log_start + 1) == (phys_end - phys_start + 1) == length + + return Extent(log_start, log_end, phys_start, phys_end, length, partition) + + +def preliminary_checks(args: argparse.Namespace) -> None: + 'A bunch of checks to emit an error and exit at the earlier stage.' + + if which('filefrag') is None: + sys.exit('Error: Program "filefrag" doesn\'t exist') + + if not os.path.exists(args.filename): + sys.exit(f'Error: File {args.filename} doesn\'t exist') + + args.filesize = os.path.getsize(args.filename) + if args.offset >= args.filesize: + sys.exit(f'Error: Specified offset {args.offset} exceeds ' + f'file size {args.filesize}') + if args.size and (args.offset + args.size > args.filesize): + sys.exit(f'Error: Chunk of size {args.size} at offset ' + f'{args.offset} exceeds file size {args.filesize}') + + args.partpath = fpath2part(args.filename) + args.disktype = part2disktype(args.partpath) + if args.disktype not in ('sd', 'virtblk'): + sys.exit(f'Error: Cannot analyze files on {args.disktype} disks') + args.diskpath = part2dev(args.partpath, args.filename) + args.part_offt = get_part_offset(args.partpath, args.diskpath) + + +def get_extent_maps(args: argparse.Namespace) -> list[Extent]: + 'Run "filefrag", parse its output and return a list of Extent instances.' + + lines = run_cmd(f'filefrag -b1 -v {args.filename}').splitlines() + + ffinfo_re = re.compile('.* is ([0-9]+) .*of ([0-9]+) bytes') + ff_size, ff_block = re.match(ffinfo_re, lines[1]).groups() + + # Paranoia checks + if int(ff_size) != args.filesize: + sys.exit('Error: filefrag and os.path.getsize() report different ' + f'sizes: {ff_size} and {args.filesize}') + if int(ff_block) != 1: + sys.exit(f'Error: "filefrag -b1" invoked, but block size is {ff_block}') + + partition = Partition(args.partpath, args.diskpath, args.part_offt) + + # Fill extents list from the output + extents = [] + for line in lines: + if not re.match(r'^\s*[0-9]+:', line): + continue + extents += [parse_frag_line(line, partition)] + + chunk_start = args.offset + chunk_end = args.offset + args.size - 1 + ext_offsets = [ext.log_start for ext in extents] + start_ind = bisect_right(ext_offsets, chunk_start) - 1 + end_ind = bisect_right(ext_offsets, chunk_end) - 1 + + res_extents = extents[start_ind : end_ind + 1] + for i, ext in enumerate(res_extents): + start = max(chunk_start, ext.log_start) + end = min(chunk_end, ext.log_end) + res_extents[i] = Extent.ext_slice(ext, start, end) + + return res_extents + + +def parse_args() -> argparse.Namespace: + 'Define program arguments and parse user input.' + + parser = argparse.ArgumentParser(description=''' +Map file offset to physical offset on the block device + +With --size provided get a list of mappings for the chunk''', + formatter_class=argparse.RawTextHelpFormatter) + + parser.add_argument('filename', type=str, help='filename to process') + parser.add_argument('offset', type=str, + help='logical offset inside the file') + parser.add_argument('-s', '--size', required=False, type=str, + help='size of the file chunk to get offsets for') + args = parser.parse_args() + + args.offset = parse_size(args.offset) + if args.size: + args.size = parse_size(args.size) + else: + # When no chunk size is provided (only offset), it's equivalent to + # chunk size == 1 + args.size = 1 + + return args + + +def main() -> int: + args = parse_args() + preliminary_checks(args) + extents = get_extent_maps(args) + for ext in extents: + print(ext) + + +if __name__ == '__main__': + sys.exit(main())