From patchwork Tue Jul 16 14:41:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Drobyshev X-Patchwork-Id: 13734563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 84B76C3DA59 for ; Tue, 16 Jul 2024 14:42:13 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sTjN6-00037I-K1; Tue, 16 Jul 2024 10:41:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTjN2-0002tB-8V; Tue, 16 Jul 2024 10:41:36 -0400 Received: from relay.virtuozzo.com ([130.117.225.111]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTjMx-0002gm-S6; Tue, 16 Jul 2024 10:41:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=SxnnQcf/vCqlj8mFGAhTZWtk/V3Dt/KLf3GRTXyVbiM=; b=ch3rNlgak6l6 heijFXZhRiAs414CD05guJzUmXTCYHcnB1rIks2kM0FvqGfcyxCzCPs6/ylryHWn1wMAoDP9cdyQ+ XYeMvkLHS1+N4BJ/hyBG61S2YlFuR+qoXI34mhMPu0QgbsBVKT+PDCVRFnUUn6igghyX36KSbUCSk y88w9n/rpwEBVUmvzDf+lyAowA3KATEFb+cDP4xjGblBOsOnB3FYCXWPUiQvocegVMDASs8ofegKl OsE/oQuzyJ6vpYLPinArSrQIgS+BCGtQWpiW8dEhCUfZzZk5CFn1M+VCF7lcDdypUWM44Vv8Kyf1G eu8i8MDpmdBg++fhT9V0xg==; Received: from [130.117.225.1] (helo=dev005.ch-qa.vzint.dev) by relay.virtuozzo.com with esmtp (Exim 4.96) (envelope-from ) id 1sTjM1-00D0sH-36; Tue, 16 Jul 2024 16:41:14 +0200 From: Andrey Drobyshev To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, pbonzini@redhat.com, eesposit@redhat.com, andrey.drobyshev@virtuozzo.com, den@virtuozzo.com Subject: [PATCH v3 1/3] block: zero data data corruption using prealloc-filter Date: Tue, 16 Jul 2024 17:41:21 +0300 Message-Id: <20240716144123.651476-2-andrey.drobyshev@virtuozzo.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com> References: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=130.117.225.111; envelope-from=andrey.drobyshev@virtuozzo.com; helo=relay.virtuozzo.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Denis V. Lunev" We have observed that some clusters in the QCOW2 files are zeroed while preallocation filter is used. We are able to trace down the following sequence when prealloc-filter is used: co=0x55e7cbed7680 qcow2_co_pwritev_task() co=0x55e7cbed7680 preallocate_co_pwritev_part() co=0x55e7cbed7680 handle_write() co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes() co=0x55e7cbed7680 raw_do_pwrite_zeroes() co=0x7f9edb7fe500 do_fallocate() Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine 0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is time to handle next coroutine, which co=0x55e7cbee91b0 qcow2_co_pwritev_task() co=0x55e7cbee91b0 preallocate_co_pwritev_part() co=0x55e7cbee91b0 handle_write() co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes() co=0x55e7cbee91b0 raw_do_pwrite_zeroes() co=0x7f9edb7deb00 do_fallocate() The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for the same area. This means that if (once fallocate is started inside 0x7f9edb7deb00) original fallocate could end and the real write will be executed. In that case write() request is handled at the same time as fallocate(). The patch moves s->file_lock assignment before fallocate and that is crucial. The idea is that all subsequent requests into the area being preallocation will be issued as just writes without fallocate to this area and they will not proceed thanks to overlapping requests mechanics. If preallocation will fail, we will just switch to the normal expand-by-write behavior and that is not a problem except performance. Signed-off-by: Denis V. Lunev Tested-by: Andrey Drobyshev --- block/preallocate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/preallocate.c b/block/preallocate.c index d215bc5d6d..ecf0aa4baa 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, want_merge_zero = want_merge_zero && (prealloc_start <= offset); + /* + * Assign file_end before making actual preallocation. This will ensure + * that next request performed while preallocation is in progress will + * be passed without preallocation. + */ + s->file_end = prealloc_end; + ret = bdrv_co_pwrite_zeroes( bs->file, prealloc_start, prealloc_end - prealloc_start, BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT); @@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes, return false; } - s->file_end = prealloc_end; return want_merge_zero; } From patchwork Tue Jul 16 14:41:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Drobyshev X-Patchwork-Id: 13734565 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 34F02C3DA49 for ; Tue, 16 Jul 2024 14:42:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sTjNE-0003Yn-86; Tue, 16 Jul 2024 10:41:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTjN4-00035g-U9; Tue, 16 Jul 2024 10:41:40 -0400 Received: from relay.virtuozzo.com ([130.117.225.111]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTjMy-0002go-4p; Tue, 16 Jul 2024 10:41:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=zSlK5i+1UayUSDO7XyqWtJIJ6Zp5SiN8t5JDdEYmRm4=; b=SAiE1j7XAVrr rwrO4cvDICGXGIZQRkHW6K1D0H4XnvyLXyhAejMZcRkaE+ZsyCCgX7oSSWjcfEBxF4Rk3nXjZsLrv 5Tl/O+4GjRxgbITcyGM+iIu4QDDyMAlklE9Ia7wylhRxPTPQR1cx0Cne2tfdNyCCKUcm0nsJA4ovh us6H6G41yq7tr5IbbF8TmnxxSlKKpDQpPIyCUUPYROFciP7vcx763bezJSZqQU43N0mLfw/z9ZlnW fkE81jaaWYtP4CP3o8433O/T759pYF68KikIZjfdgUE+ExeRr5rYgkjhWanYSPUFr+3dflDOSBPXu w/NmGYNyk1EtupO8WHRl0A==; Received: from [130.117.225.1] (helo=dev005.ch-qa.vzint.dev) by relay.virtuozzo.com with esmtp (Exim 4.96) (envelope-from ) id 1sTjM2-00D0sH-03; Tue, 16 Jul 2024 16:41:14 +0200 From: Andrey Drobyshev To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, pbonzini@redhat.com, eesposit@redhat.com, andrey.drobyshev@virtuozzo.com, den@virtuozzo.com Subject: [PATCH v3 2/3] iotests/298: add testcase for async writes with preallocation filter Date: Tue, 16 Jul 2024 17:41:22 +0300 Message-Id: <20240716144123.651476-3-andrey.drobyshev@virtuozzo.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com> References: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=130.117.225.111; envelope-from=andrey.drobyshev@virtuozzo.com; helo=relay.virtuozzo.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The testcase simply creates a 64G image with 1M clusters, generates a list of 1M aligned offsets and feeds aio_write commands with those offsets to qemu-io run with '--aio native --nocache'. Then we check the data written at each of the offsets. Before the previous commit this could result into a race within the preallocation filter which would zeroize some clusters after actually writing data to them. Note: the test doesn't fail in 100% cases as there's a race involved, but the failures are pretty consistent so it should be good enough for detecting the problem. Signed-off-by: Andrey Drobyshev --- tests/qemu-iotests/298 | 49 ++++++++++++++++++++++++++++++++++++++ tests/qemu-iotests/298.out | 4 ++-- 2 files changed, 51 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298 index 09c9290711..b7126e9e15 100755 --- a/tests/qemu-iotests/298 +++ b/tests/qemu-iotests/298 @@ -20,8 +20,10 @@ import os import iotests +import random MiB = 1024 * 1024 +GiB = MiB * 1024 disk = os.path.join(iotests.test_dir, 'disk') overlay = os.path.join(iotests.test_dir, 'overlay') refdisk = os.path.join(iotests.test_dir, 'refdisk') @@ -176,5 +178,52 @@ class TestTruncate(iotests.QMPTestCase): self.do_test('off', '150M') +class TestPreallocAsyncWrites(iotests.QMPTestCase): + def setUp(self): + # Make sure we get reproducible write patterns on each run + random.seed(42) + iotests.qemu_img_create('-f', iotests.imgfmt, disk, '-o', + f'cluster_size={MiB},lazy_refcounts=on', + str(64 * GiB)) + + def tearDown(self): + os.remove(disk) + + def test_prealloc_async_writes(self): + def gen_write_pattern(): + n = 0 + while True: + yield '-P 0xaa' if n else '-z' + n = 1 - n + + def gen_read_pattern(): + n = 0 + while True: + yield '-P 0xaa' if n else '-P 0x00' + n = 1 - n + + requests = 2048 # Number of write/read requests to feed to qemu-io + total_clusters = 64 * 1024 # 64G / 1M + + wpgen = gen_write_pattern() + rpgen = gen_read_pattern() + + offsets = random.sample(range(0, total_clusters), requests) + aio_write_cmds = [f'aio_write {next(wpgen)} {off}M 1M' for off in offsets] + read_cmds = [f'read {next(rpgen)} {off}M 1M' for off in offsets] + + proc = iotests.QemuIoInteractive('--aio', 'native', '--nocache', + '--image-opts', drive_opts) + for cmd in aio_write_cmds: + proc.cmd(cmd) + proc.close() + + proc = iotests.QemuIoInteractive('-f', iotests.imgfmt, disk) + for cmd in read_cmds: + out = proc.cmd(cmd) + self.assertFalse('Pattern verification failed' in str(out)) + proc.close() + + if __name__ == '__main__': iotests.main(supported_fmts=['qcow2'], required_fmts=['preallocate']) diff --git a/tests/qemu-iotests/298.out b/tests/qemu-iotests/298.out index fa16b5ccef..6323079e08 100644 --- a/tests/qemu-iotests/298.out +++ b/tests/qemu-iotests/298.out @@ -1,5 +1,5 @@ -............. +.............. ---------------------------------------------------------------------- -Ran 13 tests +Ran 14 tests OK From patchwork Tue Jul 16 14:41:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Drobyshev X-Patchwork-Id: 13734564 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8A703C3DA49 for ; Tue, 16 Jul 2024 14:42:16 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sTjNA-0003KY-AL; Tue, 16 Jul 2024 10:41:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTjN3-0002zx-KE; Tue, 16 Jul 2024 10:41:38 -0400 Received: from relay.virtuozzo.com ([130.117.225.111]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sTjMx-0002gn-Cz; Tue, 16 Jul 2024 10:41:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=Sw82UZxyldCsryWuHDMLPvXTfZcc5DZM9+qg13v3L14=; b=Ep9b3bU02aUO LVookE1vmSfiPxMuHqi4sjD2gWSOk4/Bqp+qDePox3UsW7GTAIHkzcKSvg62QC/Z1ot6gd3EphgTo CKjw4zqigUzx70kgOEikC2n1QNwxRgz7m7l+aJ3onwcTXPTONlfnPb+XnWbdUOnIiNYad8nlOUDt1 ZYRXcVaI0wItRcaTqBEf9pwCcLDF7jOoqV3sxx8GrlrJDQWJ64my/1ca8msrt1ocopy/g7yGVJLBO QkJd1F92n8lX7mCwSbchD0yBhfNn8DpSVpbWUp5+DnnV2W61BvPcEaMg2u7NGxBpie5s9Kf+L56ct JN3fPUiU2HprspyqiYxvPg==; Received: from [130.117.225.1] (helo=dev005.ch-qa.vzint.dev) by relay.virtuozzo.com with esmtp (Exim 4.96) (envelope-from ) id 1sTjM2-00D0sH-06; Tue, 16 Jul 2024 16:41:14 +0200 From: Andrey Drobyshev To: qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com, vsementsov@yandex-team.ru, pbonzini@redhat.com, eesposit@redhat.com, andrey.drobyshev@virtuozzo.com, den@virtuozzo.com Subject: [PATCH v3 3/3] scripts: add filev2p.py script for mapping virtual file offsets mapping Date: Tue, 16 Jul 2024 17:41:23 +0300 Message-Id: <20240716144123.651476-4-andrey.drobyshev@virtuozzo.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com> References: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=130.117.225.111; envelope-from=andrey.drobyshev@virtuozzo.com; helo=relay.virtuozzo.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The script is basically a wrapper around "filefrag" utility. This might be used to map virtual offsets within the file to the underlying block device offsets. In addition, a chunk size might be specified, in which case a list of such mappings will be obtained: $ scripts/filev2p.py -s 100M /sparsefile 1768M 1853882368..1895825407 (file) -> 16332619776..16374562815 (/dev/sda4) -> 84492156928..84534099967 (/dev/sda) 1895825408..1958739967 (file) -> 17213591552..17276506111 (/dev/sda4) -> 85373128704..85436043263 (/dev/sda) This could come in handy when we need to map a certain piece of data within a file inside VM to the same data within the image on the host (e.g. physical offset on VM's /dev/sda would be the virtual offset within QCOW2 image). Note: as of now the script only works with the files located on plain partitions, i.e. it doesn't work with partitions built on top of LVM. Partitions on LVM would require another level of mapping. Signed-off-by: Andrey Drobyshev --- scripts/filev2p.py | 311 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 311 insertions(+) create mode 100755 scripts/filev2p.py diff --git a/scripts/filev2p.py b/scripts/filev2p.py new file mode 100755 index 0000000000..3bd7d18b5e --- /dev/null +++ b/scripts/filev2p.py @@ -0,0 +1,311 @@ +#!/usr/bin/env python3 +# +# Map file virtual offset to the offset on the underlying block device. +# Works by parsing 'filefrag' output. +# +# Copyright (c) 2024 Virtuozzo International GmbH. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +import argparse +import os +import subprocess +import re +import sys + +from bisect import bisect_right +from collections import namedtuple +from dataclasses import dataclass +from shutil import which +from stat import S_ISBLK + + +Partition = namedtuple('Partition', ['partpath', 'diskpath', 'part_offt']) + + +@dataclass +class Extent: + '''Class representing an individual file extent. + + This is basically a piece of data within the file which is located + consecutively (i.e. not sparsely) on the underlying block device. + ''' + + log_start: int + log_end: int + phys_start: int + phys_end: int + length: int + partition: Partition + + @property + def disk_start(self): + 'Number of the first byte of this extent on the whole disk (/dev/sda)' + return self.partition.part_offt + self.phys_start + + @property + def disk_end(self): + 'Number of the last byte of this extent on the whole disk (/dev/sda)' + return self.partition.part_offt + self.phys_end + + def __str__(self): + ischunk = self.log_end > self.log_start + maybe_end = lambda s: f'..{s}' if ischunk else '' + return '%s%s (file) -> %s%s (%s) -> %s%s (%s)' % ( + self.log_start, maybe_end(self.log_end), + self.phys_start, maybe_end(self.phys_end), self.partition.partpath, + self.disk_start, maybe_end(self.disk_end), self.partition.diskpath + ) + + @classmethod + def ext_slice(cls, bigger_ext, start, end): + '''Constructor for the Extent class from a bigger extent. + + Return Extent instance which is a slice of @bigger_ext contained + within the range [start, end]. + ''' + + assert start >= bigger_ext.log_start + assert end <= bigger_ext.log_end + + if start == bigger_ext.log_start and end == bigger_ext.log_end: + return bigger_ext + + phys_start = bigger_ext.phys_start + (start - bigger_ext.log_start) + phys_end = bigger_ext.phys_end - (bigger_ext.log_end - end) + length = end - start + 1 + + return cls(start, end, phys_start, phys_end, length, + bigger_ext.partition) + + +def run_cmd(cmd: str) -> str: + '''Wrapper around subprocess.run. + + Returns stdout in case of success, emits en error and exits in case + of failure. + ''' + + proc = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, + check=False, shell=True) + if proc.stderr is not None: + stderr = f'\n{proc.stderr.decode().strip()}' + else: + stderr = '' + + if proc.returncode: + sys.exit(f'Error: Command "{cmd}" returned {proc.returncode}:{stderr}') + + return proc.stdout.decode().strip() + + +def parse_size(offset: str) -> int: + 'Convert human readable size to bytes' + + suffixes = { + **dict.fromkeys(['k', 'K', 'Kb', 'KB', 'KiB'], 2 ** 10), + **dict.fromkeys(['m', 'M', 'Mb', 'MB', 'MiB'], 2 ** 20), + **dict.fromkeys(['g', 'G', 'Gb', 'GB', 'GiB'], 2 ** 30), + **dict.fromkeys( ['T', 'Tb', 'TB', 'TiB'], 2 ** 40), + **dict.fromkeys([''], 1) + } + + sizematch = re.match(r'^([0-9]+)\s*([a-zA-Z]*)$', offset) + if not bool(sizematch): + sys.exit(f'Error: Couldn\'t parse size "{offset}". Pass offset ' + 'either in bytes or in format 1K, 2M, 3G') + + num, suff = sizematch.groups() + num = int(num) + + mult = suffixes.get(suff) + if mult is None: + sys.exit(f'Error: Couldn\'t parse size "{offset}": ' + f'unknown suffix {suff}') + + return num * mult + + +def fpath2part(filename: str) -> str: + 'Get partition on which @filename is located (i.e. /dev/sda1).' + + partpath = run_cmd(f'df --output=source {filename} | tail -n+2') + if not os.path.exists(partpath) or not S_ISBLK(os.stat(partpath).st_mode): + sys.exit(f'Error: file {filename} is located on {partpath} which ' + 'isn\'t a block device') + return partpath + + +def part2dev(partpath: str, filename: str) -> str: + 'Get block device on which @partpath is located (i.e. /dev/sda).' + dev = run_cmd(f'lsblk -no PKNAME {partpath}') + diskpath = f'/dev/{dev}' + if not os.path.exists(diskpath) or not S_ISBLK(os.stat(diskpath).st_mode): + sys.exit(f'Error: file {filename} is located on {diskpath} which ' + 'isn\'t a block device') + return diskpath + + +def part2disktype(partpath: str) -> str: + 'Parse /proc/devices and get block device type for @partpath' + + major = os.major(os.stat(partpath).st_rdev) + assert major + with open('/proc/devices', encoding='utf-8') as devf: + for line in reversed(list(devf)): + # Our major cannot be absent among block devs + if line.startswith('Block'): + break + devmajor, devtype = line.strip().split() + if int(devmajor) == major: + return devtype + + sys.exit('Error: We haven\'t found major {major} in /proc/devices, ' + 'and that can\'t be') + + +def get_part_offset(part: str, disk: str) -> int: + 'Get offset in bytes of the partition @part on the block device @disk.' + + lines = run_cmd(f'fdisk -l {disk} | egrep "^(Units|{part})"').splitlines() + + unitmatch = re.match('^.* = ([0-9]+) bytes$', lines[0]) + if not bool(unitmatch): + sys.exit(f'Error: Couldn\'t parse "fdisk -l" output:\n{lines[0]}') + secsize = int(unitmatch.group(1)) + + part_offt = int(lines[1].split()[1]) + return part_offt * secsize + + +def parse_frag_line(line: str, partition: Partition) -> Extent: + 'Construct Extent instance from a "filefrag" output line.' + + nums = [int(n) for n in re.findall(r'[0-9]+', line)] + + log_start = nums[1] + log_end = nums[2] + phys_start = nums[3] + phys_end = nums[4] + length = nums[5] + + assert log_start < log_end + assert phys_start < phys_end + assert (log_end - log_start + 1) == (phys_end - phys_start + 1) == length + + return Extent(log_start, log_end, phys_start, phys_end, length, partition) + + +def preliminary_checks(args: argparse.Namespace) -> None: + 'A bunch of checks to emit an error and exit at the earlier stage.' + + if which('filefrag') is None: + sys.exit('Error: Program "filefrag" doesn\'t exist') + + if not os.path.exists(args.filename): + sys.exit(f'Error: File {args.filename} doesn\'t exist') + + args.filesize = os.path.getsize(args.filename) + if args.offset >= args.filesize: + sys.exit(f'Error: Specified offset {args.offset} exceeds ' + f'file size {args.filesize}') + if args.size and (args.offset + args.size > args.filesize): + sys.exit(f'Error: Chunk of size {args.size} at offset ' + f'{args.offset} exceeds file size {args.filesize}') + + args.partpath = fpath2part(args.filename) + args.disktype = part2disktype(args.partpath) + if args.disktype not in ('sd', 'virtblk'): + sys.exit(f'Error: Cannot analyze files on {args.disktype} disks') + args.diskpath = part2dev(args.partpath, args.filename) + args.part_offt = get_part_offset(args.partpath, args.diskpath) + + +def get_extent_maps(args: argparse.Namespace) -> list[Extent]: + 'Run "filefrag", parse its output and return a list of Extent instances.' + + lines = run_cmd(f'filefrag -b1 -v {args.filename}').splitlines() + + ffinfo_re = re.compile('.* is ([0-9]+) .*of ([0-9]+) bytes') + ff_size, ff_block = re.match(ffinfo_re, lines[1]).groups() + + # Paranoia checks + if int(ff_size) != args.filesize: + sys.exit('Error: filefrag and os.path.getsize() report different ' + f'sizes: {ff_size} and {args.filesize}') + if int(ff_block) != 1: + sys.exit(f'Error: "filefrag -b1" invoked, but block size is {ff_block}') + + partition = Partition(args.partpath, args.diskpath, args.part_offt) + + # Fill extents list from the output + extents = [] + for line in lines: + if not re.match(r'^\s*[0-9]+:', line): + continue + extents += [parse_frag_line(line, partition)] + + chunk_start = args.offset + chunk_end = args.offset + args.size - 1 + ext_offsets = [ext.log_start for ext in extents] + start_ind = bisect_right(ext_offsets, chunk_start) - 1 + end_ind = bisect_right(ext_offsets, chunk_end) - 1 + + res_extents = extents[start_ind : end_ind + 1] + for i, ext in enumerate(res_extents): + start = max(chunk_start, ext.log_start) + end = min(chunk_end, ext.log_end) + res_extents[i] = Extent.ext_slice(ext, start, end) + + return res_extents + + +def parse_args() -> argparse.Namespace: + 'Define program arguments and parse user input.' + + parser = argparse.ArgumentParser(description=''' +Map file offset to physical offset on the block device + +With --size provided get a list of mappings for the chunk''', + formatter_class=argparse.RawTextHelpFormatter) + + parser.add_argument('filename', type=str, help='filename to process') + parser.add_argument('offset', type=str, + help='logical offset inside the file') + parser.add_argument('-s', '--size', required=False, type=str, + help='size of the file chunk to get offsets for') + args = parser.parse_args() + + args.offset = parse_size(args.offset) + if args.size: + args.size = parse_size(args.size) + else: + # When no chunk size is provided (only offset), it's equivalent to + # chunk size == 1 + args.size = 1 + + return args + + +def main() -> int: + args = parse_args() + preliminary_checks(args) + extents = get_extent_maps(args) + for ext in extents: + print(ext) + + +if __name__ == '__main__': + sys.exit(main())