mbox series

[-next,0/2] block: fix scan partition for exclusively open device again

Message ID 20230217022200.3092987-1-yukuai1@huaweicloud.com (mailing list archive)
Headers show
Series block: fix scan partition for exclusively open device again | expand

Message

Yu Kuai Feb. 17, 2023, 2:21 a.m. UTC
From: Yu Kuai <yukuai3@huawei.com>

Changes from RFC:
 - remove the patch to factor out GD_NEED_PART_SCAN

Yu Kuai (2):
  block: Revert "block: Do not reread partition table on exclusively
    open device"
  block: fix scan partition for exclusively open device again

 block/blk.h   |  2 +-
 block/genhd.c | 37 ++++++++++++++++++++++++++++---------
 block/ioctl.c | 13 ++++++-------
 3 files changed, 35 insertions(+), 17 deletions(-)

Comments

Jens Axboe Feb. 17, 2023, 1:16 p.m. UTC | #1
On Fri, 17 Feb 2023 10:21:58 +0800, Yu Kuai wrote:
> Changes from RFC:
>  - remove the patch to factor out GD_NEED_PART_SCAN
> 
> Yu Kuai (2):
>   block: Revert "block: Do not reread partition table on exclusively
>     open device"
>   block: fix scan partition for exclusively open device again
> 
> [...]

Applied, thanks!

[1/2] block: Revert "block: Do not reread partition table on exclusively open device"
      commit: 0f77b29ad14e34a89961f32edc87b92db623bb37
[2/2] block: fix scan partition for exclusively open device again
      commit: e5cfefa97bccf956ea0bb6464c1f6c84fd7a8d9f

Best regards,
Ming Lei March 21, 2023, 11:43 a.m. UTC | #2
On Fri, Feb 17, 2023 at 10:21:58AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes from RFC:
>  - remove the patch to factor out GD_NEED_PART_SCAN
> 
> Yu Kuai (2):
>   block: Revert "block: Do not reread partition table on exclusively
>     open device"
>   block: fix scan partition for exclusively open device again

Hi Yu kuai,

Looks the original issue starts to re-appear now with the two patches:

https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/

And underlying disk partition and raid partition can be observed at the
same time.

Can you take a look?

Follows the script, which isn't 100% triggered, but still easy.

#create level 1 with 2 devices, meta 1.0
mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0

#create partition 0, start from 0 sector, size 100MiB
sgdisk -n 0:0:+100MiB /dev/md0

#observe partitions
cat /proc/partitions

#stop the array
mdadm -S /dev/md0

#re-assemble 
mdadm -A /dev/md0 /dev/sda /dev/sdb
cat /proc/partitions


Thanks,
Ming
Yu Kuai March 22, 2023, 1:26 a.m. UTC | #3
Hi,

在 2023/03/21 19:43, Ming Lei 写道:
> On Fri, Feb 17, 2023 at 10:21:58AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Changes from RFC:
>>   - remove the patch to factor out GD_NEED_PART_SCAN
>>
>> Yu Kuai (2):
>>    block: Revert "block: Do not reread partition table on exclusively
>>      open device"
>>    block: fix scan partition for exclusively open device again
> 
> Hi Yu kuai,
> 
> Looks the original issue starts to re-appear now with the two patches:
> 
> https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/
> 
> And underlying disk partition and raid partition can be observed at the
> same time.
> 
> Can you take a look?
Yes, thanks for the report. I realize that sda1 adn sdb1 is created
while raid open sda and sdb excl, and I think this problem should exist
before this patchset.

And I verify this with following test:

1) mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0
2) sgdisk -n 0:0:+100MiB /dev/md0
3) mdadm -S /dev/md0
# scan partitions of sda
4) blockdev --rereadpt /dev/sda

Then sda1 is created.

I'm not sure how to fix this yet
Ming Lei March 22, 2023, 1:34 a.m. UTC | #4
On Wed, Mar 22, 2023 at 09:26:07AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/03/21 19:43, Ming Lei 写道:
> > On Fri, Feb 17, 2023 at 10:21:58AM +0800, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@huawei.com>
> > > 
> > > Changes from RFC:
> > >   - remove the patch to factor out GD_NEED_PART_SCAN
> > > 
> > > Yu Kuai (2):
> > >    block: Revert "block: Do not reread partition table on exclusively
> > >      open device"
> > >    block: fix scan partition for exclusively open device again
> > 
> > Hi Yu kuai,
> > 
> > Looks the original issue starts to re-appear now with the two patches:
> > 
> > https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/
> > 
> > And underlying disk partition and raid partition can be observed at the
> > same time.
> > 
> > Can you take a look?
> Yes, thanks for the report. I realize that sda1 adn sdb1 is created
> while raid open sda and sdb excl, and I think this problem should exist
> before this patchset.

Looks not reproduced before applying your two patches, that is exactly what Jan
tried to fix with 36369f46e917 ("block: Do not reread partition table on exclusively open device").

The issue is reported by Changhui's block regression test.


Thanks, 
Ming
Yu Kuai March 22, 2023, 2:02 a.m. UTC | #5
Hi,

在 2023/03/22 9:34, Ming Lei 写道:
> On Wed, Mar 22, 2023 at 09:26:07AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/03/21 19:43, Ming Lei 写道:
>>> On Fri, Feb 17, 2023 at 10:21:58AM +0800, Yu Kuai wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> Changes from RFC:
>>>>    - remove the patch to factor out GD_NEED_PART_SCAN
>>>>
>>>> Yu Kuai (2):
>>>>     block: Revert "block: Do not reread partition table on exclusively
>>>>       open device"
>>>>     block: fix scan partition for exclusively open device again
>>>
>>> Hi Yu kuai,
>>>
>>> Looks the original issue starts to re-appear now with the two patches:
>>>
>>> https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/
>>>
>>> And underlying disk partition and raid partition can be observed at the
>>> same time.
>>>
>>> Can you take a look?
>> Yes, thanks for the report. I realize that sda1 adn sdb1 is created
>> while raid open sda and sdb excl, and I think this problem should exist
>> before this patchset.
> 
> Looks not reproduced before applying your two patches, that is exactly what Jan
> tried to fix with 36369f46e917 ("block: Do not reread partition table on exclusively open device").

Hi, Ming

I just tried your test with this patchset reverted, and I can still
reporduce the problem myself.

raid only open this device excl, and disk_scan_partitions is not called:

md_import_device
  blkdev_get_by_devo

I need to add some debuginfo to figure out how GD_NEED_PART_SCAN is set
for sda after raid is stopped. And this should explain why sda1 is
created.

Thanks,
Kuai
> 
> The issue is reported by Changhui's block regression test.
> 
> 
> Thanks,
> Ming
> 
> 
> .
>
Yu Kuai March 22, 2023, 2:15 a.m. UTC | #6
Hi,

在 2023/03/22 10:02, Yu Kuai 写道:
> Hi,
> 
> 在 2023/03/22 9:34, Ming Lei 写道:
>> On Wed, Mar 22, 2023 at 09:26:07AM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2023/03/21 19:43, Ming Lei 写道:
>>>> On Fri, Feb 17, 2023 at 10:21:58AM +0800, Yu Kuai wrote:
>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>
>>>>> Changes from RFC:
>>>>>    - remove the patch to factor out GD_NEED_PART_SCAN
>>>>>
>>>>> Yu Kuai (2):
>>>>>     block: Revert "block: Do not reread partition table on exclusively
>>>>>       open device"
>>>>>     block: fix scan partition for exclusively open device again
>>>>
>>>> Hi Yu kuai,
>>>>
>>>> Looks the original issue starts to re-appear now with the two patches:
>>>>
>>>> https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/ 
>>>>
>>>>
>>>> And underlying disk partition and raid partition can be observed at the
>>>> same time.
>>>>
>>>> Can you take a look?
>>> Yes, thanks for the report. I realize that sda1 adn sdb1 is created
>>> while raid open sda and sdb excl, and I think this problem should exist
>>> before this patchset.
>>
>> Looks not reproduced before applying your two patches, that is exactly 
>> what Jan
>> tried to fix with 36369f46e917 ("block: Do not reread partition table 
>> on exclusively open device").
> 
> Hi, Ming
> 
> I just tried your test with this patchset reverted, and I can still
> reporduce the problem myself.

Oops, I forgot to revert the first patch. It's right the problem can't
be reporduced.
> 
> raid only open this device excl, and disk_scan_partitions is not called:
> 
> md_import_device
>   blkdev_get_by_devo
> 
> I need to add some debuginfo to figure out how GD_NEED_PART_SCAN is set
> for sda after raid is stopped. And this should explain why sda1 is
> created.

I found how GD_NEED_PART_SCAN is set now, in patch 2, this is set before
bd_prepare_to_claim, so preciously faild part scan will still set this
bit, and following patch shold fix this problem:

diff --git a/block/genhd.c b/block/genhd.c
index 08bb1a9ec22c..2487c9452b94 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -368,7 +368,6 @@ int disk_scan_partitions(struct gendisk *disk, 
fmode_t mode)
         if (disk->open_partitions)
                 return -EBUSY;

-       set_bit(GD_NEED_PART_SCAN, &disk->state);
         /*
          * If the device is opened exclusively by current thread 
already, it's
          * safe to scan partitons, otherwise, use bd_prepare_to_claim() to
@@ -381,6 +380,7 @@ int disk_scan_partitions(struct gendisk *disk, 
fmode_t mode)
                         return ret;
         }

+       set_bit(GD_NEED_PART_SCAN, &disk->state);
         bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, 
NULL);
         if (IS_ERR(bdev))
                 ret =  PTR_ERR(bdev);

> 
> Thanks,
> Kuai
>>
>> The issue is reported by Changhui's block regression test.
>>
>>
>> Thanks,
>> Ming
>>
>>
>> .
>>
> 
> 
> .
>
Ming Lei March 22, 2023, 3:38 a.m. UTC | #7
On Wed, Mar 22, 2023 at 10:15:30AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2023/03/22 10:02, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2023/03/22 9:34, Ming Lei 写道:
> > > On Wed, Mar 22, 2023 at 09:26:07AM +0800, Yu Kuai wrote:
> > > > Hi,
> > > > 
> > > > 在 2023/03/21 19:43, Ming Lei 写道:
> > > > > On Fri, Feb 17, 2023 at 10:21:58AM +0800, Yu Kuai wrote:
> > > > > > From: Yu Kuai <yukuai3@huawei.com>
> > > > > > 
> > > > > > Changes from RFC:
> > > > > >    - remove the patch to factor out GD_NEED_PART_SCAN
> > > > > > 
> > > > > > Yu Kuai (2):
> > > > > >     block: Revert "block: Do not reread partition table on exclusively
> > > > > >       open device"
> > > > > >     block: fix scan partition for exclusively open device again
> > > > > 
> > > > > Hi Yu kuai,
> > > > > 
> > > > > Looks the original issue starts to re-appear now with the two patches:
> > > > > 
> > > > > https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/
> > > > > 
> > > > > 
> > > > > And underlying disk partition and raid partition can be observed at the
> > > > > same time.
> > > > > 
> > > > > Can you take a look?
> > > > Yes, thanks for the report. I realize that sda1 adn sdb1 is created
> > > > while raid open sda and sdb excl, and I think this problem should exist
> > > > before this patchset.
> > > 
> > > Looks not reproduced before applying your two patches, that is
> > > exactly what Jan
> > > tried to fix with 36369f46e917 ("block: Do not reread partition
> > > table on exclusively open device").
> > 
> > Hi, Ming
> > 
> > I just tried your test with this patchset reverted, and I can still
> > reporduce the problem myself.
> 
> Oops, I forgot to revert the first patch. It's right the problem can't
> be reporduced.
> > 
> > raid only open this device excl, and disk_scan_partitions is not called:
> > 
> > md_import_device
> >   blkdev_get_by_devo
> > 
> > I need to add some debuginfo to figure out how GD_NEED_PART_SCAN is set
> > for sda after raid is stopped. And this should explain why sda1 is
> > created.
> 
> I found how GD_NEED_PART_SCAN is set now, in patch 2, this is set before
> bd_prepare_to_claim, so preciously faild part scan will still set this
> bit, and following patch shold fix this problem:

Just run quick test, the issue won't be reproduced with your patch, and
the change looks rational too,

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming
Yu Kuai March 22, 2023, 4 a.m. UTC | #8
Hi,

在 2023/03/22 11:38, Ming Lei 写道:
>>>>>> Hi Yu kuai,
>>>>>>
>>>>>> Looks the original issue starts to re-appear now with the two patches:
>>>>>>
>>>>>> https://lore.kernel.org/linux-block/20221130135344.2ul4cyfstfs3znxg@quack3/
>>>>>>
>>>>>>
>>>>>> And underlying disk partition and raid partition can be observed at the
>>>>>> same time.
>>>>>>
>>>>>> Can you take a look?
>>>>> Yes, thanks for the report. I realize that sda1 adn sdb1 is created
>>>>> while raid open sda and sdb excl, and I think this problem should exist
>>>>> before this patchset.
>>>>
>>>> Looks not reproduced before applying your two patches, that is
>>>> exactly what Jan
>>>> tried to fix with 36369f46e917 ("block: Do not reread partition
>>>> table on exclusively open device").
>>>
>>> Hi, Ming
>>>
>>> I just tried your test with this patchset reverted, and I can still
>>> reporduce the problem myself.
>>
>> Oops, I forgot to revert the first patch. It's right the problem can't
>> be reporduced.
>>>
>>> raid only open this device excl, and disk_scan_partitions is not called:
>>>
>>> md_import_device
>>>    blkdev_get_by_devo
>>>
>>> I need to add some debuginfo to figure out how GD_NEED_PART_SCAN is set
>>> for sda after raid is stopped. And this should explain why sda1 is
>>> created.
>>
>> I found how GD_NEED_PART_SCAN is set now, in patch 2, this is set before
>> bd_prepare_to_claim, so preciously faild part scan will still set this
>> bit, and following patch shold fix this problem:
> 
> Just run quick test, the issue won't be reproduced with your patch, and
> the change looks rational too,
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> 

Thanks for the test and review, I just do some additional change to
clear GD_NEED_PART_SCAN, I will send a patch, can you take a look?

Kuai
> 
> Thanks,
> Ming
> 
> .
>