Message ID | 20221130175653.24299-1-jack@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Do not reread partition table on exclusively open device | expand |
On 11/30/22 10:56 AM, Jan Kara wrote: > Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in > blk_drop_partitions") we allow rereading of partition table although > there are users of the block device. This has an undesirable consequence > that e.g. if sda and sdb are assembled to a RAID1 device md0 with > partitions, BLKRRPART ioctl on sda will rescan partition table and > create sda1 device. This partition device under a raid device confuses > some programs (such as libstorage-ng used for initial partitioning for > distribution installation) leading to failures. > > Fix the problem refusing to rescan partitions if there is another user > that has the block device exclusively open. This looks nice and clean to me. Was pondering whether to queue this up for 6.1, but it's old enough that I think we should just funnel it through 6.2 and mark it stable.
On Wed, 30 Nov 2022 18:56:53 +0100, Jan Kara wrote: > Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in > blk_drop_partitions") we allow rereading of partition table although > there are users of the block device. This has an undesirable consequence > that e.g. if sda and sdb are assembled to a RAID1 device md0 with > partitions, BLKRRPART ioctl on sda will rescan partition table and > create sda1 device. This partition device under a raid device confuses > some programs (such as libstorage-ng used for initial partitioning for > distribution installation) leading to failures. > > [...] Applied, thanks! [1/1] block: Do not reread partition table on exclusively open device commit: 8d67fc20caf8b08ef6c90a04970846a950d46dd1 Best regards,
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Hello! I'm sorry but this patch has a brownpaper bag bug because BLKRRPART now bails also if there is no exlusive opener (which didn't have any adverse effects on my test system because the BLKRRPART calls issued by systemd-udev are mostly a pointless exercise as I've found out). I'll send a fixup patch. Either fold it into this patch or just add on top. Thanks and I'm sorry for the trouble! Honza On Wed 30-11-22 23:10:02, Christoph Hellwig wrote: > Looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de>
Hi, Jan 在 2022/12/01 1:56, Jan Kara 写道: > -int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) > { > struct block_device *bdev; > > @@ -366,6 +366,9 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) > return -EINVAL; > if (disk->open_partitions) > return -EBUSY; > + /* Someone else has bdev exclusively open? */ > + if (disk->part0->bd_holder != owner) > + return -EBUSY; > > set_bit(GD_NEED_PART_SCAN, &disk->state); > bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); It just by code review, but I think the above checking should be inside open_mutex, which is used to protect bdev claming. Otherwise there can be race that this check is pass while someone else exclusive open the disk before blkdev_get_by_dev(). How do you think about open coding blkdev_get_by_dev() here, something like: diff --git a/block/genhd.c b/block/genhd.c index 23cf83b3331c..341af4db7d54 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -358,7 +358,7 @@ EXPORT_SYMBOL_GPL(disk_uevent); int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) { - struct block_device *bdev; + int ret = 0; if (disk->flags & (GENHD_FL_NO_PART | GENHD_FL_HIDDEN)) return -EINVAL; @@ -366,16 +366,31 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) return -EINVAL; if (disk->open_partitions) return -EBUSY; + + disk_block_events(disk); + mutex_lock(&disk->open_mutex); + /* Someone else has bdev exclusively open? */ - if (disk->part0->bd_holder && disk->part0->bd_holder != owner) - return -EBUSY; + if (disk->part0->bd_holder && disk->part0->bd_holder != owner) { + ret = -EBUSY; + goto out; + } + + if (!disk_live(disk)) { + ret = -ENODEV; + goto out; + } set_bit(GD_NEED_PART_SCAN, &disk->state); - bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); - if (IS_ERR(bdev)) - return PTR_ERR(bdev); - blkdev_put(bdev, mode); - return 0; + ret = blkdev_get_whole(disk->part0, mode); +out: + mutex_unlock(&disk->open_mutex); + disk_unblock_events(disk); + + if (!ret) + blkdev_put_whole(disk->part0, mode); + + return ret; }
It might be easier to just move the check into blkdev_get_whole, which also ensures that non-excluisve openers don't cause a partition scan while someone else has the device exclusively open. diff --git a/block/bdev.c b/block/bdev.c index edc110d90df404..a831b6c9c627d7 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -666,25 +666,28 @@ static void blkdev_flush_mapping(struct block_device *bdev) static int blkdev_get_whole(struct block_device *bdev, fmode_t mode) { struct gendisk *disk = bdev->bd_disk; - int ret; + int ret = 0; - if (disk->fops->open) { + if (disk->fops->open) ret = disk->fops->open(bdev, mode); - if (ret) { - /* avoid ghost partitions on a removed medium */ - if (ret == -ENOMEDIUM && - test_bit(GD_NEED_PART_SCAN, &disk->state)) - bdev_disk_changed(disk, true); + + if (ret) { + /* avoid ghost partitions on a removed medium */ + if (ret != -ENOMEDIUM) return ret; - } + } else { + if (!atomic_read(&bdev->bd_openers)) + set_init_blocksize(bdev); + atomic_inc(&bdev->bd_openers); } - if (!atomic_read(&bdev->bd_openers)) - set_init_blocksize(bdev); - if (test_bit(GD_NEED_PART_SCAN, &disk->state)) + /* + * Skip the partition scan if someone else has the device exclusively + * open. + */ + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) bdev_disk_changed(disk, false); - atomic_inc(&bdev->bd_openers); - return 0; + return ret; } static void blkdev_put_whole(struct block_device *bdev, fmode_t mode) diff --git a/block/genhd.c b/block/genhd.c index 23cf83b3331cde..4a7b1b8b9efdb5 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -366,9 +366,6 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) return -EINVAL; if (disk->open_partitions) return -EBUSY; - /* Someone else has bdev exclusively open? */ - if (disk->part0->bd_holder && disk->part0->bd_holder != owner) - return -EBUSY; set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL);
Hi, Christoph 在 2023/01/31 22:15, Christoph Hellwig 写道: > It might be easier to just move the check into blkdev_get_whole, > which also ensures that non-excluisve openers don't cause a partition > scan while someone else has the device exclusively open. > > diff --git a/block/bdev.c b/block/bdev.c > index edc110d90df404..a831b6c9c627d7 100644 > --- a/block/bdev.c > +++ b/block/bdev.c > @@ -666,25 +666,28 @@ static void blkdev_flush_mapping(struct block_device *bdev) > static int blkdev_get_whole(struct block_device *bdev, fmode_t mode) > { > struct gendisk *disk = bdev->bd_disk; > - int ret; > + int ret = 0; > > - if (disk->fops->open) { > + if (disk->fops->open) > ret = disk->fops->open(bdev, mode); > - if (ret) { > - /* avoid ghost partitions on a removed medium */ > - if (ret == -ENOMEDIUM && > - test_bit(GD_NEED_PART_SCAN, &disk->state)) > - bdev_disk_changed(disk, true); > + > + if (ret) { > + /* avoid ghost partitions on a removed medium */ > + if (ret != -ENOMEDIUM) > return ret; > - } > + } else { > + if (!atomic_read(&bdev->bd_openers)) > + set_init_blocksize(bdev); > + atomic_inc(&bdev->bd_openers); > } > > - if (!atomic_read(&bdev->bd_openers)) > - set_init_blocksize(bdev); > - if (test_bit(GD_NEED_PART_SCAN, &disk->state)) > + /* > + * Skip the partition scan if someone else has the device exclusively > + * open. > + */ > + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) > bdev_disk_changed(disk, false); I think this is wrong here... We should at least allow the exclusively opener to scan partition, right? Thanks, Kuai
On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote: > > + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) > > bdev_disk_changed(disk, false); > > I think this is wrong here... We should at least allow the exclusively > opener to scan partition, right? bd_holder is only set in bd_finish_claiming, which is called after the partition rescan.
Hi, 在 2023/02/01 14:22, Christoph Hellwig 写道: > On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote: >>> + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) >>> bdev_disk_changed(disk, false); >> >> I think this is wrong here... We should at least allow the exclusively >> opener to scan partition, right? > > bd_holder is only set in bd_finish_claiming, which is called after > the partition rescan. > . > I mean that someone open bdev exclusively first, and then call ioctl to rescan partition. By the way, disk_scan_partitions() won't claim bdev in the first place... Thanks, Kuai
Hi, Jan and Christoph 在 2022/12/01 1:56, Jan Kara 写道: > Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in > blk_drop_partitions") we allow rereading of partition table although > there are users of the block device. This has an undesirable consequence > that e.g. if sda and sdb are assembled to a RAID1 device md0 with > partitions, BLKRRPART ioctl on sda will rescan partition table and > create sda1 device. This partition device under a raid device confuses > some programs (such as libstorage-ng used for initial partitioning for > distribution installation) leading to failures. > > Fix the problem refusing to rescan partitions if there is another user > that has the block device exclusively open. Still by code review, I just found another race that cound cause disk can't scan partition while creating: t1: create device t2: open device exclusively device_add_disk() bdev_add insert_inode_hash ... bd_finish_claiming bdev->bd_holder = holder disk_scan_partitions // check holder and failed This race is artificial and unlikely to happend in practice, but this is easy to fix by following: diff --git a/block/genhd.c b/block/genhd.c index 09f2ac548832..23e87753313b 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -497,6 +497,11 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, if (ret) goto out_unregister_bdi; + /* + * In case user open device before disk_scan_partitions(), + * set state here, so that blkdev_open() can scan partitions. + */ + set_bit(GD_NEED_PART_SCAN, &disk->state); bdev_add(disk->part0, ddev->devt); if (get_capacity(disk)) disk_scan_partitions(disk, FMODE_READ, NULL);
Hi, Jan and Chirstoph 在 2023/02/01 15:20, Yu Kuai 写道: > Hi, > > 在 2023/02/01 14:22, Christoph Hellwig 写道: >> On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote: >>>> + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) >>>> bdev_disk_changed(disk, false); >>> >>> I think this is wrong here... We should at least allow the exclusively >>> opener to scan partition, right? >> >> bd_holder is only set in bd_finish_claiming, which is called after >> the partition rescan. >> . >> > > I mean that someone open bdev exclusively first, and then call ioctl to > rescan partition. Any suggestions? Thanks, Kuai
Hi! On Mon 06-02-23 11:24:10, Yu Kuai wrote: > 在 2023/02/01 15:20, Yu Kuai 写道: > > Hi, > > > > 在 2023/02/01 14:22, Christoph Hellwig 写道: > > > On Wed, Feb 01, 2023 at 09:04:12AM +0800, Yu Kuai wrote: > > > > > + if (test_bit(GD_NEED_PART_SCAN, &disk->state) && !bdev->bd_holder) > > > > > bdev_disk_changed(disk, false); > > > > > > > > I think this is wrong here... We should at least allow the exclusively > > > > opener to scan partition, right? > > > > > > bd_holder is only set in bd_finish_claiming, which is called after > > > the partition rescan. > > > . > > > > > > > I mean that someone open bdev exclusively first, and then call ioctl to > > rescan partition. > > Any suggestions? After some thought I don't like opencoding blkdev_get_by_dev() in disk_scan partitions. But I agree Christoph's approach with blkdev_get_whole() does not quite work either. We could propagate holder/owner into blkdev_get_whole() to fix Christoph's check but still we are left with a question what to do with GD_NEED_PART_SCAN set bit when we get into blkdev_get_whole() and find out we are not elligible to rescan partitions. Because then some exclusive opener later might be caught by surprise when the partition rescan happens due to this bit being set from the past failed attempt to rescan. So what we could do is play a similar trick as we do in the loop device and do in disk_scan_partitions(): /* * If we don't hold exclusive handle for the device, upgrade to it * here to avoid changing partitions under exclusive owner. */ if (!(mode & FMODE_EXCL)) { error = bd_prepare_to_claim(disk->part0, disk_scan_partitions); if (error) return error; } set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); if (IS_ERR(bdev)) { error = PTR_ERR(bdev); goto abort; } blkdev_put(bdev, mode & ~FMODE_EXCL); error = 0; abort: if (!(mode & FMODE_EXCL)) bd_abort_claiming(disk->part0, disk_scan_partitions); return error; So esentially we'll temporarily block any exlusive openers by claiming the bdev while we set the GD_NEED_PART_SCAN and force partition rescan. What do you think? Honza
Hi, 在 2023/02/08 20:02, Jan Kara 写道: > > After some thought I don't like opencoding blkdev_get_by_dev() in disk_scan > partitions. But I agree Christoph's approach with blkdev_get_whole() does > not quite work either. We could propagate holder/owner into > blkdev_get_whole() to fix Christoph's check but still we are left with a > question what to do with GD_NEED_PART_SCAN set bit when we get into > blkdev_get_whole() and find out we are not elligible to rescan partitions. > Because then some exclusive opener later might be caught by surprise when > the partition rescan happens due to this bit being set from the past failed > attempt to rescan. > > So what we could do is play a similar trick as we do in the loop device and > do in disk_scan_partitions(): > > /* > * If we don't hold exclusive handle for the device, upgrade to it > * here to avoid changing partitions under exclusive owner. > */ > if (!(mode & FMODE_EXCL)) { This is not necessary, all the caller make sure FMODE_EXCL is not set. > error = bd_prepare_to_claim(disk->part0, disk_scan_partitions); > if (error) > return error; > } From what I see, if thread open device excl first, and then call ioctl() to reread partition, this will cause this ioctl() to fail? > set_bit(GD_NEED_PART_SCAN, &disk->state); > bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL); > if (IS_ERR(bdev)) { > error = PTR_ERR(bdev); > goto abort; > } > blkdev_put(bdev, mode & ~FMODE_EXCL); > error = 0; > abort: > if (!(mode & FMODE_EXCL)) > bd_abort_claiming(disk->part0, disk_scan_partitions); > return error; > > So esentially we'll temporarily block any exlusive openers by claiming the > bdev while we set the GD_NEED_PART_SCAN and force partition rescan. What do > you think? > > Honza >
On Wed 08-02-23 22:13:02, Yu Kuai wrote: > Hi, > > 在 2023/02/08 20:02, Jan Kara 写道: > > > > After some thought I don't like opencoding blkdev_get_by_dev() in disk_scan > > partitions. But I agree Christoph's approach with blkdev_get_whole() does > > not quite work either. We could propagate holder/owner into > > blkdev_get_whole() to fix Christoph's check but still we are left with a > > question what to do with GD_NEED_PART_SCAN set bit when we get into > > blkdev_get_whole() and find out we are not elligible to rescan partitions. > > Because then some exclusive opener later might be caught by surprise when > > the partition rescan happens due to this bit being set from the past failed > > attempt to rescan. > > > > So what we could do is play a similar trick as we do in the loop device and > > do in disk_scan_partitions(): > > > > /* > > * If we don't hold exclusive handle for the device, upgrade to it > > * here to avoid changing partitions under exclusive owner. > > */ > > if (!(mode & FMODE_EXCL)) { > This is not necessary, all the caller make sure FMODE_EXCL is not set. Yes, but we need to propagate it correctly from blkdev_common_ioctl() now, exactly so that ioctl does not fail if you exclusively opened the device as you realized below :) > > error = bd_prepare_to_claim(disk->part0, disk_scan_partitions); > > if (error) > > return error; > > } > From what I see, if thread open device excl first, and then call ioctl() > to reread partition, this will cause this ioctl() to fail? Honza
Hi, 在 2023/02/09 17:04, Jan Kara 写道: > On Wed 08-02-23 22:13:02, Yu Kuai wrote: >> Hi, >> >> 在 2023/02/08 20:02, Jan Kara 写道: >>> >>> After some thought I don't like opencoding blkdev_get_by_dev() in disk_scan >>> partitions. But I agree Christoph's approach with blkdev_get_whole() does >>> not quite work either. We could propagate holder/owner into >>> blkdev_get_whole() to fix Christoph's check but still we are left with a >>> question what to do with GD_NEED_PART_SCAN set bit when we get into >>> blkdev_get_whole() and find out we are not elligible to rescan partitions. >>> Because then some exclusive opener later might be caught by surprise when >>> the partition rescan happens due to this bit being set from the past failed >>> attempt to rescan. >>> >>> So what we could do is play a similar trick as we do in the loop device and >>> do in disk_scan_partitions(): >>> >>> /* >>> * If we don't hold exclusive handle for the device, upgrade to it >>> * here to avoid changing partitions under exclusive owner. >>> */ >>> if (!(mode & FMODE_EXCL)) { >> This is not necessary, all the caller make sure FMODE_EXCL is not set. > > Yes, but we need to propagate it correctly from blkdev_common_ioctl() now, > exactly so that ioctl does not fail if you exclusively opened the device as > you realized below :) Ok, I get it that you want to pass in FMODE_EXCL from ioctl(). But I'm not sure let others fail to open device extl is a good idea. I still prefer to open code blkdev_get_by_dev(), because many operations is not necessary here. And this way, we can clear GD_NEED_PART_SCAN inside open_mutex if rescan failed. Thanks, Kuai > >>> error = bd_prepare_to_claim(disk->part0, disk_scan_partitions); >>> if (error) >>> return error; >>> } >> From what I see, if thread open device excl first, and then call ioctl() >> to reread partition, this will cause this ioctl() to fail? > > Honza >
On Thu 09-02-23 17:32:45, Yu Kuai wrote: > 在 2023/02/09 17:04, Jan Kara 写道: > > On Wed 08-02-23 22:13:02, Yu Kuai wrote: > > > Hi, > > > > > > 在 2023/02/08 20:02, Jan Kara 写道: > > > > > > > > After some thought I don't like opencoding blkdev_get_by_dev() in disk_scan > > > > partitions. But I agree Christoph's approach with blkdev_get_whole() does > > > > not quite work either. We could propagate holder/owner into > > > > blkdev_get_whole() to fix Christoph's check but still we are left with a > > > > question what to do with GD_NEED_PART_SCAN set bit when we get into > > > > blkdev_get_whole() and find out we are not elligible to rescan partitions. > > > > Because then some exclusive opener later might be caught by surprise when > > > > the partition rescan happens due to this bit being set from the past failed > > > > attempt to rescan. > > > > > > > > So what we could do is play a similar trick as we do in the loop device and > > > > do in disk_scan_partitions(): > > > > > > > > /* > > > > * If we don't hold exclusive handle for the device, upgrade to it > > > > * here to avoid changing partitions under exclusive owner. > > > > */ > > > > if (!(mode & FMODE_EXCL)) { > > > This is not necessary, all the caller make sure FMODE_EXCL is not set. > > > > Yes, but we need to propagate it correctly from blkdev_common_ioctl() now, > > exactly so that ioctl does not fail if you exclusively opened the device as > > you realized below :) > > Ok, I get it that you want to pass in FMODE_EXCL from ioctl(). But I'm > not sure let others fail to open device extl is a good idea. Other exclusive openers will not fail. They will block until we call bd_abort_claiming() after the partitions have been reread. > I still prefer to open code blkdev_get_by_dev(), because many operations > is not necessary here. And this way, we can clear GD_NEED_PART_SCAN > inside open_mutex if rescan failed. I understand many operations are not needed there but this is not a hot path and leaking of bdev internal details into genhd.c is not a good practice either (e.g. you'd have to make blkdev_get_whole() extern). We could create a special helper for partition rescan in block/bdev.c to hide the details but think that bd_start_claiming() - bd_abort_claiming() trick is the simplest solution to temporarily grab exclusive ownership if we don't have it. Honza
Hi, 在 2023/02/09 17:57, Jan Kara 写道: > On Thu 09-02-23 17:32:45, Yu Kuai wrote: >> 在 2023/02/09 17:04, Jan Kara 写道: >>> On Wed 08-02-23 22:13:02, Yu Kuai wrote: >>>> Hi, >>>> >>>> 在 2023/02/08 20:02, Jan Kara 写道: >>>>> >>>>> After some thought I don't like opencoding blkdev_get_by_dev() in disk_scan >>>>> partitions. But I agree Christoph's approach with blkdev_get_whole() does >>>>> not quite work either. We could propagate holder/owner into >>>>> blkdev_get_whole() to fix Christoph's check but still we are left with a >>>>> question what to do with GD_NEED_PART_SCAN set bit when we get into >>>>> blkdev_get_whole() and find out we are not elligible to rescan partitions. >>>>> Because then some exclusive opener later might be caught by surprise when >>>>> the partition rescan happens due to this bit being set from the past failed >>>>> attempt to rescan. >>>>> >>>>> So what we could do is play a similar trick as we do in the loop device and >>>>> do in disk_scan_partitions(): >>>>> >>>>> /* >>>>> * If we don't hold exclusive handle for the device, upgrade to it >>>>> * here to avoid changing partitions under exclusive owner. >>>>> */ >>>>> if (!(mode & FMODE_EXCL)) { >>>> This is not necessary, all the caller make sure FMODE_EXCL is not set. >>> >>> Yes, but we need to propagate it correctly from blkdev_common_ioctl() now, >>> exactly so that ioctl does not fail if you exclusively opened the device as >>> you realized below :) >> >> Ok, I get it that you want to pass in FMODE_EXCL from ioctl(). But I'm >> not sure let others fail to open device extl is a good idea. > > Other exclusive openers will not fail. They will block until we call > bd_abort_claiming() after the partitions have been reread. Yes, you're right, rescan and other exclusively openers will be synchronized by bd_prepare_to_claim(). > >> I still prefer to open code blkdev_get_by_dev(), because many operations >> is not necessary here. And this way, we can clear GD_NEED_PART_SCAN >> inside open_mutex if rescan failed. > > I understand many operations are not needed there but this is not a hot > path and leaking of bdev internal details into genhd.c is not a good > practice either (e.g. you'd have to make blkdev_get_whole() extern). I was thinking that disk_scan_partitions() can be moved to bdev.c to avoid that... > > We could create a special helper for partition rescan in block/bdev.c to > hide the details but think that bd_start_claiming() - bd_abort_claiming() > trick is the simplest solution to temporarily grab exclusive ownership if > we don't have it. Now I'm good with this solution. By the way, do you think we must make sure the first partition scan will be proceed? Thanks, Kuai > > Honza >
On Thu 09-02-23 19:19:48, Yu Kuai wrote: > 在 2023/02/09 17:57, Jan Kara 写道: > > On Thu 09-02-23 17:32:45, Yu Kuai wrote: > > > 在 2023/02/09 17:04, Jan Kara 写道: > > > I still prefer to open code blkdev_get_by_dev(), because many operations > > > is not necessary here. And this way, we can clear GD_NEED_PART_SCAN > > > inside open_mutex if rescan failed. > > > > I understand many operations are not needed there but this is not a hot > > path and leaking of bdev internal details into genhd.c is not a good > > practice either (e.g. you'd have to make blkdev_get_whole() extern). > > I was thinking that disk_scan_partitions() can be moved to bdev.c to > avoid that... > > > > We could create a special helper for partition rescan in block/bdev.c to > > hide the details but think that bd_start_claiming() - bd_abort_claiming() > > trick is the simplest solution to temporarily grab exclusive ownership if > > we don't have it. > > Now I'm good with this solution. By the way, do you think we must make > sure the first partition scan will be proceed? Sorry, I'm not sure what your are asking about here :) Can you please elaborate more? Honza
Hi, 在 2023/02/09 21:58, Jan Kara 写道: > Sorry, I'm not sure what your are asking about here :) Can you please > elaborate more? It's another artificail race that will cause part scan fail in device_add_disk(). bdev_add() -> user can open the device now disk_scan_partitions -> will fail is the device is already opened exclusively I'm thinking about set disk state before bdev_add()... Thanks, Kuai
On Fri 10-02-23 09:58:36, Yu Kuai wrote: > Hi, > > 在 2023/02/09 21:58, Jan Kara 写道: > > > Sorry, I'm not sure what your are asking about here :) Can you please > > elaborate more? > > > It's another artificail race that will cause part scan fail in > device_add_disk(). > > bdev_add() -> user can open the device now > > disk_scan_partitions -> will fail is the device is already opened > exclusively > > I'm thinking about set disk state before bdev_add()... Oh, right. Yes, that should be a good fix to set GD_NEED_PART_SCAN before calling bdev_add(). Honza
Hi, 在 2023/02/10 18:31, Jan Kara 写道: > On Fri 10-02-23 09:58:36, Yu Kuai wrote: >> Hi, >> >> 在 2023/02/09 21:58, Jan Kara 写道: >> >>> Sorry, I'm not sure what your are asking about here :) Can you please >>> elaborate more? >> >> >> It's another artificail race that will cause part scan fail in >> device_add_disk(). >> >> bdev_add() -> user can open the device now >> >> disk_scan_partitions -> will fail is the device is already opened >> exclusively >> >> I'm thinking about set disk state before bdev_add()... > > Oh, right. Yes, that should be a good fix to set GD_NEED_PART_SCAN before > calling bdev_add(). Glad to here that
diff --git a/block/blk.h b/block/blk.h index a186ea20f39d..8b75a95b28d6 100644 --- a/block/blk.h +++ b/block/blk.h @@ -436,7 +436,7 @@ static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu) } struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu); -int disk_scan_partitions(struct gendisk *disk, fmode_t mode); +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner); int disk_alloc_events(struct gendisk *disk); void disk_add_events(struct gendisk *disk); diff --git a/block/genhd.c b/block/genhd.c index 0f9769db2de8..012529d36f5b 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -356,7 +356,7 @@ void disk_uevent(struct gendisk *disk, enum kobject_action action) } EXPORT_SYMBOL_GPL(disk_uevent); -int disk_scan_partitions(struct gendisk *disk, fmode_t mode) +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) { struct block_device *bdev; @@ -366,6 +366,9 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode) return -EINVAL; if (disk->open_partitions) return -EBUSY; + /* Someone else has bdev exclusively open? */ + if (disk->part0->bd_holder != owner) + return -EBUSY; set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); @@ -500,7 +503,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, bdev_add(disk->part0, ddev->devt); if (get_capacity(disk)) - disk_scan_partitions(disk, FMODE_READ); + disk_scan_partitions(disk, FMODE_READ, NULL); /* * Announce the disk and partitions after all partitions are diff --git a/block/ioctl.c b/block/ioctl.c index 60121e89052b..96617512982e 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -467,9 +467,10 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode, * user space. Note the separate arg/argp parameters that are needed * to deal with the compat_ptr() conversion. */ -static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, - unsigned cmd, unsigned long arg, void __user *argp) +static int blkdev_common_ioctl(struct file *file, fmode_t mode, unsigned cmd, + unsigned long arg, void __user *argp) { + struct block_device *bdev = I_BDEV(file->f_mapping->host); unsigned int max_sectors; switch (cmd) { @@ -527,7 +528,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, return -EACCES; if (bdev_is_partition(bdev)) return -EINVAL; - return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL); + return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL, + file); case BLKTRACESTART: case BLKTRACESTOP: case BLKTRACETEARDOWN: @@ -605,7 +607,7 @@ long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) break; } - ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp); + ret = blkdev_common_ioctl(file, mode, cmd, arg, argp); if (ret != -ENOIOCTLCMD) return ret; @@ -674,7 +676,7 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg) break; } - ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp); + ret = blkdev_common_ioctl(file, mode, cmd, arg, argp); if (ret == -ENOIOCTLCMD && disk->fops->compat_ioctl) ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);
Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara <jack@suse.cz> --- block/blk.h | 2 +- block/genhd.c | 7 +++++-- block/ioctl.c | 12 +++++++----- 3 files changed, 13 insertions(+), 8 deletions(-)