diff mbox series

md/md-bitmap: fix incorrect usage for sb_index

Message ID 20240223121128.28985-1-heming.zhao@suse.com (mailing list archive)
State Accepted, archived
Headers show
Series md/md-bitmap: fix incorrect usage for sb_index | expand

Commit Message

heming.zhao@suse.com Feb. 23, 2024, 12:11 p.m. UTC
Commit d7038f951828 ("md-bitmap: don't use ->index for pages backing the
bitmap file") removed page->index from bitmap code, but left wrong code
logic for clustered-md. current code never set slot offset for cluster
nodes, will sometimes cause crash in clustered env.

Call trace (partly):
 md_bitmap_file_set_bit+0x110/0x1d8 [md_mod]
 md_bitmap_startwrite+0x13c/0x240 [md_mod]
 raid1_make_request+0x6b0/0x1c08 [raid1]
 md_handle_request+0x1dc/0x368 [md_mod]
 md_submit_bio+0x80/0xf8 [md_mod]
 __submit_bio+0x178/0x300
 submit_bio_noacct_nocheck+0x11c/0x338
 submit_bio_noacct+0x134/0x614
 submit_bio+0x28/0xdc
 submit_bh_wbc+0x130/0x1cc
 submit_bh+0x1c/0x28

Fixes: d7038f951828 ("md-bitmap: don't use ->index for pages backing the bitmap file")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 drivers/md/md-bitmap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Feb. 23, 2024, 3:37 p.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

... but if you're actually still actively using the bitmap code,
any chance you could try to look into updating it so that it doesn't
rely on ->bmap?
heming.zhao@suse.com Feb. 24, 2024, 8:41 a.m. UTC | #2
On 2/23/24 23:37, Christoph Hellwig wrote:
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks for your view.

> 
> ... but if you're actually still actively using the bitmap code,
> any chance you could try to look into updating it so that it doesn't
> rely on ->bmap?
> 

maybe I misunderstood your meaning, do you mean: you don't like the design of the bitmap, and you hope/plan to drop it?
bitmap is a journal-like stuff. we could get an idea from filesystem journal, just write data status to the bitmap area, rather than managing it in kernel memory. with this idea, writing data protection may involve more simpler code, the replay will become more complex.

Thanks,
Heming
Christoph Hellwig Feb. 26, 2024, 11:04 a.m. UTC | #3
On Sat, Feb 24, 2024 at 04:41:43PM +0800, Heming Zhao wrote:
> maybe I misunderstood your meaning, do you mean: you don't like the design of the bitmap, and you hope/plan to drop it?
> bitmap is a journal-like stuff. we could get an idea from filesystem journal, just write data status to the bitmap area, rather than managing it in kernel memory. with this idea, writing data protection may involve more simpler code, the replay will become more complex.

The main problem with the dm-bitmap code is the way it does I/O.  Using
->bmap to try to map blocks ahead is cumbersome, and unsafe in many
ways.  If you want to keep the MD bitmap code alive someone needs to
rewrite it to go through the file systems read_iter/write_iter ops
and do direct I/O.
Song Liu Feb. 26, 2024, 9:43 p.m. UTC | #4
On Fri, Feb 23, 2024 at 4:11 AM Heming Zhao <heming.zhao@suse.com> wrote:
>
> Commit d7038f951828 ("md-bitmap: don't use ->index for pages backing the
> bitmap file") removed page->index from bitmap code, but left wrong code
> logic for clustered-md. current code never set slot offset for cluster
> nodes, will sometimes cause crash in clustered env.
>
> Call trace (partly):
>  md_bitmap_file_set_bit+0x110/0x1d8 [md_mod]
>  md_bitmap_startwrite+0x13c/0x240 [md_mod]
>  raid1_make_request+0x6b0/0x1c08 [raid1]
>  md_handle_request+0x1dc/0x368 [md_mod]
>  md_submit_bio+0x80/0xf8 [md_mod]
>  __submit_bio+0x178/0x300
>  submit_bio_noacct_nocheck+0x11c/0x338
>  submit_bio_noacct+0x134/0x614
>  submit_bio+0x28/0xdc
>  submit_bh_wbc+0x130/0x1cc
>  submit_bh+0x1c/0x28
>
> Fixes: d7038f951828 ("md-bitmap: don't use ->index for pages backing the bitmap file")
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>

Applied to md-6.9 branch.

Thanks,
Song
heming.zhao@suse.com Feb. 27, 2024, 12:29 p.m. UTC | #5
On 2/26/24 19:04, Christoph Hellwig wrote:
> On Sat, Feb 24, 2024 at 04:41:43PM +0800, Heming Zhao wrote:
>> maybe I misunderstood your meaning, do you mean: you don't like the design of the bitmap, and you hope/plan to drop it?
>> bitmap is a journal-like stuff. we could get an idea from filesystem journal, just write data status to the bitmap area, rather than managing it in kernel memory. with this idea, writing data protection may involve more simpler code, the replay will become more complex.
> 
> The main problem with the dm-bitmap code is the way it does I/O.  Using
> ->bmap to try to map blocks ahead is cumbersome, and unsafe in many
> ways.  If you want to keep the MD bitmap code alive someone needs to
> rewrite it to go through the file systems read_iter/write_iter ops
> and do direct I/O.
> 

Thank you for your explanation, I understand your meaning. It looks the
code changes include userspace (mdadm) & kernelspace (md module) changes.
mdadm should open a file with dio, kernel should call read_iter/write_iter
ops for IOs.
 From my experience, I have never seen any SUSE customer using kernel
to manager bitmap under external mode (intel VROC is another topic).
This job has a very low priority from my side.

Thanks,
Heming
diff mbox series

Patch

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 9672f75c3050..a4976ceae868 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -234,7 +234,8 @@  static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
 	sector_t doff;
 
 	bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev;
-	if (pg_index == store->file_pages - 1) {
+	/* we compare length (page numbers), not page offset. */
+	if ((pg_index - store->sb_index) == store->file_pages - 1) {
 		unsigned int last_page_size = store->bytes & (PAGE_SIZE - 1);
 
 		if (last_page_size == 0)
@@ -438,8 +439,8 @@  static void filemap_write_page(struct bitmap *bitmap, unsigned long pg_index,
 	struct page *page = store->filemap[pg_index];
 
 	if (mddev_is_clustered(bitmap->mddev)) {
-		pg_index += bitmap->cluster_slot *
-			DIV_ROUND_UP(store->bytes, PAGE_SIZE);
+		/* go to node bitmap area starting point */
+		pg_index += store->sb_index;
 	}
 
 	if (store->file)
@@ -952,6 +953,7 @@  static void md_bitmap_file_set_bit(struct bitmap *bitmap, sector_t block)
 	unsigned long index = file_page_index(store, chunk);
 	unsigned long node_offset = 0;
 
+	index += store->sb_index;
 	if (mddev_is_clustered(bitmap->mddev))
 		node_offset = bitmap->cluster_slot * store->file_pages;
 
@@ -982,6 +984,7 @@  static void md_bitmap_file_clear_bit(struct bitmap *bitmap, sector_t block)
 	unsigned long index = file_page_index(store, chunk);
 	unsigned long node_offset = 0;
 
+	index += store->sb_index;
 	if (mddev_is_clustered(bitmap->mddev))
 		node_offset = bitmap->cluster_slot * store->file_pages;