Message ID | 20180919184040.22540-2-kreijack@libero.it (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile. | expand |
On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote: > From: Goffredo Baroncelli <kreijack@inwind.it> > > Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> > --- > grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 66 insertions(+) > > diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c > index be195448d..56c42746d 100644 > --- a/grub-core/fs/btrfs.c > +++ b/grub-core/fs/btrfs.c > @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item > #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 > #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20 > #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40 > +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 > +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100 > grub_uint8_t dummy2[0xc]; > grub_uint16_t nstripes; > grub_uint16_t nsubstripes; > @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr, > stripe_offset = low + chunk_stripe_length > * high; > csize = chunk_stripe_length - low; > + break; > + } > + case GRUB_BTRFS_CHUNK_TYPE_RAID5: > + case GRUB_BTRFS_CHUNK_TYPE_RAID6: > + { > + grub_uint64_t nparities, block_nr, high, low; > + > + redundancy = 1; /* no redundancy for now */ > + > + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) > + { > + grub_dprintf ("btrfs", "RAID5\n"); > + nparities = 1; > + } > + else > + { > + grub_dprintf ("btrfs", "RAID6\n"); > + nparities = 2; > + } > + > + /* > + * A RAID 6 layout consists of several blocks spread on the disks. > + * The raid terminology is used to call all the blocks of a row > + * "stripe". Unfortunately the BTRFS terminology confuses block Stripe is data set or parity (parity stripe) on one disk. Block has different meaning. Please stick to btrfs terminology and say it clearly in the comment. And even add a link to btrfs wiki page to ease reading. I think about this one: https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID > + * and stripe. I do not think so. Or at least not so much... > + * > + * Disk0 Disk1 Disk2 Disk3 > + * > + * A1 B1 P1 Q1 > + * Q2 A2 B2 P2 > + * P3 Q3 A3 B3 > + * [...] > + * > + * Note that the placement of the parities depends on row index. > + * In the code below: > + * - block_nr is the block number without the parities Well, it seems to me that the btrfs code introduces confusion not the spec itself. I would leave code as is but s/block number/stripe number/. > + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...), > + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...), > + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...), s/disk number/disk number in a row/ > + * - off is the logical address to read > + * - chunk_stripe_length is the size of a block (typically 64k), s/a block/a stripe/ > + * - nstripes is the number of disks, s/number of disks/number of disks in a row/ I miss the description of nparities here... > + * - low is the offset of the data inside a stripe, > + * - stripe_offset is the disk offset, s/the disk offset/the data offset in an array/? > + * - csize is the "potential" data to read. It will be reduced to > + * size if the latter is smaller. > + */ > + block_nr = grub_divmod64 (off, chunk_stripe_length, &low); > + > + /* > + * stripen is computed without the parities (0 for A1, A2, A3... > + * 1 for B1, B2...). > + */ > + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen); This is clear... > + /* > + * stripen is recomputed considering the parities (0 for A1, 1 for > + * A2, 2 for A3....). > + */ > + grub_divmod64 (high + stripen, nstripes, &stripen); ... but this looks strange... You add disk number to row number. Hmmm... It looks that it works but this is not obvious at first sight. Could you explain that? > + stripe_offset = low + chunk_stripe_length * high; > + csize = chunk_stripe_length - low; > + > break; > } > default: Daniel
On 25/09/2018 17.31, Daniel Kiper wrote: > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote: >> From: Goffredo Baroncelli <kreijack@inwind.it> >> >> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> >> --- >> grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 66 insertions(+) >> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c >> index be195448d..56c42746d 100644 >> --- a/grub-core/fs/btrfs.c >> +++ b/grub-core/fs/btrfs.c >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item >> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 >> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20 >> #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40 >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100 >> grub_uint8_t dummy2[0xc]; >> grub_uint16_t nstripes; >> grub_uint16_t nsubstripes; >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr, >> stripe_offset = low + chunk_stripe_length >> * high; >> csize = chunk_stripe_length - low; >> + break; >> + } >> + case GRUB_BTRFS_CHUNK_TYPE_RAID5: >> + case GRUB_BTRFS_CHUNK_TYPE_RAID6: >> + { >> + grub_uint64_t nparities, block_nr, high, low; >> + >> + redundancy = 1; /* no redundancy for now */ >> + >> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) >> + { >> + grub_dprintf ("btrfs", "RAID5\n"); >> + nparities = 1; >> + } >> + else >> + { >> + grub_dprintf ("btrfs", "RAID6\n"); >> + nparities = 2; >> + } >> + >> + /* >> + * A RAID 6 layout consists of several blocks spread on the disks. >> + * The raid terminology is used to call all the blocks of a row >> + * "stripe". Unfortunately the BTRFS terminology confuses block > > Stripe is data set or parity (parity stripe) on one disk. Block has > different meaning. Please stick to btrfs terminology and say it clearly > in the comment. And even add a link to btrfs wiki page to ease reading. > > I think about this one: > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID > >> + * and stripe. > > I do not think so. Or at least not so much... Trust me, generally speaking stripe is the "row" in the disks (without the parity); looking at the ext3 man page: .... stride=stride-size Configure the filesystem for a RAID array with stride-size filesystem blocks. This is the number of blocks read or written to disk before moving to the next disk, which is sometimes referred to as the chunk size. This mostly affects placement of filesystem metadata like bitmaps at mke2fs time to avoid placing them on a single disk, which can hurt performance. It may also be used by the block allo‐ cator. stripe_width=stripe-width Configure the filesystem for a RAID array with stripe-width filesystem blocks per stripe. This is typically stride-size * N, where N is the number of data-bearing disks in the RAID (e.g. for RAID 5 there is one parity disk, so N will be the number of disks in the array minus 1). This allows the block allocator to prevent read-modify-write of the parity in a RAID stripe if possible when the data is writ‐ ten. .... Looking at the RAID5 wikipedia page, it seems that the term "stripe" is coherent with the ext3 man page. I suspect that the confusion is due to the fact that in RAID1 a stripe is in ONE disk (because the others are like parities). In BTRFS the RAID5/6 code uses the structure of RAID1 with some minimal extensions... To be clear, I don't have problem to be adherent to the BTRFS terminology. However I found this very confusing because I was used to a different terminology. I am bit worried about the fact that grub uses both MD/DM code and BTRFS code; a quick look to the code (eg ./grub-core/disk/diskfilter.c) shows that the stripe_size field seems to be related to a disks row without parities. And... yes in BTRFS "chunk" is a completely different beast than what it is reported in the ext3 man page :-) > >> + * >> + * Disk0 Disk1 Disk2 Disk3 >> + * >> + * A1 B1 P1 Q1 >> + * Q2 A2 B2 P2 >> + * P3 Q3 A3 B3 >> + * [...] >> + * >> + * Note that the placement of the parities depends on row index. >> + * In the code below: >> + * - block_nr is the block number without the parities > > Well, it seems to me that the btrfs code introduces confusion not the > spec itself. I would leave code as is but s/block number/stripe number/. Ok I will replace the two terms. However I have to put a warning that this is a "BTRFS" terminology :-) > >> + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...), >> + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...), >> + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...), > > s/disk number/disk number in a row/ This value doesn't depend by the row. So "number of disk" is more correct > >> + * - off is the logical address to read >> + * - chunk_stripe_length is the size of a block (typically 64k), > > s/a block/a stripe/ > >> + * - nstripes is the number of disks, > > s/number of disks/number of disks in a row/ ditto > > I miss the description of nparities here... Right: + * - nparities is the number of parities (1 for RAID5, 2 for RAID6); + * used only in RAID5/6 code. > >> + * - low is the offset of the data inside a stripe, >> + * - stripe_offset is the disk offset, > > s/the disk offset/the data offset in an array/? Yes > >> + * - csize is the "potential" data to read. It will be reduced to >> + * size if the latter is smaller. >> + */ >> + block_nr = grub_divmod64 (off, chunk_stripe_length, &low); >> + >> + /* >> + * stripen is computed without the parities (0 for A1, A2, A3... >> + * 1 for B1, B2...). >> + */ >> + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen); > > This is clear... > >> + /* >> + * stripen is recomputed considering the parities (0 for A1, 1 for >> + * A2, 2 for A3....). >> + */ >> + grub_divmod64 (high + stripen, nstripes, &stripen); > > ... but this looks strange... You add disk number to row number. Hmmm... > It looks that it works but this is not obvious at first sight. Could you > explain that? What about + /* + * stripen is recomputed considering the parities: different row have + * a different offset, we have to add to stripen the number of row ("high") in + * modulo nstripes (0 for A1, 1 for A2, 2 for A3....). + */ > >> + stripe_offset = low + chunk_stripe_length * high; >> + csize = chunk_stripe_length - low; >> + >> break; >> } >> default: > > Daniel >
On Wed, Sep 26, 2018 at 10:40:32PM +0200, Goffredo Baroncelli wrote: > On 25/09/2018 17.31, Daniel Kiper wrote: > > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote: > >> From: Goffredo Baroncelli <kreijack@inwind.it> > >> > >> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> > >> --- > >> grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 66 insertions(+) > >> > >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c > >> index be195448d..56c42746d 100644 > >> --- a/grub-core/fs/btrfs.c > >> +++ b/grub-core/fs/btrfs.c > >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item > >> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 > >> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20 > >> #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40 > >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 > >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100 > >> grub_uint8_t dummy2[0xc]; > >> grub_uint16_t nstripes; > >> grub_uint16_t nsubstripes; > >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr, > >> stripe_offset = low + chunk_stripe_length > >> * high; > >> csize = chunk_stripe_length - low; > >> + break; > >> + } > >> + case GRUB_BTRFS_CHUNK_TYPE_RAID5: > >> + case GRUB_BTRFS_CHUNK_TYPE_RAID6: > >> + { > >> + grub_uint64_t nparities, block_nr, high, low; > >> + > >> + redundancy = 1; /* no redundancy for now */ > >> + > >> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) > >> + { > >> + grub_dprintf ("btrfs", "RAID5\n"); > >> + nparities = 1; > >> + } > >> + else > >> + { > >> + grub_dprintf ("btrfs", "RAID6\n"); > >> + nparities = 2; > >> + } > >> + > >> + /* > >> + * A RAID 6 layout consists of several blocks spread on the disks. > >> + * The raid terminology is used to call all the blocks of a row > >> + * "stripe". Unfortunately the BTRFS terminology confuses block > > > > Stripe is data set or parity (parity stripe) on one disk. Block has > > different meaning. Please stick to btrfs terminology and say it clearly > > in the comment. And even add a link to btrfs wiki page to ease reading. > > > > I think about this one: > > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID > > > >> + * and stripe. > > > > I do not think so. Or at least not so much... > > Trust me, generally speaking stripe is the "row" in the disks (without the parity); looking at the ext3 man page: > > .... > stride=stride-size > Configure the filesystem for a RAID array with > stride-size filesystem blocks. This is the number of > blocks read or written to disk before moving to the > next disk, which is sometimes referred to as the > chunk size. This mostly affects placement of > filesystem metadata like bitmaps at mke2fs time to > avoid placing them on a single disk, which can hurt > performance. It may also be used by the block allo??? > cator. > > stripe_width=stripe-width > Configure the filesystem for a RAID array with > stripe-width filesystem blocks per stripe. This is > typically stride-size * N, where N is the number of > data-bearing disks in the RAID (e.g. for RAID 5 > there is one parity disk, so N will be the number of > disks in the array minus 1). This allows the block > allocator to prevent read-modify-write of the parity > in a RAID stripe if possible when the data is writ??? > ten. > > .... > Looking at the RAID5 wikipedia page, it seems that the term "stripe" > is coherent with the ext3 man page. Ugh... It looks that I have messed up things. Sorry about that. > I suspect that the confusion is due to the fact that in RAID1 a stripe > is in ONE disk (because the others are like parities). In BTRFS the > RAID5/6 code uses the structure of RAID1 with some minimal > extensions... > > To be clear, I don't have problem to be adherent to the BTRFS > terminology. However I found this very confusing because I was used to > a different terminology. I am bit worried about the fact that grub Yeah, I have the same feeling. However, I think that in btrfs code we should stay with btrfs terms. Though I think that it make sense to underline differences between btrfs and well known RAID here. > uses both MD/DM code and BTRFS code; a quick look to the code (eg > ./grub-core/disk/diskfilter.c) shows that the stripe_size field seems > to be related to a disks row without parities. > > And... yes in BTRFS "chunk" is a completely different beast than what > it is reported in the ext3 man page :-) As I said above, please say it in the comment. This will ease reading for people who are not used to btrfs terms. > >> + * > >> + * Disk0 Disk1 Disk2 Disk3 > >> + * > >> + * A1 B1 P1 Q1 > >> + * Q2 A2 B2 P2 > >> + * P3 Q3 A3 B3 > >> + * [...] > >> + * > >> + * Note that the placement of the parities depends on row index. > >> + * In the code below: > >> + * - block_nr is the block number without the parities > > > > Well, it seems to me that the btrfs code introduces confusion not the > > spec itself. I would leave code as is but s/block number/stripe number/. > > Ok I will replace the two terms. However I have to put a warning that this is a "BTRFS" terminology :-) Yep, and please explain the differences at the beginning of the comment. > >> + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...), > >> + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...), > >> + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...), > > > > s/disk number/disk number in a row/ > > This value doesn't depend by the row. So "number of disk" is more correct Yes, but without "row" it is a bit confusing because it suggests that it is an arbitrary number. Even if you give an example next to the description. So, I am insisting on adding "in a row" here. > >> + * - off is the logical address to read > >> + * - chunk_stripe_length is the size of a block (typically 64k), > > > > s/a block/a stripe/ Taking into account discussion above I am not sure right now which one is correct. Please double check and fix it if it is needed. > >> + * - nstripes is the number of disks, > > > > s/number of disks/number of disks in a row/ > ditto As above, Is it total number of disks in array? I do not think so. Hence, I think that "in a row" helps a bit. Even if it is not very precise. However, if you come up with something better I am not against it. > > I miss the description of nparities here... > > Right: > + * - nparities is the number of parities (1 for RAID5, 2 for RAID6); > + * used only in RAID5/6 code. LGTM. > >> + * - low is the offset of the data inside a stripe, > >> + * - stripe_offset is the disk offset, > > > > s/the disk offset/the data offset in an array/? > > Yes > > > > >> + * - csize is the "potential" data to read. It will be reduced to > >> + * size if the latter is smaller. > >> + */ > >> + block_nr = grub_divmod64 (off, chunk_stripe_length, &low); > >> + > >> + /* > >> + * stripen is computed without the parities (0 for A1, A2, A3... > >> + * 1 for B1, B2...). > >> + */ > >> + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen); > > > > This is clear... > > > >> + /* > >> + * stripen is recomputed considering the parities (0 for A1, 1 for > >> + * A2, 2 for A3....). > >> + */ > >> + grub_divmod64 (high + stripen, nstripes, &stripen); > > > > ... but this looks strange... You add disk number to row number. Hmmm... > > It looks that it works but this is not obvious at first sight. Could you > > explain that? > > What about > + /* > + * stripen is recomputed considering the parities: different row have > + * a different offset, we have to add to stripen the number of row ("high") in > + * modulo nstripes (0 for A1, 1 for A2, 2 for A3....). > + */ This is better but not much. You are repeating what code does. I am especially interested in why this math is correct. It is not obvious at first sight. If it is not it should be explained. Otherwise we will forget in a few months why it is correct. Daniel
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c index be195448d..56c42746d 100644 --- a/grub-core/fs/btrfs.c +++ b/grub-core/fs/btrfs.c @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10 #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20 #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40 +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80 +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100 grub_uint8_t dummy2[0xc]; grub_uint16_t nstripes; grub_uint16_t nsubstripes; @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr, stripe_offset = low + chunk_stripe_length * high; csize = chunk_stripe_length - low; + break; + } + case GRUB_BTRFS_CHUNK_TYPE_RAID5: + case GRUB_BTRFS_CHUNK_TYPE_RAID6: + { + grub_uint64_t nparities, block_nr, high, low; + + redundancy = 1; /* no redundancy for now */ + + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5) + { + grub_dprintf ("btrfs", "RAID5\n"); + nparities = 1; + } + else + { + grub_dprintf ("btrfs", "RAID6\n"); + nparities = 2; + } + + /* + * A RAID 6 layout consists of several blocks spread on the disks. + * The raid terminology is used to call all the blocks of a row + * "stripe". Unfortunately the BTRFS terminology confuses block + * and stripe. + * + * Disk0 Disk1 Disk2 Disk3 + * + * A1 B1 P1 Q1 + * Q2 A2 B2 P2 + * P3 Q3 A3 B3 + * [...] + * + * Note that the placement of the parities depends on row index. + * In the code below: + * - block_nr is the block number without the parities + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...), + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...), + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...), + * - off is the logical address to read + * - chunk_stripe_length is the size of a block (typically 64k), + * - nstripes is the number of disks, + * - low is the offset of the data inside a stripe, + * - stripe_offset is the disk offset, + * - csize is the "potential" data to read. It will be reduced to + * size if the latter is smaller. + */ + block_nr = grub_divmod64 (off, chunk_stripe_length, &low); + + /* + * stripen is computed without the parities (0 for A1, A2, A3... + * 1 for B1, B2...). + */ + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen); + + /* + * stripen is recomputed considering the parities (0 for A1, 1 for + * A2, 2 for A3....). + */ + grub_divmod64 (high + stripen, nstripes, &stripen); + + stripe_offset = low + chunk_stripe_length * high; + csize = chunk_stripe_length - low; + break; } default: