Message ID | 1450393200-6802-1-git-send-email-toshi.kani@hpe.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Thu, 2015-12-17 at 16:00 -0700, Toshi Kani wrote: > When user unbinds a BTT disk and binds again with a different > sector size without wiping out the disk, a BTT disk is created > with a wrong size. I think this is an incorrect usage model in the first place. You shouldn't expect to disable a BTT, change the sector size or uuid behind it, and expect it to work with the new sector size on re- enabling. While this patch makes the BTT see the right size, it is really just an illusion, because if you try to read the pre-sector-size-change data, it will be scrambled, and thus practically lost. Even with this patch, I can skip changing the UUID, just change the sector size, and re-enable it, and the total available size will appear to have changed. For the case of legacy (non-nfit) namespaces, the only way to change a BTT's properties is to recreate it using 'force_raw'. For non-legacy namespaces, the recommended way is to recreate the namespace with a new uuid, and this will cause BTT to react to the parent_uuid change and not try to bind itself to stale metadata. Both cases will lose any data on the old BTT. Ideally, changing BTT properties shouldn't be allowed till the parent namespaces is recreated, but I'm not sure there is an easy way to enforce this -- Dan? Also, I wonder if this problem is solved by using libndctl to manage BTTs. Thanks, -Vishal
On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: > On Thu, 2015-12-17 at 16:00 -0700, Toshi Kani wrote: > > When user unbinds a BTT disk and binds again with a different > > sector size without wiping out the disk, a BTT disk is created > > with a wrong size. > > I think this is an incorrect usage model in the first place. You > shouldn't expect to disable a BTT, change the sector size or uuid > behind it, and expect it to work with the new sector size on re- > enabling. Well, users do test with multiple sector sizes when they evaluate BTT. So, this is a legitimate use-case with a missing step to wipe out the data in between. I believe BTT needs to be protected from this case in one way or the other. > While this patch makes the BTT see the right size, it is really just an > illusion, because if you try to read the pre-sector-size-change data, > it will be scrambled, and thus practically lost. When user binds a new BTT, he/she does not expect any valid data left in BTT. mkfs is then performed before using the BTT disk. This use-case is similar with re-partitioning. > Even with this patch, I can skip changing the UUID, just change the > sector size, and re-enable it, and the total available size will appear > to have changed. Using the same UUID requires deliberate effort since uuidgen won't generate the same UUID again. We can check the sector size in addition to the UUID, if that makes it any safer. I am not sure how much we need to protect from such deliberate effort, though. > For the case of legacy (non-nfit) namespaces, the only way to change a > BTT's properties is to recreate it using 'force_raw'. For non-legacy > namespaces, the recommended way is to recreate the namespace with a new > uuid, and this will cause BTT to react to the parent_uuid change and > not try to bind itself to stale metadata. Both cases will lose any data > on the old BTT. Losing the data is expected after newly binding BTT, which is not an issue here. The issue is that BTT operates in split sector sizes when user missed the step to wipe out the data before binding. Yes, this patch follows the same approach of the parent_uuid check. > Ideally, changing BTT properties shouldn't be allowed till the parent > namespaces is recreated, but I'm not sure there is an easy way to > enforce this -- Dan? > > Also, I wonder if this problem is solved by using libndctl to manage > BTTs. I have not tested with libndctl yet, but I think our bind/unbind scripts do the same procedures. Thanks, -Toshi
On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: >> Also, I wonder if this problem is solved by using libndctl to manage >> BTTs. > > I have not tested with libndctl yet, but I think our bind/unbind scripts do > the same procedures. We loop through all combinations of sector size in our unit test. If you want to change the sector size the expectation is that the namespace is destroyed and fully re-created, especially due to the fact that changing sector size invalidates all data on the namespace. See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c
On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote: > On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: > > > Also, I wonder if this problem is solved by using libndctl to manage > > > BTTs. > > > > I have not tested with libndctl yet, but I think our bind/unbind > > scripts do the same procedures. > > We loop through all combinations of sector size in our unit test. If > you want to change the sector size the expectation is that the > namespace is destroyed and fully re-created, especially due to the > fact that changing sector size invalidates all data on the namespace. > > See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c The parent_uuid is not set on our NVDIMM-N systems. I do not see 'uuid' file under sysfs 'namespaceX.X' per namespace_visible(), either. This concept of creating/destroying a namespace is a bit foreign to us since we've never needed to do. Can you elaborate how it's supposed to work for NVDIMM-N? Thanks, -Toshi
On Fri, Dec 18, 2015 at 10:52 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote: >> On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >> > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: >> > > Also, I wonder if this problem is solved by using libndctl to manage >> > > BTTs. >> > >> > I have not tested with libndctl yet, but I think our bind/unbind >> > scripts do the same procedures. >> >> We loop through all combinations of sector size in our unit test. If >> you want to change the sector size the expectation is that the >> namespace is destroyed and fully re-created, especially due to the >> fact that changing sector size invalidates all data on the namespace. >> >> See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c > > The parent_uuid is not set on our NVDIMM-N systems. I do not see 'uuid' > file under sysfs 'namespaceX.X' per namespace_visible(), either. This > concept of creating/destroying a namespace is a bit foreign to us since > we've never needed to do. Can you elaborate how it's supposed to work for > NVDIMM-N? > Ugh, yes. An oversight on my part, let me give this some thought. Whatever we decide, I want the libndctl api to be identical for the two cases. I expect the simplest option is to have ndctl_namespace_disable_invalidate() destroy the btt-info block in the NVDIMM-N case.
On Fri, 2015-12-18 at 14:23 -0800, Dan Williams wrote: > On Fri, Dec 18, 2015 at 10:52 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > > On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote: > > > On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> > > > wrote: > > > > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: > > > > > Also, I wonder if this problem is solved by using libndctl to > > > > > manage > > > > > BTTs. > > > > > > > > I have not tested with libndctl yet, but I think our bind/unbind > > > > scripts do the same procedures. > > > > > > We loop through all combinations of sector size in our unit test. If > > > you want to change the sector size the expectation is that the > > > namespace is destroyed and fully re-created, especially due to the > > > fact that changing sector size invalidates all data on the namespace. > > > > > > See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c > > > > The parent_uuid is not set on our NVDIMM-N systems. I do not see > > 'uuid' file under sysfs 'namespaceX.X' per namespace_visible(), either. > > This concept of creating/destroying a namespace is a bit foreign to us > > since we've never needed to do. Can you elaborate how it's supposed to > > work for NVDIMM-N? > > > > Ugh, yes. An oversight on my part, let me give this some thought. > Whatever we decide, I want the libndctl api to be identical for the > two cases. > > I expect the simplest option is to have > ndctl_namespace_disable_invalidate() destroy the btt-info block in the > NVDIMM-N case. Do you have any update on this? If we are going to use ndctl to address this issue, does it mean that we will have to use ndctl to manage BTT? Thanks, -Toshi
On Tue, Jan 12, 2016 at 7:02 AM, Toshi Kani <toshi.kani@hpe.com> wrote: > On Fri, 2015-12-18 at 14:23 -0800, Dan Williams wrote: >> On Fri, Dec 18, 2015 at 10:52 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >> > On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote: >> > > On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> >> > > wrote: >> > > > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: >> > > > > Also, I wonder if this problem is solved by using libndctl to >> > > > > manage >> > > > > BTTs. >> > > > >> > > > I have not tested with libndctl yet, but I think our bind/unbind >> > > > scripts do the same procedures. >> > > >> > > We loop through all combinations of sector size in our unit test. If >> > > you want to change the sector size the expectation is that the >> > > namespace is destroyed and fully re-created, especially due to the >> > > fact that changing sector size invalidates all data on the namespace. >> > > >> > > See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c >> > >> > The parent_uuid is not set on our NVDIMM-N systems. I do not see >> > 'uuid' file under sysfs 'namespaceX.X' per namespace_visible(), either. >> > This concept of creating/destroying a namespace is a bit foreign to us >> > since we've never needed to do. Can you elaborate how it's supposed to >> > work for NVDIMM-N? >> > >> >> Ugh, yes. An oversight on my part, let me give this some thought. >> Whatever we decide, I want the libndctl api to be identical for the >> two cases. >> >> I expect the simplest option is to have >> ndctl_namespace_disable_invalidate() destroy the btt-info block in the >> NVDIMM-N case. > > Do you have any update on this? If we are going to use ndctl to address > this issue, does it mean that we will have to use ndctl to manage BTT? This is my current proposal: https://github.com/pmem/ndctl/blob/pending/Documentation/ndctl-create-namespace.txt
On 1/12/2016 11:38 AM, Dan Williams wrote: > On Tue, Jan 12, 2016 at 7:02 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >> On Fri, 2015-12-18 at 14:23 -0800, Dan Williams wrote: >>> On Fri, Dec 18, 2015 at 10:52 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >>>> On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote: >>>>> On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> >>>>> wrote: >>>>>> On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: >>>>>>> Also, I wonder if this problem is solved by using libndctl to >>>>>>> manage >>>>>>> BTTs. >>>>>> >>>>>> I have not tested with libndctl yet, but I think our bind/unbind >>>>>> scripts do the same procedures. >>>>> >>>>> We loop through all combinations of sector size in our unit test. If >>>>> you want to change the sector size the expectation is that the >>>>> namespace is destroyed and fully re-created, especially due to the >>>>> fact that changing sector size invalidates all data on the namespace. >>>>> >>>>> See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c >>>> >>>> The parent_uuid is not set on our NVDIMM-N systems. I do not see >>>> 'uuid' file under sysfs 'namespaceX.X' per namespace_visible(), either. >>>> This concept of creating/destroying a namespace is a bit foreign to us >>>> since we've never needed to do. Can you elaborate how it's supposed to >>>> work for NVDIMM-N? >>>> >>> >>> Ugh, yes. An oversight on my part, let me give this some thought. >>> Whatever we decide, I want the libndctl api to be identical for the >>> two cases. >>> >>> I expect the simplest option is to have >>> ndctl_namespace_disable_invalidate() destroy the btt-info block in the >>> NVDIMM-N case. >> >> Do you have any update on this? If we are going to use ndctl to address >> this issue, does it mean that we will have to use ndctl to manage BTT? > > This is my current proposal: > > https://github.com/pmem/ndctl/blob/pending/Documentation/ndctl-create-namespace.txt I don't really like "safe" as a mode choice. The others are "memory" and "raw". Could the one using the BTT be "sector" or "atomic" or "btt" or something other than "safe"? Using "safe" implies that the others aren't safe, but they are if you use them for their intended purpose and know what you're doing. -- ljk > _______________________________________________ > Linux-nvdimm mailing list > Linux-nvdimm@lists.01.org > https://lists.01.org/mailman/listinfo/linux-nvdimm >
On Tue, Jan 12, 2016 at 8:47 AM, Linda Knippers <linda.knippers@hpe.com> wrote: > On 1/12/2016 11:38 AM, Dan Williams wrote: >> On Tue, Jan 12, 2016 at 7:02 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >>> On Fri, 2015-12-18 at 14:23 -0800, Dan Williams wrote: >>>> On Fri, Dec 18, 2015 at 10:52 AM, Toshi Kani <toshi.kani@hpe.com> wrote: >>>>> On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote: >>>>>> On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <toshi.kani@hpe.com> >>>>>> wrote: >>>>>>> On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote: >>>>>>>> Also, I wonder if this problem is solved by using libndctl to >>>>>>>> manage >>>>>>>> BTTs. >>>>>>> >>>>>>> I have not tested with libndctl yet, but I think our bind/unbind >>>>>>> scripts do the same procedures. >>>>>> >>>>>> We loop through all combinations of sector size in our unit test. If >>>>>> you want to change the sector size the expectation is that the >>>>>> namespace is destroyed and fully re-created, especially due to the >>>>>> fact that changing sector size invalidates all data on the namespace. >>>>>> >>>>>> See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c >>>>> >>>>> The parent_uuid is not set on our NVDIMM-N systems. I do not see >>>>> 'uuid' file under sysfs 'namespaceX.X' per namespace_visible(), either. >>>>> This concept of creating/destroying a namespace is a bit foreign to us >>>>> since we've never needed to do. Can you elaborate how it's supposed to >>>>> work for NVDIMM-N? >>>>> >>>> >>>> Ugh, yes. An oversight on my part, let me give this some thought. >>>> Whatever we decide, I want the libndctl api to be identical for the >>>> two cases. >>>> >>>> I expect the simplest option is to have >>>> ndctl_namespace_disable_invalidate() destroy the btt-info block in the >>>> NVDIMM-N case. >>> >>> Do you have any update on this? If we are going to use ndctl to address >>> this issue, does it mean that we will have to use ndctl to manage BTT? >> >> This is my current proposal: >> >> https://github.com/pmem/ndctl/blob/pending/Documentation/ndctl-create-namespace.txt > > I don't really like "safe" as a mode choice. The others are "memory" and "raw". > Could the one using the BTT be "sector" or "atomic" or "btt" or something other > than "safe"? Using "safe" implies that the others aren't safe, but they are if > you use them for their intended purpose and know what you're doing. Of those alternatives I think "sector" mode is the most clear.
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c index cb47751..176ea25 100644 --- a/drivers/nvdimm/btt_devs.c +++ b/drivers/nvdimm/btt_devs.c @@ -218,6 +218,8 @@ static bool uuid_is_null(u8 *uuid) * Check consistency of the btt info block with itself by validating * the checksum, and with the parent namespace by verifying the * parent_uuid contained in the info block with the one supplied in. + * When nd_btt->uuid is set for binding, verify if the metadata is + * stale. * * Returns: * false for an invalid info block, true for a valid one @@ -234,6 +236,10 @@ bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super) if (memcmp(super->parent_uuid, parent_uuid, 16) != 0) return false; + if (nd_btt->uuid) + if (memcmp(super->uuid, nd_btt->uuid, 16) != 0) + return false; + checksum = le64_to_cpu(super->checksum); super->checksum = 0; if (checksum != nd_sb_checksum((struct nd_gen_sb *) super))
When user unbinds a BTT disk and binds again with a different sector size without wiping out the disk, a BTT disk is created with a wrong size. This is because the bind operation keeps the previous metadata, which leads nd_btt->lbasize inconsistent with internal_lbasize and external_lbasize in the arena. A reboot also reattaches the BTT from the previous metadata. Change nd_btt_arena_is_valid() to check if nd_btt->uuid matches with super->uuid when a new UUID is set for binding. This assures the bind operation writes the metadata with the values specified by user. Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Micah Parrish <micah.parrish@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> --- drivers/nvdimm/btt_devs.c | 6 ++++++ 1 file changed, 6 insertions(+)