Message ID | 152694212460.5484.13180030631810166467.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 21 May 2018 15:35:24 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > The last step before devm_memremap_pages() returns success is to > allocate a release action to tear the entire setup down. However, the > result from devm_add_action() is not checked. > > Checking the error also means that we need to handle the fact that the > percpu_ref may not be killed by the time devm_memremap_pages_release() > runs. Add a new state flag for this case. > > Cc: <stable@vger.kernel.org> > Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") > Cc: Christoph Hellwig <hch@lst.de> > Cc: "Jérôme Glisse" <jglisse@redhat.com> > Cc: Logan Gunthorpe <logang@deltatee.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Why the cc:stable? The changelog doesn't describe the end-user-visible impact of the bug (it always should, for this reason). AFAICT we only go wrong when a small GFP_KERNEL allocation fails (basically never happens), with undescribed results :(
[ resend as the last attempt dropped all the cc's ] On Mon, May 21, 2018 at 4:10 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Mon, 21 May 2018 15:35:24 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > >> The last step before devm_memremap_pages() returns success is to >> allocate a release action to tear the entire setup down. However, the >> result from devm_add_action() is not checked. >> >> Checking the error also means that we need to handle the fact that the >> percpu_ref may not be killed by the time devm_memremap_pages_release() >> runs. Add a new state flag for this case. >> >> Cc: <stable@vger.kernel.org> >> Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") >> Cc: Christoph Hellwig <hch@lst.de> >> Cc: "Jérôme Glisse" <jglisse@redhat.com> >> Cc: Logan Gunthorpe <logang@deltatee.com> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > Why the cc:stable? The changelog doesn't describe the end-user-visible > impact of the bug (it always should, for this reason). True, I should have included that, one of these years I'll stop making this mistake. > > AFAICT we only go wrong when a small GFP_KERNEL allocation fails > (basically never happens), with undescribed results :( > Here's a better changelog: --- The last step before devm_memremap_pages() returns success is to allocate a release action to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error also means that we need to handle the fact that the percpu_ref may not be killed by the time devm_memremap_pages_release() runs. Add a new state flag for this case. Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range.
On Mon, May 21, 2018 at 03:35:24PM -0700, Dan Williams wrote: > The last step before devm_memremap_pages() returns success is to > allocate a release action to tear the entire setup down. However, the > result from devm_add_action() is not checked. > > Checking the error also means that we need to handle the fact that the > percpu_ref may not be killed by the time devm_memremap_pages_release() > runs. Add a new state flag for this case. Looks good (modulo any stable tag issues): Reviewed-by: Christoph Hellwig <hch@lst.de>
Hey Dan, On 21/05/18 06:07 PM, Dan Williams wrote: > Without this change we could fail to register the teardown of > devm_memremap_pages(). The likelihood of hitting this failure is tiny > as small memory allocations almost always succeed. However, the impact > of the failure is large given any future reconfiguration, or > disable/enable, of an nvdimm namespace will fail forever as subsequent > calls to devm_memremap_pages() will fail to setup the pgmap_radix > since there will be stale entries for the physical address range. Sorry, I don't follow this. The change only seems to prevent a warning from occurring in this situation. Won't pgmap_radix_release() still be called regardless of whether this patch is applied? But it looks to me like this patch doesn't quite solve the issue -- at least when looking at dax/pmem.c: If devm_add_action_or_reset() fails, then dax_pmem_percpu_kill() won't be registered as an action and the percpu_ref will never get killed. Thus, dax_pmem_percpu_release() would not get called and dax_pmem_percpu_exit() will hang waiting for a completion that will never occur. So we probably need to add a kill call somewhere on the failing path... Logan
On Tue, May 22, 2018 at 9:42 AM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hey Dan, > > On 21/05/18 06:07 PM, Dan Williams wrote: >> Without this change we could fail to register the teardown of >> devm_memremap_pages(). The likelihood of hitting this failure is tiny >> as small memory allocations almost always succeed. However, the impact >> of the failure is large given any future reconfiguration, or >> disable/enable, of an nvdimm namespace will fail forever as subsequent >> calls to devm_memremap_pages() will fail to setup the pgmap_radix >> since there will be stale entries for the physical address range. > > Sorry, I don't follow this. The change only seems to prevent a warning > from occurring in this situation. Won't pgmap_radix_release() still be > called regardless of whether this patch is applied? devm_add_action() does not call the release function, devm_add_action_or_reset() does. > But it looks to me like this patch doesn't quite solve the issue -- at > least when looking at dax/pmem.c: If devm_add_action_or_reset() fails, > then dax_pmem_percpu_kill() won't be registered as an action and the > percpu_ref will never get killed. Thus, dax_pmem_percpu_release() would > not get called and dax_pmem_percpu_exit() will hang waiting for a > completion that will never occur. So we probably need to add a kill call > somewhere on the failing path... Ah, true, good catch! We should manually kill in the !registered case. I think this means we need to pass in the custom kill routine, because for the pmem driver it's blk_freeze_queue_start().
On 22/05/18 10:56 AM, Dan Williams wrote: > On Tue, May 22, 2018 at 9:42 AM, Logan Gunthorpe <logang@deltatee.com> wrote: >> Hey Dan, >> >> On 21/05/18 06:07 PM, Dan Williams wrote: >>> Without this change we could fail to register the teardown of >>> devm_memremap_pages(). The likelihood of hitting this failure is tiny >>> as small memory allocations almost always succeed. However, the impact >>> of the failure is large given any future reconfiguration, or >>> disable/enable, of an nvdimm namespace will fail forever as subsequent >>> calls to devm_memremap_pages() will fail to setup the pgmap_radix >>> since there will be stale entries for the physical address range. >> >> Sorry, I don't follow this. The change only seems to prevent a warning >> from occurring in this situation. Won't pgmap_radix_release() still be >> called regardless of whether this patch is applied? > > devm_add_action() does not call the release function, > devm_add_action_or_reset() does. Oh, yes. Thanks I see that now. > Ah, true, good catch! > > We should manually kill in the !registered case. I think this means we > need to pass in the custom kill routine, because for the pmem driver > it's blk_freeze_queue_start(). It may be cleaner to just have the caller call the specific kill function if devm_memremap_pages fails... Though, I don't fully understand how the nvdimm pmem driver cleans up the percpu counter. Logan
On Tue, May 22, 2018 at 10:03 AM, Logan Gunthorpe <logang@deltatee.com> wrote: > > > On 22/05/18 10:56 AM, Dan Williams wrote: >> On Tue, May 22, 2018 at 9:42 AM, Logan Gunthorpe <logang@deltatee.com> wrote: >>> Hey Dan, >>> >>> On 21/05/18 06:07 PM, Dan Williams wrote: >>>> Without this change we could fail to register the teardown of >>>> devm_memremap_pages(). The likelihood of hitting this failure is tiny >>>> as small memory allocations almost always succeed. However, the impact >>>> of the failure is large given any future reconfiguration, or >>>> disable/enable, of an nvdimm namespace will fail forever as subsequent >>>> calls to devm_memremap_pages() will fail to setup the pgmap_radix >>>> since there will be stale entries for the physical address range. >>> >>> Sorry, I don't follow this. The change only seems to prevent a warning >>> from occurring in this situation. Won't pgmap_radix_release() still be >>> called regardless of whether this patch is applied? >> >> devm_add_action() does not call the release function, >> devm_add_action_or_reset() does. > > Oh, yes. Thanks I see that now. > >> Ah, true, good catch! >> >> We should manually kill in the !registered case. I think this means we >> need to pass in the custom kill routine, because for the pmem driver >> it's blk_freeze_queue_start(). > > It may be cleaner to just have the caller call the specific kill > function if devm_memremap_pages fails... As far as I can see by then it's too late, or we need to expose release details to the caller which defeats the purpose of devm semantics. > Though, I don't fully > understand how the nvdimm pmem driver cleans up the percpu counter. The dev_pagemap setup for pmem is entirely too subtle and arguably a layering violation as it reuses the block layer q_usage_counter percpu_ref. We arrange for that counter to be shutdown before the blk_cleanup_queue() does the same.
On 22/05/18 11:25 AM, Dan Williams wrote: > As far as I can see by then it's too late, or we need to expose > release details to the caller which defeats the purpose of devm > semantics. In the dax/pmem case, I *think* it should be fine... devm_add_action_or_reset() only calls the action it is passed on failure not the entire devm chain. In which case, it should drop all the references to the percpu counter. Then, if on an error return from devm_memremap_pages() we call dax_pmem_percpu_kill(), the rest of the devm chain should be called when we return from a failed probe and it should proceed as usual. I think dax_pmem_percpu_kill() also must be called on any error return from devm_memremap_pages and it is currently not... Logan
diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 7b4899c06f49..44a7ee517513 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -115,6 +115,7 @@ struct dev_pagemap { dev_page_free_t page_free; struct vmem_altmap altmap; bool altmap_valid; + bool registered; struct resource res; struct percpu_ref *ref; struct device *dev; diff --git a/kernel/memremap.c b/kernel/memremap.c index c614645227a7..30d96be5a965 100644 --- a/kernel/memremap.c +++ b/kernel/memremap.c @@ -296,7 +296,7 @@ static void devm_memremap_pages_release(void *data) for_each_device_pfn(pfn, pgmap) put_page(pfn_to_page(pfn)); - if (percpu_ref_tryget_live(pgmap->ref)) { + if (pgmap->registered && percpu_ref_tryget_live(pgmap->ref)) { dev_WARN(dev, "%s: page mapping is still live!\n", __func__); percpu_ref_put(pgmap->ref); } @@ -418,7 +418,11 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) percpu_ref_get(pgmap->ref); } - devm_add_action(dev, devm_memremap_pages_release, pgmap); + error = devm_add_action_or_reset(dev, devm_memremap_pages_release, + pgmap); + if (error) + return ERR_PTR(error); + pgmap->registered = true; return __va(res->start);
The last step before devm_memremap_pages() returns success is to allocate a release action to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error also means that we need to handle the fact that the percpu_ref may not be killed by the time devm_memremap_pages_release() runs. Add a new state flag for this case. Cc: <stable@vger.kernel.org> Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Cc: Christoph Hellwig <hch@lst.de> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- include/linux/memremap.h | 1 + kernel/memremap.c | 8 ++++++-- 2 files changed, 7 insertions(+), 2 deletions(-)