Message ID | 20210416025745.8698-1-apopple@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] kernel/resource: Fix locking in request_free_mem_region | expand |
On Thu, Apr 15, 2021 at 7:58 PM Alistair Popple <apopple@nvidia.com> wrote: > > request_free_mem_region() is used to find an empty range of physical > addresses for hotplugging ZONE_DEVICE memory. It does this by iterating > over the range of possible addresses using region_intersects() to see if > the range is free. > > region_intersects() obtains a read lock before walking the resource tree > to protect against concurrent changes. However it drops the lock prior > to returning. This means by the time request_mem_region() is called in > request_free_mem_region() another thread may have already reserved the > requested region resulting in unexpected failures and a message in the > kernel log from hitting this condition: > > /* > * mm/hmm.c reserves physical addresses which then > * become unavailable to other users. Conflicts are > * not expected. Warn to aid debugging if encountered. > */ > if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { > pr_warn("Unaddressable device %s %pR conflicts with %pR", > conflict->name, conflict, res); > > To fix this create versions of region_intersects() and > request_mem_region() that allow the caller to take the appropriate lock > such that it may be held over the required calls. > > Instead of creating another version of devm_request_mem_region() that > doesn't take the lock open-code it to allow the caller to pre-allocate > the required memory prior to taking the lock. > > On some architectures and kernel configurations revoke_iomem() also > calls resource code so cannot be called with the resource lock held. > Therefore call it only after dropping the lock. The patch is difficult to read because too many things are being changed at once, and the changelog seems to confirm that. Can you try breaking this down into a set of incremental changes? Not only will this ease review it will distribute any regressions over multiple bisection targets. Something like: * Refactor region_intersects() to allow external locking * Refactor __request_region() to allow external locking * Push revoke_iomem() down into... * Fix resource_lock usage in [devm_]request_free_mem_region() The revoke_iomem() change seems like something that should be moved into a leaf helper and not called by __request_free_mem_region() directly.
On 16.04.21 06:19, Dan Williams wrote: > On Thu, Apr 15, 2021 at 7:58 PM Alistair Popple <apopple@nvidia.com> wrote: >> >> request_free_mem_region() is used to find an empty range of physical >> addresses for hotplugging ZONE_DEVICE memory. It does this by iterating >> over the range of possible addresses using region_intersects() to see if >> the range is free. >> >> region_intersects() obtains a read lock before walking the resource tree >> to protect against concurrent changes. However it drops the lock prior >> to returning. This means by the time request_mem_region() is called in >> request_free_mem_region() another thread may have already reserved the >> requested region resulting in unexpected failures and a message in the >> kernel log from hitting this condition: >> >> /* >> * mm/hmm.c reserves physical addresses which then >> * become unavailable to other users. Conflicts are >> * not expected. Warn to aid debugging if encountered. >> */ >> if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { >> pr_warn("Unaddressable device %s %pR conflicts with %pR", >> conflict->name, conflict, res); >> >> To fix this create versions of region_intersects() and >> request_mem_region() that allow the caller to take the appropriate lock >> such that it may be held over the required calls. >> >> Instead of creating another version of devm_request_mem_region() that >> doesn't take the lock open-code it to allow the caller to pre-allocate >> the required memory prior to taking the lock. >> >> On some architectures and kernel configurations revoke_iomem() also >> calls resource code so cannot be called with the resource lock held. >> Therefore call it only after dropping the lock. > > The patch is difficult to read because too many things are being > changed at once, and the changelog seems to confirm that. Can you try > breaking this down into a set of incremental changes? Not only will > this ease review it will distribute any regressions over multiple > bisection targets. > > Something like: > > * Refactor region_intersects() to allow external locking > * Refactor __request_region() to allow external locking > * Push revoke_iomem() down into... > * Fix resource_lock usage in [devm_]request_free_mem_region() +1
On Friday, 16 April 2021 2:19:18 PM AEST Dan Williams wrote: > The revoke_iomem() change seems like something that should be moved > into a leaf helper and not called by __request_free_mem_region() > directly. Ok. I have split this up but left the call to revoke_iomem() in __request_free_mem_region(). Perhaps I am missing something but I wasn't sure how moving it into a helper would be any better/different as it has to be called after dropping the lock. - Alistair
diff --git a/kernel/resource.c b/kernel/resource.c index 7e00239a023a..f1f7fe089fc8 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -502,6 +502,34 @@ int __weak page_is_ram(unsigned long pfn) } EXPORT_SYMBOL_GPL(page_is_ram); +static int __region_intersects(resource_size_t start, size_t size, + unsigned long flags, unsigned long desc) +{ + struct resource res; + int type = 0; int other = 0; + struct resource *p; + + res.start = start; + res.end = start + size - 1; + + for (p = iomem_resource.child; p ; p = p->sibling) { + bool is_type = (((p->flags & flags) == flags) && + ((desc == IORES_DESC_NONE) || + (desc == p->desc))); + + if (resource_overlaps(p, &res)) + is_type ? type++ : other++; + } + + if (type == 0) + return REGION_DISJOINT; + + if (other == 0) + return REGION_INTERSECTS; + + return REGION_MIXED; +} + /** * region_intersects() - determine intersection of region with known resources * @start: region start address @@ -525,31 +553,12 @@ EXPORT_SYMBOL_GPL(page_is_ram); int region_intersects(resource_size_t start, size_t size, unsigned long flags, unsigned long desc) { - struct resource res; - int type = 0; int other = 0; - struct resource *p; - - res.start = start; - res.end = start + size - 1; + int rc; read_lock(&resource_lock); - for (p = iomem_resource.child; p ; p = p->sibling) { - bool is_type = (((p->flags & flags) == flags) && - ((desc == IORES_DESC_NONE) || - (desc == p->desc))); - - if (resource_overlaps(p, &res)) - is_type ? type++ : other++; - } + rc = __region_intersects(start, size, flags, desc); read_unlock(&resource_lock); - - if (type == 0) - return REGION_DISJOINT; - - if (other == 0) - return REGION_INTERSECTS; - - return REGION_MIXED; + return rc; } EXPORT_SYMBOL_GPL(region_intersects); @@ -1150,31 +1159,16 @@ struct address_space *iomem_get_mapping(void) return smp_load_acquire(&iomem_inode)->i_mapping; } -/** - * __request_region - create a new busy resource region - * @parent: parent resource descriptor - * @start: resource start address - * @n: resource region size - * @name: reserving caller's ID string - * @flags: IO resource flags - */ -struct resource * __request_region(struct resource *parent, - resource_size_t start, resource_size_t n, - const char *name, int flags) +static bool request_region_locked(struct resource *parent, + struct resource *res, resource_size_t start, + resource_size_t n, const char *name, int flags) { DECLARE_WAITQUEUE(wait, current); - struct resource *res = alloc_resource(GFP_KERNEL); - struct resource *orig_parent = parent; - - if (!res) - return NULL; res->name = name; res->start = start; res->end = start + n - 1; - write_lock(&resource_lock); - for (;;) { struct resource *conflict; @@ -1209,14 +1203,37 @@ struct resource * __request_region(struct resource *parent, write_lock(&resource_lock); continue; } + return false; + } + + return true; +} + +/** + * __request_region - create a new busy resource region + * @parent: parent resource descriptor + * @start: resource start address + * @n: resource region size + * @name: reserving caller's ID string + * @flags: IO resource flags + */ +struct resource *__request_region(struct resource *parent, + resource_size_t start, resource_size_t n, + const char *name, int flags) +{ + struct resource *res = alloc_resource(GFP_KERNEL); + + if (!res) + return NULL; + + write_lock(&resource_lock); + if (!request_region_locked(parent, res, start, n, name, flags)) { /* Uhhuh, that didn't work out.. */ free_resource(res); res = NULL; - break; } write_unlock(&resource_lock); - - if (res && orig_parent == &iomem_resource) + if (res && parent == &iomem_resource) revoke_iomem(res); return res; @@ -1758,26 +1775,53 @@ static struct resource *__request_free_mem_region(struct device *dev, { resource_size_t end, addr; struct resource *res; + struct region_devres *dr = NULL; + + res = alloc_resource(GFP_KERNEL); + if (!res) + return ERR_PTR(-ENOMEM); + + if (dev) { + dr = devres_alloc(devm_region_release, sizeof(struct region_devres), + GFP_KERNEL); + if (!dr) { + free_resource(res); + return ERR_PTR(-ENOMEM); + } + } size = ALIGN(size, 1UL << PA_SECTION_SHIFT); end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1); addr = end - size + 1UL; + write_lock(&resource_lock); for (; addr > size && addr >= base->start; addr -= size) { - if (region_intersects(addr, size, 0, IORES_DESC_NONE) != + if (__region_intersects(addr, size, 0, IORES_DESC_NONE) != REGION_DISJOINT) continue; - if (dev) - res = devm_request_mem_region(dev, addr, size, name); - else - res = request_mem_region(addr, size, name); - if (!res) - return ERR_PTR(-ENOMEM); + if (!request_region_locked(&iomem_resource, res, addr, + size, name, 0)) + break; + + write_unlock(&resource_lock); + revoke_iomem(res); res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY; + if (dev) { + dr->parent = &iomem_resource; + dr->start = addr; + dr->n = size; + devres_add(dev, dr); + } + return res; } + write_unlock(&resource_lock); + free_resource(res); + if (dev) + devres_free(dr); + return ERR_PTR(-ERANGE); }