Message ID | 1360890012-4684-1-git-send-email-chanho61.park@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Chanho, Apologies for the tardy response, this patch slipped past me. On Fri, Feb 15, 2013 at 01:00:12AM +0000, Chanho Park wrote: > This patch adds get_user_pages_fast(old name is "fast_gup") for ARM. > The fast_gup can walk pagetable without taking mmap_sem or any locks. If there > is not a pte with the correct permissions for the access, we fall back to slow > path(get_user_pages) to get remaining pages. This patch is written on reference > the x86's gup implementation. Traversing of hugepages is excluded because ARM > haven't supported hugepages yet[1], just only RFC. > I've tested this patch out, unfortunately it treats huge pmds as regular pmds and attempts to traverse them rather than fall back to a slow path. The fix for this is very minor, please see my suggestion below. As an aside, I would like to extend this fast_gup to include full huge page support and include a __get_user_pages_fast implementation. This will hopefully fix a problem that was brought to my attention by Grazvydas Ignotas whereby a FUTEX_WAIT on a THP tail page will cause an infinite loop due to the stock implementation of __get_user_pages_fast always returning 0. > diff --git a/arch/arm/mm/gup.c b/arch/arm/mm/gup.c > new file mode 100644 > index 0000000..ed54fd8 ... > +static int gup_pmd_range(pud_t *pudp, unsigned long addr, unsigned long end, > + int write, struct page **pages, int *nr) > +{ > + unsigned long next; > + pmd_t *pmdp; > + > + pmdp = pmd_offset(pudp, addr); > + do { > + next = pmd_addr_end(addr, end); > + if (pmd_none(*pmdp)) > + return 0; I would suggest: if (pmd_none(*pmdp) || pmd_bad(*pmdp)) return 0; as this will pick up pmds that can't be traversed, and fall back to the slow path. > + else if (!gup_pte_range(pmdp, addr, next, write, pages, nr)) > + return 0; > + } while (pmdp++, addr = next, addr != end); > + > + return 1; > +} > + Cheers,
On Wed, Apr 10, 2013 at 08:30:54AM +0100, Chanho Park wrote: > > Apologies for the tardy response, this patch slipped past me. > > Never mind. > > > I've tested this patch out, unfortunately it treats huge pmds as regular > > pmds and attempts to traverse them rather than fall back to a slow path. > > The fix for this is very minor, please see my suggestion below. > OK. I'll fix it. > > > > > As an aside, I would like to extend this fast_gup to include full huge > > page support and include a __get_user_pages_fast implementation. This will > > hopefully fix a problem that was brought to my attention by Grazvydas > > Ignotas whereby a FUTEX_WAIT on a THP tail page will cause an infinite > > loop due to the stock implementation of __get_user_pages_fast always > > returning 0. > > I'll add the __get_user_pages_fast implementation. BTW, HugeTLB on ARM > wasn't > supported yet. There is no problem to add gup_huge_pmd. But I think it need > a test > for hugepages. > Thanks, that would be helpful. My plan was to then put the huge page specific bits in, with another patch. That way I can test it all out here. > > I would suggest: > > if (pmd_none(*pmdp) || pmd_bad(*pmdp)) > > return 0; > > as this will pick up pmds that can't be traversed, and fall back to the > > slow path. > > Thanks for your suggestion. > I'll prepare the v2 patch. > Also, just one more thing. In your gup_pte_range function there is an smp_rmb() just after the pte is dereferenced. I don't understand why though? > Best regards, > Chanho Park > > Thanks,
On Wed, Apr 10, 2013 at 10:10:18AM +0100, Chanho Park wrote: > > From: Steve Capper [mailto:steve.capper@arm.com] > > Sent: Wednesday, April 10, 2013 5:21 PM > > To: Chanho Park > > Cc: Steve Capper; 'Chanho Park'; linux@arm.linux.org.uk; Catalin Marinas; > > 'Inki Dae'; linux-mm@kvack.org; 'Kyungmin Park'; 'Myungjoo Ham'; linux- > > arm-kernel@lists.infradead.org; 'Grazvydas Ignotas' > > Subject: Re: [PATCH] arm: mm: lockless get_user_pages_fast > > > > On Wed, Apr 10, 2013 at 08:30:54AM +0100, Chanho Park wrote: > > > > Apologies for the tardy response, this patch slipped past me. > > > > > > Never mind. > > > > > > > I've tested this patch out, unfortunately it treats huge pmds as > > > > regular pmds and attempts to traverse them rather than fall back to a > > slow path. > > > > The fix for this is very minor, please see my suggestion below. > > > OK. I'll fix it. > > > > > > > > > > > As an aside, I would like to extend this fast_gup to include full > > > > huge page support and include a __get_user_pages_fast > > > > implementation. This will hopefully fix a problem that was brought > > > > to my attention by Grazvydas Ignotas whereby a FUTEX_WAIT on a THP > > > > tail page will cause an infinite loop due to the stock > > > > implementation of __get_user_pages_fast always returning 0. > > > > > > I'll add the __get_user_pages_fast implementation. BTW, HugeTLB on ARM > > > wasn't supported yet. There is no problem to add gup_huge_pmd. But I > > > think it need a test for hugepages. > > > > > > > Thanks, that would be helpful. My plan was to then put the huge page > > specific bits in, with another patch. That way I can test it all out here. > > Can I see the patch? I think it will be helpful to implement the > gup_huge_pmd. > Or how about you think except gup_huge_pmd in this patch? > IMO it will be added easily after hugetlb on arm is merged. > I think it would be better if this patch did not have gup_huge_pmd in it. I am still working on my implementation and running it through a battery of tests here. Also, I will likely change a few things in my huge page patches to make a gup_huge_pmd easier to implement. I will be resending a V2 of my huge pages soon. It still would be helpful to have the pmd_bad(*pmdp) condition check in as I suggested before though. > > > > > > I would suggest: > > > > if (pmd_none(*pmdp) || pmd_bad(*pmdp)) > > > > return 0; > > > > as this will pick up pmds that can't be traversed, and fall back to > > > > the slow path. > > > > > > Thanks for your suggestion. > > > I'll prepare the v2 patch. > > > > > > > Also, just one more thing. In your gup_pte_range function there is an > > smp_rmb() just after the pte is dereferenced. I don't understand why > > though? > > I think it would be needed for 64 bit machine. A pte of 64bit machine > consists of low and high value. In this version, there is no need to add it. > I'll remove it. Thanks. The pte will only be 64 bit if LPAE is enabled, and LPAE support mandates that 64 bit page table entries be read atomically. So we should be ok without the read barrier. > > Best regards, > Chanho Park > > Cheers, -- Steve
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 8a9c4cb..1f01c0b 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -6,7 +6,7 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ iomap.o obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ - mmap.o pgd.o mmu.o vmregion.o + mmap.o pgd.o mmu.o vmregion.o gup.o ifneq ($(CONFIG_MMU),y) obj-y += nommu.o diff --git a/arch/arm/mm/gup.c b/arch/arm/mm/gup.c new file mode 100644 index 0000000..ed54fd8 --- /dev/null +++ b/arch/arm/mm/gup.c @@ -0,0 +1,160 @@ +/* + * linux/arch/arm/mm/gup.c - Lockless get_user_pages_fast for arm + * + * Copyright (c) 2013 Samsung Electronics Co., Ltd. + * http://www.samsung.com + * Author : Chanho Park <chanho61.park@samsung.com> + * + * This code is written on reference from the x86 and PowerPC versions, by: + * + * Copyright (C) 2008 Nick Piggin + * Copyright (C) 2008 Novell Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/sched.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/rwsem.h> +#include <asm/pgtable.h> + +/* + * The performance critical leaf functions are made noinline otherwise gcc + * inlines everything into a single function which results in too much + * register pressure. + */ +static noinline int gup_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, int write, struct page **pages, int *nr) +{ + pte_t *ptep, pte; + + ptep = pte_offset_kernel(pmdp, addr); + do { + struct page *page; + + pte = *ptep; + smp_rmb(); + + if (!pte_present_user(pte) || (write && !pte_write(pte))) + return 0; + + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + page = pte_page(pte); + + if (!page_cache_get_speculative(page)) + return 0; + + pages[*nr] = page; + (*nr)++; + + } while (ptep++, addr += PAGE_SIZE, addr != end); + + return 1; +} + +static int gup_pmd_range(pud_t *pudp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pmd_t *pmdp; + + pmdp = pmd_offset(pudp, addr); + do { + next = pmd_addr_end(addr, end); + if (pmd_none(*pmdp)) + return 0; + else if (!gup_pte_range(pmdp, addr, next, write, pages, nr)) + return 0; + } while (pmdp++, addr = next, addr != end); + + return 1; +} + +static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pud_t *pudp; + + pudp = pud_offset(pgdp, addr); + do { + next = pud_addr_end(addr, end); + if (pud_none(*pudp)) + return 0; + else if (!gup_pmd_range(pudp, addr, next, write, pages, nr)) + return 0; + } while (pudp++, addr = next, addr != end); + + return 1; +} + +int get_user_pages_fast(unsigned long start, int nr_pages, int write, + struct page **pages) +{ + struct mm_struct *mm = current->mm; + unsigned long addr, len, end; + unsigned long next; + pgd_t *pgdp; + int nr = 0; + + start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + start, len))) + goto slow_irqon; + + /* + * This doesn't prevent pagetable teardown, but does prevent + * the pagetables from being freed on arm. + * + * So long as we atomically load page table pointers versus teardown, + * we can follow the address down to the the page and take a ref on it. + */ + local_irq_disable(); + + pgdp = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none(*pgdp)) + goto slow; + else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr)) + goto slow; + } while (pgdp++, addr = next, addr != end); + + local_irq_enable(); + + VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT); + return nr; + + { + int ret; + +slow: + local_irq_enable(); +slow_irqon: + /* Try to get the remaining pages with get_user_pages */ + start += nr << PAGE_SHIFT; + pages += nr; + + down_read(&mm->mmap_sem); + ret = get_user_pages(current, mm, start, + (end - start) >> PAGE_SHIFT, write, 0, pages, NULL); + up_read(&mm->mmap_sem); + + /* Have to be a bit careful with return values */ + if (nr > 0) { + if (ret < 0) + ret = nr; + else + ret += nr; + } + + return ret; + } +}