arm: mm: lockless get_user_pages_fast

Message ID	1360890012-4684-1-git-send-email-chanho61.park@samsung.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Chanho Park <chanho61.park@samsung.com> To: linux@arm.linux.org.uk Subject: [PATCH] arm: mm: lockless get_user_pages_fast Date: Fri, 15 Feb 2013 10:00:12 +0900 Message-id: <1360890012-4684-1-git-send-email-chanho61.park@samsung.com> DLP-Filter: Pass summary: Content analysis details: (-7.6 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [203.254.224.24 listed in list.dnswl.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.7 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: catalin.marinas@arm.com, Inki Dae <inki.dae@samsung.com>, linux-mm@kvack.org, Kyungmin Park <kyungmin.park@samsung.com>, Myungjoo Ham <myungjoo.ham@samsung.com>, Chanho Park <chanho61.park@samsung.com>, linux-arm-kernel@lists.infradead.org Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Message ID

1360890012-4684-1-git-send-email-chanho61.park@samsung.com (mailing list archive)

State

New, archived

Headers

From: Chanho Park <chanho61.park@samsung.com>
To: linux@arm.linux.org.uk
Subject: [PATCH] arm: mm: lockless get_user_pages_fast
Date: Fri, 15 Feb 2013 10:00:12 +0900
Message-id: <1360890012-4684-1-git-send-email-chanho61.park@samsung.com>
DLP-Filter: Pass
Cc: catalin.marinas@arm.com, Inki Dae <inki.dae@samsung.com>,
	linux-mm@kvack.org, Kyungmin Park <kyungmin.park@samsung.com>,
	Myungjoo Ham <myungjoo.ham@samsung.com>,
	Chanho Park <chanho61.park@samsung.com>,
	linux-arm-kernel@lists.infradead.org
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: linux-arm-kernel-bounces@lists.infradead.org
Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Commit Message

Chanho Park Feb. 15, 2013, 1 a.m. UTC

This patch adds get_user_pages_fast(old name is "fast_gup") for ARM.
The fast_gup can walk pagetable without taking mmap_sem or any locks. If there
is not a pte with the correct permissions for the access, we fall back to slow
path(get_user_pages) to get remaining pages. This patch is written on reference
the x86's gup implementation. Traversing of hugepages is excluded because ARM
haven't supported hugepages yet[1], just only RFC.

[1]: http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/126382.html

Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Myungjoo Ham <myungjoo.ham@samsung.com>
Cc: Inki Dae <inki.dae@samsung.com>
---
 arch/arm/mm/Makefile |    2 +-
 arch/arm/mm/gup.c    |  160 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/mm/gup.c

Comments

Steve Capper April 5, 2013, 11:11 a.m. UTC | #1

Hi Chanho,

Apologies for the tardy response, this patch slipped past me.

On Fri, Feb 15, 2013 at 01:00:12AM +0000, Chanho Park wrote:
> This patch adds get_user_pages_fast(old name is "fast_gup") for ARM.
> The fast_gup can walk pagetable without taking mmap_sem or any locks. If there
> is not a pte with the correct permissions for the access, we fall back to slow
> path(get_user_pages) to get remaining pages. This patch is written on reference
> the x86's gup implementation. Traversing of hugepages is excluded because ARM
> haven't supported hugepages yet[1], just only RFC.
> 

I've tested this patch out, unfortunately it treats huge pmds as regular
pmds and attempts to traverse them rather than fall back to a slow path.
The fix for this is very minor, please see my suggestion below.

As an aside, I would like to extend this fast_gup to include full huge
page support and include a __get_user_pages_fast implementation. This
will hopefully fix a problem that was brought to my attention by
Grazvydas Ignotas whereby a FUTEX_WAIT on a THP tail page will cause
an infinite loop due to the stock implementation of __get_user_pages_fast
always returning 0.

> diff --git a/arch/arm/mm/gup.c b/arch/arm/mm/gup.c
> new file mode 100644
> index 0000000..ed54fd8

...

> +static int gup_pmd_range(pud_t *pudp, unsigned long addr, unsigned long end,
> +		int write, struct page **pages, int *nr)
> +{
> +	unsigned long next;
> +	pmd_t *pmdp;
> +
> +	pmdp = pmd_offset(pudp, addr);
> +	do {
> +		next = pmd_addr_end(addr, end);
> +		if (pmd_none(*pmdp))
> +			return 0;

I would suggest:
		if (pmd_none(*pmdp) || pmd_bad(*pmdp))
			return 0;
as this will pick up pmds that can't be traversed, and fall back to the slow path.

> +		else if (!gup_pte_range(pmdp, addr, next, write, pages, nr))
> +			return 0;
> +	} while (pmdp++, addr = next, addr != end);
> +
> +	return 1;
> +}
> +

Cheers,

Steve Capper April 10, 2013, 8:21 a.m. UTC | #2

On Wed, Apr 10, 2013 at 08:30:54AM +0100, Chanho Park wrote:
> > Apologies for the tardy response, this patch slipped past me.
> 
> Never mind.
> 
> > I've tested this patch out, unfortunately it treats huge pmds as regular
> > pmds and attempts to traverse them rather than fall back to a slow path.
> > The fix for this is very minor, please see my suggestion below.
> OK. I'll fix it.
> 
> > 
> > As an aside, I would like to extend this fast_gup to include full huge
> > page support and include a __get_user_pages_fast implementation. This will
> > hopefully fix a problem that was brought to my attention by Grazvydas
> > Ignotas whereby a FUTEX_WAIT on a THP tail page will cause an infinite
> > loop due to the stock implementation of __get_user_pages_fast always
> > returning 0.
> 
> I'll add the __get_user_pages_fast implementation. BTW, HugeTLB on ARM
> wasn't
> supported yet. There is no problem to add gup_huge_pmd. But I think it need
> a test
> for hugepages.
> 

Thanks, that would be helpful. My plan was to then put the huge page
specific bits in, with another patch. That way I can test it all out
here.

> > I would suggest:
> > 		if (pmd_none(*pmdp) || pmd_bad(*pmdp))
> > 			return 0;
> > as this will pick up pmds that can't be traversed, and fall back to the
> > slow path.
> 
> Thanks for your suggestion.
> I'll prepare the v2 patch.
> 

Also, just one more thing. In your gup_pte_range function there is
an smp_rmb() just after the pte is dereferenced. I don't understand
why though?

> Best regards,
> Chanho Park
> 
> 

Thanks,

Steve Capper April 10, 2013, 9:47 a.m. UTC | #3

On Wed, Apr 10, 2013 at 10:10:18AM +0100, Chanho Park wrote:
> > From: Steve Capper [mailto:steve.capper@arm.com]
> > Sent: Wednesday, April 10, 2013 5:21 PM
> > To: Chanho Park
> > Cc: Steve Capper; 'Chanho Park'; linux@arm.linux.org.uk; Catalin Marinas;
> > 'Inki Dae'; linux-mm@kvack.org; 'Kyungmin Park'; 'Myungjoo Ham'; linux-
> > arm-kernel@lists.infradead.org; 'Grazvydas Ignotas'
> > Subject: Re: [PATCH] arm: mm: lockless get_user_pages_fast
> > 
> > On Wed, Apr 10, 2013 at 08:30:54AM +0100, Chanho Park wrote:
> > > > Apologies for the tardy response, this patch slipped past me.
> > >
> > > Never mind.
> > >
> > > > I've tested this patch out, unfortunately it treats huge pmds as
> > > > regular pmds and attempts to traverse them rather than fall back to a
> > slow path.
> > > > The fix for this is very minor, please see my suggestion below.
> > > OK. I'll fix it.
> > >
> > > >
> > > > As an aside, I would like to extend this fast_gup to include full
> > > > huge page support and include a __get_user_pages_fast
> > > > implementation. This will hopefully fix a problem that was brought
> > > > to my attention by Grazvydas Ignotas whereby a FUTEX_WAIT on a THP
> > > > tail page will cause an infinite loop due to the stock
> > > > implementation of __get_user_pages_fast always returning 0.
> > >
> > > I'll add the __get_user_pages_fast implementation. BTW, HugeTLB on ARM
> > > wasn't supported yet. There is no problem to add gup_huge_pmd. But I
> > > think it need a test for hugepages.
> > >
> > 
> > Thanks, that would be helpful. My plan was to then put the huge page
> > specific bits in, with another patch. That way I can test it all out here.
> 
> Can I see the patch? I think it will be helpful to implement the
> gup_huge_pmd.
> Or how about you think except gup_huge_pmd in this patch?
> IMO it will be added easily after hugetlb on arm is merged.
> 

I think it would be better if this patch did not have gup_huge_pmd in it.

I am still working on my implementation and running it through a battery of
tests here. Also, I will likely change a few things in my huge page patches
to make a gup_huge_pmd easier to implement. I will be resending a V2 of my
huge pages soon.

It still would be helpful to have the pmd_bad(*pmdp) condition check in as
I suggested before though.

> > 
> > > > I would suggest:
> > > > 		if (pmd_none(*pmdp) || pmd_bad(*pmdp))
> > > > 			return 0;
> > > > as this will pick up pmds that can't be traversed, and fall back to
> > > > the slow path.
> > >
> > > Thanks for your suggestion.
> > > I'll prepare the v2 patch.
> > >
> > 
> > Also, just one more thing. In your gup_pte_range function there is an
> > smp_rmb() just after the pte is dereferenced. I don't understand why
> > though?
> 
> I think it would be needed for 64 bit machine. A pte of 64bit machine
> consists of low and high value. In this version, there is no need to add it.
> I'll remove it. Thanks.

The pte will only be 64 bit if LPAE is enabled, and LPAE support mandates
that 64 bit page table entries be read atomically. So we should be ok without
the read barrier.

> 
> Best regards,
> Chanho Park
> 
> 

Cheers,
--
Steve

diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 8a9c4cb..1f01c0b 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -6,7 +6,7 @@  obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   iomap.o
 
 obj-$(CONFIG_MMU)		+= fault-armv.o flush.o idmap.o ioremap.o \
-				   mmap.o pgd.o mmu.o vmregion.o
+				   mmap.o pgd.o mmu.o vmregion.o gup.o
 
 ifneq ($(CONFIG_MMU),y)
 obj-y				+= nommu.o
diff --git a/arch/arm/mm/gup.c b/arch/arm/mm/gup.c
new file mode 100644
index 0000000..ed54fd8
--- /dev/null
+++ b/arch/arm/mm/gup.c
@@ -0,0 +1,160 @@ 
+/*
+ * linux/arch/arm/mm/gup.c - Lockless get_user_pages_fast for arm
+ *
+ * Copyright (c) 2013 Samsung Electronics Co., Ltd.
+ *		http://www.samsung.com
+ * Author : Chanho Park <chanho61.park@samsung.com>
+ *
+ * This code is written on reference from the x86 and PowerPC versions, by:
+ *
+ *	Copyright (C) 2008 Nick Piggin
+ *	Copyright (C) 2008 Novell Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/pagemap.h>
+#include <linux/rwsem.h>
+#include <asm/pgtable.h>
+
+/*
+ * The performance critical leaf functions are made noinline otherwise gcc
+ * inlines everything into a single function which results in too much
+ * register pressure.
+ */
+static noinline int gup_pte_range(pmd_t *pmdp, unsigned long addr,
+		unsigned long end, int write, struct page **pages, int *nr)
+{
+	pte_t *ptep, pte;
+
+	ptep = pte_offset_kernel(pmdp, addr);
+	do {
+		struct page *page;
+
+		pte = *ptep;
+		smp_rmb();
+
+		if (!pte_present_user(pte) || (write && !pte_write(pte)))
+			return 0;
+
+		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
+		page = pte_page(pte);
+
+		if (!page_cache_get_speculative(page))
+			return 0;
+
+		pages[*nr] = page;
+		(*nr)++;
+
+	} while (ptep++, addr += PAGE_SIZE, addr != end);
+
+	return 1;
+}
+
+static int gup_pmd_range(pud_t *pudp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pmd_t *pmdp;
+
+	pmdp = pmd_offset(pudp, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		if (pmd_none(*pmdp))
+			return 0;
+		else if (!gup_pte_range(pmdp, addr, next, write, pages, nr))
+			return 0;
+	} while (pmdp++, addr = next, addr != end);
+
+	return 1;
+}
+
+static int gup_pud_range(pgd_t *pgdp, unsigned long addr, unsigned long end,
+		int write, struct page **pages, int *nr)
+{
+	unsigned long next;
+	pud_t *pudp;
+
+	pudp = pud_offset(pgdp, addr);
+	do {
+		next = pud_addr_end(addr, end);
+		if (pud_none(*pudp))
+			return 0;
+		else if (!gup_pmd_range(pudp, addr, next, write, pages, nr))
+			return 0;
+	} while (pudp++, addr = next, addr != end);
+
+	return 1;
+}
+
+int get_user_pages_fast(unsigned long start, int nr_pages, int write,
+			struct page **pages)
+{
+	struct mm_struct *mm = current->mm;
+	unsigned long addr, len, end;
+	unsigned long next;
+	pgd_t *pgdp;
+	int nr = 0;
+
+	start &= PAGE_MASK;
+	addr = start;
+	len = (unsigned long) nr_pages << PAGE_SHIFT;
+	end = start + len;
+
+	if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
+					start, len)))
+		goto slow_irqon;
+
+	/*
+	 * This doesn't prevent pagetable teardown, but does prevent
+	 * the pagetables from being freed on arm.
+	 *
+	 * So long as we atomically load page table pointers versus teardown,
+	 * we can follow the address down to the the page and take a ref on it.
+	 */
+	local_irq_disable();
+
+	pgdp = pgd_offset(mm, addr);
+	do {
+		next = pgd_addr_end(addr, end);
+		if (pgd_none(*pgdp))
+			goto slow;
+		else if (!gup_pud_range(pgdp, addr, next, write, pages, &nr))
+			goto slow;
+	} while (pgdp++, addr = next, addr != end);
+
+	local_irq_enable();
+
+	VM_BUG_ON(nr != (end - start) >> PAGE_SHIFT);
+	return nr;
+
+	{
+		int ret;
+
+slow:
+		local_irq_enable();
+slow_irqon:
+		/* Try to get the remaining pages with get_user_pages */
+		start += nr << PAGE_SHIFT;
+		pages += nr;
+
+		down_read(&mm->mmap_sem);
+		ret = get_user_pages(current, mm, start,
+			(end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
+		up_read(&mm->mmap_sem);
+
+		/* Have to be a bit careful with return values */
+		if (nr > 0) {
+			if (ret < 0)
+				ret = nr;
+			else
+				ret += nr;
+		}
+
+		return ret;
+	}
+}

arm: mm: lockless get_user_pages_fast

Commit Message

Comments

Patch