diff mbox series

[v4,2/9] mm: pagewalk: Take the pagetable lock in walk_pte_range()

Message ID 20191008091508.2682-3-thomas_os@shipmail.org (mailing list archive)
State New, archived
Headers show
Series Emulated coherent graphics memory take 2 | expand

Commit Message

Thomas Hellström (Intel) Oct. 8, 2019, 9:15 a.m. UTC
From: Thomas Hellstrom <thellstrom@vmware.com>

Without the lock, anybody modifying a pte from within this function might
have it concurrently modified by someone else.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 mm/pagewalk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Kirill A. Shutemov Oct. 9, 2019, 3:14 p.m. UTC | #1
On Tue, Oct 08, 2019 at 11:15:01AM +0200, Thomas Hellström (VMware) wrote:
> From: Thomas Hellstrom <thellstrom@vmware.com>
> 
> Without the lock, anybody modifying a pte from within this function might
> have it concurrently modified by someone else.
> 
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> ---
>  mm/pagewalk.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index d48c2a986ea3..83c0b78363b4 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -10,8 +10,9 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	pte_t *pte;
>  	int err = 0;
>  	const struct mm_walk_ops *ops = walk->ops;
> +	spinlock_t *ptl;
>  
> -	pte = pte_offset_map(pmd, addr);
> +	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
>  	for (;;) {
>  		err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
>  		if (err)
> @@ -22,7 +23,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  		pte++;
>  	}
>  
> -	pte_unmap(pte);
> +	pte_unmap_unlock(pte - 1, ptl);

NAK.

If ->pte_entry() fails on the first entry of the page table, pte - 1 will
point out side the page table.

And the '- 1' is totally unnecessary as we break the loop before pte++ on
the last iteration.
Linus Torvalds Oct. 9, 2019, 4:07 p.m. UTC | #2
On Wed, Oct 9, 2019 at 8:14 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> If ->pte_entry() fails on the first entry of the page table, pte - 1 will
> point out side the page table.
>
> And the '- 1' is totally unnecessary as we break the loop before pte++ on
> the last iteration.

Good catch. Too much copying the wrong pattern from other sources.

I do wish we didn't have this pattern of "update pte, then do
pte_unmap as long as it's in the same page". Yeah, it avoids a
variable, but still... But it is what it is, and we just need to be
careful.

             Linus
diff mbox series

Patch

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index d48c2a986ea3..83c0b78363b4 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -10,8 +10,9 @@  static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	pte_t *pte;
 	int err = 0;
 	const struct mm_walk_ops *ops = walk->ops;
+	spinlock_t *ptl;
 
-	pte = pte_offset_map(pmd, addr);
+	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
 	for (;;) {
 		err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
 		if (err)
@@ -22,7 +23,7 @@  static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		pte++;
 	}
 
-	pte_unmap(pte);
+	pte_unmap_unlock(pte - 1, ptl);
 	return err;
 }