Message ID | 20210119043920.155044-9-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | prohibit pinning pages in ZONE_MOVABLE | expand |
On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote: > Zero page should not be used for long term pinned pages. Once pages > are pinned their physical addresses cannot changed until they are unpinned. > > Guarantee to always return real pages when they are pinned by adding > FOLL_WRITE. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > --- > mm/gup.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) No, this will definitely break things Why does the zero page have to be movable? Jason
On Tue, Jan 19, 2021 at 1:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote: > > Zero page should not be used for long term pinned pages. Once pages > > are pinned their physical addresses cannot changed until they are unpinned. > > > > Guarantee to always return real pages when they are pinned by adding > > FOLL_WRITE. > > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > --- > > mm/gup.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > No, this will definitely break things What will break > > Why does the zero page have to be movable? It is not even about being movable, we can't cow pinned pages returned by GUP call, how can we use zero page for that? > > Jason
On Tue, Jan 19, 2021 at 01:34:26PM -0500, Pavel Tatashin wrote: > On Tue, Jan 19, 2021 at 1:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote: > > > Zero page should not be used for long term pinned pages. Once pages > > > are pinned their physical addresses cannot changed until they are unpinned. > > > > > > Guarantee to always return real pages when they are pinned by adding > > > FOLL_WRITE. > > > > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > > mm/gup.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > No, this will definitely break things > > What will break Things assuming GUP doesn't break COW, making all GUP WRITE was already tried and revered for some other reason > > Why does the zero page have to be movable? > > It is not even about being movable, we can't cow pinned pages returned > by GUP call, how can we use zero page for that? The zero page is always zero, it is never written to. What does cow matter? Jason
On Tue, Jan 19, 2021 at 1:47 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Jan 19, 2021 at 01:34:26PM -0500, Pavel Tatashin wrote: > > On Tue, Jan 19, 2021 at 1:30 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Mon, Jan 18, 2021 at 11:39:14PM -0500, Pavel Tatashin wrote: > > > > Zero page should not be used for long term pinned pages. Once pages > > > > are pinned their physical addresses cannot changed until they are unpinned. > > > > > > > > Guarantee to always return real pages when they are pinned by adding > > > > FOLL_WRITE. > > > > > > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > > > mm/gup.c | 10 +++++++++- > > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > No, this will definitely break things > > > > What will break > > Things assuming GUP doesn't break COW, making all GUP WRITE was > already tried and revered for some other reason > > > > Why does the zero page have to be movable? > > > > It is not even about being movable, we can't cow pinned pages returned > > by GUP call, how can we use zero page for that? > > The zero page is always zero, it is never written to. What does cow > matter? Hi Jason, I was thinking about a use case where userland would pin an address without FOLL_WRITE, because the PTE for that address is not going to be writable, but some device via DMA will write to it. Now, if we got a zero page we have a problem... If this usecase is not valid then the fix for movable zero page is make the zero page always come from a non-movable zone so we do not need to isolate it during migration, and so the memory can be offlined later. Pasha > > Jason
> I was thinking about a use case where userland would pin an address > without FOLL_WRITE, because the PTE for that address is not going to > be writable, but some device via DMA will write to it. Now, if we got > a zero page we have a problem... If this usecase is not valid then the > fix for movable zero page is make the zero page always come from a > non-movable zone so we do not need to isolate it during migration, and > so the memory can be offlined later. I looked into making zero_page non-movable, and I am confused here. huge zero page is already not movable: get_huge_zero_page() zero_page = alloc_pages((GFP_TRANSHUGE | __GFP_ZERO) & ~__GFP_MOVABLE, ... Base zero page can be in a movable zone, which is a bug: if there are references to zero page, that page cannot be migrated, and we won't be hot-remove memory area where that page is located. On x86, zero page should always come from the bottom 4G of physical memory / DMA32 ZONE. However, I see that sometimes it is not (I reproduce in QEMU emulator): QEMU instance with 16G of memory and kernelcore=5G Boot#1: zero_pfn 48a8d zero_pfn zone: ZONE_DMA32 Boot#2: zero_pfn 20168d zero_pfn zone: ZONE_MOVABLE (???) The problem is that the x86 zero page comes from the .bss segment: https://soleen.com/source/xref/linux/arch/x86/kernel/head_64.S?r=31d85460#583 Which, I thought would always be set within the first 4G of physical memory. What is going on here? Pasha
On Tue, Jan 19, 2021 at 03:14:04PM -0500, Pavel Tatashin wrote: > I was thinking about a use case where userland would pin an address > without FOLL_WRITE, because the PTE for that address is not going to > be writable, but some device via DMA will write to it. That would be a serious bug in the get_user_pages caller to write to a page without using FOLL_WRITE Jason
diff --git a/mm/gup.c b/mm/gup.c index 857b273e32ac..9a817652f501 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1668,8 +1668,16 @@ static long __gup_longterm_locked(struct mm_struct *mm, unsigned long flags = 0; long rc; - if (gup_flags & FOLL_LONGTERM) + if (gup_flags & FOLL_LONGTERM) { + /* + * We are long term pinning pages and their PA's should not + * change until unpinned. Without FOLL_WRITE we might get zero + * page which we do not want. Force creating normal + * pages by adding FOLL_WRITE. + */ + gup_flags |= FOLL_WRITE; flags = memalloc_pin_save(); + } rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas, NULL, gup_flags);
Zero page should not be used for long term pinned pages. Once pages are pinned their physical addresses cannot changed until they are unpinned. Guarantee to always return real pages when they are pinned by adding FOLL_WRITE. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> --- mm/gup.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)