Message ID | 1559725820-26138-1-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCHv3,1/2] mm/gup: fix omission of check on FOLL_LONGTERM in get_user_pages_fast() | expand |
On Wed, 5 Jun 2019 17:10:19 +0800 Pingfan Liu <kernelfans@gmail.com> wrote: > As for FOLL_LONGTERM, it is checked in the slow path > __gup_longterm_unlocked(). But it is not checked in the fast path, which > means a possible leak of CMA page to longterm pinned requirement through > this crack. > > Place a check in the fast path. I'm not actually seeing a description (in either the existing code or this changelog or patch) an explanation of *why* we wish to exclude CMA pages from longterm pinning. > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, > return ret; > } > > +#ifdef CONFIG_CMA > +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > +{ > + int i; > + > + for (i = 0; i < nr_pinned; i++) > + if (is_migrate_cma_page(pages[i])) { > + put_user_pages(pages + i, nr_pinned - i); > + return i; > + } > + > + return nr_pinned; > +} There's no point in inlining this. The code seems inefficient. If it encounters a single CMA page it can end up discarding a possibly significant number of non-CMA pages. I guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is rare. But could we avoid this (and the second pass across pages[]) by checking for a CMA page within gup_pte_range()? > +#else > +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > +{ > + return nr_pinned; > +} > +#endif > + > /** > * get_user_pages_fast() - pin user pages in memory > * @start: starting user address > @@ -2236,6 +2256,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > ret = nr; > } > > + if (unlikely(gup_flags & FOLL_LONGTERM) && nr) > + nr = reject_cma_pages(nr, pages); > + This would be a suitable place to add a comment explaining why we're doing this...
On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 5 Jun 2019 17:10:19 +0800 Pingfan Liu <kernelfans@gmail.com> wrote: > > > As for FOLL_LONGTERM, it is checked in the slow path > > __gup_longterm_unlocked(). But it is not checked in the fast path, which > > means a possible leak of CMA page to longterm pinned requirement through > > this crack. > > > > Place a check in the fast path. > > I'm not actually seeing a description (in either the existing code or > this changelog or patch) an explanation of *why* we wish to exclude CMA > pages from longterm pinning. > What about a short description like this: FOLL_LONGTERM suggests a pin which is going to be given to hardware and can't move. It would truncate CMA permanently and should be excluded. > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, > > return ret; > > } > > > > +#ifdef CONFIG_CMA > > +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > > +{ > > + int i; > > + > > + for (i = 0; i < nr_pinned; i++) > > + if (is_migrate_cma_page(pages[i])) { > > + put_user_pages(pages + i, nr_pinned - i); > > + return i; > > + } > > + > > + return nr_pinned; > > +} > > There's no point in inlining this. OK, will drop it in V4. > > The code seems inefficient. If it encounters a single CMA page it can > end up discarding a possibly significant number of non-CMA pages. I The trick is the page is not be discarded, in fact, they are still be referrenced by pte. We just leave the slow path to pick up the non-CMA pages again. > guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is > rare. But could we avoid this (and the second pass across pages[]) by > checking for a CMA page within gup_pte_range()? It will spread the same logic to hugetlb pte and normal pte. And no improvement in performance due to slow path. So I think maybe it is not worth. > > > +#else > > +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > > +{ > > + return nr_pinned; > > +} > > +#endif > > + > > /** > > * get_user_pages_fast() - pin user pages in memory > > * @start: starting user address > > @@ -2236,6 +2256,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > > ret = nr; > > } > > > > + if (unlikely(gup_flags & FOLL_LONGTERM) && nr) > > + nr = reject_cma_pages(nr, pages); > > + > > This would be a suitable place to add a comment explaining why we're > doing this... Would add one comment "FOLL_LONGTERM suggests a pin given to hardware and rarely returned." Thanks for your kind review. Regards, Pingfan
On 6/5/19 7:19 PM, Pingfan Liu wrote: > On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: ... >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, >>> return ret; >>> } >>> >>> +#ifdef CONFIG_CMA >>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages) >>> +{ >>> + int i; >>> + >>> + for (i = 0; i < nr_pinned; i++) >>> + if (is_migrate_cma_page(pages[i])) { >>> + put_user_pages(pages + i, nr_pinned - i); >>> + return i; >>> + } >>> + >>> + return nr_pinned; >>> +} >> >> There's no point in inlining this. > OK, will drop it in V4. > >> >> The code seems inefficient. If it encounters a single CMA page it can >> end up discarding a possibly significant number of non-CMA pages. I > The trick is the page is not be discarded, in fact, they are still be > referrenced by pte. We just leave the slow path to pick up the non-CMA > pages again. > >> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is >> rare. But could we avoid this (and the second pass across pages[]) by >> checking for a CMA page within gup_pte_range()? > It will spread the same logic to hugetlb pte and normal pte. And no > improvement in performance due to slow path. So I think maybe it is > not worth. > >> I think the concern is: for the successful gup_fast case with no CMA pages, this patch is adding another complete loop through all the pages. In the fast case. If the check were instead done as part of the gup_pte_range(), then it would be a little more efficient for that case. As for whether it's worth it, *probably* this is too small an effect to measure. But in order to attempt a measurement: running fio (https://github.com/axboe/fio) with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file that Jan Kara and Tom Talpey helped me come up with, for related testing: [reader] direct=1 ioengine=libaio blocksize=4096 size=1g numjobs=1 rw=read iodepth=64 thanks,
On Fri, Jun 7, 2019 at 5:17 AM John Hubbard <jhubbard@nvidia.com> wrote: > > On 6/5/19 7:19 PM, Pingfan Liu wrote: > > On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: > ... > >>> --- a/mm/gup.c > >>> +++ b/mm/gup.c > >>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, > >>> return ret; > >>> } > >>> > >>> +#ifdef CONFIG_CMA > >>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > >>> +{ > >>> + int i; > >>> + > >>> + for (i = 0; i < nr_pinned; i++) > >>> + if (is_migrate_cma_page(pages[i])) { > >>> + put_user_pages(pages + i, nr_pinned - i); > >>> + return i; > >>> + } > >>> + > >>> + return nr_pinned; > >>> +} > >> > >> There's no point in inlining this. > > OK, will drop it in V4. > > > >> > >> The code seems inefficient. If it encounters a single CMA page it can > >> end up discarding a possibly significant number of non-CMA pages. I > > The trick is the page is not be discarded, in fact, they are still be > > referrenced by pte. We just leave the slow path to pick up the non-CMA > > pages again. > > > >> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is > >> rare. But could we avoid this (and the second pass across pages[]) by > >> checking for a CMA page within gup_pte_range()? > > It will spread the same logic to hugetlb pte and normal pte. And no > > improvement in performance due to slow path. So I think maybe it is > > not worth. > > > >> > > I think the concern is: for the successful gup_fast case with no CMA > pages, this patch is adding another complete loop through all the > pages. In the fast case. > > If the check were instead done as part of the gup_pte_range(), then > it would be a little more efficient for that case. > > As for whether it's worth it, *probably* this is too small an effect to measure. > But in order to attempt a measurement: running fio (https://github.com/axboe/fio) > with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file > that Jan Kara and Tom Talpey helped me come up with, for related testing: > > [reader] > direct=1 > ioengine=libaio > blocksize=4096 > size=1g > numjobs=1 > rw=read > iodepth=64 > Yeah, agreed. Data is more persuasive. Thanks for your suggestion. I will try to bring out the result. Thanks, Pingfan
On Fri, Jun 07, 2019 at 02:10:15PM +0800, Pingfan Liu wrote: > On Fri, Jun 7, 2019 at 5:17 AM John Hubbard <jhubbard@nvidia.com> wrote: > > > > On 6/5/19 7:19 PM, Pingfan Liu wrote: > > > On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > ... > > >>> --- a/mm/gup.c > > >>> +++ b/mm/gup.c > > >>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, > > >>> return ret; > > >>> } > > >>> > > >>> +#ifdef CONFIG_CMA > > >>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > > >>> +{ > > >>> + int i; > > >>> + > > >>> + for (i = 0; i < nr_pinned; i++) > > >>> + if (is_migrate_cma_page(pages[i])) { > > >>> + put_user_pages(pages + i, nr_pinned - i); > > >>> + return i; > > >>> + } > > >>> + > > >>> + return nr_pinned; > > >>> +} > > >> > > >> There's no point in inlining this. > > > OK, will drop it in V4. > > > > > >> > > >> The code seems inefficient. If it encounters a single CMA page it can > > >> end up discarding a possibly significant number of non-CMA pages. I > > > The trick is the page is not be discarded, in fact, they are still be > > > referrenced by pte. We just leave the slow path to pick up the non-CMA > > > pages again. > > > > > >> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is > > >> rare. But could we avoid this (and the second pass across pages[]) by > > >> checking for a CMA page within gup_pte_range()? > > > It will spread the same logic to hugetlb pte and normal pte. And no > > > improvement in performance due to slow path. So I think maybe it is > > > not worth. > > > > > >> > > > > I think the concern is: for the successful gup_fast case with no CMA > > pages, this patch is adding another complete loop through all the > > pages. In the fast case. > > > > If the check were instead done as part of the gup_pte_range(), then > > it would be a little more efficient for that case. > > > > As for whether it's worth it, *probably* this is too small an effect to measure. > > But in order to attempt a measurement: running fio (https://github.com/axboe/fio) > > with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file > > that Jan Kara and Tom Talpey helped me come up with, for related testing: > > > > [reader] > > direct=1 > > ioengine=libaio > > blocksize=4096 > > size=1g > > numjobs=1 > > rw=read > > iodepth=64 > > Unable to get a NVME device to have a test. And when testing fio on the tranditional disk, I got the error "fio: engine libaio not loadable fio: failed to load engine fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory" But I found a test case which can be slightly adjusted to met the aim. It is tools/testing/selftests/vm/gup_benchmark.c Test enviroment: MemTotal: 264079324 kB MemFree: 262306788 kB CmaTotal: 0 kB CmaFree: 0 kB on AMD EPYC 7601 Test command: gup_benchmark -r 100 -n 64 gup_benchmark -r 100 -n 64 -l where -r stands for repeat times, -n is nr_pages param for get_user_pages_fast(), -l is a new option to test FOLL_LONGTERM in fast path, see a patch at the tail. Test result: w/o 477.800000 w/o-l 481.070000 a 481.800000 a-l 640.410000 b 466.240000 (question a: b outperforms w/o ?) b-l 529.740000 Where w/o is baseline without any patch using v5.2-rc2, a is this series, b does the check in gup_pte_range(). '-l' means FOLL_LONGTERM. I am suprised that b-l has about 17% improvement than a. (640.41 -529.74)/640.41 As for "question a: b outperforms w/o ?", I can not figure out why, maybe it can be considered as variance. Based on the above result, I think it is better to do the check inside gup_pte_range(). Any comment? Thanks, > Yeah, agreed. Data is more persuasive. Thanks for your suggestion. I > will try to bring out the result. > > Thanks, > Pingfan > --- Patch to do check inside gup_pte_range() diff --git a/mm/gup.c b/mm/gup.c index 2ce3091..ba213a0 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1757,6 +1757,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) + goto pte_unmap; + head = try_get_compound_head(page, 1); if (!head) goto pte_unmap; @@ -1900,6 +1904,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) { + *nr -= refs; + return 0; + } + head = try_get_compound_head(pmd_page(orig), refs); if (!head) { *nr -= refs; @@ -1941,6 +1951,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) { + *nr -= refs; + return 0; + } + head = try_get_compound_head(pud_page(orig), refs); if (!head) { *nr -= refs; @@ -1978,6 +1994,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) { + *nr -= refs; + return 0; + } + head = try_get_compound_head(pgd_page(orig), refs); if (!head) { *nr -= refs; --- Patch for testing diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c index 7dd602d..61dec5f 100644 --- a/mm/gup_benchmark.c +++ b/mm/gup_benchmark.c @@ -6,8 +6,9 @@ #include <linux/debugfs.h> #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) -#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +#define GUP_FAST_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +#define GUP_BENCHMARK _IOWR('g', 4, struct gup_benchmark) struct gup_benchmark { __u64 get_delta_usec; @@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd, nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i); break; + case GUP_FAST_LONGTERM_BENCHMARK: + nr = get_user_pages_fast(addr, nr, + (gup->flags & 1) | FOLL_LONGTERM, + pages + i); + break; case GUP_LONGTERM_BENCHMARK: nr = get_user_pages(addr, nr, (gup->flags & 1) | FOLL_LONGTERM, @@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, switch (cmd) { case GUP_FAST_BENCHMARK: + case GUP_FAST_LONGTERM_BENCHMARK: case GUP_LONGTERM_BENCHMARK: case GUP_BENCHMARK: break; diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c index c0534e2..ade8acb 100644 --- a/tools/testing/selftests/vm/gup_benchmark.c +++ b/tools/testing/selftests/vm/gup_benchmark.c @@ -15,8 +15,9 @@ #define PAGE_SIZE sysconf(_SC_PAGESIZE) #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) -#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +#define GUP_FAST_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 3, struct gup_benchmark) +#define GUP_BENCHMARK _IOWR('g', 4, struct gup_benchmark) struct gup_benchmark { __u64 get_delta_usec; @@ -37,7 +38,7 @@ int main(int argc, char **argv) char *file = "/dev/zero"; char *p; - while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) { + while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) { switch (opt) { case 'm': size = atoi(optarg) * MB; @@ -54,6 +55,9 @@ int main(int argc, char **argv) case 'T': thp = 0; break; + case 'l': + cmd = GUP_FAST_LONGTERM_BENCHMARK; + break; case 'L': cmd = GUP_LONGTERM_BENCHMARK; break;
On Tue, Jun 11, 2019 at 08:29:35PM +0800, Pingfan Liu wrote:
> Unable to get a NVME device to have a test. And when testing fio on the
How would a nvme test help? FOLL_LONGTERM isn't used by any performance
critical path to start with, so I don't see how this patch could be
a problem.
Pingfan Liu <kernelfans@gmail.com> writes: > As for FOLL_LONGTERM, it is checked in the slow path > __gup_longterm_unlocked(). But it is not checked in the fast path, which > means a possible leak of CMA page to longterm pinned requirement through > this crack. Shouldn't we disallow FOLL_LONGTERM with get_user_pages fastpath? W.r.t dax check we need vma to ensure whether a long term pin is allowed or not. If FOLL_LONGTERM is specified we should fallback to slow path. -aneesh
> Pingfan Liu <kernelfans@gmail.com> writes: > > > As for FOLL_LONGTERM, it is checked in the slow path > > __gup_longterm_unlocked(). But it is not checked in the fast path, > > which means a possible leak of CMA page to longterm pinned requirement > > through this crack. > > Shouldn't we disallow FOLL_LONGTERM with get_user_pages fastpath? W.r.t > dax check we need vma to ensure whether a long term pin is allowed or not. > If FOLL_LONGTERM is specified we should fallback to slow path. Yes, the fastpath bails to the slowpath if FOLL_LONGTERM _and_ DAX. But it does this while walking the page tables. I missed the CMA case and Pingfan's patch fixes this. We could check for CMA pages while walking the page tables but most agreed that it was not worth it. For DAX we already had checks for *_devmap() so it was easier to put the FOLL_LONGTERM checks there. Ira
On Tue, Jun 11, 2019 at 08:29:35PM +0800, Pingfan Liu wrote: > On Fri, Jun 07, 2019 at 02:10:15PM +0800, Pingfan Liu wrote: > > On Fri, Jun 7, 2019 at 5:17 AM John Hubbard <jhubbard@nvidia.com> wrote: > > > > > > On 6/5/19 7:19 PM, Pingfan Liu wrote: > > > > On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > ... > > > >>> --- a/mm/gup.c > > > >>> +++ b/mm/gup.c > > > >>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, > > > >>> return ret; > > > >>> } > > > >>> > > > >>> +#ifdef CONFIG_CMA > > > >>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > > > >>> +{ > > > >>> + int i; > > > >>> + > > > >>> + for (i = 0; i < nr_pinned; i++) > > > >>> + if (is_migrate_cma_page(pages[i])) { > > > >>> + put_user_pages(pages + i, nr_pinned - i); > > > >>> + return i; > > > >>> + } > > > >>> + > > > >>> + return nr_pinned; > > > >>> +} > > > >> > > > >> There's no point in inlining this. > > > > OK, will drop it in V4. > > > > > > > >> > > > >> The code seems inefficient. If it encounters a single CMA page it can > > > >> end up discarding a possibly significant number of non-CMA pages. I > > > > The trick is the page is not be discarded, in fact, they are still be > > > > referrenced by pte. We just leave the slow path to pick up the non-CMA > > > > pages again. > > > > > > > >> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is > > > >> rare. But could we avoid this (and the second pass across pages[]) by > > > >> checking for a CMA page within gup_pte_range()? > > > > It will spread the same logic to hugetlb pte and normal pte. And no > > > > improvement in performance due to slow path. So I think maybe it is > > > > not worth. > > > > > > > >> > > > > > > I think the concern is: for the successful gup_fast case with no CMA > > > pages, this patch is adding another complete loop through all the > > > pages. In the fast case. > > > > > > If the check were instead done as part of the gup_pte_range(), then > > > it would be a little more efficient for that case. > > > > > > As for whether it's worth it, *probably* this is too small an effect to measure. > > > But in order to attempt a measurement: running fio (https://github.com/axboe/fio) > > > with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file > > > that Jan Kara and Tom Talpey helped me come up with, for related testing: > > > > > > [reader] > > > direct=1 > > > ioengine=libaio > > > blocksize=4096 > > > size=1g > > > numjobs=1 > > > rw=read > > > iodepth=64 > > > > Unable to get a NVME device to have a test. And when testing fio on the > tranditional disk, I got the error "fio: engine libaio not loadable > fio: failed to load engine > fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory" > > But I found a test case which can be slightly adjusted to met the aim. > It is tools/testing/selftests/vm/gup_benchmark.c > > Test enviroment: > MemTotal: 264079324 kB > MemFree: 262306788 kB > CmaTotal: 0 kB > CmaFree: 0 kB > on AMD EPYC 7601 > > Test command: > gup_benchmark -r 100 -n 64 > gup_benchmark -r 100 -n 64 -l > where -r stands for repeat times, -n is nr_pages param for > get_user_pages_fast(), -l is a new option to test FOLL_LONGTERM in fast > path, see a patch at the tail. Thanks! That is a good test to add. You should add the patch to the series. > > Test result: > w/o 477.800000 > w/o-l 481.070000 > a 481.800000 > a-l 640.410000 > b 466.240000 (question a: b outperforms w/o ?) > b-l 529.740000 > > Where w/o is baseline without any patch using v5.2-rc2, a is this series, b > does the check in gup_pte_range(). '-l' means FOLL_LONGTERM. > > I am suprised that b-l has about 17% improvement than a. (640.41 -529.74)/640.41 Wow that is bigger than I would have thought. I suspect it gets worse as -n increases? > > As for "question a: b outperforms w/o ?", I can not figure out why, maybe it can be > considered as variance. :-/ Does this change with larger -r or -n values? > > Based on the above result, I think it is better to do the check inside > gup_pte_range(). > > Any comment? I agree. Ira > > Thanks, > > > > Yeah, agreed. Data is more persuasive. Thanks for your suggestion. I > > will try to bring out the result. > > > > Thanks, > > Pingfan > > > > --- > Patch to do check inside gup_pte_range() > > diff --git a/mm/gup.c b/mm/gup.c > index 2ce3091..ba213a0 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1757,6 +1757,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > VM_BUG_ON(!pfn_valid(pte_pfn(pte))); > page = pte_page(pte); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) > + goto pte_unmap; > + > head = try_get_compound_head(page, 1); > if (!head) > goto pte_unmap; > @@ -1900,6 +1904,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, > refs++; > } while (addr += PAGE_SIZE, addr != end); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) { > + *nr -= refs; > + return 0; > + } > + > head = try_get_compound_head(pmd_page(orig), refs); > if (!head) { > *nr -= refs; > @@ -1941,6 +1951,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, > refs++; > } while (addr += PAGE_SIZE, addr != end); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) { > + *nr -= refs; > + return 0; > + } > + > head = try_get_compound_head(pud_page(orig), refs); > if (!head) { > *nr -= refs; > @@ -1978,6 +1994,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, > refs++; > } while (addr += PAGE_SIZE, addr != end); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) { > + *nr -= refs; > + return 0; > + } > + > head = try_get_compound_head(pgd_page(orig), refs); > if (!head) { > *nr -= refs; > --- > Patch for testing > > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c > index 7dd602d..61dec5f 100644 > --- a/mm/gup_benchmark.c > +++ b/mm/gup_benchmark.c > @@ -6,8 +6,9 @@ > #include <linux/debugfs.h> > > #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) > -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > -#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > +#define GUP_FAST_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > +#define GUP_BENCHMARK _IOWR('g', 4, struct gup_benchmark) > > struct gup_benchmark { > __u64 get_delta_usec; > @@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd, > nr = get_user_pages_fast(addr, nr, gup->flags & 1, > pages + i); > break; > + case GUP_FAST_LONGTERM_BENCHMARK: > + nr = get_user_pages_fast(addr, nr, > + (gup->flags & 1) | FOLL_LONGTERM, > + pages + i); > + break; > case GUP_LONGTERM_BENCHMARK: > nr = get_user_pages(addr, nr, > (gup->flags & 1) | FOLL_LONGTERM, > @@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, > > switch (cmd) { > case GUP_FAST_BENCHMARK: > + case GUP_FAST_LONGTERM_BENCHMARK: > case GUP_LONGTERM_BENCHMARK: > case GUP_BENCHMARK: > break; > diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c > index c0534e2..ade8acb 100644 > --- a/tools/testing/selftests/vm/gup_benchmark.c > +++ b/tools/testing/selftests/vm/gup_benchmark.c > @@ -15,8 +15,9 @@ > #define PAGE_SIZE sysconf(_SC_PAGESIZE) > > #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) > -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > -#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > +#define GUP_FAST_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > +#define GUP_BENCHMARK _IOWR('g', 4, struct gup_benchmark) > > struct gup_benchmark { > __u64 get_delta_usec; > @@ -37,7 +38,7 @@ int main(int argc, char **argv) > char *file = "/dev/zero"; > char *p; > > - while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) { > + while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) { > switch (opt) { > case 'm': > size = atoi(optarg) * MB; > @@ -54,6 +55,9 @@ int main(int argc, char **argv) > case 'T': > thp = 0; > break; > + case 'l': > + cmd = GUP_FAST_LONGTERM_BENCHMARK; > + break; > case 'L': > cmd = GUP_LONGTERM_BENCHMARK; > break; > -- > 2.7.5 >
On 6/11/19 6:52 AM, Christoph Hellwig wrote: > On Tue, Jun 11, 2019 at 08:29:35PM +0800, Pingfan Liu wrote: >> Unable to get a NVME device to have a test. And when testing fio on the > > How would a nvme test help? FOLL_LONGTERM isn't used by any performance > critical path to start with, so I don't see how this patch could be > a problem. > yes, you're right of course. We skip the loop entirely for FOLL_LONGTERM, and I forgot for the moment that the direct IO paths are never going to set that flag. :) thanks,
On Tue, Jun 11, 2019 at 04:29:11PM +0000, Weiny, Ira wrote: > > Pingfan Liu <kernelfans@gmail.com> writes: > > > > > As for FOLL_LONGTERM, it is checked in the slow path > > > __gup_longterm_unlocked(). But it is not checked in the fast path, > > > which means a possible leak of CMA page to longterm pinned requirement > > > through this crack. > > > > Shouldn't we disallow FOLL_LONGTERM with get_user_pages fastpath? W.r.t > > dax check we need vma to ensure whether a long term pin is allowed or not. > > If FOLL_LONGTERM is specified we should fallback to slow path. > > Yes, the fastpath bails to the slowpath if FOLL_LONGTERM _and_ DAX. But it does this while walking the page tables. I missed the CMA case and Pingfan's patch fixes this. We could check for CMA pages while walking the page tables but most agreed that it was not worth it. For DAX we already had checks for *_devmap() so it was easier to put the FOLL_LONGTERM checks there. > Then for CMA pages, are you suggesting something like: diff --git a/mm/gup.c b/mm/gup.c index 42a47c0..8bf3cc3 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2251,6 +2251,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, if (unlikely(!access_ok((void __user *)start, len))) return -EFAULT; + if (unlikely(gup_flags & FOLL_LONGTERM)) + goto slow; if (gup_fast_permitted(start, nr_pages)) { local_irq_disable(); gup_pgd_range(addr, end, gup_flags, pages, &nr); @@ -2258,6 +2260,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, ret = nr; } +slow: if (nr < nr_pages) { /* Try to get the remaining pages with get_user_pages */ start += nr << PAGE_SHIFT; Thanks, Pingfan
On Wed, Jun 12, 2019 at 12:46 AM Ira Weiny <ira.weiny@intel.com> wrote: > > On Tue, Jun 11, 2019 at 08:29:35PM +0800, Pingfan Liu wrote: > > On Fri, Jun 07, 2019 at 02:10:15PM +0800, Pingfan Liu wrote: > > > On Fri, Jun 7, 2019 at 5:17 AM John Hubbard <jhubbard@nvidia.com> wrote: > > > > > > > > On 6/5/19 7:19 PM, Pingfan Liu wrote: > > > > > On Thu, Jun 6, 2019 at 5:49 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > ... > > > > >>> --- a/mm/gup.c > > > > >>> +++ b/mm/gup.c > > > > >>> @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, > > > > >>> return ret; > > > > >>> } > > > > >>> > > > > >>> +#ifdef CONFIG_CMA > > > > >>> +static inline int reject_cma_pages(int nr_pinned, struct page **pages) > > > > >>> +{ > > > > >>> + int i; > > > > >>> + > > > > >>> + for (i = 0; i < nr_pinned; i++) > > > > >>> + if (is_migrate_cma_page(pages[i])) { > > > > >>> + put_user_pages(pages + i, nr_pinned - i); > > > > >>> + return i; > > > > >>> + } > > > > >>> + > > > > >>> + return nr_pinned; > > > > >>> +} > > > > >> > > > > >> There's no point in inlining this. > > > > > OK, will drop it in V4. > > > > > > > > > >> > > > > >> The code seems inefficient. If it encounters a single CMA page it can > > > > >> end up discarding a possibly significant number of non-CMA pages. I > > > > > The trick is the page is not be discarded, in fact, they are still be > > > > > referrenced by pte. We just leave the slow path to pick up the non-CMA > > > > > pages again. > > > > > > > > > >> guess that doesn't matter much, as get_user_pages(FOLL_LONGTERM) is > > > > >> rare. But could we avoid this (and the second pass across pages[]) by > > > > >> checking for a CMA page within gup_pte_range()? > > > > > It will spread the same logic to hugetlb pte and normal pte. And no > > > > > improvement in performance due to slow path. So I think maybe it is > > > > > not worth. > > > > > > > > > >> > > > > > > > > I think the concern is: for the successful gup_fast case with no CMA > > > > pages, this patch is adding another complete loop through all the > > > > pages. In the fast case. > > > > > > > > If the check were instead done as part of the gup_pte_range(), then > > > > it would be a little more efficient for that case. > > > > > > > > As for whether it's worth it, *probably* this is too small an effect to measure. > > > > But in order to attempt a measurement: running fio (https://github.com/axboe/fio) > > > > with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file > > > > that Jan Kara and Tom Talpey helped me come up with, for related testing: > > > > > > > > [reader] > > > > direct=1 > > > > ioengine=libaio > > > > blocksize=4096 > > > > size=1g > > > > numjobs=1 > > > > rw=read > > > > iodepth=64 > > > > > > Unable to get a NVME device to have a test. And when testing fio on the > > tranditional disk, I got the error "fio: engine libaio not loadable > > fio: failed to load engine > > fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory" > > > > But I found a test case which can be slightly adjusted to met the aim. > > It is tools/testing/selftests/vm/gup_benchmark.c > > > > Test enviroment: > > MemTotal: 264079324 kB > > MemFree: 262306788 kB > > CmaTotal: 0 kB > > CmaFree: 0 kB > > on AMD EPYC 7601 > > > > Test command: > > gup_benchmark -r 100 -n 64 > > gup_benchmark -r 100 -n 64 -l > > where -r stands for repeat times, -n is nr_pages param for > > get_user_pages_fast(), -l is a new option to test FOLL_LONGTERM in fast > > path, see a patch at the tail. > > Thanks! That is a good test to add. You should add the patch to the series. OK. > > > > > Test result: > > w/o 477.800000 > > w/o-l 481.070000 > > a 481.800000 > > a-l 640.410000 > > b 466.240000 (question a: b outperforms w/o ?) > > b-l 529.740000 > > > > Where w/o is baseline without any patch using v5.2-rc2, a is this series, b > > does the check in gup_pte_range(). '-l' means FOLL_LONGTERM. > > > > I am suprised that b-l has about 17% improvement than a. (640.41 -529.74)/640.41 > > Wow that is bigger than I would have thought. I suspect it gets worse as -n > increases? Yes. I test with -n 64/128/256/512. It has this trend. See the data below. > > > > > As for "question a: b outperforms w/o ?", I can not figure out why, maybe it can be > > considered as variance. > > :-/ > > Does this change with larger -r or -n values? -r should have no effect on this. And I change -n 64/128/256/512. The data always shows b outperforms w/o a bit. 64 128 256 512 a-l 633.23 676.83 747.14 683.19 (n=256 should be disturbed by something, but the overall trend keeps going up) b-l 528.32 529.10 523.95 512.88 w/o 479.73 473.87 477.67 488.70 b 470.13 467.11 463.06 469.62 Thanks, Pingfan > > > > > Based on the above result, I think it is better to do the check inside > > gup_pte_range(). > > > > Any comment? > > I agree. > > Ira > > > > > Thanks, > > > > > > > Yeah, agreed. Data is more persuasive. Thanks for your suggestion. I > > > will try to bring out the result. > > > > > > Thanks, > > > Pingfan > > > > > > > > --- > > Patch to do check inside gup_pte_range() > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 2ce3091..ba213a0 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -1757,6 +1757,10 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > > VM_BUG_ON(!pfn_valid(pte_pfn(pte))); > > page = pte_page(pte); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) > > + goto pte_unmap; > > + > > head = try_get_compound_head(page, 1); > > if (!head) > > goto pte_unmap; > > @@ -1900,6 +1904,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > head = try_get_compound_head(pmd_page(orig), refs); > > if (!head) { > > *nr -= refs; > > @@ -1941,6 +1951,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > head = try_get_compound_head(pud_page(orig), refs); > > if (!head) { > > *nr -= refs; > > @@ -1978,6 +1994,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > head = try_get_compound_head(pgd_page(orig), refs); > > if (!head) { > > *nr -= refs; > > > --- > > Patch for testing > > > > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c > > index 7dd602d..61dec5f 100644 > > --- a/mm/gup_benchmark.c > > +++ b/mm/gup_benchmark.c > > @@ -6,8 +6,9 @@ > > #include <linux/debugfs.h> > > > > #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) > > -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > > -#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > > +#define GUP_FAST_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > > +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > > +#define GUP_BENCHMARK _IOWR('g', 4, struct gup_benchmark) > > > > struct gup_benchmark { > > __u64 get_delta_usec; > > @@ -53,6 +54,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd, > > nr = get_user_pages_fast(addr, nr, gup->flags & 1, > > pages + i); > > break; > > + case GUP_FAST_LONGTERM_BENCHMARK: > > + nr = get_user_pages_fast(addr, nr, > > + (gup->flags & 1) | FOLL_LONGTERM, > > + pages + i); > > + break; > > case GUP_LONGTERM_BENCHMARK: > > nr = get_user_pages(addr, nr, > > (gup->flags & 1) | FOLL_LONGTERM, > > @@ -96,6 +102,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd, > > > > switch (cmd) { > > case GUP_FAST_BENCHMARK: > > + case GUP_FAST_LONGTERM_BENCHMARK: > > case GUP_LONGTERM_BENCHMARK: > > case GUP_BENCHMARK: > > break; > > diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c > > index c0534e2..ade8acb 100644 > > --- a/tools/testing/selftests/vm/gup_benchmark.c > > +++ b/tools/testing/selftests/vm/gup_benchmark.c > > @@ -15,8 +15,9 @@ > > #define PAGE_SIZE sysconf(_SC_PAGESIZE) > > > > #define GUP_FAST_BENCHMARK _IOWR('g', 1, struct gup_benchmark) > > -#define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > > -#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > > +#define GUP_FAST_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark) > > +#define GUP_LONGTERM_BENCHMARK _IOWR('g', 3, struct gup_benchmark) > > +#define GUP_BENCHMARK _IOWR('g', 4, struct gup_benchmark) > > > > struct gup_benchmark { > > __u64 get_delta_usec; > > @@ -37,7 +38,7 @@ int main(int argc, char **argv) > > char *file = "/dev/zero"; > > char *p; > > > > - while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) { > > + while ((opt = getopt(argc, argv, "m:r:n:f:tTlLUSH")) != -1) { > > switch (opt) { > > case 'm': > > size = atoi(optarg) * MB; > > @@ -54,6 +55,9 @@ int main(int argc, char **argv) > > case 'T': > > thp = 0; > > break; > > + case 'l': > > + cmd = GUP_FAST_LONGTERM_BENCHMARK; > > + break; > > case 'L': > > cmd = GUP_LONGTERM_BENCHMARK; > > break; > > -- > > 2.7.5 > > >
On Wed, Jun 12, 2019 at 09:54:58PM +0800, Pingfan Liu wrote: > On Tue, Jun 11, 2019 at 04:29:11PM +0000, Weiny, Ira wrote: > > > Pingfan Liu <kernelfans@gmail.com> writes: > > > > > > > As for FOLL_LONGTERM, it is checked in the slow path > > > > __gup_longterm_unlocked(). But it is not checked in the fast path, > > > > which means a possible leak of CMA page to longterm pinned requirement > > > > through this crack. > > > > > > Shouldn't we disallow FOLL_LONGTERM with get_user_pages fastpath? W.r.t > > > dax check we need vma to ensure whether a long term pin is allowed or not. > > > If FOLL_LONGTERM is specified we should fallback to slow path. > > > > Yes, the fastpath bails to the slowpath if FOLL_LONGTERM _and_ DAX. But it does this while walking the page tables. I missed the CMA case and Pingfan's patch fixes this. We could check for CMA pages while walking the page tables but most agreed that it was not worth it. For DAX we already had checks for *_devmap() so it was easier to put the FOLL_LONGTERM checks there. > > > Then for CMA pages, are you suggesting something like: I'm not suggesting this. Sorry I wrote this prior to seeing the numbers in your other email. Given the numbers it looks like performing the check whilst walking the tables is worth the extra complexity. I was just trying to summarize the thread. I don't think we should disallow FOLL_LONGTERM because it only affects CMA and DAX. Other pages will be fine with FOLL_LONGTERM. Why penalize every call if we don't have to. Also in the case of DAX the use of vma will be going away...[1] Eventually... ;-) Ira [1] https://lkml.org/lkml/2019/6/5/1049 > diff --git a/mm/gup.c b/mm/gup.c > index 42a47c0..8bf3cc3 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2251,6 +2251,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > if (unlikely(!access_ok((void __user *)start, len))) > return -EFAULT; > > + if (unlikely(gup_flags & FOLL_LONGTERM)) > + goto slow; > if (gup_fast_permitted(start, nr_pages)) { > local_irq_disable(); > gup_pgd_range(addr, end, gup_flags, pages, &nr); > @@ -2258,6 +2260,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > ret = nr; > } > > +slow: > if (nr < nr_pages) { > /* Try to get the remaining pages with get_user_pages */ > start += nr << PAGE_SHIFT; > > Thanks, > Pingfan
On Thu, Jun 13, 2019 at 7:49 AM Ira Weiny <ira.weiny@intel.com> wrote: > > On Wed, Jun 12, 2019 at 09:54:58PM +0800, Pingfan Liu wrote: > > On Tue, Jun 11, 2019 at 04:29:11PM +0000, Weiny, Ira wrote: > > > > Pingfan Liu <kernelfans@gmail.com> writes: > > > > > > > > > As for FOLL_LONGTERM, it is checked in the slow path > > > > > __gup_longterm_unlocked(). But it is not checked in the fast path, > > > > > which means a possible leak of CMA page to longterm pinned requirement > > > > > through this crack. > > > > > > > > Shouldn't we disallow FOLL_LONGTERM with get_user_pages fastpath? W.r.t > > > > dax check we need vma to ensure whether a long term pin is allowed or not. > > > > If FOLL_LONGTERM is specified we should fallback to slow path. > > > > > > Yes, the fastpath bails to the slowpath if FOLL_LONGTERM _and_ DAX. But it does this while walking the page tables. I missed the CMA case and Pingfan's patch fixes this. We could check for CMA pages while walking the page tables but most agreed that it was not worth it. For DAX we already had checks for *_devmap() so it was easier to put the FOLL_LONGTERM checks there. > > > > > Then for CMA pages, are you suggesting something like: > > I'm not suggesting this. OK, then I send out v4. > > Sorry I wrote this prior to seeing the numbers in your other email. Given > the numbers it looks like performing the check whilst walking the tables is > worth the extra complexity. I was just trying to summarize the thread. I > don't think we should disallow FOLL_LONGTERM because it only affects CMA and > DAX. Other pages will be fine with FOLL_LONGTERM. Why penalize every call if > we don't have to. Also in the case of DAX the use of vma will be going > away...[1] Eventually... ;-) A good feature. Trying to catch up. Thanks, Pingfan
diff --git a/mm/gup.c b/mm/gup.c index f173fcb..0e59af9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2196,6 +2196,26 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages, return ret; } +#ifdef CONFIG_CMA +static inline int reject_cma_pages(int nr_pinned, struct page **pages) +{ + int i; + + for (i = 0; i < nr_pinned; i++) + if (is_migrate_cma_page(pages[i])) { + put_user_pages(pages + i, nr_pinned - i); + return i; + } + + return nr_pinned; +} +#else +static inline int reject_cma_pages(int nr_pinned, struct page **pages) +{ + return nr_pinned; +} +#endif + /** * get_user_pages_fast() - pin user pages in memory * @start: starting user address @@ -2236,6 +2256,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages, ret = nr; } + if (unlikely(gup_flags & FOLL_LONGTERM) && nr) + nr = reject_cma_pages(nr, pages); + if (nr < nr_pages) { /* Try to get the remaining pages with get_user_pages */ start += nr << PAGE_SHIFT;
As for FOLL_LONGTERM, it is checked in the slow path __gup_longterm_unlocked(). But it is not checked in the fast path, which means a possible leak of CMA page to longterm pinned requirement through this crack. Place a check in the fast path. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-kernel@vger.kernel.org --- mm/gup.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)