Message ID | 20200402020031.1611223-1-ying.huang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [-V2] /proc/PID/smaps: Add PMD migration entry parsing | expand |
On Thu 02-04-20 10:00:31, Huang, Ying wrote: > From: Huang Ying <ying.huang@intel.com> > > Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply > ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing > is added. > > Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP > pages under migration may be lost as follows. Interesting. How did you reproduce this? [...] > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 8d382d4ec067..9c72f9ce2dd8 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, > struct mem_size_stats *mss = walk->private; > struct vm_area_struct *vma = walk->vma; > bool locked = !!(vma->vm_flags & VM_LOCKED); > - struct page *page; > + struct page *page = NULL; > > - /* FOLL_DUMP will return -EFAULT on huge zero page */ > - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); > + if (pmd_present(*pmd)) { > + /* FOLL_DUMP will return -EFAULT on huge zero page */ > + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); > + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { > + swp_entry_t entry = pmd_to_swp_entry(*pmd); > + > + if (is_migration_entry(entry)) > + page = migration_entry_to_page(entry); > + else > + VM_WARN_ON_ONCE(1); Could you explain why do we need this WARN_ON? I haven't really checked the swap support for THP but cannot we have normal swap pmd entries? > + } > if (IS_ERR_OR_NULL(page)) > return; > if (PageAnon(page)) > @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, > > ptl = pmd_trans_huge_lock(pmd, vma); > if (ptl) { > - if (pmd_present(*pmd)) > - smaps_pmd_entry(pmd, addr, walk); > + smaps_pmd_entry(pmd, addr, walk); > spin_unlock(ptl); > goto out; > } > -- > 2.25.0
Michal Hocko <mhocko@kernel.org> writes: > On Thu 02-04-20 10:00:31, Huang, Ying wrote: >> From: Huang Ying <ying.huang@intel.com> >> >> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply >> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing >> is added. >> >> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP >> pages under migration may be lost as follows. > > Interesting. How did you reproduce this? > [...] I run the pmbench in background to eat memory, then run `/usr/bin/migratepages` and `cat /proc/PID/smaps` every second. The issue can be reproduced within 60 seconds. >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> index 8d382d4ec067..9c72f9ce2dd8 100644 >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, >> struct mem_size_stats *mss = walk->private; >> struct vm_area_struct *vma = walk->vma; >> bool locked = !!(vma->vm_flags & VM_LOCKED); >> - struct page *page; >> + struct page *page = NULL; >> >> - /* FOLL_DUMP will return -EFAULT on huge zero page */ >> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); >> + if (pmd_present(*pmd)) { >> + /* FOLL_DUMP will return -EFAULT on huge zero page */ >> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); >> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { >> + swp_entry_t entry = pmd_to_swp_entry(*pmd); >> + >> + if (is_migration_entry(entry)) >> + page = migration_entry_to_page(entry); >> + else >> + VM_WARN_ON_ONCE(1); > > Could you explain why do we need this WARN_ON? I haven't really checked > the swap support for THP but cannot we have normal swap pmd entries? I have some patches to add the swap pmd entry support, but they haven't been merged yet. Similar checks are for all THP migration code paths, so I follow the same style. Best Regards, Huang, Ying
On Thu 02-04-20 15:03:23, Huang, Ying wrote: > Michal Hocko <mhocko@kernel.org> writes: > > > On Thu 02-04-20 10:00:31, Huang, Ying wrote: > >> From: Huang Ying <ying.huang@intel.com> > >> > >> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply > >> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing > >> is added. > >> > >> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP > >> pages under migration may be lost as follows. > > > > Interesting. How did you reproduce this? > > [...] > > I run the pmbench in background to eat memory, then run > `/usr/bin/migratepages` and `cat /proc/PID/smaps` every second. The > issue can be reproduced within 60 seconds. Please add that information to the changelog. I was probably too optimistic about the migration duration because I found it highly unlikely to be visible. I was clearly wrong here. > >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > >> index 8d382d4ec067..9c72f9ce2dd8 100644 > >> --- a/fs/proc/task_mmu.c > >> +++ b/fs/proc/task_mmu.c > >> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, > >> struct mem_size_stats *mss = walk->private; > >> struct vm_area_struct *vma = walk->vma; > >> bool locked = !!(vma->vm_flags & VM_LOCKED); > >> - struct page *page; > >> + struct page *page = NULL; > >> > >> - /* FOLL_DUMP will return -EFAULT on huge zero page */ > >> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); > >> + if (pmd_present(*pmd)) { > >> + /* FOLL_DUMP will return -EFAULT on huge zero page */ > >> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); > >> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { > >> + swp_entry_t entry = pmd_to_swp_entry(*pmd); > >> + > >> + if (is_migration_entry(entry)) > >> + page = migration_entry_to_page(entry); > >> + else > >> + VM_WARN_ON_ONCE(1); > > > > Could you explain why do we need this WARN_ON? I haven't really checked > > the swap support for THP but cannot we have normal swap pmd entries? > > I have some patches to add the swap pmd entry support, but they haven't > been merged yet. > > Similar checks are for all THP migration code paths, so I follow the > same style. I haven't checked other migration code paths but what is the reason to add the warning here? Even if this shouldn't happen, smaps is perfectly fine to ignore that situation, no?
Michal Hocko <mhocko@kernel.org> writes: > On Thu 02-04-20 15:03:23, Huang, Ying wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >> > On Thu 02-04-20 10:00:31, Huang, Ying wrote: >> >> From: Huang Ying <ying.huang@intel.com> >> >> >> >> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply >> >> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing >> >> is added. >> >> >> >> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP >> >> pages under migration may be lost as follows. >> > >> > Interesting. How did you reproduce this? >> > [...] >> >> I run the pmbench in background to eat memory, then run >> `/usr/bin/migratepages` and `cat /proc/PID/smaps` every second. The >> issue can be reproduced within 60 seconds. > > Please add that information to the changelog. I was probably too > optimistic about the migration duration because I found it highly > unlikely to be visible. I was clearly wrong here. Sure. Will add that in the next version. >> >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >> >> index 8d382d4ec067..9c72f9ce2dd8 100644 >> >> --- a/fs/proc/task_mmu.c >> >> +++ b/fs/proc/task_mmu.c >> >> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, >> >> struct mem_size_stats *mss = walk->private; >> >> struct vm_area_struct *vma = walk->vma; >> >> bool locked = !!(vma->vm_flags & VM_LOCKED); >> >> - struct page *page; >> >> + struct page *page = NULL; >> >> >> >> - /* FOLL_DUMP will return -EFAULT on huge zero page */ >> >> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); >> >> + if (pmd_present(*pmd)) { >> >> + /* FOLL_DUMP will return -EFAULT on huge zero page */ >> >> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); >> >> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { >> >> + swp_entry_t entry = pmd_to_swp_entry(*pmd); >> >> + >> >> + if (is_migration_entry(entry)) >> >> + page = migration_entry_to_page(entry); >> >> + else >> >> + VM_WARN_ON_ONCE(1); >> > >> > Could you explain why do we need this WARN_ON? I haven't really checked >> > the swap support for THP but cannot we have normal swap pmd entries? >> >> I have some patches to add the swap pmd entry support, but they haven't >> been merged yet. >> >> Similar checks are for all THP migration code paths, so I follow the >> same style. > > I haven't checked other migration code paths but what is the reason to > add the warning here? Even if this shouldn't happen, smaps is perfectly > fine to ignore that situation, no? Yes. smaps itself is perfectly fine to ignore it. I think this is used to find bugs in other code paths such as THP migration related. Best Regards, Huang, Ying
On Thu 02-04-20 16:10:29, Huang, Ying wrote: > Michal Hocko <mhocko@kernel.org> writes: > > > On Thu 02-04-20 15:03:23, Huang, Ying wrote: [...] > >> > Could you explain why do we need this WARN_ON? I haven't really checked > >> > the swap support for THP but cannot we have normal swap pmd entries? > >> > >> I have some patches to add the swap pmd entry support, but they haven't > >> been merged yet. > >> > >> Similar checks are for all THP migration code paths, so I follow the > >> same style. > > > > I haven't checked other migration code paths but what is the reason to > > add the warning here? Even if this shouldn't happen, smaps is perfectly > > fine to ignore that situation, no? > > Yes. smaps itself is perfectly fine to ignore it. I think this is used > to find bugs in other code paths such as THP migration related. Please do not add new warnings without a good an strong reasons. As a matter of fact there are people running with panic_on_warn and each warning is fatal for them. Please also note that this is a user trigable path and that requires even more care.
Michal Hocko <mhocko@kernel.org> writes: > On Thu 02-04-20 16:10:29, Huang, Ying wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >> > On Thu 02-04-20 15:03:23, Huang, Ying wrote: > [...] >> >> > Could you explain why do we need this WARN_ON? I haven't really checked >> >> > the swap support for THP but cannot we have normal swap pmd entries? >> >> >> >> I have some patches to add the swap pmd entry support, but they haven't >> >> been merged yet. >> >> >> >> Similar checks are for all THP migration code paths, so I follow the >> >> same style. >> > >> > I haven't checked other migration code paths but what is the reason to >> > add the warning here? Even if this shouldn't happen, smaps is perfectly >> > fine to ignore that situation, no? >> >> Yes. smaps itself is perfectly fine to ignore it. I think this is used >> to find bugs in other code paths such as THP migration related. > > Please do not add new warnings without a good an strong reasons. As a > matter of fact there are people running with panic_on_warn and each > warning is fatal for them. Please also note that this is a user trigable > path and that requires even more care. OK for me. Best Regards, Huang, Ying
On 02/04/2020 11.21, Michal Hocko wrote: > On Thu 02-04-20 16:10:29, Huang, Ying wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >>> On Thu 02-04-20 15:03:23, Huang, Ying wrote: > [...] >>>>> Could you explain why do we need this WARN_ON? I haven't really checked >>>>> the swap support for THP but cannot we have normal swap pmd entries? >>>> >>>> I have some patches to add the swap pmd entry support, but they haven't >>>> been merged yet. >>>> >>>> Similar checks are for all THP migration code paths, so I follow the >>>> same style. >>> >>> I haven't checked other migration code paths but what is the reason to >>> add the warning here? Even if this shouldn't happen, smaps is perfectly >>> fine to ignore that situation, no? >> >> Yes. smaps itself is perfectly fine to ignore it. I think this is used >> to find bugs in other code paths such as THP migration related. > > Please do not add new warnings without a good an strong reasons. As a > matter of fact there are people running with panic_on_warn and each > warning is fatal for them. Please also note that this is a user trigable > path and that requires even more care. > But this should not happen and if it does we'll never know without debug. VM_WARN_ON checks something only if build with CONFIG_DEBUG_VM=y. Anybody who runs debug kernels with panic_on_warn shouldn't expect much stability =)
On Thu 02-04-20 11:29:09, Konstantin Khlebnikov wrote: > > > On 02/04/2020 11.21, Michal Hocko wrote: > > On Thu 02-04-20 16:10:29, Huang, Ying wrote: > > > Michal Hocko <mhocko@kernel.org> writes: > > > > > > > On Thu 02-04-20 15:03:23, Huang, Ying wrote: > > [...] > > > > > > Could you explain why do we need this WARN_ON? I haven't really checked > > > > > > the swap support for THP but cannot we have normal swap pmd entries? > > > > > > > > > > I have some patches to add the swap pmd entry support, but they haven't > > > > > been merged yet. > > > > > > > > > > Similar checks are for all THP migration code paths, so I follow the > > > > > same style. > > > > > > > > I haven't checked other migration code paths but what is the reason to > > > > add the warning here? Even if this shouldn't happen, smaps is perfectly > > > > fine to ignore that situation, no? > > > > > > Yes. smaps itself is perfectly fine to ignore it. I think this is used > > > to find bugs in other code paths such as THP migration related. > > > > Please do not add new warnings without a good an strong reasons. As a > > matter of fact there are people running with panic_on_warn and each > > warning is fatal for them. Please also note that this is a user trigable > > path and that requires even more care. > > > > But this should not happen and if it does we'll never know without debug. The migration path which already deals with this will notice, right? Those are paths which really care about consistency. > VM_WARN_ON checks something only if build with CONFIG_DEBUG_VM=y. > > Anybody who runs debug kernels with panic_on_warn shouldn't expect much stability =) That doesn't mean we should be adding warnings here and there nilly willy.
On Thu, Apr 02, 2020 at 10:00:31AM +0800, Huang, Ying wrote: > From: Huang Ying <ying.huang@intel.com> > > Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply > ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing > is added. > > Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP > pages under migration may be lost as follows. > > 7f3f6a7e5000-7f3f837e5000 rw-p 00000000 00:00 0 > Size: 409600 kB > KernelPageSize: 4 kB > MMUPageSize: 4 kB > Rss: 407552 kB > Pss: 407552 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 0 kB > Private_Dirty: 407552 kB > Referenced: 301056 kB > Anonymous: 407552 kB > LazyFree: 0 kB > AnonHugePages: 405504 kB > ShmemPmdMapped: 0 kB > FilePmdMapped: 0 kB The alignment makes me triggered. Andrew, could you please apply this patch: http://lore.kernel.org/r/20191230084125.267040-1-samuel.williams@oriontransfer.co.nz
On 4/2/20 12:44 AM, Michal Hocko wrote: > On Thu 02-04-20 15:03:23, Huang, Ying wrote: >> Michal Hocko <mhocko@kernel.org> writes: >> >>> On Thu 02-04-20 10:00:31, Huang, Ying wrote: >>>> From: Huang Ying <ying.huang@intel.com> >>>> >>>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply >>>> ignored. To improve the accuracy of /proc/PID/smaps, its parsing and processing >>>> is added. >>>> >>>> Before the patch, for a fully populated 400 MB anonymous VMA, sometimes some THP >>>> pages under migration may be lost as follows. >>> Interesting. How did you reproduce this? >>> [...] >> I run the pmbench in background to eat memory, then run >> `/usr/bin/migratepages` and `cat /proc/PID/smaps` every second. The >> issue can be reproduced within 60 seconds. > Please add that information to the changelog. I was probably too > optimistic about the migration duration because I found it highly > unlikely to be visible. I was clearly wrong here. I believe that depends on the page is shared by how many processes. If it is not shared then it should just take dozens micro seconds in my test FYI. > >>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c >>>> index 8d382d4ec067..9c72f9ce2dd8 100644 >>>> --- a/fs/proc/task_mmu.c >>>> +++ b/fs/proc/task_mmu.c >>>> @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, >>>> struct mem_size_stats *mss = walk->private; >>>> struct vm_area_struct *vma = walk->vma; >>>> bool locked = !!(vma->vm_flags & VM_LOCKED); >>>> - struct page *page; >>>> + struct page *page = NULL; >>>> >>>> - /* FOLL_DUMP will return -EFAULT on huge zero page */ >>>> - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); >>>> + if (pmd_present(*pmd)) { >>>> + /* FOLL_DUMP will return -EFAULT on huge zero page */ >>>> + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); >>>> + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { >>>> + swp_entry_t entry = pmd_to_swp_entry(*pmd); >>>> + >>>> + if (is_migration_entry(entry)) >>>> + page = migration_entry_to_page(entry); >>>> + else >>>> + VM_WARN_ON_ONCE(1); >>> Could you explain why do we need this WARN_ON? I haven't really checked >>> the swap support for THP but cannot we have normal swap pmd entries? >> I have some patches to add the swap pmd entry support, but they haven't >> been merged yet. >> >> Similar checks are for all THP migration code paths, so I follow the >> same style. > I haven't checked other migration code paths but what is the reason to > add the warning here? Even if this shouldn't happen, smaps is perfectly > fine to ignore that situation, no?
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8d382d4ec067..9c72f9ce2dd8 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -546,10 +546,19 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, struct mem_size_stats *mss = walk->private; struct vm_area_struct *vma = walk->vma; bool locked = !!(vma->vm_flags & VM_LOCKED); - struct page *page; + struct page *page = NULL; - /* FOLL_DUMP will return -EFAULT on huge zero page */ - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); + if (pmd_present(*pmd)) { + /* FOLL_DUMP will return -EFAULT on huge zero page */ + page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); + } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { + swp_entry_t entry = pmd_to_swp_entry(*pmd); + + if (is_migration_entry(entry)) + page = migration_entry_to_page(entry); + else + VM_WARN_ON_ONCE(1); + } if (IS_ERR_OR_NULL(page)) return; if (PageAnon(page)) @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { - if (pmd_present(*pmd)) - smaps_pmd_entry(pmd, addr, walk); + smaps_pmd_entry(pmd, addr, walk); spin_unlock(ptl); goto out; }