diff mbox series

/proc/PID/smaps: Add PMD migration entry parsing

Message ID 20200331085604.1260162-1-ying.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series /proc/PID/smaps: Add PMD migration entry parsing | expand

Commit Message

Huang, Ying March 31, 2020, 8:56 a.m. UTC
From: Huang Ying <ying.huang@intel.com>

Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
is added.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
---
 fs/proc/task_mmu.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Comments

Konstantin Khlebnikov March 31, 2020, 9:51 a.m. UTC | #1
On 31/03/2020 11.56, Huang, Ying wrote:
> From: Huang Ying <ying.huang@intel.com>
> 
> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
> is added.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>   fs/proc/task_mmu.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 8d382d4ec067..b5b3aef8cb3b 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>   	bool locked = !!(vma->vm_flags & VM_LOCKED);
>   	struct page *page;

         struct page *page = NULL;

>   
> -	/* FOLL_DUMP will return -EFAULT on huge zero page */
> -	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> +	if (pmd_present(*pmd)) {
> +		/* FOLL_DUMP will return -EFAULT on huge zero page */
> +		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> +	} else if (unlikely(is_swap_pmd(*pmd))) {
> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
> +
> +		VM_BUG_ON(!is_migration_entry(entry));
> +		page = migration_entry_to_page(entry);

                 if (is_migration_entry(entry))
                         page = migration_entry_to_page(entry);

Seems safer and doesn't add much code.

> +	} else {
> +		return;
> +	}
>   	if (IS_ERR_OR_NULL(page))
>   		return;
>   	if (PageAnon(page))
> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>   
>   	ptl = pmd_trans_huge_lock(pmd, vma);
>   	if (ptl) {
> -		if (pmd_present(*pmd))
> -			smaps_pmd_entry(pmd, addr, walk);
> +		smaps_pmd_entry(pmd, addr, walk);
>   		spin_unlock(ptl);
>   		goto out;
>   	}
>
Zi Yan March 31, 2020, 12:24 p.m. UTC | #2
On 31 Mar 2020, at 4:56, Huang, Ying wrote:
>
> From: Huang Ying <ying.huang@intel.com>
>
> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
> is added.
>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  fs/proc/task_mmu.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 8d382d4ec067..b5b3aef8cb3b 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>         bool locked = !!(vma->vm_flags & VM_LOCKED);
>         struct page *page;

Like Konstantin pointed out in another email, you could initialize page to NULL here.
Plus you do not need the “else-return” below, if you do that.

>
> -       /* FOLL_DUMP will return -EFAULT on huge zero page */
> -       page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> +       if (pmd_present(*pmd)) {
> +               /* FOLL_DUMP will return -EFAULT on huge zero page */
> +               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> +       } else if (unlikely(is_swap_pmd(*pmd))) {

Should be:
          } else if (unlikely(thp_migration_support() && is_swap_pmd(*pmd))) {

Otherwise, when THP migration is disabled and the PMD is under splitting, VM_BUG_ON
will be triggered.

> +               swp_entry_t entry = pmd_to_swp_entry(*pmd);
> +
> +               VM_BUG_ON(!is_migration_entry(entry));
> +               page = migration_entry_to_page(entry);
> +       } else {
> +               return;
> +       }
>         if (IS_ERR_OR_NULL(page))
>                 return;
>         if (PageAnon(page))
> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>
>         ptl = pmd_trans_huge_lock(pmd, vma);
>         if (ptl) {
> -               if (pmd_present(*pmd))
> -                       smaps_pmd_entry(pmd, addr, walk);
> +               smaps_pmd_entry(pmd, addr, walk);
>                 spin_unlock(ptl);
>                 goto out;
>         }
> --
> 2.25.0

Everything else looks good to me. Thanks.

With the fixes mentioned above, you can add
Reviewed-by: Zi Yan <ziy@nvidia.com>


—
Best Regards,
Yan Zi
Huang, Ying April 1, 2020, 2:24 a.m. UTC | #3
Zi Yan <ziy@nvidia.com> writes:

> On 31 Mar 2020, at 4:56, Huang, Ying wrote:
>>
>> From: Huang Ying <ying.huang@intel.com>
>>
>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
>> is added.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>> Cc: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>>  fs/proc/task_mmu.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 8d382d4ec067..b5b3aef8cb3b 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>         bool locked = !!(vma->vm_flags & VM_LOCKED);
>>         struct page *page;
>
> Like Konstantin pointed out in another email, you could initialize page to NULL here.
> Plus you do not need the “else-return” below, if you do that.

Yes.  That looks better.  Will change this in the next version.

>>
>> -       /* FOLL_DUMP will return -EFAULT on huge zero page */
>> -       page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +       if (pmd_present(*pmd)) {
>> +               /* FOLL_DUMP will return -EFAULT on huge zero page */
>> +               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +       } else if (unlikely(is_swap_pmd(*pmd))) {
>
> Should be:
>           } else if (unlikely(thp_migration_support() && is_swap_pmd(*pmd))) {
>
> Otherwise, when THP migration is disabled and the PMD is under splitting, VM_BUG_ON
> will be triggered.

We hold the PMD page table lock when call smaps_pmd_entry().  How does
PMD splitting trigger VM_BUG_ON()?

>> +               swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> +
>> +               VM_BUG_ON(!is_migration_entry(entry));
>> +               page = migration_entry_to_page(entry);
>> +       } else {
>> +               return;
>> +       }
>>         if (IS_ERR_OR_NULL(page))
>>                 return;
>>         if (PageAnon(page))
>> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>
>>         ptl = pmd_trans_huge_lock(pmd, vma);
>>         if (ptl) {
>> -               if (pmd_present(*pmd))
>> -                       smaps_pmd_entry(pmd, addr, walk);
>> +               smaps_pmd_entry(pmd, addr, walk);
>>                 spin_unlock(ptl);
>>                 goto out;
>>         }
>> --
>> 2.25.0
>
> Everything else looks good to me. Thanks.
>
> With the fixes mentioned above, you can add
> Reviewed-by: Zi Yan <ziy@nvidia.com>

Thanks!

Best Regards,
Huang, Ying
Huang, Ying April 1, 2020, 2:31 a.m. UTC | #4
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:

> On 31/03/2020 11.56, Huang, Ying wrote:
>> From: Huang Ying <ying.huang@intel.com>
>>
>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
>> is added.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>> Cc: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>>   fs/proc/task_mmu.c | 16 ++++++++++++----
>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 8d382d4ec067..b5b3aef8cb3b 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>   	bool locked = !!(vma->vm_flags & VM_LOCKED);
>>   	struct page *page;
>
>         struct page *page = NULL;

Looks good.  Will do this in the next version.

>>   -	/* FOLL_DUMP will return -EFAULT on huge zero page */
>> -	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +	if (pmd_present(*pmd)) {
>> +		/* FOLL_DUMP will return -EFAULT on huge zero page */
>> +		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +	} else if (unlikely(is_swap_pmd(*pmd))) {
>> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> +
>> +		VM_BUG_ON(!is_migration_entry(entry));
>> +		page = migration_entry_to_page(entry);
>
>                 if (is_migration_entry(entry))
>                         page = migration_entry_to_page(entry);
>
> Seems safer and doesn't add much code.

With this, we lose an opportunity to capture some bugs during debugging.
Right?

Best Regards,
Huang, Ying

>> +	} else {
>> +		return;
>> +	}
>>   	if (IS_ERR_OR_NULL(page))
>>   		return;
>>   	if (PageAnon(page))
>> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>     	ptl = pmd_trans_huge_lock(pmd, vma);
>>   	if (ptl) {
>> -		if (pmd_present(*pmd))
>> -			smaps_pmd_entry(pmd, addr, walk);
>> +		smaps_pmd_entry(pmd, addr, walk);
>>   		spin_unlock(ptl);
>>   		goto out;
>>   	}
>>
Zi Yan April 1, 2020, 2:42 a.m. UTC | #5
On 31 Mar 2020, at 22:24, Huang, Ying wrote:

> External email: Use caution opening links or attachments
>
>
> Zi Yan <ziy@nvidia.com> writes:
>
>> On 31 Mar 2020, at 4:56, Huang, Ying wrote:
>>>
>>> From: Huang Ying <ying.huang@intel.com>
>>>
>>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>>> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
>>> is added.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>>> Cc: Yang Shi <yang.shi@linux.alibaba.com>
>>> ---
>>>  fs/proc/task_mmu.c | 16 ++++++++++++----
>>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index 8d382d4ec067..b5b3aef8cb3b 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>>         bool locked = !!(vma->vm_flags & VM_LOCKED);
>>>         struct page *page;
>>
>> Like Konstantin pointed out in another email, you could initialize page to NULL here.
>> Plus you do not need the “else-return” below, if you do that.
>
> Yes.  That looks better.  Will change this in the next version.

Thanks.

>
>>>
>>> -       /* FOLL_DUMP will return -EFAULT on huge zero page */
>>> -       page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>> +       if (pmd_present(*pmd)) {
>>> +               /* FOLL_DUMP will return -EFAULT on huge zero page */
>>> +               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>> +       } else if (unlikely(is_swap_pmd(*pmd))) {
>>
>> Should be:
>>           } else if (unlikely(thp_migration_support() && is_swap_pmd(*pmd))) {
>>
>> Otherwise, when THP migration is disabled and the PMD is under splitting, VM_BUG_ON
>> will be triggered.
>
> We hold the PMD page table lock when call smaps_pmd_entry().  How does
> PMD splitting trigger VM_BUG_ON()?

Oh, I missed that. Your original code is right. Thank you for the explanation.



—
Best Regards,
Yan Zi
Konstantin Khlebnikov April 1, 2020, 6:03 a.m. UTC | #6
On 01/04/2020 05.31, Huang, Ying wrote:
> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> 
>> On 31/03/2020 11.56, Huang, Ying wrote:
>>> From: Huang Ying <ying.huang@intel.com>
>>>
>>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>>> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
>>> is added.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>>> Cc: Yang Shi <yang.shi@linux.alibaba.com>
>>> ---
>>>    fs/proc/task_mmu.c | 16 ++++++++++++----
>>>    1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index 8d382d4ec067..b5b3aef8cb3b 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>>    	bool locked = !!(vma->vm_flags & VM_LOCKED);
>>>    	struct page *page;
>>
>>          struct page *page = NULL;
> 
> Looks good.  Will do this in the next version.
> 
>>>    -	/* FOLL_DUMP will return -EFAULT on huge zero page */
>>> -	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>> +	if (pmd_present(*pmd)) {
>>> +		/* FOLL_DUMP will return -EFAULT on huge zero page */
>>> +		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>> +	} else if (unlikely(is_swap_pmd(*pmd))) {
>>> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>> +
>>> +		VM_BUG_ON(!is_migration_entry(entry));
>>> +		page = migration_entry_to_page(entry);
>>
>>                  if (is_migration_entry(entry))
>>                          page = migration_entry_to_page(entry);
>>
>> Seems safer and doesn't add much code.
> 
> With this, we lose an opportunity to capture some bugs during debugging.
> Right?

You can keep VM_BUG_ON or VM_WARN_ON_ONCE

Off-by-page in statistics isn't a big deal and not a good reason to crash (even debug) kernel.
But for normal build should use safe behaviour if this isn't hard.

> 
> Best Regards,
> Huang, Ying
> 
>>> +	} else {
>>> +		return;
>>> +	}
>>>    	if (IS_ERR_OR_NULL(page))
>>>    		return;
>>>    	if (PageAnon(page))
>>> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>>      	ptl = pmd_trans_huge_lock(pmd, vma);
>>>    	if (ptl) {
>>> -		if (pmd_present(*pmd))
>>> -			smaps_pmd_entry(pmd, addr, walk);
>>> +		smaps_pmd_entry(pmd, addr, walk);
>>>    		spin_unlock(ptl);
>>>    		goto out;
>>>    	}
>>>
Huang, Ying April 1, 2020, 6:20 a.m. UTC | #7
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:

> On 01/04/2020 05.31, Huang, Ying wrote:
>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>
>>> On 31/03/2020 11.56, Huang, Ying wrote:
>>>> From: Huang Ying <ying.huang@intel.com>
>>>>
>>>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>>>> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
>>>> is added.
>>>>
>>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>>>> Cc: Yang Shi <yang.shi@linux.alibaba.com>
>>>> ---
>>>>    fs/proc/task_mmu.c | 16 ++++++++++++----
>>>>    1 file changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>>> index 8d382d4ec067..b5b3aef8cb3b 100644
>>>> --- a/fs/proc/task_mmu.c
>>>> +++ b/fs/proc/task_mmu.c
>>>> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>>>    	bool locked = !!(vma->vm_flags & VM_LOCKED);
>>>>    	struct page *page;
>>>
>>>          struct page *page = NULL;
>>
>> Looks good.  Will do this in the next version.
>>
>>>>    -	/* FOLL_DUMP will return -EFAULT on huge zero page */
>>>> -	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>>> +	if (pmd_present(*pmd)) {
>>>> +		/* FOLL_DUMP will return -EFAULT on huge zero page */
>>>> +		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>>>> +	} else if (unlikely(is_swap_pmd(*pmd))) {
>>>> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>>>> +
>>>> +		VM_BUG_ON(!is_migration_entry(entry));
>>>> +		page = migration_entry_to_page(entry);
>>>
>>>                  if (is_migration_entry(entry))
>>>                          page = migration_entry_to_page(entry);
>>>
>>> Seems safer and doesn't add much code.
>>
>> With this, we lose an opportunity to capture some bugs during debugging.
>> Right?
>
> You can keep VM_BUG_ON or VM_WARN_ON_ONCE
>
> Off-by-page in statistics isn't a big deal and not a good reason to crash (even debug) kernel.
> But for normal build should use safe behaviour if this isn't hard.

Sounds reasonable!  Will revise the code.  Thanks!

Best Regards,
Huang, Ying

>>
>> Best Regards,
>> Huang, Ying
>>
>>>> +	} else {
>>>> +		return;
>>>> +	}
>>>>    	if (IS_ERR_OR_NULL(page))
>>>>    		return;
>>>>    	if (PageAnon(page))
>>>> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>>>      	ptl = pmd_trans_huge_lock(pmd, vma);
>>>>    	if (ptl) {
>>>> -		if (pmd_present(*pmd))
>>>> -			smaps_pmd_entry(pmd, addr, walk);
>>>> +		smaps_pmd_entry(pmd, addr, walk);
>>>>    		spin_unlock(ptl);
>>>>    		goto out;
>>>>    	}
>>>>
Andrew Morton April 1, 2020, 11:04 p.m. UTC | #8
On Tue, 31 Mar 2020 16:56:04 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

> From: Huang Ying <ying.huang@intel.com>
> 
> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
> is added.

It would be helpful to show the before-and-after output in the changelog.

> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>  	bool locked = !!(vma->vm_flags & VM_LOCKED);
>  	struct page *page;
>  
> -	/* FOLL_DUMP will return -EFAULT on huge zero page */
> -	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> +	if (pmd_present(*pmd)) {
> +		/* FOLL_DUMP will return -EFAULT on huge zero page */
> +		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
> +	} else if (unlikely(is_swap_pmd(*pmd))) {
> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
> +
> +		VM_BUG_ON(!is_migration_entry(entry));

I don't think this justifies nuking the kernel.  A
WARN()-and-recover would be better.

> +		page = migration_entry_to_page(entry);
> +	} else {
> +		return;
> +	}
>  	if (IS_ERR_OR_NULL(page))
>  		return;
>  	if (PageAnon(page))
> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> -		if (pmd_present(*pmd))
> -			smaps_pmd_entry(pmd, addr, walk);
> +		smaps_pmd_entry(pmd, addr, walk);
>  		spin_unlock(ptl);
>  		goto out;
>  	}
Huang, Ying April 2, 2020, 1:42 a.m. UTC | #9
Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 31 Mar 2020 16:56:04 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>
>> From: Huang Ying <ying.huang@intel.com>
>> 
>> Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
>> ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
>> is added.
>
> It would be helpful to show the before-and-after output in the changelog.

OK.

>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -548,8 +548,17 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
>>  	bool locked = !!(vma->vm_flags & VM_LOCKED);
>>  	struct page *page;
>>  
>> -	/* FOLL_DUMP will return -EFAULT on huge zero page */
>> -	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +	if (pmd_present(*pmd)) {
>> +		/* FOLL_DUMP will return -EFAULT on huge zero page */
>> +		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +	} else if (unlikely(is_swap_pmd(*pmd))) {
>> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> +
>> +		VM_BUG_ON(!is_migration_entry(entry));
>
> I don't think this justifies nuking the kernel.  A
> WARN()-and-recover would be better.

Yes.  Will change this in the next version.

Best Regards,
Huang, Ying

>> +		page = migration_entry_to_page(entry);
>> +	} else {
>> +		return;
>> +	}
>>  	if (IS_ERR_OR_NULL(page))
>>  		return;
>>  	if (PageAnon(page))
>> @@ -578,8 +587,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>  
>>  	ptl = pmd_trans_huge_lock(pmd, vma);
>>  	if (ptl) {
>> -		if (pmd_present(*pmd))
>> -			smaps_pmd_entry(pmd, addr, walk);
>> +		smaps_pmd_entry(pmd, addr, walk);
>>  		spin_unlock(ptl);
>>  		goto out;
>>  	}
Huang, Ying April 2, 2020, 1:49 a.m. UTC | #10
Zi Yan <ziy@nvidia.com> writes:

> On 31 Mar 2020, at 4:56, Huang, Ying wrote:
>>
>> From: Huang Ying <ying.huang@intel.com>
>> -       /* FOLL_DUMP will return -EFAULT on huge zero page */
>> -       page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +       if (pmd_present(*pmd)) {
>> +               /* FOLL_DUMP will return -EFAULT on huge zero page */
>> +               page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
>> +       } else if (unlikely(is_swap_pmd(*pmd))) {
>
> Should be:
>           } else if (unlikely(thp_migration_support() && is_swap_pmd(*pmd))) {

Thought this again.  This can reduce the code size if
thp_migration_support() isn't enabled.  I will do this in the next
version.

Best Regards,
Huang, Ying
Michal Hocko April 2, 2020, 6:27 a.m. UTC | #11
On Wed 01-04-20 16:04:32, Andrew Morton wrote:
> On Tue, 31 Mar 2020 16:56:04 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
> 
> > From: Huang Ying <ying.huang@intel.com>
> > 
> > Now, when read /proc/PID/smaps, the PMD migration entry in page table is simply
> > ignored.  To improve the accuracy of /proc/PID/smaps, its parsing and processing
> > is added.
> 
> It would be helpful to show the before-and-after output in the changelog.

Migration entries are ephemeral. Is this observable in practice? I
suspect this is just primarily motivated by reading the code more than
hitting the actual problem.
diff mbox series

Patch

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8d382d4ec067..b5b3aef8cb3b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -548,8 +548,17 @@  static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 	bool locked = !!(vma->vm_flags & VM_LOCKED);
 	struct page *page;
 
-	/* FOLL_DUMP will return -EFAULT on huge zero page */
-	page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
+	if (pmd_present(*pmd)) {
+		/* FOLL_DUMP will return -EFAULT on huge zero page */
+		page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP);
+	} else if (unlikely(is_swap_pmd(*pmd))) {
+		swp_entry_t entry = pmd_to_swp_entry(*pmd);
+
+		VM_BUG_ON(!is_migration_entry(entry));
+		page = migration_entry_to_page(entry);
+	} else {
+		return;
+	}
 	if (IS_ERR_OR_NULL(page))
 		return;
 	if (PageAnon(page))
@@ -578,8 +587,7 @@  static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
-		if (pmd_present(*pmd))
-			smaps_pmd_entry(pmd, addr, walk);
+		smaps_pmd_entry(pmd, addr, walk);
 		spin_unlock(ptl);
 		goto out;
 	}