Message ID | 1584885427-4952-1-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/rmap: fix the handling of device private page in try_to_unmap_one() | expand |
On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > For zone_device, migration can only happen on is_device_private_page(page). > Correct the logic in try_to_unmap_one(). Maybe it is just me lacking knowledge in the zone_device ZOO. But this really deserves a much more detailed explanation IMHO. It seems a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") deliberately made the decision to allow unmapping these pages? Is the check just wrong, inncomplete? Why? What is the real user visible problem here? > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > To: linux-mm@kvack.org > --- > mm/rmap.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index b838647..ffadf3e 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > is_zone_device_page(page) && !is_device_private_page(page)) > - return true; > + return false; > > if (flags & TTU_SPLIT_HUGE_PMD) { > split_huge_pmd_address(vma, address, > @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > > if (IS_ENABLED(CONFIG_MIGRATION) && > (flags & TTU_MIGRATION) && > - is_zone_device_page(page)) { > + is_device_private_page(page)) { > swp_entry_t entry; > pte_t swp_pte; > > -- > 2.7.5
On 3/23/20 12:34 AM, Michal Hocko wrote: > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: >> For zone_device, migration can only happen on is_device_private_page(page). >> Correct the logic in try_to_unmap_one(). > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > this really deserves a much more detailed explanation IMHO. It seems > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > migration") deliberately made the decision to allow unmapping these > pages? Is the check just wrong, inncomplete? Why? > > What is the real user visible problem here? > I was going to guess that someone was having trouble with the behavior on non-device-private ZONE_DEVICE pages, but...I am also at a loss as to what triggered this patch. So, I have the exact same questions as Michal, plus really wondering what tests you are running, and on what hardware config. It's hard to get the right CC's, but probably Dan Williams and Ralph Campbell should also be added, if you are sure you need this. (I'm adding them now.) thanks,
On 23/3/20 12:57 am, Pingfan Liu wrote: > For zone_device, migration can only happen on is_device_private_page(page). > Correct the logic in try_to_unmap_one(). !DEVICE_PRIVATE implies it's a cache coherent page and we shouldn't bail out - Could you clarify what issue your fixing? Balbir Singh.
On 3/23/20 12:34 AM, Michal Hocko wrote: > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: >> For zone_device, migration can only happen on is_device_private_page(page). >> Correct the logic in try_to_unmap_one(). > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > this really deserves a much more detailed explanation IMHO. It seems > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > migration") deliberately made the decision to allow unmapping these > pages? Is the check just wrong, inncomplete? Why? > > What is the real user visible problem here? > >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> >> Cc: Jérôme Glisse <jglisse@redhat.com> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> To: linux-mm@kvack.org >> --- >> mm/rmap.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index b838647..ffadf3e 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, >> >> if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && >> is_zone_device_page(page) && !is_device_private_page(page)) >> - return true; >> + return false; Pages can be mapped in multiple vmas. Returning false here will only break out of the loop and skip any other vmas mapping this page which is a minor optimization but shouldn't really affect what try_to_unmap_one() does. >> if (flags & TTU_SPLIT_HUGE_PMD) { >> split_huge_pmd_address(vma, address, >> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, >> >> if (IS_ENABLED(CONFIG_MIGRATION) && >> (flags & TTU_MIGRATION) && >> - is_zone_device_page(page)) { >> + is_device_private_page(page)) { >> swp_entry_t entry; >> pte_t swp_pte; Since the page was checked for !device private, this is more clear but shouldn't change anything.
On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > > For zone_device, migration can only happen on is_device_private_page(page). > > Correct the logic in try_to_unmap_one(). > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > this really deserves a much more detailed explanation IMHO. It seems > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > migration") deliberately made the decision to allow unmapping these > pages? Is the check just wrong, inncomplete? Why? I am not quite sure about zone_device, but I will try to explain it later. But first of all, I think the code conflicts with the logic behind it. If try_to_unmap_one() success to unmap a page, then it should kill the pte, and return true. But the original code return true before the code like "ptep_clear_flush()" Now, I try to say about !device_private zone device. (Please pardon and correct me if I make a mistake) memmap_init_zone_device() raises an extra _refcount on all zone device. And private-device should lifts the count later, otherwise it can not migrate. But I did not find the exact place yet. While this extra _refcount will block migration, it is not the whole reason if a zone device page is mapped. If a zone device page is mapped, then I think the original code happen to work due to it skip the call of page_remove_rmap(), and in try_to_unmap(){ return !page_mapcount(page) ? true : false;}. > > What is the real user visible problem here? As explained, the original code happens to work, but it conflicts with the logic. Thanks, Pingfan
On Tue, Mar 24, 2020 at 7:32 AM John Hubbard <jhubbard@nvidia.com> wrote: > > On 3/23/20 12:34 AM, Michal Hocko wrote: > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > >> For zone_device, migration can only happen on is_device_private_page(page). > >> Correct the logic in try_to_unmap_one(). > > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > > this really deserves a much more detailed explanation IMHO. It seems > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > > migration") deliberately made the decision to allow unmapping these > > pages? Is the check just wrong, inncomplete? Why? > > > > What is the real user visible problem here? > > > > I was going to guess that someone was having trouble with the behavior on > non-device-private ZONE_DEVICE pages, but...I am also at a loss as to > what triggered this patch. So, I have the exact same questions as Michal, > plus really wondering what tests you are running, and on what hardware config. :), please see my reply to Michal. > > It's hard to get the right CC's, but probably Dan Williams and Ralph Campbell > should also be added, if you are sure you need this. (I'm adding them now.) Appreciate you to do that. In fact, I am not sure about some detail, and hope someone can correct me if I make mistake. Thanks, Pingfan
On Tue, Mar 24, 2020 at 8:04 AM Balbir Singh <bsingharora@gmail.com> wrote: > > On 23/3/20 12:57 am, Pingfan Liu wrote: > > For zone_device, migration can only happen on is_device_private_page(page). > > Correct the logic in try_to_unmap_one(). > > !DEVICE_PRIVATE implies it's a cache coherent page and we shouldn't bail > out - Could you clarify what issue your fixing? As I replied to Michal, I tried to fix the code logic, but not really hit a bug. Thanks, Pingfan
Thanks for your explanation. On Tue, Mar 24, 2020 at 8:20 AM Ralph Campbell <rcampbell@nvidia.com> wrote: > > > On 3/23/20 12:34 AM, Michal Hocko wrote: > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > >> For zone_device, migration can only happen on is_device_private_page(page). > >> Correct the logic in try_to_unmap_one(). > > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > > this really deserves a much more detailed explanation IMHO. It seems > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > > migration") deliberately made the decision to allow unmapping these > > pages? Is the check just wrong, inncomplete? Why? > > > > What is the real user visible problem here? > > > >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > >> Cc: Jérôme Glisse <jglisse@redhat.com> > >> Cc: Michal Hocko <mhocko@kernel.org> > >> Cc: John Hubbard <jhubbard@nvidia.com> > >> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > >> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> To: linux-mm@kvack.org > >> --- > >> mm/rmap.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index b838647..ffadf3e 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > >> > >> if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > >> is_zone_device_page(page) && !is_device_private_page(page)) > >> - return true; > >> + return false; > > Pages can be mapped in multiple vmas. Returning false here will only break out > of the loop and skip any other vmas mapping this page which is a minor optimization > but shouldn't really affect what try_to_unmap_one() does. Yes, it returns false to terminate further iteration. And I think fs-dax page would not go through this path. The unmap of fs-dax should go through: umount fs, and arch_remove_memory(). > > >> if (flags & TTU_SPLIT_HUGE_PMD) { > >> split_huge_pmd_address(vma, address, > >> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > >> > >> if (IS_ENABLED(CONFIG_MIGRATION) && > >> (flags & TTU_MIGRATION) && > >> - is_zone_device_page(page)) { > >> + is_device_private_page(page)) { > >> swp_entry_t entry; > >> pte_t swp_pte; > > Since the page was checked for !device private, this is more clear but shouldn't > change anything. Yes. It just makes things clear. Thanks, Pingfan
On Tue 24-03-20 11:47:20, Pingfan Liu wrote: > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > > > For zone_device, migration can only happen on is_device_private_page(page). > > > Correct the logic in try_to_unmap_one(). > > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > > this really deserves a much more detailed explanation IMHO. It seems > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > > migration") deliberately made the decision to allow unmapping these > > pages? Is the check just wrong, inncomplete? Why? > I am not quite sure about zone_device, but I will try to explain it later. > > But first of all, I think the code conflicts with the logic behind it. > If try_to_unmap_one() success to unmap a page, then it should kill the > pte, and return true. But the original code return true before the > code like "ptep_clear_flush()" > > Now, I try to say about !device_private zone device. (Please pardon > and correct me if I make a mistake) > memmap_init_zone_device() raises an extra _refcount on all zone > device. And private-device should lifts the count later, otherwise it > can not migrate. But I did not find the exact place yet. > > While this extra _refcount will block migration, it is not the whole > reason if a zone device page is mapped. > > If a zone device page is mapped, then I think the original code > happen to work due to it skip the call of page_remove_rmap(), and in > try_to_unmap(){ return !page_mapcount(page) ? true : false;}. OK, you made me look more closely. The lack of documentation and therefore the expected semantic doesn't really help. So we can only deduce from the existing code which is a recipe for cargo cult programming :/ The only difference betweena rmap_one returning true and false is that the VMA walk stops for the later and done() callback is not called. Does rmap_one success means the mapping for the vma has been torn down? No. As we can see for the munlock case which just shows hot vague the semantic of this callback might be. I believe the zone device path was just copying it. Is that wrong? Well, it is less optimal than necessary because the property we are checking is not VMA specific so all other VMAs (if there are any at all) will have the same to say. So the only last remaining point is the done() callback. That one is documented as a check. There is no note about potential side effects but the current implementation is really only a check so skipping it doesn't make any real difference. > > What is the real user visible problem here? > As explained, the original code happens to work, but it conflicts with > the logic. Your changelog should be explicit about this being a pure code refinement/cleanup without any functional changes. The rmap walk and callbacks would benefit from a proper documentation. Hint...
On Tue, Mar 24, 2020 at 5:14 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Tue 24-03-20 11:47:20, Pingfan Liu wrote: > > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > > > > For zone_device, migration can only happen on is_device_private_page(page). > > > > Correct the logic in try_to_unmap_one(). > > > > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > > > this really deserves a much more detailed explanation IMHO. It seems > > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > > > migration") deliberately made the decision to allow unmapping these > > > pages? Is the check just wrong, inncomplete? Why? > > I am not quite sure about zone_device, but I will try to explain it later. > > > > But first of all, I think the code conflicts with the logic behind it. > > If try_to_unmap_one() success to unmap a page, then it should kill the > > pte, and return true. But the original code return true before the > > code like "ptep_clear_flush()" > > > > Now, I try to say about !device_private zone device. (Please pardon > > and correct me if I make a mistake) > > memmap_init_zone_device() raises an extra _refcount on all zone > > device. And private-device should lifts the count later, otherwise it > > can not migrate. But I did not find the exact place yet. > > > > While this extra _refcount will block migration, it is not the whole > > reason if a zone device page is mapped. > > > > If a zone device page is mapped, then I think the original code > > happen to work due to it skip the call of page_remove_rmap(), and in > > try_to_unmap(){ return !page_mapcount(page) ? true : false;}. > > OK, you made me look more closely. > > The lack of documentation and therefore the expected semantic doesn't > really help. So we can only deduce from the existing code which is a > recipe for cargo cult programming :/ > > The only difference betweena rmap_one returning true and false is that > the VMA walk stops for the later and done() callback is not called. > Does rmap_one success means the mapping for the vma has been torn down? > No. As we can see for the munlock case which just shows hot vague the > semantic of this callback might be. > > I believe the zone device path was just copying it. Is that wrong? > Well, it is less optimal than necessary because the property we are > checking is not VMA specific so all other VMAs (if there are any at all) > will have the same to say. > > So the only last remaining point is the done() callback. That one is > documented as a check. There is no note about potential side effects but > the current implementation is really only a check so skipping it doesn't > make any real difference. > > > > What is the real user visible problem here? > > As explained, the original code happens to work, but it conflicts with > > the logic. > > Your changelog should be explicit about this being a pure code > refinement/cleanup without any functional changes. OK, I will do that. > > The rmap walk and callbacks would benefit from a proper documentation. > Hint... I will add some note around try_to_unmap_one(), and update the commit log. (Hope my understanding of zone device is right, and will cc more people for comment) Thanks, Pingfan
On Wed, Mar 25, 2020 at 6:54 PM Pingfan Liu <kernelfans@gmail.com> wrote: > > On Tue, Mar 24, 2020 at 5:14 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Tue 24-03-20 11:47:20, Pingfan Liu wrote: > > > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote: > > > > > For zone_device, migration can only happen on is_device_private_page(page). > > > > > Correct the logic in try_to_unmap_one(). > > > > > > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But > > > > this really deserves a much more detailed explanation IMHO. It seems > > > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in > > > > migration") deliberately made the decision to allow unmapping these > > > > pages? Is the check just wrong, inncomplete? Why? > > > I am not quite sure about zone_device, but I will try to explain it later. > > > > > > But first of all, I think the code conflicts with the logic behind it. > > > If try_to_unmap_one() success to unmap a page, then it should kill the > > > pte, and return true. But the original code return true before the > > > code like "ptep_clear_flush()" > > > > > > Now, I try to say about !device_private zone device. (Please pardon > > > and correct me if I make a mistake) > > > memmap_init_zone_device() raises an extra _refcount on all zone > > > device. And private-device should lifts the count later, otherwise it > > > can not migrate. But I did not find the exact place yet. > > > > > > While this extra _refcount will block migration, it is not the whole > > > reason if a zone device page is mapped. > > > > > > If a zone device page is mapped, then I think the original code > > > happen to work due to it skip the call of page_remove_rmap(), and in > > > try_to_unmap(){ return !page_mapcount(page) ? true : false;}. > > > > OK, you made me look more closely. > > > > The lack of documentation and therefore the expected semantic doesn't > > really help. So we can only deduce from the existing code which is a > > recipe for cargo cult programming :/ > > > > The only difference betweena rmap_one returning true and false is that > > the VMA walk stops for the later and done() callback is not called. > > Does rmap_one success means the mapping for the vma has been torn down? > > No. As we can see for the munlock case which just shows hot vague the > > semantic of this callback might be. > > > > I believe the zone device path was just copying it. Is that wrong? > > Well, it is less optimal than necessary because the property we are > > checking is not VMA specific so all other VMAs (if there are any at all) > > will have the same to say. > > > > So the only last remaining point is the done() callback. That one is > > documented as a check. There is no note about potential side effects but > > the current implementation is really only a check so skipping it doesn't > > make any real difference. > > > > > > What is the real user visible problem here? > > > As explained, the original code happens to work, but it conflicts with > > > the logic. > > > > Your changelog should be explicit about this being a pure code > > refinement/cleanup without any functional changes. > OK, I will do that. It took me some time to make clear try_to_munlock(). And now I can make some notes. I will send out V2 soon. Thanks, Pingfan
diff --git a/mm/rmap.c b/mm/rmap.c index b838647..ffadf3e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && is_zone_device_page(page) && !is_device_private_page(page)) - return true; + return false; if (flags & TTU_SPLIT_HUGE_PMD) { split_huge_pmd_address(vma, address, @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && - is_zone_device_page(page)) { + is_device_private_page(page)) { swp_entry_t entry; pte_t swp_pte;
For zone_device, migration can only happen on is_device_private_page(page). Correct the logic in try_to_unmap_one(). Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> To: linux-mm@kvack.org --- mm/rmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.7.5