Message ID | 20201009215952.2726-1-rcampbell@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/memcg: fix device private memcg accounting | expand |
On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <rcampbell@nvidia.com> wrote: > The code in mc_handle_swap_pte() checks for non_swap_entry() and returns > NULL before checking is_device_private_entry() so device private pages > are never handled. > Fix this by checking for non_swap_entry() after handling device private > swap PTEs. > > Cc: stable@vger.kernel.org I was going to ask "what are the end-user visible effects of the bug". This is important information with a cc:stable. > > I'm not sure exactly how to test this. I ran the HMM self tests but > that is a minimal sanity check. I think moving the self test from one > memory cgroup to another while it is running would exercise this patch. > I'm looking at how the test could move itself to another group after > migrating some anonymous memory to the test driver. > But this makes me suspect the answer is "there aren't any that we know of". Are you sure a cc:stable is warranted?
On 10/9/20 3:50 PM, Andrew Morton wrote: > On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <rcampbell@nvidia.com> wrote: > >> The code in mc_handle_swap_pte() checks for non_swap_entry() and returns >> NULL before checking is_device_private_entry() so device private pages >> are never handled. >> Fix this by checking for non_swap_entry() after handling device private >> swap PTEs. >> >> Cc: stable@vger.kernel.org > > I was going to ask "what are the end-user visible effects of the bug". > This is important information with a cc:stable. > >> >> I'm not sure exactly how to test this. I ran the HMM self tests but >> that is a minimal sanity check. I think moving the self test from one >> memory cgroup to another while it is running would exercise this patch. >> I'm looking at how the test could move itself to another group after >> migrating some anonymous memory to the test driver. >> > > But this makes me suspect the answer is "there aren't any that we know > of". Are you sure a cc:stable is warranted? > I assume the memory cgroup accounting would be off somehow when moving a process to another memory cgroup. Currently, the device private page is charged like a normal anonymous page when allocated and is uncharged when the page is freed so I think that path is OK. Maybe someone who knows more about memory cgroup accounting can comment?
On Fri, Oct 09, 2020 at 05:00:37PM -0700, Ralph Campbell wrote: > > On 10/9/20 3:50 PM, Andrew Morton wrote: > > On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <rcampbell@nvidia.com> wrote: > > > > > The code in mc_handle_swap_pte() checks for non_swap_entry() and returns > > > NULL before checking is_device_private_entry() so device private pages > > > are never handled. > > > Fix this by checking for non_swap_entry() after handling device private > > > swap PTEs. The fix looks good to me. Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > But this makes me suspect the answer is "there aren't any that we know > > of". Are you sure a cc:stable is warranted? > > > > I assume the memory cgroup accounting would be off somehow when moving > a process to another memory cgroup. > Currently, the device private page is charged like a normal anonymous page > when allocated and is uncharged when the page is freed so I think that path is OK. > Maybe someone who knows more about memory cgroup accounting can comment? As for whether to CC stable, I'm leaning toward no: - When moving tasks, we'd leave their device pages behind in the old cgroup. This isn't great, but it doesn't cause counter imbalances or corruption or anything - we also skip locked pages, we used to skip pages mapped by more than one pte, the user can select whether to move pages along tasks at all, and if so, whether only anon or file. - Charge moving itself is a bit of a questionable feature, and users have been moving away from it. Leaving tasks in a cgroup and changing the configuration is a heck of a lot cheaper than moving potentially gigabytes of pages to another configuration domain. - According to the Fixes tag, this isn't a regression, either. Since their inception, we have never migrated device pages.
On 10/12/20 6:28 AM, Johannes Weiner wrote: > On Fri, Oct 09, 2020 at 05:00:37PM -0700, Ralph Campbell wrote: >> >> On 10/9/20 3:50 PM, Andrew Morton wrote: >>> On Fri, 9 Oct 2020 14:59:52 -0700 Ralph Campbell <rcampbell@nvidia.com> wrote: >>> >>>> The code in mc_handle_swap_pte() checks for non_swap_entry() and returns >>>> NULL before checking is_device_private_entry() so device private pages >>>> are never handled. >>>> Fix this by checking for non_swap_entry() after handling device private >>>> swap PTEs. > > The fix looks good to me. > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > >>> But this makes me suspect the answer is "there aren't any that we know >>> of". Are you sure a cc:stable is warranted? >>> >> >> I assume the memory cgroup accounting would be off somehow when moving >> a process to another memory cgroup. >> Currently, the device private page is charged like a normal anonymous page >> when allocated and is uncharged when the page is freed so I think that path is OK. >> Maybe someone who knows more about memory cgroup accounting can comment? > > As for whether to CC stable, I'm leaning toward no: > > - When moving tasks, we'd leave their device pages behind in the old > cgroup. This isn't great, but it doesn't cause counter imbalances or > corruption or anything - we also skip locked pages, we used to skip > pages mapped by more than one pte, the user can select whether to > move pages along tasks at all, and if so, whether only anon or file. > > - Charge moving itself is a bit of a questionable feature, and users > have been moving away from it. Leaving tasks in a cgroup and > changing the configuration is a heck of a lot cheaper than moving > potentially gigabytes of pages to another configuration domain. > > - According to the Fixes tag, this isn't a regression, either. Since > their inception, we have never migrated device pages. Thanks for the Acked-by and the comments. I assume Andrew will update the tags when queuing this up unless he wants me to resend.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2636f8bad908..3a24e3b619f5 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5549,7 +5549,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, struct page *page = NULL; swp_entry_t ent = pte_to_swp_entry(ptent); - if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent)) + if (!(mc.flags & MOVE_ANON)) return NULL; /* @@ -5568,6 +5568,9 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma, return page; } + if (non_swap_entry(ent)) + return NULL; + /* * Because lookup_swap_cache() updates some statistics counter, * we call find_get_page() with swapper_space directly.
The code in mc_handle_swap_pte() checks for non_swap_entry() and returns NULL before checking is_device_private_entry() so device private pages are never handled. Fix this by checking for non_swap_entry() after handling device private swap PTEs. Cc: stable@vger.kernel.org Fixes: c733a82874a7 ("mm/memcontrol: support MEMORY_DEVICE_PRIVATE") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> --- I'm not sure exactly how to test this. I ran the HMM self tests but that is a minimal sanity check. I think moving the self test from one memory cgroup to another while it is running would exercise this patch. I'm looking at how the test could move itself to another group after migrating some anonymous memory to the test driver. This applies cleanly to linux-5.9.0-rc8-mm1 and is for Andrew Morton's tree. mm/memcontrol.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)