Message ID | 20181119010924.177177-1-yuzhao@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] mm: fix swap offset when replacing shmem page | expand |
On Sun, 18 Nov 2018, Yu Zhao wrote: > We used to have a single swap address space with swp_entry_t.val > as its radix tree index. This is not the case anymore. Now Each > swp_type() has its own address space and should use swp_offset() > as radix tree index. > > Signed-off-by: Yu Zhao <yuzhao@google.com> This fix is a great find, thank you! But completely mis-described! And could you do a smaller patch, keeping swap_index, that can go to stable without getting into trouble with the recent xarrifications? Fixes: bde05d1ccd51 ("shmem: replace page if mapping excludes its zone") Cc: stable@vger.kernel.org # 3.5+ Seems shmem_replace_page() has been wrong since the day I wrote it: good enough to work on swap "type" 0, which is all most people ever use (especially those few who need shmem_replace_page() at all), but broken once there are any non-0 swp_type bits set in the higher order bits. > --- > mm/shmem.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index d44991ea5ed4..685faa3e0191 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1509,11 +1509,13 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, > { > struct page *oldpage, *newpage; > struct address_space *swap_mapping; > - pgoff_t swap_index; > + swp_entry_t entry; Please keep swap_index as well as adding entry. > int error; > > + VM_BUG_ON(!PageSwapCache(*pagep)); > + I'd prefer you to drop that, it has no bearing on this patch; we used to have it, along with lots of other VM_BUG_ONs in here, but they outlived their usefulness, and don't need reintroducing - they didn't help at all to prevent the actual bug you've found. > oldpage = *pagep; > - swap_index = page_private(oldpage); > + entry.val = page_private(oldpage); entry.val = page_private(oldpage); swap_index = swp_offset(entry); > swap_mapping = page_mapping(oldpage); > > /* > @@ -1532,7 +1534,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, > __SetPageLocked(newpage); > __SetPageSwapBacked(newpage); > SetPageUptodate(newpage); > - set_page_private(newpage, swap_index); > + set_page_private(newpage, entry.val); Yes. > SetPageSwapCache(newpage); > > /* > @@ -1540,7 +1542,8 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, > * a nice clean interface for us to replace oldpage by newpage there. > */ > xa_lock_irq(&swap_mapping->i_pages); > - error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage); > + error = shmem_replace_entry(swap_mapping, swp_offset(entry), > + oldpage, newpage); I'd prefer to omit that hunk, to avoid the xa_lock_irq() in the context; the patch is just as good if we keep the swap_index variable. > if (!error) { > __inc_node_page_state(newpage, NR_FILE_PAGES); > __dec_node_page_state(oldpage, NR_FILE_PAGES); > -- > 2.19.1.1215.g8438c0b245-goog Thanks, Hugh
On Mon, Nov 19, 2018 at 02:11:27PM -0800, Hugh Dickins wrote: > On Sun, 18 Nov 2018, Yu Zhao wrote: > > > We used to have a single swap address space with swp_entry_t.val > > as its radix tree index. This is not the case anymore. Now Each > > swp_type() has its own address space and should use swp_offset() > > as radix tree index. > > > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > This fix is a great find, thank you! But completely mis-described! Yes, now I remember making swap offset as key was done long after per swap device radix tree. > And could you do a smaller patch, keeping swap_index, that can go to > stable without getting into trouble with the recent xarrifications? > > Fixes: bde05d1ccd51 ("shmem: replace page if mapping excludes its zone") > Cc: stable@vger.kernel.org # 3.5+ > > Seems shmem_replace_page() has been wrong since the day I wrote it: > good enough to work on swap "type" 0, which is all most people ever use > (especially those few who need shmem_replace_page() at all), but broken > once there are any non-0 swp_type bits set in the higher order bits. But you did get it right when you wrote the function, which was before the per swap device radix tree. so Fixes: f6ab1f7f6b2d ("mm, swap: use offset of swap entry as key of swap cache") looks good? > > --- > > mm/shmem.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index d44991ea5ed4..685faa3e0191 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -1509,11 +1509,13 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, > > { > > struct page *oldpage, *newpage; > > struct address_space *swap_mapping; > > - pgoff_t swap_index; > > + swp_entry_t entry; > > Please keep swap_index as well as adding entry. Ack. > > int error; > > > > + VM_BUG_ON(!PageSwapCache(*pagep)); > > + > > I'd prefer you to drop that, it has no bearing on this patch; > we used to have it, along with lots of other VM_BUG_ONs in here, > but they outlived their usefulness, and don't need reintroducing - > they didn't help at all to prevent the actual bug you've found. > > > oldpage = *pagep; > > - swap_index = page_private(oldpage); > > + entry.val = page_private(oldpage); > > entry.val = page_private(oldpage); > swap_index = swp_offset(entry); > > > swap_mapping = page_mapping(oldpage); > > > > /* > > @@ -1532,7 +1534,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, > > __SetPageLocked(newpage); > > __SetPageSwapBacked(newpage); > > SetPageUptodate(newpage); > > - set_page_private(newpage, swap_index); > > + set_page_private(newpage, entry.val); > > Yes. > > > SetPageSwapCache(newpage); > > > > /* > > @@ -1540,7 +1542,8 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, > > * a nice clean interface for us to replace oldpage by newpage there. > > */ > > xa_lock_irq(&swap_mapping->i_pages); > > - error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage); > > + error = shmem_replace_entry(swap_mapping, swp_offset(entry), > > + oldpage, newpage); > > I'd prefer to omit that hunk, to avoid the xa_lock_irq() in the context; > the patch is just as good if we keep the swap_index variable. > > > if (!error) { > > __inc_node_page_state(newpage, NR_FILE_PAGES); > > __dec_node_page_state(oldpage, NR_FILE_PAGES); > > -- > > 2.19.1.1215.g8438c0b245-goog > > Thanks, > Hugh
On Mon, 19 Nov 2018, Yu Zhao wrote: > On Mon, Nov 19, 2018 at 02:11:27PM -0800, Hugh Dickins wrote: > > On Sun, 18 Nov 2018, Yu Zhao wrote: > > > > > We used to have a single swap address space with swp_entry_t.val > > > as its radix tree index. This is not the case anymore. Now Each > > > swp_type() has its own address space and should use swp_offset() > > > as radix tree index. > > > > > > Signed-off-by: Yu Zhao <yuzhao@google.com> > > > > This fix is a great find, thank you! But completely mis-described! > > Yes, now I remember making swap offset as key was done long after per > swap device radix tree. > > > And could you do a smaller patch, keeping swap_index, that can go to > > stable without getting into trouble with the recent xarrifications? > > > > Fixes: bde05d1ccd51 ("shmem: replace page if mapping excludes its zone") > > Cc: stable@vger.kernel.org # 3.5+ > > > > Seems shmem_replace_page() has been wrong since the day I wrote it: > > good enough to work on swap "type" 0, which is all most people ever use > > (especially those few who need shmem_replace_page() at all), but broken > > once there are any non-0 swp_type bits set in the higher order bits. > > But you did get it right when you wrote the function, which was before > the per swap device radix tree. so > Fixes: f6ab1f7f6b2d ("mm, swap: use offset of swap entry as key of swap cache") > looks good? Oh, you're right, thank you. Yes, the fix is to that one, in 4.9 onwards. I don't much like my original use of the name "swap_index", when it was not the index in a swapfile (though it was the index in the radix tree); but it will become a correct name with your patch. Though Matthew Wilcox seems to want us to avoid saying "radix tree"... Hugh
On Mon, Nov 19, 2018 at 09:07:27PM -0800, Hugh Dickins wrote: > I don't much like my original use of the name "swap_index", when it was > not the index in a swapfile (though it was the index in the radix tree); > but it will become a correct name with your patch. > > Though Matthew Wilcox seems to want us to avoid saying "radix tree"... Naming is hard ... but the Linux radix tree looks almost nothing like a classic computer science radix tree. If you try to reconcile our implementation with the wikipedia article on radix trees, you'll get very confused. A lot of places where we were saying 'radix tree' in comments should really have said 'page cache'. So is this a swap cache index? I'm not really familiar enough with the swapping code to say.
diff --git a/mm/shmem.c b/mm/shmem.c index d44991ea5ed4..685faa3e0191 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1509,11 +1509,13 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, { struct page *oldpage, *newpage; struct address_space *swap_mapping; - pgoff_t swap_index; + swp_entry_t entry; int error; + VM_BUG_ON(!PageSwapCache(*pagep)); + oldpage = *pagep; - swap_index = page_private(oldpage); + entry.val = page_private(oldpage); swap_mapping = page_mapping(oldpage); /* @@ -1532,7 +1534,7 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, __SetPageLocked(newpage); __SetPageSwapBacked(newpage); SetPageUptodate(newpage); - set_page_private(newpage, swap_index); + set_page_private(newpage, entry.val); SetPageSwapCache(newpage); /* @@ -1540,7 +1542,8 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp, * a nice clean interface for us to replace oldpage by newpage there. */ xa_lock_irq(&swap_mapping->i_pages); - error = shmem_replace_entry(swap_mapping, swap_index, oldpage, newpage); + error = shmem_replace_entry(swap_mapping, swp_offset(entry), + oldpage, newpage); if (!error) { __inc_node_page_state(newpage, NR_FILE_PAGES); __dec_node_page_state(oldpage, NR_FILE_PAGES);
We used to have a single swap address space with swp_entry_t.val as its radix tree index. This is not the case anymore. Now Each swp_type() has its own address space and should use swp_offset() as radix tree index. Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/shmem.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)