Message ID | 20240607145902.1137853-5-kernel@pankajraghav.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | enable bs > ps in XFS | expand |
On Fri, Jun 07, 2024 at 02:58:55PM +0000, Pankaj Raghav (Samsung) wrote: > @@ -230,7 +247,9 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > struct folio *folio = xa_load(&mapping->i_pages, index + i); > int ret; > > + Spurious newline > if (folio && !xa_is_value(folio)) { > + long nr_pages = folio_nr_pages(folio); Hm, but we don't have a reference on this folio. So this isn't safe. > @@ -240,12 +259,24 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > * not worth getting one just for that. > */ > read_pages(ractl); > - ractl->_index += folio_nr_pages(folio); > + > + /* > + * Move the ractl->_index by at least min_pages > + * if the folio got truncated to respect the > + * alignment constraint in the page cache. > + * > + */ > + if (mapping != folio->mapping) > + nr_pages = min_nrpages; > + > + VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); > + ractl->_index += nr_pages; Why not just: ractl->_index += min_nrpages; like you do below?
On Wed, Jun 12, 2024 at 07:50:49PM +0100, Matthew Wilcox wrote: > On Fri, Jun 07, 2024 at 02:58:55PM +0000, Pankaj Raghav (Samsung) wrote: > > @@ -230,7 +247,9 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > struct folio *folio = xa_load(&mapping->i_pages, index + i); > > int ret; > > > > + > > Spurious newline Oops. > > > if (folio && !xa_is_value(folio)) { > > + long nr_pages = folio_nr_pages(folio); > > Hm, but we don't have a reference on this folio. So this isn't safe. > That is why I added a check for mapping after read_pages(). You are right, we can make it better. > > @@ -240,12 +259,24 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > * not worth getting one just for that. > > */ > > read_pages(ractl); > > - ractl->_index += folio_nr_pages(folio); > > + > > + /* > > + * Move the ractl->_index by at least min_pages > > + * if the folio got truncated to respect the > > + * alignment constraint in the page cache. > > + * > > + */ > > + if (mapping != folio->mapping) > > + nr_pages = min_nrpages; > > + > > + VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); > > + ractl->_index += nr_pages; > > Why not just: > ractl->_index += min_nrpages; Then we will only move min_nrpages even if the folio we found had a bigger order. Hannes patches (first patch) made sure we move the ractl->index by folio_nr_pages instead of 1 and making this change will defeat the purpose because without mapping order set, min_nrpages will be 1. What I could do is the follows: diff --git a/mm/readahead.c b/mm/readahead.c index 389cd802da63..92cf45cdb4d3 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -249,7 +249,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) { - long nr_pages = folio_nr_pages(folio); + long nr_pages; /* * Page already present? Kick off the current batch * of contiguous pages before continuing with the @@ -266,10 +266,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * alignment constraint in the page cache. * */ - if (mapping != folio->mapping) - nr_pages = min_nrpages; + nr_pages = max(folio_nr_pages(folio), (long)min_nrpages); - VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); ractl->_index += nr_pages; i = ractl->_index + ractl->_nr_pages - index; continue; Now we will still move respecting the min order constraint but if we had a bigger folio and we do have a reference, then we move folio_nr_pages. > > like you do below? Below we add a folio of min_order, so if that fails for some reason, we can unconditionally move min_nrpages.
On Fri, Jun 14, 2024 at 09:26:02AM +0000, Pankaj Raghav (Samsung) wrote: > > Hm, but we don't have a reference on this folio. So this isn't safe. > > That is why I added a check for mapping after read_pages(). You are > right, we can make it better. That's not enoughh. > > > + if (mapping != folio->mapping) > > > + nr_pages = min_nrpages; > > > + > > > + VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); > > > + ractl->_index += nr_pages; > > > > Why not just: > > ractl->_index += min_nrpages; > > Then we will only move min_nrpages even if the folio we found had a > bigger order. Hannes patches (first patch) made sure we move the > ractl->index by folio_nr_pages instead of 1 and making this change will > defeat the purpose because without mapping order set, min_nrpages will > be 1. Hannes' patch is wrong. It's not safe to call folio_nr_pages() unless you have a reference to the folio. > @@ -266,10 +266,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > * alignment constraint in the page cache. > * > */ > - if (mapping != folio->mapping) > - nr_pages = min_nrpages; > + nr_pages = max(folio_nr_pages(folio), (long)min_nrpages); No. > Now we will still move respecting the min order constraint but if we had > a bigger folio and we do have a reference, then we move folio_nr_pages. You don't have a reference, so it's never safe.
On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: > On Fri, Jun 14, 2024 at 09:26:02AM +0000, Pankaj Raghav (Samsung) wrote: > > > Hm, but we don't have a reference on this folio. So this isn't safe. > > > > That is why I added a check for mapping after read_pages(). You are > > right, we can make it better. > > That's not enoughh. > > > > > + if (mapping != folio->mapping) > > > > + nr_pages = min_nrpages; > > > > + > > > > + VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); > > > > + ractl->_index += nr_pages; > > > > > > Why not just: > > > ractl->_index += min_nrpages; > > > > Then we will only move min_nrpages even if the folio we found had a > > bigger order. Hannes patches (first patch) made sure we move the > > ractl->index by folio_nr_pages instead of 1 and making this change will > > defeat the purpose because without mapping order set, min_nrpages will > > be 1. > > Hannes' patch is wrong. It's not safe to call folio_nr_pages() unless > you have a reference to the folio. > > > @@ -266,10 +266,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > > * alignment constraint in the page cache. > > * > > */ > > - if (mapping != folio->mapping) > > - nr_pages = min_nrpages; > > + nr_pages = max(folio_nr_pages(folio), (long)min_nrpages); > > No. > > > Now we will still move respecting the min order constraint but if we had > > a bigger folio and we do have a reference, then we move folio_nr_pages. > > You don't have a reference, so it's never safe. I am hitting my head now because you have literally mentioned that in the comment: * next batch. This page may be the one we would * have intended to mark as Readahead, but we don't * have a stable reference to this page, and it's * not worth getting one just for that. I will move it by min_nrpages as follows: > ractl->_index += min_nrpages; So the following can still be there from Hannes patch as we have a stable reference: ractl->_workingset |= folio_test_workingset(folio); - ractl->_nr_pages++; + ractl->_nr_pages += folio_nr_pages(folio); + i += folio_nr_pages(folio); } Thanks for the clarification. -- Pankaj
On Mon, Jun 17, 2024 at 04:04:20PM +0000, Pankaj Raghav (Samsung) wrote: > On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: > So the following can still be there from Hannes patch as we have a > stable reference: > > ractl->_workingset |= folio_test_workingset(folio); > - ractl->_nr_pages++; > + ractl->_nr_pages += folio_nr_pages(folio); > + i += folio_nr_pages(folio); > } We _can_, but we just allocated it, so we know what size it is already. I'm starting to feel that Hannes' patch should be combined with this one.
On Mon, Jun 17, 2024 at 05:10:15PM +0100, Matthew Wilcox wrote: > On Mon, Jun 17, 2024 at 04:04:20PM +0000, Pankaj Raghav (Samsung) wrote: > > On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: > > So the following can still be there from Hannes patch as we have a > > stable reference: > > > > ractl->_workingset |= folio_test_workingset(folio); > > - ractl->_nr_pages++; > > + ractl->_nr_pages += folio_nr_pages(folio); > > + i += folio_nr_pages(folio); > > } > > We _can_, but we just allocated it, so we know what size it is already. Yes. > I'm starting to feel that Hannes' patch should be combined with this > one. Fine by me. @Hannes, is that ok with you?
On 6/17/24 18:10, Matthew Wilcox wrote: > On Mon, Jun 17, 2024 at 04:04:20PM +0000, Pankaj Raghav (Samsung) wrote: >> On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: >> So the following can still be there from Hannes patch as we have a >> stable reference: >> >> ractl->_workingset |= folio_test_workingset(folio); >> - ractl->_nr_pages++; >> + ractl->_nr_pages += folio_nr_pages(folio); >> + i += folio_nr_pages(folio); >> } > > We _can_, but we just allocated it, so we know what size it is already. > I'm starting to feel that Hannes' patch should be combined with this > one. And we could even make it conditional; on recent devices allocating 64k (or even 2M) worth of zero pages is not a big deal. And if you have machines where this is an issue maybe using large folios isn't the best of ideas to start with. Cheers, Hannes
On 6/17/24 18:39, Pankaj Raghav (Samsung) wrote: > On Mon, Jun 17, 2024 at 05:10:15PM +0100, Matthew Wilcox wrote: >> On Mon, Jun 17, 2024 at 04:04:20PM +0000, Pankaj Raghav (Samsung) wrote: >>> On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: >>> So the following can still be there from Hannes patch as we have a >>> stable reference: >>> >>> ractl->_workingset |= folio_test_workingset(folio); >>> - ractl->_nr_pages++; >>> + ractl->_nr_pages += folio_nr_pages(folio); >>> + i += folio_nr_pages(folio); >>> } >> >> We _can_, but we just allocated it, so we know what size it is already. > Yes. > >> I'm starting to feel that Hannes' patch should be combined with this >> one. > > Fine by me. @Hannes, is that ok with you? Sure. I was about to re-send my patchset anyway, so feel free to wrap it in. Cheers, Hannes
On Tue, Jun 18, 2024 at 08:56:53AM +0200, Hannes Reinecke wrote: > On 6/17/24 18:39, Pankaj Raghav (Samsung) wrote: > > On Mon, Jun 17, 2024 at 05:10:15PM +0100, Matthew Wilcox wrote: > > > On Mon, Jun 17, 2024 at 04:04:20PM +0000, Pankaj Raghav (Samsung) wrote: > > > > On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: > > > > So the following can still be there from Hannes patch as we have a > > > > stable reference: > > > > > > > > ractl->_workingset |= folio_test_workingset(folio); > > > > - ractl->_nr_pages++; > > > > + ractl->_nr_pages += folio_nr_pages(folio); > > > > + i += folio_nr_pages(folio); > > > > } > > > > > > We _can_, but we just allocated it, so we know what size it is already. > > Yes. > > > > > I'm starting to feel that Hannes' patch should be combined with this > > > one. > > > > Fine by me. @Hannes, is that ok with you? > > Sure. I was about to re-send my patchset anyway, so feel free to wrap it in. Is it ok if I add your Co-developed and Signed-off tag? This is what I have combining your patch with mine and making willy's changes: diff --git a/mm/readahead.c b/mm/readahead.c index 389cd802da63..f56da953c130 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -247,9 +247,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, struct folio *folio = xa_load(&mapping->i_pages, index + i); int ret; - if (folio && !xa_is_value(folio)) { - long nr_pages = folio_nr_pages(folio); /* * Page already present? Kick off the current batch * of contiguous pages before continuing with the @@ -259,18 +257,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * not worth getting one just for that. */ read_pages(ractl); - - /* - * Move the ractl->_index by at least min_pages - * if the folio got truncated to respect the - * alignment constraint in the page cache. - * - */ - if (mapping != folio->mapping) - nr_pages = min_nrpages; - - VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); - ractl->_index += nr_pages; + ractl->_index += min_nrpages; i = ractl->_index + ractl->_nr_pages - index; continue; } @@ -293,8 +280,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (i == mark) folio_set_readahead(folio); ractl->_workingset |= folio_test_workingset(folio); - ractl->_nr_pages += folio_nr_pages(folio); - i += folio_nr_pages(folio); + ractl->_nr_pages += min_nrpages; + i += min_nrpages; } /*
On 6/21/24 14:19, Pankaj Raghav (Samsung) wrote: > On Tue, Jun 18, 2024 at 08:56:53AM +0200, Hannes Reinecke wrote: >> On 6/17/24 18:39, Pankaj Raghav (Samsung) wrote: >>> On Mon, Jun 17, 2024 at 05:10:15PM +0100, Matthew Wilcox wrote: >>>> On Mon, Jun 17, 2024 at 04:04:20PM +0000, Pankaj Raghav (Samsung) wrote: >>>>> On Mon, Jun 17, 2024 at 01:32:42PM +0100, Matthew Wilcox wrote: >>>>> So the following can still be there from Hannes patch as we have a >>>>> stable reference: >>>>> >>>>> ractl->_workingset |= folio_test_workingset(folio); >>>>> - ractl->_nr_pages++; >>>>> + ractl->_nr_pages += folio_nr_pages(folio); >>>>> + i += folio_nr_pages(folio); >>>>> } >>>> >>>> We _can_, but we just allocated it, so we know what size it is already. >>> Yes. >>> >>>> I'm starting to feel that Hannes' patch should be combined with this >>>> one. >>> >>> Fine by me. @Hannes, is that ok with you? >> >> Sure. I was about to re-send my patchset anyway, so feel free to wrap it in. > Is it ok if I add your Co-developed and Signed-off tag? > This is what I have combining your patch with mine and making willy's > changes: > > diff --git a/mm/readahead.c b/mm/readahead.c > index 389cd802da63..f56da953c130 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -247,9 +247,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > struct folio *folio = xa_load(&mapping->i_pages, index + i); > int ret; > > - > if (folio && !xa_is_value(folio)) { > - long nr_pages = folio_nr_pages(folio); > /* > * Page already present? Kick off the current batch > * of contiguous pages before continuing with the > @@ -259,18 +257,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > * not worth getting one just for that. > */ > read_pages(ractl); > - > - /* > - * Move the ractl->_index by at least min_pages > - * if the folio got truncated to respect the > - * alignment constraint in the page cache. > - * > - */ > - if (mapping != folio->mapping) > - nr_pages = min_nrpages; > - > - VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); > - ractl->_index += nr_pages; > + ractl->_index += min_nrpages; > i = ractl->_index + ractl->_nr_pages - index; > continue; > } > @@ -293,8 +280,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, > if (i == mark) > folio_set_readahead(folio); > ractl->_workingset |= folio_test_workingset(folio); > - ractl->_nr_pages += folio_nr_pages(folio); > - i += folio_nr_pages(folio); > + ractl->_nr_pages += min_nrpages; > + i += min_nrpages; > } > > /* > Yes, that looks fine. Go. Cheers, Hannes
diff --git a/mm/readahead.c b/mm/readahead.c index da34b28da02c..389cd802da63 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -206,9 +206,10 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned long nr_to_read, unsigned long lookahead_size) { struct address_space *mapping = ractl->mapping; - unsigned long index = readahead_index(ractl); + unsigned long ra_folio_index, index = readahead_index(ractl); gfp_t gfp_mask = readahead_gfp_mask(mapping); - unsigned long i = 0; + unsigned long mark, i = 0; + unsigned int min_nrpages = mapping_min_folio_nrpages(mapping); /* * Partway through the readahead operation, we will have added @@ -223,6 +224,22 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, unsigned int nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + index = mapping_align_start_index(mapping, index); + + /* + * As iterator `i` is aligned to min_nrpages, round_up the + * difference between nr_to_read and lookahead_size to mark the + * index that only has lookahead or "async_region" to set the + * readahead flag. + */ + ra_folio_index = round_up(readahead_index(ractl) + nr_to_read - lookahead_size, + min_nrpages); + mark = ra_folio_index - index; + if (index != readahead_index(ractl)) { + nr_to_read += readahead_index(ractl) - index; + ractl->_index = index; + } + /* * Preallocate as many pages as we will need. */ @@ -230,7 +247,9 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, struct folio *folio = xa_load(&mapping->i_pages, index + i); int ret; + if (folio && !xa_is_value(folio)) { + long nr_pages = folio_nr_pages(folio); /* * Page already present? Kick off the current batch * of contiguous pages before continuing with the @@ -240,12 +259,24 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, * not worth getting one just for that. */ read_pages(ractl); - ractl->_index += folio_nr_pages(folio); + + /* + * Move the ractl->_index by at least min_pages + * if the folio got truncated to respect the + * alignment constraint in the page cache. + * + */ + if (mapping != folio->mapping) + nr_pages = min_nrpages; + + VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio); + ractl->_index += nr_pages; i = ractl->_index + ractl->_nr_pages - index; continue; } - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, + mapping_min_folio_order(mapping)); if (!folio) break; @@ -255,11 +286,11 @@ void page_cache_ra_unbounded(struct readahead_control *ractl, if (ret == -ENOMEM) break; read_pages(ractl); - ractl->_index++; + ractl->_index += min_nrpages; i = ractl->_index + ractl->_nr_pages - index; continue; } - if (i == nr_to_read - lookahead_size) + if (i == mark) folio_set_readahead(folio); ractl->_workingset |= folio_test_workingset(folio); ractl->_nr_pages += folio_nr_pages(folio); @@ -493,13 +524,19 @@ void page_cache_ra_order(struct readahead_control *ractl, { struct address_space *mapping = ractl->mapping; pgoff_t index = readahead_index(ractl); + unsigned int min_order = mapping_min_folio_order(mapping); pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT; pgoff_t mark = index + ra->size - ra->async_size; unsigned int nofs; int err = 0; gfp_t gfp = readahead_gfp_mask(mapping); + unsigned int min_ra_size = max(4, mapping_min_folio_nrpages(mapping)); - if (!mapping_large_folio_support(mapping) || ra->size < 4) + /* + * Fallback when size < min_nrpages as each folio should be + * at least min_nrpages anyway. + */ + if (!mapping_large_folio_support(mapping) || ra->size < min_ra_size) goto fallback; limit = min(limit, index + ra->size - 1); @@ -508,11 +545,20 @@ void page_cache_ra_order(struct readahead_control *ractl, new_order += 2; new_order = min(mapping_max_folio_order(mapping), new_order); new_order = min_t(unsigned int, new_order, ilog2(ra->size)); + new_order = max(new_order, min_order); } /* See comment in page_cache_ra_unbounded() */ nofs = memalloc_nofs_save(); filemap_invalidate_lock_shared(mapping); + /* + * If the new_order is greater than min_order and index is + * already aligned to new_order, then this will be noop as index + * aligned to new_order should also be aligned to min_order. + */ + ractl->_index = mapping_align_start_index(mapping, index); + index = readahead_index(ractl); + while (index <= limit) { unsigned int order = new_order; @@ -520,7 +566,7 @@ void page_cache_ra_order(struct readahead_control *ractl, if (index & ((1UL << order) - 1)) order = __ffs(index); /* Don't allocate pages past EOF */ - while (index + (1UL << order) - 1 > limit) + while (order > min_order && index + (1UL << order) - 1 > limit) order--; err = ra_alloc_folio(ractl, index, mark, order, gfp); if (err) @@ -784,8 +830,15 @@ void readahead_expand(struct readahead_control *ractl, struct file_ra_state *ra = ractl->ra; pgoff_t new_index, new_nr_pages; gfp_t gfp_mask = readahead_gfp_mask(mapping); + unsigned long min_nrpages = mapping_min_folio_nrpages(mapping); + unsigned int min_order = mapping_min_folio_order(mapping); new_index = new_start / PAGE_SIZE; + /* + * Readahead code should have aligned the ractl->_index to + * min_nrpages before calling readahead aops. + */ + VM_BUG_ON(!IS_ALIGNED(ractl->_index, min_nrpages)); /* Expand the leading edge downwards */ while (ractl->_index > new_index) { @@ -795,9 +848,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_start_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -807,7 +862,7 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; ractl->_index = folio->index; } @@ -822,9 +877,11 @@ void readahead_expand(struct readahead_control *ractl, if (folio && !xa_is_value(folio)) return; /* Folio apparently present */ - folio = filemap_alloc_folio(gfp_mask, 0); + folio = filemap_alloc_folio(gfp_mask, min_order); if (!folio) return; + + index = mapping_align_start_index(mapping, index); if (filemap_add_folio(mapping, folio, index, gfp_mask) < 0) { folio_put(folio); return; @@ -834,10 +891,10 @@ void readahead_expand(struct readahead_control *ractl, ractl->_workingset = true; psi_memstall_enter(&ractl->_pflags); } - ractl->_nr_pages++; + ractl->_nr_pages += min_nrpages; if (ra) { - ra->size++; - ra->async_size++; + ra->size += min_nrpages; + ra->async_size += min_nrpages; } } }