Message ID | 20230915095042.1320180-7-da.gomez@samsung.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [1/6] filemap: make the folio order calculation shareable | expand |
On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > Add large folio support for shmem write path matching the same high > order preference mechanism used for iomap buffered IO path as used in > __filemap_get_folio(). > > Use the __folio_get_max_order to get a hint for the order of the folio > based on file size which takes care of the mapping requirements. > > Swap does not support high order folios for now, so make it order 0 in > case swap is enabled. I didn't take a close look at the series, but I am not sure I understand the rationale here. Reclaim will split high order shmem folios anyway, right? It seems like we only enable high order folios if the "noswap" mount option is used, which is fairly recent. I doubt it is widely used. > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > --- > mm/shmem.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index adff74751065..26ca555b1669 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > } > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > - pgoff_t index, bool huge, unsigned int *order) > + pgoff_t index, bool huge, unsigned int *order, > + struct shmem_sb_info *sbinfo) > { > struct shmem_inode_info *info = SHMEM_I(inode); > struct folio *folio; > int nr; > int err; > > + if (!sbinfo->noswap) > + *order = 0; > + else > + *order = (*order == 1) ? 0 : *order; > + > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > huge = false; > nr = huge ? HPAGE_PMD_NR : 1U << *order; > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > return 0; > } > > + order = mapping_size_order(inode->i_mapping, index, len); > + > if (!shmem_is_huge(inode, index, false, > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > goto alloc_nohuge; > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > huge_gfp = vma_thp_gfp_mask(vma); > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > - &order); > + &order, sbinfo); > if (IS_ERR(folio)) { > alloc_nohuge: > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > - &order); > + &order, sbinfo); > } > if (IS_ERR(folio)) { > int retry = 5; > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > if (folio_test_large(folio)) { > folio_unlock(folio); > folio_put(folio); > + if (order > 0) > + order--; > goto alloc_nohuge; > } > unlock: > -- > 2.39.2 >
On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > Add large folio support for shmem write path matching the same high > > order preference mechanism used for iomap buffered IO path as used in > > __filemap_get_folio(). > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > based on file size which takes care of the mapping requirements. > > > > Swap does not support high order folios for now, so make it order 0 in > > case swap is enabled. > > I didn't take a close look at the series, but I am not sure I > understand the rationale here. Reclaim will split high order shmem > folios anyway, right? For context, this is part of the enablement of large block sizes (LBS) effort [1][2][3], so the assumption here is that the kernel will reclaim memory with the same (large) block sizes that were written to the device. I'll add more context in the V2. [1] https://kernelnewbies.org/KernelProjects/large-block-size [2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml# [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > It seems like we only enable high order folios if the "noswap" mount > option is used, which is fairly recent. I doubt it is widely used. For now, I skipped the swap path as it currently lacks support for high order folios. But I'm currently looking into it as part of the LBS effort (please check spreadsheet at [2] for that). > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > --- > > mm/shmem.c | 16 +++++++++++++--- > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index adff74751065..26ca555b1669 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > > } > > > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > > - pgoff_t index, bool huge, unsigned int *order) > > + pgoff_t index, bool huge, unsigned int *order, > > + struct shmem_sb_info *sbinfo) > > { > > struct shmem_inode_info *info = SHMEM_I(inode); > > struct folio *folio; > > int nr; > > int err; > > > > + if (!sbinfo->noswap) > > + *order = 0; > > + else > > + *order = (*order == 1) ? 0 : *order; > > + > > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > > huge = false; > > nr = huge ? HPAGE_PMD_NR : 1U << *order; > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > return 0; > > } > > > > + order = mapping_size_order(inode->i_mapping, index, len); > > + > > if (!shmem_is_huge(inode, index, false, > > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > > goto alloc_nohuge; > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > huge_gfp = vma_thp_gfp_mask(vma); > > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > > - &order); > > + &order, sbinfo); > > if (IS_ERR(folio)) { > > alloc_nohuge: > > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > > - &order); > > + &order, sbinfo); > > } > > if (IS_ERR(folio)) { > > int retry = 5; > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > if (folio_test_large(folio)) { > > folio_unlock(folio); > > folio_put(folio); > > + if (order > 0) > > + order--; > > goto alloc_nohuge; > > } > > unlock: > > -- > > 2.39.2 > >
On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > Add large folio support for shmem write path matching the same high > > > order preference mechanism used for iomap buffered IO path as used in > > > __filemap_get_folio(). > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > based on file size which takes care of the mapping requirements. > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > case swap is enabled. > > > > I didn't take a close look at the series, but I am not sure I > > understand the rationale here. Reclaim will split high order shmem > > folios anyway, right? > > For context, this is part of the enablement of large block sizes (LBS) > effort [1][2][3], so the assumption here is that the kernel will > reclaim memory with the same (large) block sizes that were written to > the device. > > I'll add more context in the V2. > > [1] https://kernelnewbies.org/KernelProjects/large-block-size > [2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml# > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > It seems like we only enable high order folios if the "noswap" mount > > option is used, which is fairly recent. I doubt it is widely used. > > For now, I skipped the swap path as it currently lacks support for > high order folios. But I'm currently looking into it as part of the LBS > effort (please check spreadsheet at [2] for that). Thanks for the context, but I am not sure I understand. IIUC we are skipping allocating large folios in shmem if swap is enabled in this patch. Swap does not support swapping out large folios as a whole (except THPs), but page reclaim will split those large folios and swap them out as order-0 pages anyway. So I am not sure I understand why we need to skip allocating large folios if swap is enabled. > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > --- > > > mm/shmem.c | 16 +++++++++++++--- > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > > index adff74751065..26ca555b1669 100644 > > > --- a/mm/shmem.c > > > +++ b/mm/shmem.c > > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > > > } > > > > > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > > > - pgoff_t index, bool huge, unsigned int *order) > > > + pgoff_t index, bool huge, unsigned int *order, > > > + struct shmem_sb_info *sbinfo) > > > { > > > struct shmem_inode_info *info = SHMEM_I(inode); > > > struct folio *folio; > > > int nr; > > > int err; > > > > > > + if (!sbinfo->noswap) > > > + *order = 0; > > > + else > > > + *order = (*order == 1) ? 0 : *order; > > > + > > > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > > > huge = false; > > > nr = huge ? HPAGE_PMD_NR : 1U << *order; > > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > return 0; > > > } > > > > > > + order = mapping_size_order(inode->i_mapping, index, len); > > > + > > > if (!shmem_is_huge(inode, index, false, > > > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > > > goto alloc_nohuge; > > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > huge_gfp = vma_thp_gfp_mask(vma); > > > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > > > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > > > - &order); > > > + &order, sbinfo); > > > if (IS_ERR(folio)) { > > > alloc_nohuge: > > > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > > > - &order); > > > + &order, sbinfo); > > > } > > > if (IS_ERR(folio)) { > > > int retry = 5; > > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > if (folio_test_large(folio)) { > > > folio_unlock(folio); > > > folio_put(folio); > > > + if (order > 0) > > > + order--; > > > goto alloc_nohuge; > > > } > > > unlock: > > > -- > > > 2.39.2 > > >
On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > Add large folio support for shmem write path matching the same high > > > > order preference mechanism used for iomap buffered IO path as used in > > > > __filemap_get_folio(). > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > based on file size which takes care of the mapping requirements. > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > case swap is enabled. > > > > > > I didn't take a close look at the series, but I am not sure I > > > understand the rationale here. Reclaim will split high order shmem > > > folios anyway, right? > > > > For context, this is part of the enablement of large block sizes (LBS) > > effort [1][2][3], so the assumption here is that the kernel will > > reclaim memory with the same (large) block sizes that were written to > > the device. > > > > I'll add more context in the V2. > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > option is used, which is fairly recent. I doubt it is widely used. > > > > For now, I skipped the swap path as it currently lacks support for > > high order folios. But I'm currently looking into it as part of the LBS > > effort (please check spreadsheet at [2] for that). > > Thanks for the context, but I am not sure I understand. > > IIUC we are skipping allocating large folios in shmem if swap is > enabled in this patch. Swap does not support swapping out large folios > as a whole (except THPs), but page reclaim will split those large > folios and swap them out as order-0 pages anyway. So I am not sure I > understand why we need to skip allocating large folios if swap is > enabled. I lifted noswap condition and retested it again on top of 230918 and there is some regression. So, based on the results I guess the initial requirement may be the way to go. But what do you think? Here the logs: * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 -Failed 5 of 730 tests \ No newline at end of file +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 +Failed 9 of 730 tests \ No newline at end of file > > > > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > --- > > > > mm/shmem.c | 16 +++++++++++++--- > > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > > > index adff74751065..26ca555b1669 100644 > > > > --- a/mm/shmem.c > > > > +++ b/mm/shmem.c > > > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > > > > } > > > > > > > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > > > > - pgoff_t index, bool huge, unsigned int *order) > > > > + pgoff_t index, bool huge, unsigned int *order, > > > > + struct shmem_sb_info *sbinfo) > > > > { > > > > struct shmem_inode_info *info = SHMEM_I(inode); > > > > struct folio *folio; > > > > int nr; > > > > int err; > > > > > > > > + if (!sbinfo->noswap) > > > > + *order = 0; > > > > + else > > > > + *order = (*order == 1) ? 0 : *order; > > > > + > > > > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > > > > huge = false; > > > > nr = huge ? HPAGE_PMD_NR : 1U << *order; > > > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > > return 0; > > > > } > > > > > > > > + order = mapping_size_order(inode->i_mapping, index, len); > > > > + > > > > if (!shmem_is_huge(inode, index, false, > > > > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > > > > goto alloc_nohuge; > > > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > > huge_gfp = vma_thp_gfp_mask(vma); > > > > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > > > > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > > > > - &order); > > > > + &order, sbinfo); > > > > if (IS_ERR(folio)) { > > > > alloc_nohuge: > > > > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > > > > - &order); > > > > + &order, sbinfo); > > > > } > > > > if (IS_ERR(folio)) { > > > > int retry = 5; > > > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > > if (folio_test_large(folio)) { > > > > folio_unlock(folio); > > > > folio_put(folio); > > > > + if (order > 0) > > > > + order--; > > > > goto alloc_nohuge; > > > > } > > > > unlock: > > > > -- > > > > 2.39.2 > > > >
On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > Add large folio support for shmem write path matching the same high > > > > > order preference mechanism used for iomap buffered IO path as used in > > > > > __filemap_get_folio(). > > > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > > based on file size which takes care of the mapping requirements. > > > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > > case swap is enabled. > > > > > > > > I didn't take a close look at the series, but I am not sure I > > > > understand the rationale here. Reclaim will split high order shmem > > > > folios anyway, right? > > > > > > For context, this is part of the enablement of large block sizes (LBS) > > > effort [1][2][3], so the assumption here is that the kernel will > > > reclaim memory with the same (large) block sizes that were written to > > > the device. > > > > > > I'll add more context in the V2. > > > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > > option is used, which is fairly recent. I doubt it is widely used. > > > > > > For now, I skipped the swap path as it currently lacks support for > > > high order folios. But I'm currently looking into it as part of the LBS > > > effort (please check spreadsheet at [2] for that). > > > > Thanks for the context, but I am not sure I understand. > > > > IIUC we are skipping allocating large folios in shmem if swap is > > enabled in this patch. Swap does not support swapping out large folios > > as a whole (except THPs), but page reclaim will split those large > > folios and swap them out as order-0 pages anyway. So I am not sure I > > understand why we need to skip allocating large folios if swap is > > enabled. > > I lifted noswap condition and retested it again on top of 230918 and > there is some regression. So, based on the results I guess the initial > requirement may be the way to go. But what do you think? > > Here the logs: > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 > -Failed 5 of 730 tests > \ No newline at end of file > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 > +Failed 9 of 730 tests > \ No newline at end of file > > I am not really familiar with these tests so I cannot really tell what's going on. I can see "swapfiles are not supported" in the logs though, so it seems like we are seeing extra failures by just lifting "noswap" even without actually swapping. I am curious if this is just hiding a different issue, I would at least try to understand what's happening. Anyway, I don't have enough context here to be useful. I was just making an observation about reclaim splitting shmem folios to swap them out as order-0 pages, and asking why this is needed based on that. I will leave it up to you and the reviewers to decide if there's anything interesting here.
On Tue, Sep 19, 2023 at 09:00:16AM -0700, Yosry Ahmed wrote: > On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > Add large folio support for shmem write path matching the same high > > > > > > order preference mechanism used for iomap buffered IO path as used in > > > > > > __filemap_get_folio(). > > > > > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > > > based on file size which takes care of the mapping requirements. > > > > > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > > > case swap is enabled. > > > > > > > > > > I didn't take a close look at the series, but I am not sure I > > > > > understand the rationale here. Reclaim will split high order shmem > > > > > folios anyway, right? > > > > > > > > For context, this is part of the enablement of large block sizes (LBS) > > > > effort [1][2][3], so the assumption here is that the kernel will > > > > reclaim memory with the same (large) block sizes that were written to > > > > the device. > > > > > > > > I'll add more context in the V2. > > > > > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > > > option is used, which is fairly recent. I doubt it is widely used. > > > > > > > > For now, I skipped the swap path as it currently lacks support for > > > > high order folios. But I'm currently looking into it as part of the LBS > > > > effort (please check spreadsheet at [2] for that). > > > > > > Thanks for the context, but I am not sure I understand. > > > > > > IIUC we are skipping allocating large folios in shmem if swap is > > > enabled in this patch. Swap does not support swapping out large folios > > > as a whole (except THPs), but page reclaim will split those large > > > folios and swap them out as order-0 pages anyway. So I am not sure I > > > understand why we need to skip allocating large folios if swap is > > > enabled. > > > > I lifted noswap condition and retested it again on top of 230918 and > > there is some regression. So, based on the results I guess the initial > > requirement may be the way to go. But what do you think? > > > > Here the logs: > > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 > > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 > > > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 > > -Failed 5 of 730 tests > > \ No newline at end of file > > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 > > +Failed 9 of 730 tests > > \ No newline at end of file > > > > > I am not really familiar with these tests so I cannot really tell > what's going on. I can see "swapfiles are not supported" in the logs > though, so it seems like we are seeing extra failures by just lifting > "noswap" even without actually swapping. I am curious if this is just > hiding a different issue, I would at least try to understand what's > happening. > > Anyway, I don't have enough context here to be useful. I was just > making an observation about reclaim splitting shmem folios to swap > them out as order-0 pages, and asking why this is needed based on > that. I will leave it up to you and the reviewers to decide if there's > anything interesting here. The tests which are failing seem be related to permissions, I could not immediate decipher why, because as you suggest we'd just be doing the silly thing of splitting large folios on writepage. I'd prefer we don't require swap until those regressions would be fixed. Note that part of the rationale to enable this work is to eventually also extend swap code to support large order folios, so it is not like this would be left as-is. It is just that it may take time to resolve the kinks with swap. So I'd stick to nowap for now. The above tests also don't stress swap too, and if we do that I would imagine we might see some other undesirable failures. Luis
On Tue, Sep 19, 2023 at 2:47 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > > On Tue, Sep 19, 2023 at 09:00:16AM -0700, Yosry Ahmed wrote: > > On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > > > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > > > Add large folio support for shmem write path matching the same high > > > > > > > order preference mechanism used for iomap buffered IO path as used in > > > > > > > __filemap_get_folio(). > > > > > > > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > > > > based on file size which takes care of the mapping requirements. > > > > > > > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > > > > case swap is enabled. > > > > > > > > > > > > I didn't take a close look at the series, but I am not sure I > > > > > > understand the rationale here. Reclaim will split high order shmem > > > > > > folios anyway, right? > > > > > > > > > > For context, this is part of the enablement of large block sizes (LBS) > > > > > effort [1][2][3], so the assumption here is that the kernel will > > > > > reclaim memory with the same (large) block sizes that were written to > > > > > the device. > > > > > > > > > > I'll add more context in the V2. > > > > > > > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > > > > option is used, which is fairly recent. I doubt it is widely used. > > > > > > > > > > For now, I skipped the swap path as it currently lacks support for > > > > > high order folios. But I'm currently looking into it as part of the LBS > > > > > effort (please check spreadsheet at [2] for that). > > > > > > > > Thanks for the context, but I am not sure I understand. > > > > > > > > IIUC we are skipping allocating large folios in shmem if swap is > > > > enabled in this patch. Swap does not support swapping out large folios > > > > as a whole (except THPs), but page reclaim will split those large > > > > folios and swap them out as order-0 pages anyway. So I am not sure I > > > > understand why we need to skip allocating large folios if swap is > > > > enabled. > > > > > > I lifted noswap condition and retested it again on top of 230918 and > > > there is some regression. So, based on the results I guess the initial > > > requirement may be the way to go. But what do you think? > > > > > > Here the logs: > > > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 > > > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 > > > > > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 > > > -Failed 5 of 730 tests > > > \ No newline at end of file > > > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 > > > +Failed 9 of 730 tests > > > \ No newline at end of file > > > > > > > > I am not really familiar with these tests so I cannot really tell > > what's going on. I can see "swapfiles are not supported" in the logs > > though, so it seems like we are seeing extra failures by just lifting > > "noswap" even without actually swapping. I am curious if this is just > > hiding a different issue, I would at least try to understand what's > > happening. > > > > Anyway, I don't have enough context here to be useful. I was just > > making an observation about reclaim splitting shmem folios to swap > > them out as order-0 pages, and asking why this is needed based on > > that. I will leave it up to you and the reviewers to decide if there's > > anything interesting here. > > The tests which are failing seem be related to permissions, I could not > immediate decipher why, because as you suggest we'd just be doing the > silly thing of splitting large folios on writepage. > > I'd prefer we don't require swap until those regressions would be fixed. > > Note that part of the rationale to enable this work is to eventually > also extend swap code to support large order folios, so it is not like > this would be left as-is. It is just that it may take time to resolve > the kinks with swap. > > So I'd stick to nowap for now. > > The above tests also don't stress swap too, and if we do that I would > imagine we might see some other undesirable failures. > > Luis I thought we already have some notion of exercising swap with large shmem folios from THPs, so this shouldn't be new, but perhaps I am missing something.
diff --git a/mm/shmem.c b/mm/shmem.c index adff74751065..26ca555b1669 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, } static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, - pgoff_t index, bool huge, unsigned int *order) + pgoff_t index, bool huge, unsigned int *order, + struct shmem_sb_info *sbinfo) { struct shmem_inode_info *info = SHMEM_I(inode); struct folio *folio; int nr; int err; + if (!sbinfo->noswap) + *order = 0; + else + *order = (*order == 1) ? 0 : *order; + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) huge = false; nr = huge ? HPAGE_PMD_NR : 1U << *order; @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, return 0; } + order = mapping_size_order(inode->i_mapping, index, len); + if (!shmem_is_huge(inode, index, false, vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) goto alloc_nohuge; @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, huge_gfp = vma_thp_gfp_mask(vma); huge_gfp = limit_gfp_mask(huge_gfp, gfp); folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, - &order); + &order, sbinfo); if (IS_ERR(folio)) { alloc_nohuge: folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, - &order); + &order, sbinfo); } if (IS_ERR(folio)) { int retry = 5; @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, if (folio_test_large(folio)) { folio_unlock(folio); folio_put(folio); + if (order > 0) + order--; goto alloc_nohuge; } unlock:
Add large folio support for shmem write path matching the same high order preference mechanism used for iomap buffered IO path as used in __filemap_get_folio(). Use the __folio_get_max_order to get a hint for the order of the folio based on file size which takes care of the mapping requirements. Swap does not support high order folios for now, so make it order 0 in case swap is enabled. Signed-off-by: Daniel Gomez <da.gomez@samsung.com> --- mm/shmem.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-)