Message ID | 20231213-zswap-dstmem-v1-2-896763369d04@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/zswap: dstmem reuse optimizations and cleanups | expand |
On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > Change the dstmem size from 2 * PAGE_SIZE to only one page since > we only need at most one page when compress, and the "dlen" is also > PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE > we don't wanna store the output in zswap anyway. > > So change it to one page, and delete the stale comment. I couldn't find the history of why we needed 2 * PAGE_SIZE, it would be nice if someone has the context, perhaps one of the maintainers. One potential reason is that we used to store a zswap header containing the swap entry in the compressed page for writeback purposes, but we don't do that anymore. Maybe we wanted to be able to handle the case where an incompressible page would exceed PAGE_SIZE because of that?
On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou > <zhouchengming@bytedance.com> wrote: > > > > Change the dstmem size from 2 * PAGE_SIZE to only one page since > > we only need at most one page when compress, and the "dlen" is also > > PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE > > we don't wanna store the output in zswap anyway. > > > > So change it to one page, and delete the stale comment. > > I couldn't find the history of why we needed 2 * PAGE_SIZE, it would > be nice if someone has the context, perhaps one of the maintainers. It'd be very nice indeed. > > One potential reason is that we used to store a zswap header > containing the swap entry in the compressed page for writeback > purposes, but we don't do that anymore. Maybe we wanted to be able to > handle the case where an incompressible page would exceed PAGE_SIZE > because of that? It could be hmm. I didn't study the old zswap architecture too much, but it has been 2 * PAGE_SIZE since the time zswap was first merged last I checked. I'm not 100% comfortable ACK-ing the undoing of something that looks so intentional, but FTR, AFAICT, this looks correct to me.
On 2023/12/14 08:18, Nhat Pham wrote: > On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: >> >> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou >> <zhouchengming@bytedance.com> wrote: >>> >>> Change the dstmem size from 2 * PAGE_SIZE to only one page since >>> we only need at most one page when compress, and the "dlen" is also >>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE >>> we don't wanna store the output in zswap anyway. >>> >>> So change it to one page, and delete the stale comment. >> >> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would >> be nice if someone has the context, perhaps one of the maintainers. > > It'd be very nice indeed. > >> >> One potential reason is that we used to store a zswap header >> containing the swap entry in the compressed page for writeback >> purposes, but we don't do that anymore. Maybe we wanted to be able to >> handle the case where an incompressible page would exceed PAGE_SIZE >> because of that? > > It could be hmm. I didn't study the old zswap architecture too much, > but it has been 2 * PAGE_SIZE since the time zswap was first merged > last I checked. > I'm not 100% comfortable ACK-ing the undoing of something that looks > so intentional, but FTR, AFAICT, this looks correct to me. Right, there is no any history about the reason why we needed 2 pages. But obviously only one page is needed from the current code and no any problem found in the kernel build stress testing. Thanks!
On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > On 2023/12/14 08:18, Nhat Pham wrote: > > On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: > >> > >> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou > >> <zhouchengming@bytedance.com> wrote: > >>> > >>> Change the dstmem size from 2 * PAGE_SIZE to only one page since > >>> we only need at most one page when compress, and the "dlen" is also > >>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE > >>> we don't wanna store the output in zswap anyway. > >>> > >>> So change it to one page, and delete the stale comment. > >> > >> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would > >> be nice if someone has the context, perhaps one of the maintainers. > > > > It'd be very nice indeed. > > > >> > >> One potential reason is that we used to store a zswap header > >> containing the swap entry in the compressed page for writeback > >> purposes, but we don't do that anymore. Maybe we wanted to be able to > >> handle the case where an incompressible page would exceed PAGE_SIZE > >> because of that? > > > > It could be hmm. I didn't study the old zswap architecture too much, > > but it has been 2 * PAGE_SIZE since the time zswap was first merged > > last I checked. > > I'm not 100% comfortable ACK-ing the undoing of something that looks > > so intentional, but FTR, AFAICT, this looks correct to me. > > Right, there is no any history about the reason why we needed 2 pages. > But obviously only one page is needed from the current code and no any > problem found in the kernel build stress testing. Could you try manually stressing the compression with data that doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure that this case is specifically handled. I think using data from /dev/random will do that but please double check that dlen == PAGE_SIZE.
On 2023/12/14 21:37, Yosry Ahmed wrote: > On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou > <zhouchengming@bytedance.com> wrote: >> >> On 2023/12/14 08:18, Nhat Pham wrote: >>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: >>>> >>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou >>>> <zhouchengming@bytedance.com> wrote: >>>>> >>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since >>>>> we only need at most one page when compress, and the "dlen" is also >>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE >>>>> we don't wanna store the output in zswap anyway. >>>>> >>>>> So change it to one page, and delete the stale comment. >>>> >>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would >>>> be nice if someone has the context, perhaps one of the maintainers. >>> >>> It'd be very nice indeed. >>> >>>> >>>> One potential reason is that we used to store a zswap header >>>> containing the swap entry in the compressed page for writeback >>>> purposes, but we don't do that anymore. Maybe we wanted to be able to >>>> handle the case where an incompressible page would exceed PAGE_SIZE >>>> because of that? >>> >>> It could be hmm. I didn't study the old zswap architecture too much, >>> but it has been 2 * PAGE_SIZE since the time zswap was first merged >>> last I checked. >>> I'm not 100% comfortable ACK-ing the undoing of something that looks >>> so intentional, but FTR, AFAICT, this looks correct to me. >> >> Right, there is no any history about the reason why we needed 2 pages. >> But obviously only one page is needed from the current code and no any >> problem found in the kernel build stress testing. > > Could you try manually stressing the compression with data that > doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure > that this case is specifically handled. I think using data from > /dev/random will do that but please double check that dlen == > PAGE_SIZE. I just did the same kernel build testing, indeed there are a few cases that output dlen == PAGE_SIZE. bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}' @[1]: 2 @[0]: 12011430
On 2023/12/14 21:57, Chengming Zhou wrote: > On 2023/12/14 21:37, Yosry Ahmed wrote: >> On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou >> <zhouchengming@bytedance.com> wrote: >>> >>> On 2023/12/14 08:18, Nhat Pham wrote: >>>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: >>>>> >>>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou >>>>> <zhouchengming@bytedance.com> wrote: >>>>>> >>>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since >>>>>> we only need at most one page when compress, and the "dlen" is also >>>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE >>>>>> we don't wanna store the output in zswap anyway. >>>>>> >>>>>> So change it to one page, and delete the stale comment. >>>>> >>>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would >>>>> be nice if someone has the context, perhaps one of the maintainers. >>>> >>>> It'd be very nice indeed. >>>> >>>>> >>>>> One potential reason is that we used to store a zswap header >>>>> containing the swap entry in the compressed page for writeback >>>>> purposes, but we don't do that anymore. Maybe we wanted to be able to >>>>> handle the case where an incompressible page would exceed PAGE_SIZE >>>>> because of that? >>>> >>>> It could be hmm. I didn't study the old zswap architecture too much, >>>> but it has been 2 * PAGE_SIZE since the time zswap was first merged >>>> last I checked. >>>> I'm not 100% comfortable ACK-ing the undoing of something that looks >>>> so intentional, but FTR, AFAICT, this looks correct to me. >>> >>> Right, there is no any history about the reason why we needed 2 pages. >>> But obviously only one page is needed from the current code and no any >>> problem found in the kernel build stress testing. >> >> Could you try manually stressing the compression with data that >> doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure >> that this case is specifically handled. I think using data from >> /dev/random will do that but please double check that dlen == >> PAGE_SIZE. > > I just did the same kernel build testing, indeed there are a few cases > that output dlen == PAGE_SIZE. > > bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}' > > @[1]: 2 > @[0]: 12011430 I think we shouldn't put these poorly compressed output into zswap, maybe it's better to early return in these cases when compress ratio < threshold ratio, which can be tune by the user? e.g. in the same kernel build testing: bpftrace -e 'k:zpool_malloc {@[(uint32)arg1>2048]=count()}' @[1]: 1597706 @[0]: 10886138
On Thu, Dec 14, 2023 at 5:57 AM Chengming Zhou <zhouchengming@bytedance.com> wrote: > > On 2023/12/14 21:37, Yosry Ahmed wrote: > > On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou > > <zhouchengming@bytedance.com> wrote: > >> > >> On 2023/12/14 08:18, Nhat Pham wrote: > >>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: > >>>> > >>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou > >>>> <zhouchengming@bytedance.com> wrote: > >>>>> > >>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since > >>>>> we only need at most one page when compress, and the "dlen" is also > >>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE > >>>>> we don't wanna store the output in zswap anyway. > >>>>> > >>>>> So change it to one page, and delete the stale comment. > >>>> > >>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would > >>>> be nice if someone has the context, perhaps one of the maintainers. > >>> > >>> It'd be very nice indeed. > >>> > >>>> > >>>> One potential reason is that we used to store a zswap header > >>>> containing the swap entry in the compressed page for writeback > >>>> purposes, but we don't do that anymore. Maybe we wanted to be able to > >>>> handle the case where an incompressible page would exceed PAGE_SIZE > >>>> because of that? > >>> > >>> It could be hmm. I didn't study the old zswap architecture too much, > >>> but it has been 2 * PAGE_SIZE since the time zswap was first merged > >>> last I checked. > >>> I'm not 100% comfortable ACK-ing the undoing of something that looks > >>> so intentional, but FTR, AFAICT, this looks correct to me. > >> > >> Right, there is no any history about the reason why we needed 2 pages. > >> But obviously only one page is needed from the current code and no any > >> problem found in the kernel build stress testing. > > > > Could you try manually stressing the compression with data that > > doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure > > that this case is specifically handled. I think using data from > > /dev/random will do that but please double check that dlen == > > PAGE_SIZE. > > I just did the same kernel build testing, indeed there are a few cases > that output dlen == PAGE_SIZE. > > bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}' > > @[1]: 2 > @[0]: 12011430 That's very useful information, thanks for testing that. Please include this in the commit log. Please also include the fact that we used to store a zswap header with the compressed page but don't do that anymore, which *may* be the reason why this was needed back then. I still want someone who knows the history to Ack this, but FWIW it looks correct to me, so low-key: Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
[..] > > I think we shouldn't put these poorly compressed output into zswap, > maybe it's better to early return in these cases when compress ratio > < threshold ratio, which can be tune by the user? We have something similar at Google, but because we use zswap without a backing swapfile, we make those pages unevictable. For the upstream code, the pages will go to a backing swapfile, which arguably violates the LRU ordering, but may be the correct thing to do. There was a recent upstream attempt to solidify storing those incompressible pages in zswap in their uncompressed form to retain the LRU ordering. If you want, feel free to start a discussion about this separately, it's out of context for this patch series. Thanks!
On Thu, Dec 14, 2023 at 10:30 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Thu, Dec 14, 2023 at 5:57 AM Chengming Zhou > <zhouchengming@bytedance.com> wrote: > > > > On 2023/12/14 21:37, Yosry Ahmed wrote: > > > On Thu, Dec 14, 2023 at 5:33 AM Chengming Zhou > > > <zhouchengming@bytedance.com> wrote: > > >> > > >> On 2023/12/14 08:18, Nhat Pham wrote: > > >>> On Wed, Dec 13, 2023 at 3:34 PM Yosry Ahmed <yosryahmed@google.com> wrote: > > >>>> > > >>>> On Tue, Dec 12, 2023 at 8:18 PM Chengming Zhou > > >>>> <zhouchengming@bytedance.com> wrote: > > >>>>> > > >>>>> Change the dstmem size from 2 * PAGE_SIZE to only one page since > > >>>>> we only need at most one page when compress, and the "dlen" is also > > >>>>> PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE > > >>>>> we don't wanna store the output in zswap anyway. > > >>>>> > > >>>>> So change it to one page, and delete the stale comment. > > >>>> > > >>>> I couldn't find the history of why we needed 2 * PAGE_SIZE, it would > > >>>> be nice if someone has the context, perhaps one of the maintainers. > > >>> > > >>> It'd be very nice indeed. > > >>> > > >>>> > > >>>> One potential reason is that we used to store a zswap header > > >>>> containing the swap entry in the compressed page for writeback > > >>>> purposes, but we don't do that anymore. Maybe we wanted to be able to > > >>>> handle the case where an incompressible page would exceed PAGE_SIZE > > >>>> because of that? > > >>> > > >>> It could be hmm. I didn't study the old zswap architecture too much, > > >>> but it has been 2 * PAGE_SIZE since the time zswap was first merged > > >>> last I checked. > > >>> I'm not 100% comfortable ACK-ing the undoing of something that looks > > >>> so intentional, but FTR, AFAICT, this looks correct to me. > > >> > > >> Right, there is no any history about the reason why we needed 2 pages. > > >> But obviously only one page is needed from the current code and no any > > >> problem found in the kernel build stress testing. > > > > > > Could you try manually stressing the compression with data that > > > doesn't compress at all (i.e. dlen == PAGE_SIZE)? I want to make sure > > > that this case is specifically handled. I think using data from > > > /dev/random will do that but please double check that dlen == > > > PAGE_SIZE. FWIW, zsmalloc supports the storing of pages that are PAGE_SIZE in length, so a use case is probably there (although it could be for ZRAM). We tested it during the storing-uncompressed-pages patch. Architecturally, it seems that zswap just lets the backend allocator handle the rejection of compressed objects that are too large, and the compressor to reject pages that are too poorly compressed. > > > > I just did the same kernel build testing, indeed there are a few cases > > that output dlen == PAGE_SIZE. > > > > bpftrace -e 'k:zpool_malloc {@[(uint32)arg1==4096]=count()}' > > > > @[1]: 2 > > @[0]: 12011430 > > That's very useful information, thanks for testing that. Please > include this in the commit log. Please also include the fact that we > used to store a zswap header with the compressed page but don't do > that anymore, which *may* be the reason why this was needed back then. > > I still want someone who knows the history to Ack this, but FWIW it > looks correct to me, so low-key: > Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Anyway: Reviewed-by: Nhat Pham <nphamcs@gmail.com>
diff --git a/mm/zswap.c b/mm/zswap.c index edb8b45ed5a1..fa186945010d 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -707,7 +707,7 @@ static int zswap_dstmem_prepare(unsigned int cpu) struct mutex *mutex; u8 *dst; - dst = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL, cpu_to_node(cpu)); + dst = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); if (!dst) return -ENOMEM; @@ -1662,8 +1662,7 @@ bool zswap_store(struct folio *folio) sg_init_table(&input, 1); sg_set_page(&input, page, PAGE_SIZE, 0); - /* zswap_dstmem is of size (PAGE_SIZE * 2). Reflect same in sg_list */ - sg_init_one(&output, dst, PAGE_SIZE * 2); + sg_init_one(&output, dst, PAGE_SIZE); acomp_request_set_params(acomp_ctx->req, &input, &output, PAGE_SIZE, dlen); /* * it maybe looks a little bit silly that we send an asynchronous request,
Change the dstmem size from 2 * PAGE_SIZE to only one page since we only need at most one page when compress, and the "dlen" is also PAGE_SIZE in acomp_request_set_params(). If the output size > PAGE_SIZE we don't wanna store the output in zswap anyway. So change it to one page, and delete the stale comment. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- mm/zswap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)