Message ID | 20241001053222.6944-7-kanchana.p.sridhar@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: zswap swap-out of large folios | expand |
On Mon, Sep 30, 2024 at 10:32:21PM -0700, Kanchana P Sridhar wrote: > zswap_store() will store large folios by compressing them page by page. > > This patch provides a sequential implementation of storing a large folio > in zswap_store() by iterating through each page in the folio to compress > and store it in the zswap zpool. > > zswap_store() calls the newly added zswap_store_page() function for each > page in the folio. zswap_store_page() handles compressing and storing each > page. > > We check the global and per-cgroup limits once at the beginning of > zswap_store(), and only check that the limit is not reached yet. This is > racy and inaccurate, but it should be sufficient for now. We also obtain > initial references to the relevant objcg and pool to guarantee that > subsequent references can be acquired by zswap_store_page(). A new function > zswap_pool_get() is added to facilitate this. > > If these one-time checks pass, we compress the pages of the folio, while > maintaining a running count of compressed bytes for all the folio's pages. > If all pages are successfully compressed and stored, we do the cgroup > zswap charging with the total compressed bytes, and batch update the > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() once, > before returning from zswap_store(). > > If an error is encountered during the store of any page in the folio, > all pages in that folio currently stored in zswap will be invalidated. > Thus, a folio is either entirely stored in zswap, or entirely not stored > in zswap. > > The most important value provided by this patch is it enables swapping out > large folios to zswap without splitting them. Furthermore, it batches some > operations while doing so (cgroup charging, stats updates). > > This patch also forms the basis for building compress batching of pages in > a large folio in zswap_store() by compressing up to say, 8 pages of the > folio in parallel in hardware using the Intel In-Memory Analytics > Accelerator (Intel IAA). > > This change reuses and adapts the functionality in Ryan Roberts' RFC > patch [1]: > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u > > Co-developed-by: Ryan Roberts I would change that to Originally-by: Ryan Roberts <ryan.roberts@arm.com> > Signed-off-by: and drop this for now. > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
On Mon, Sep 30, 2024 at 10:37 PM Kanchana P Sridhar <kanchana.p.sridhar@intel.com> wrote: > > zswap_store() will store large folios by compressing them page by page. > > This patch provides a sequential implementation of storing a large folio > in zswap_store() by iterating through each page in the folio to compress > and store it in the zswap zpool. > > zswap_store() calls the newly added zswap_store_page() function for each > page in the folio. zswap_store_page() handles compressing and storing each > page. > > We check the global and per-cgroup limits once at the beginning of > zswap_store(), and only check that the limit is not reached yet. This is > racy and inaccurate, but it should be sufficient for now. We also obtain > initial references to the relevant objcg and pool to guarantee that > subsequent references can be acquired by zswap_store_page(). A new function > zswap_pool_get() is added to facilitate this. > > If these one-time checks pass, we compress the pages of the folio, while > maintaining a running count of compressed bytes for all the folio's pages. > If all pages are successfully compressed and stored, we do the cgroup > zswap charging with the total compressed bytes, and batch update the > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() once, > before returning from zswap_store(). > > If an error is encountered during the store of any page in the folio, > all pages in that folio currently stored in zswap will be invalidated. > Thus, a folio is either entirely stored in zswap, or entirely not stored > in zswap. > > The most important value provided by this patch is it enables swapping out > large folios to zswap without splitting them. Furthermore, it batches some > operations while doing so (cgroup charging, stats updates). > > This patch also forms the basis for building compress batching of pages in > a large folio in zswap_store() by compressing up to say, 8 pages of the > folio in parallel in hardware using the Intel In-Memory Analytics > Accelerator (Intel IAA). > > This change reuses and adapts the functionality in Ryan Roberts' RFC > patch [1]: > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u > > Co-developed-by: Ryan Roberts > Signed-off-by: > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> -----Original Message----- > From: Johannes Weiner <hannes@cmpxchg.org> > Sent: Tuesday, October 1, 2024 4:11 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > zswap_store(). > > On Mon, Sep 30, 2024 at 10:32:21PM -0700, Kanchana P Sridhar wrote: > > zswap_store() will store large folios by compressing them page by page. > > > > This patch provides a sequential implementation of storing a large folio > > in zswap_store() by iterating through each page in the folio to compress > > and store it in the zswap zpool. > > > > zswap_store() calls the newly added zswap_store_page() function for each > > page in the folio. zswap_store_page() handles compressing and storing each > > page. > > > > We check the global and per-cgroup limits once at the beginning of > > zswap_store(), and only check that the limit is not reached yet. This is > > racy and inaccurate, but it should be sufficient for now. We also obtain > > initial references to the relevant objcg and pool to guarantee that > > subsequent references can be acquired by zswap_store_page(). A new > function > > zswap_pool_get() is added to facilitate this. > > > > If these one-time checks pass, we compress the pages of the folio, while > > maintaining a running count of compressed bytes for all the folio's pages. > > If all pages are successfully compressed and stored, we do the cgroup > > zswap charging with the total compressed bytes, and batch update the > > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() > once, > > before returning from zswap_store(). > > > > If an error is encountered during the store of any page in the folio, > > all pages in that folio currently stored in zswap will be invalidated. > > Thus, a folio is either entirely stored in zswap, or entirely not stored > > in zswap. > > > > The most important value provided by this patch is it enables swapping out > > large folios to zswap without splitting them. Furthermore, it batches some > > operations while doing so (cgroup charging, stats updates). > > > > This patch also forms the basis for building compress batching of pages in > > a large folio in zswap_store() by compressing up to say, 8 pages of the > > folio in parallel in hardware using the Intel In-Memory Analytics > > Accelerator (Intel IAA). > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > patch [1]: > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > ryan.roberts@arm.com/T/#u > > > > Co-developed-by: Ryan Roberts > > I would change that to > > Originally-by: Ryan Roberts <ryan.roberts@arm.com> > > > Signed-off-by: > > and drop this for now. Thanks Johannes. Sure, this sounds good. Should I post a v11 for just this specific patch which this change, or a v11 for the entire series? > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> Many thanks! Really appreciate the code review and comments. Thanks, Kanchana
> -----Original Message----- > From: Nhat Pham <nphamcs@gmail.com> > Sent: Tuesday, October 1, 2024 9:58 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > zswap_store(). > > On Mon, Sep 30, 2024 at 10:37 PM Kanchana P Sridhar > <kanchana.p.sridhar@intel.com> wrote: > > > > zswap_store() will store large folios by compressing them page by page. > > > > This patch provides a sequential implementation of storing a large folio > > in zswap_store() by iterating through each page in the folio to compress > > and store it in the zswap zpool. > > > > zswap_store() calls the newly added zswap_store_page() function for each > > page in the folio. zswap_store_page() handles compressing and storing each > > page. > > > > We check the global and per-cgroup limits once at the beginning of > > zswap_store(), and only check that the limit is not reached yet. This is > > racy and inaccurate, but it should be sufficient for now. We also obtain > > initial references to the relevant objcg and pool to guarantee that > > subsequent references can be acquired by zswap_store_page(). A new > function > > zswap_pool_get() is added to facilitate this. > > > > If these one-time checks pass, we compress the pages of the folio, while > > maintaining a running count of compressed bytes for all the folio's pages. > > If all pages are successfully compressed and stored, we do the cgroup > > zswap charging with the total compressed bytes, and batch update the > > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() > once, > > before returning from zswap_store(). > > > > If an error is encountered during the store of any page in the folio, > > all pages in that folio currently stored in zswap will be invalidated. > > Thus, a folio is either entirely stored in zswap, or entirely not stored > > in zswap. > > > > The most important value provided by this patch is it enables swapping out > > large folios to zswap without splitting them. Furthermore, it batches some > > operations while doing so (cgroup charging, stats updates). > > > > This patch also forms the basis for building compress batching of pages in > > a large folio in zswap_store() by compressing up to say, 8 pages of the > > folio in parallel in hardware using the Intel In-Memory Analytics > > Accelerator (Intel IAA). > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > patch [1]: > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > ryan.roberts@arm.com/T/#u > > > > Co-developed-by: Ryan Roberts > > Signed-off-by: > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > Reviewed-by: Nhat Pham <nphamcs@gmail.com> Thanks Nhat!
On Tue, Oct 1, 2024 at 10:01 AM Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> wrote: > > > -----Original Message----- > > From: Johannes Weiner <hannes@cmpxchg.org> > > Sent: Tuesday, October 1, 2024 4:11 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > yosryahmed@google.com; nphamcs@gmail.com; > > chengming.zhou@linux.dev; usamaarif642@gmail.com; > > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > > zswap_store(). > > > > On Mon, Sep 30, 2024 at 10:32:21PM -0700, Kanchana P Sridhar wrote: > > > zswap_store() will store large folios by compressing them page by page. > > > > > > This patch provides a sequential implementation of storing a large folio > > > in zswap_store() by iterating through each page in the folio to compress > > > and store it in the zswap zpool. > > > > > > zswap_store() calls the newly added zswap_store_page() function for each > > > page in the folio. zswap_store_page() handles compressing and storing each > > > page. > > > > > > We check the global and per-cgroup limits once at the beginning of > > > zswap_store(), and only check that the limit is not reached yet. This is > > > racy and inaccurate, but it should be sufficient for now. We also obtain > > > initial references to the relevant objcg and pool to guarantee that > > > subsequent references can be acquired by zswap_store_page(). A new > > function > > > zswap_pool_get() is added to facilitate this. > > > > > > If these one-time checks pass, we compress the pages of the folio, while > > > maintaining a running count of compressed bytes for all the folio's pages. > > > If all pages are successfully compressed and stored, we do the cgroup > > > zswap charging with the total compressed bytes, and batch update the > > > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() > > once, > > > before returning from zswap_store(). > > > > > > If an error is encountered during the store of any page in the folio, > > > all pages in that folio currently stored in zswap will be invalidated. > > > Thus, a folio is either entirely stored in zswap, or entirely not stored > > > in zswap. > > > > > > The most important value provided by this patch is it enables swapping out > > > large folios to zswap without splitting them. Furthermore, it batches some > > > operations while doing so (cgroup charging, stats updates). > > > > > > This patch also forms the basis for building compress batching of pages in > > > a large folio in zswap_store() by compressing up to say, 8 pages of the > > > folio in parallel in hardware using the Intel In-Memory Analytics > > > Accelerator (Intel IAA). > > > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > > patch [1]: > > > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > > ryan.roberts@arm.com/T/#u > > > > > > Co-developed-by: Ryan Roberts > > > > I would change that to > > > > Originally-by: Ryan Roberts <ryan.roberts@arm.com> > > > > > Signed-off-by: > > > > and drop this for now. > > Thanks Johannes. Sure, this sounds good. Should I post a v11 for just this > specific patch which this change, or a v11 for the entire series? Andrew could probably make the change for you while applying the patches to mm-unstable if you ask nicely :) Also since we agreed further cleanup can be done as a followup: Acked-by: Yosry Ahmed <yosryahmed@google.com>
> -----Original Message----- > From: Yosry Ahmed <yosryahmed@google.com> > Sent: Tuesday, October 1, 2024 10:04 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: Johannes Weiner <hannes@cmpxchg.org>; linux-kernel@vger.kernel.org; > linux-mm@kvack.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > usamaarif642@gmail.com; shakeel.butt@linux.dev; ryan.roberts@arm.com; > Huang, Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh > <vinodh.gopal@intel.com> > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > zswap_store(). > > On Tue, Oct 1, 2024 at 10:01 AM Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> wrote: > > > > > -----Original Message----- > > > From: Johannes Weiner <hannes@cmpxchg.org> > > > Sent: Tuesday, October 1, 2024 4:11 AM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > > yosryahmed@google.com; nphamcs@gmail.com; > > > chengming.zhou@linux.dev; usamaarif642@gmail.com; > > > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi > K > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > > > zswap_store(). > > > > > > On Mon, Sep 30, 2024 at 10:32:21PM -0700, Kanchana P Sridhar wrote: > > > > zswap_store() will store large folios by compressing them page by page. > > > > > > > > This patch provides a sequential implementation of storing a large folio > > > > in zswap_store() by iterating through each page in the folio to compress > > > > and store it in the zswap zpool. > > > > > > > > zswap_store() calls the newly added zswap_store_page() function for > each > > > > page in the folio. zswap_store_page() handles compressing and storing > each > > > > page. > > > > > > > > We check the global and per-cgroup limits once at the beginning of > > > > zswap_store(), and only check that the limit is not reached yet. This is > > > > racy and inaccurate, but it should be sufficient for now. We also obtain > > > > initial references to the relevant objcg and pool to guarantee that > > > > subsequent references can be acquired by zswap_store_page(). A new > > > function > > > > zswap_pool_get() is added to facilitate this. > > > > > > > > If these one-time checks pass, we compress the pages of the folio, while > > > > maintaining a running count of compressed bytes for all the folio's pages. > > > > If all pages are successfully compressed and stored, we do the cgroup > > > > zswap charging with the total compressed bytes, and batch update the > > > > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() > > > once, > > > > before returning from zswap_store(). > > > > > > > > If an error is encountered during the store of any page in the folio, > > > > all pages in that folio currently stored in zswap will be invalidated. > > > > Thus, a folio is either entirely stored in zswap, or entirely not stored > > > > in zswap. > > > > > > > > The most important value provided by this patch is it enables swapping > out > > > > large folios to zswap without splitting them. Furthermore, it batches > some > > > > operations while doing so (cgroup charging, stats updates). > > > > > > > > This patch also forms the basis for building compress batching of pages in > > > > a large folio in zswap_store() by compressing up to say, 8 pages of the > > > > folio in parallel in hardware using the Intel In-Memory Analytics > > > > Accelerator (Intel IAA). > > > > > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > > > patch [1]: > > > > > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > > > ryan.roberts@arm.com/T/#u > > > > > > > > Co-developed-by: Ryan Roberts > > > > > > I would change that to > > > > > > Originally-by: Ryan Roberts <ryan.roberts@arm.com> > > > > > > > Signed-off-by: > > > > > > and drop this for now. > > > > Thanks Johannes. Sure, this sounds good. Should I post a v11 for just this > > specific patch which this change, or a v11 for the entire series? > > Andrew could probably make the change for you while applying the > patches to mm-unstable if you ask nicely :) > > Also since we agreed further cleanup can be done as a followup: > Acked-by: Yosry Ahmed <yosryahmed@google.com> Thanks Yosry!
> -----Original Message----- > From: Johannes Weiner <hannes@cmpxchg.org> > Sent: Tuesday, October 1, 2024 4:11 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > zswap_store(). > > On Mon, Sep 30, 2024 at 10:32:21PM -0700, Kanchana P Sridhar wrote: > > zswap_store() will store large folios by compressing them page by page. > > > > This patch provides a sequential implementation of storing a large folio > > in zswap_store() by iterating through each page in the folio to compress > > and store it in the zswap zpool. > > > > zswap_store() calls the newly added zswap_store_page() function for each > > page in the folio. zswap_store_page() handles compressing and storing each > > page. > > > > We check the global and per-cgroup limits once at the beginning of > > zswap_store(), and only check that the limit is not reached yet. This is > > racy and inaccurate, but it should be sufficient for now. We also obtain > > initial references to the relevant objcg and pool to guarantee that > > subsequent references can be acquired by zswap_store_page(). A new > function > > zswap_pool_get() is added to facilitate this. > > > > If these one-time checks pass, we compress the pages of the folio, while > > maintaining a running count of compressed bytes for all the folio's pages. > > If all pages are successfully compressed and stored, we do the cgroup > > zswap charging with the total compressed bytes, and batch update the > > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() > once, > > before returning from zswap_store(). > > > > If an error is encountered during the store of any page in the folio, > > all pages in that folio currently stored in zswap will be invalidated. > > Thus, a folio is either entirely stored in zswap, or entirely not stored > > in zswap. > > > > The most important value provided by this patch is it enables swapping out > > large folios to zswap without splitting them. Furthermore, it batches some > > operations while doing so (cgroup charging, stats updates). > > > > This patch also forms the basis for building compress batching of pages in > > a large folio in zswap_store() by compressing up to say, 8 pages of the > > folio in parallel in hardware using the Intel In-Memory Analytics > > Accelerator (Intel IAA). > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > patch [1]: > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > ryan.roberts@arm.com/T/#u > > > > Co-developed-by: Ryan Roberts > > I would change that to > > Originally-by: Ryan Roberts <ryan.roberts@arm.com> > > > Signed-off-by: > > and drop this for now. Hi Andrew, Just wanted to check if you can make the change from "Co-developed-by/Signed-off-by:" to "Originally-by:" to acknowledge Ryan Roberts' contribution, when this patch is included in mm-unstable? Please do let me know if it is simpler if I submit a v11 for just this specific patch or for the entire series with this change. I will proceed based on your recommendation. Thanks, Kanchana > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> -----Original Message----- > From: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Sent: Tuesday, October 1, 2024 10:34 AM > To: Johannes Weiner <hannes@cmpxchg.org> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org; > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>; > Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Subject: RE: [PATCH v10 6/7] mm: zswap: Support large folios in > zswap_store(). > > > -----Original Message----- > > From: Johannes Weiner <hannes@cmpxchg.org> > > Sent: Tuesday, October 1, 2024 4:11 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > yosryahmed@google.com; nphamcs@gmail.com; > > chengming.zhou@linux.dev; usamaarif642@gmail.com; > > shakeel.butt@linux.dev; ryan.roberts@arm.com; Huang, Ying > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux- > foundation.org; > > willy@infradead.org; Zou, Nanhai <nanhai.zou@intel.com>; Feghali, Wajdi K > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v10 6/7] mm: zswap: Support large folios in > > zswap_store(). > > > > On Mon, Sep 30, 2024 at 10:32:21PM -0700, Kanchana P Sridhar wrote: > > > zswap_store() will store large folios by compressing them page by page. > > > > > > This patch provides a sequential implementation of storing a large folio > > > in zswap_store() by iterating through each page in the folio to compress > > > and store it in the zswap zpool. > > > > > > zswap_store() calls the newly added zswap_store_page() function for > each > > > page in the folio. zswap_store_page() handles compressing and storing > each > > > page. > > > > > > We check the global and per-cgroup limits once at the beginning of > > > zswap_store(), and only check that the limit is not reached yet. This is > > > racy and inaccurate, but it should be sufficient for now. We also obtain > > > initial references to the relevant objcg and pool to guarantee that > > > subsequent references can be acquired by zswap_store_page(). A new > > function > > > zswap_pool_get() is added to facilitate this. > > > > > > If these one-time checks pass, we compress the pages of the folio, while > > > maintaining a running count of compressed bytes for all the folio's pages. > > > If all pages are successfully compressed and stored, we do the cgroup > > > zswap charging with the total compressed bytes, and batch update the > > > zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() > > once, > > > before returning from zswap_store(). > > > > > > If an error is encountered during the store of any page in the folio, > > > all pages in that folio currently stored in zswap will be invalidated. > > > Thus, a folio is either entirely stored in zswap, or entirely not stored > > > in zswap. > > > > > > The most important value provided by this patch is it enables swapping > out > > > large folios to zswap without splitting them. Furthermore, it batches some > > > operations while doing so (cgroup charging, stats updates). > > > > > > This patch also forms the basis for building compress batching of pages in > > > a large folio in zswap_store() by compressing up to say, 8 pages of the > > > folio in parallel in hardware using the Intel In-Memory Analytics > > > Accelerator (Intel IAA). > > > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > > patch [1]: > > > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > > ryan.roberts@arm.com/T/#u > > > > > > Co-developed-by: Ryan Roberts > > > > I would change that to > > > > Originally-by: Ryan Roberts <ryan.roberts@arm.com> > > > > > Signed-off-by: > > > > and drop this for now. > > Hi Andrew, > > Just wanted to check if you can make the change from > "Co-developed-by/Signed-off-by:" to "Originally-by:" to acknowledge > Ryan Roberts' contribution, when this patch is included in mm-unstable? > > Please do let me know if it is simpler if I submit a v11 for just this > specific patch or for the entire series with this change. I will proceed > based on your recommendation. Many thanks for adding the "Originally-by:", Andrew! Really appreciate it. Thanks, Kanchana > > Thanks, > Kanchana > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
diff --git a/mm/zswap.c b/mm/zswap.c index 2b8da50f6322..09aaf70f95c6 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -411,6 +411,12 @@ static int __must_check zswap_pool_tryget(struct zswap_pool *pool) return percpu_ref_tryget(&pool->ref); } +/* The caller must already have a reference. */ +static void zswap_pool_get(struct zswap_pool *pool) +{ + percpu_ref_get(&pool->ref); +} + static void zswap_pool_put(struct zswap_pool *pool) { percpu_ref_put(&pool->ref); @@ -1402,68 +1408,38 @@ static void shrink_worker(struct work_struct *w) /********************************* * main API **********************************/ -bool zswap_store(struct folio *folio) + +static ssize_t zswap_store_page(struct page *page, + struct obj_cgroup *objcg, + struct zswap_pool *pool) { - swp_entry_t swp = folio->swap; - pgoff_t offset = swp_offset(swp); - struct xarray *tree = swap_zswap_tree(swp); struct zswap_entry *entry, *old; - struct obj_cgroup *objcg = NULL; - struct mem_cgroup *memcg = NULL; - - VM_WARN_ON_ONCE(!folio_test_locked(folio)); - VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); - - /* Large folios aren't supported */ - if (folio_test_large(folio)) - return false; - - if (!zswap_enabled) - goto check_old; - - /* Check cgroup limits */ - objcg = get_obj_cgroup_from_folio(folio); - if (objcg && !obj_cgroup_may_zswap(objcg)) { - memcg = get_mem_cgroup_from_objcg(objcg); - if (shrink_memcg(memcg)) { - mem_cgroup_put(memcg); - goto reject; - } - mem_cgroup_put(memcg); - } - - if (zswap_check_limits()) - goto reject; /* allocate entry */ - entry = zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio)); + entry = zswap_entry_cache_alloc(GFP_KERNEL, page_to_nid(page)); if (!entry) { zswap_reject_kmemcache_fail++; goto reject; } - /* if entry is successfully added, it keeps the reference */ - entry->pool = zswap_pool_current_get(); - if (!entry->pool) - goto freepage; + /* zswap_store() already holds a ref on 'objcg' and 'pool' */ + if (objcg) + obj_cgroup_get(objcg); + zswap_pool_get(pool); - if (objcg) { - memcg = get_mem_cgroup_from_objcg(objcg); - if (memcg_list_lru_alloc(memcg, &zswap_list_lru, GFP_KERNEL)) { - mem_cgroup_put(memcg); - goto put_pool; - } - mem_cgroup_put(memcg); - } + /* if entry is successfully added, it keeps the reference */ + entry->pool = pool; - if (!zswap_compress(&folio->page, entry)) - goto put_pool; + if (!zswap_compress(page, entry)) + goto put_pool_objcg; - entry->swpentry = swp; + entry->swpentry = page_swap_entry(page); entry->objcg = objcg; entry->referenced = true; - old = xa_store(tree, offset, entry, GFP_KERNEL); + old = xa_store(swap_zswap_tree(entry->swpentry), + swp_offset(entry->swpentry), + entry, GFP_KERNEL); if (xa_is_err(old)) { int err = xa_err(old); @@ -1480,11 +1456,6 @@ bool zswap_store(struct folio *folio) if (old) zswap_entry_free(old); - if (objcg) { - obj_cgroup_charge_zswap(objcg, entry->length); - count_objcg_events(objcg, ZSWPOUT, 1); - } - /* * We finish initializing the entry while it's already in xarray. * This is safe because: @@ -1500,32 +1471,114 @@ bool zswap_store(struct folio *folio) zswap_lru_add(&zswap_list_lru, entry); } - /* update stats */ - atomic_long_inc(&zswap_stored_pages); - count_vm_event(ZSWPOUT); - - return true; + /* + * We shouldn't have any possibility of failure after the entry is + * added in the xarray. The pool/objcg refs obtained here will only + * be dropped if/when zswap_entry_free() gets called. + */ + return entry->length; store_failed: zpool_free(entry->pool->zpool, entry->handle); -put_pool: - zswap_pool_put(entry->pool); -freepage: +put_pool_objcg: + zswap_pool_put(pool); + obj_cgroup_put(objcg); zswap_entry_cache_free(entry); reject: + return -EINVAL; +} + +bool zswap_store(struct folio *folio) +{ + long nr_pages = folio_nr_pages(folio); + swp_entry_t swp = folio->swap; + struct obj_cgroup *objcg = NULL; + struct mem_cgroup *memcg = NULL; + struct zswap_pool *pool; + size_t compressed_bytes = 0; + bool ret = false; + long index; + + VM_WARN_ON_ONCE(!folio_test_locked(folio)); + VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); + + if (!zswap_enabled) + goto check_old; + + objcg = get_obj_cgroup_from_folio(folio); + if (objcg && !obj_cgroup_may_zswap(objcg)) { + memcg = get_mem_cgroup_from_objcg(objcg); + if (shrink_memcg(memcg)) { + mem_cgroup_put(memcg); + goto put_objcg; + } + mem_cgroup_put(memcg); + } + + if (zswap_check_limits()) + goto put_objcg; + + pool = zswap_pool_current_get(); + if (!pool) + goto put_objcg; + + if (objcg) { + memcg = get_mem_cgroup_from_objcg(objcg); + if (memcg_list_lru_alloc(memcg, &zswap_list_lru, GFP_KERNEL)) { + mem_cgroup_put(memcg); + goto put_pool; + } + mem_cgroup_put(memcg); + } + + for (index = 0; index < nr_pages; ++index) { + struct page *page = folio_page(folio, index); + ssize_t bytes; + + bytes = zswap_store_page(page, objcg, pool); + if (bytes < 0) + goto put_pool; + compressed_bytes += bytes; + } + + if (objcg) { + obj_cgroup_charge_zswap(objcg, compressed_bytes); + count_objcg_events(objcg, ZSWPOUT, nr_pages); + } + + atomic_long_add(nr_pages, &zswap_stored_pages); + count_vm_events(ZSWPOUT, nr_pages); + + ret = true; + +put_pool: + zswap_pool_put(pool); +put_objcg: obj_cgroup_put(objcg); - if (zswap_pool_reached_full) + if (!ret && zswap_pool_reached_full) queue_work(shrink_wq, &zswap_shrink_work); check_old: /* - * If the zswap store fails or zswap is disabled, we must invalidate the - * possibly stale entry which was previously stored at this offset. - * Otherwise, writeback could overwrite the new data in the swapfile. + * If the zswap store fails or zswap is disabled, we must invalidate + * the possibly stale entries which were previously stored at the + * offsets corresponding to each page of the folio. Otherwise, + * writeback could overwrite the new data in the swapfile. */ - entry = xa_erase(tree, offset); - if (entry) - zswap_entry_free(entry); - return false; + if (!ret) { + unsigned type = swp_type(swp); + pgoff_t offset = swp_offset(swp); + struct zswap_entry *entry; + struct xarray *tree; + + for (index = 0; index < nr_pages; ++index) { + tree = swap_zswap_tree(swp_entry(type, offset + index)); + entry = xa_erase(tree, offset + index); + if (entry) + zswap_entry_free(entry); + } + } + + return ret; } bool zswap_load(struct folio *folio)
zswap_store() will store large folios by compressing them page by page. This patch provides a sequential implementation of storing a large folio in zswap_store() by iterating through each page in the folio to compress and store it in the zswap zpool. zswap_store() calls the newly added zswap_store_page() function for each page in the folio. zswap_store_page() handles compressing and storing each page. We check the global and per-cgroup limits once at the beginning of zswap_store(), and only check that the limit is not reached yet. This is racy and inaccurate, but it should be sufficient for now. We also obtain initial references to the relevant objcg and pool to guarantee that subsequent references can be acquired by zswap_store_page(). A new function zswap_pool_get() is added to facilitate this. If these one-time checks pass, we compress the pages of the folio, while maintaining a running count of compressed bytes for all the folio's pages. If all pages are successfully compressed and stored, we do the cgroup zswap charging with the total compressed bytes, and batch update the zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() once, before returning from zswap_store(). If an error is encountered during the store of any page in the folio, all pages in that folio currently stored in zswap will be invalidated. Thus, a folio is either entirely stored in zswap, or entirely not stored in zswap. The most important value provided by this patch is it enables swapping out large folios to zswap without splitting them. Furthermore, it batches some operations while doing so (cgroup charging, stats updates). This patch also forms the basis for building compress batching of pages in a large folio in zswap_store() by compressing up to say, 8 pages of the folio in parallel in hardware using the Intel In-Memory Analytics Accelerator (Intel IAA). This change reuses and adapts the functionality in Ryan Roberts' RFC patch [1]: "[RFC,v1] mm: zswap: Store large folios without splitting" [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u Co-developed-by: Ryan Roberts Signed-off-by: Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> --- mm/zswap.c | 189 ++++++++++++++++++++++++++++++++++------------------- 1 file changed, 121 insertions(+), 68 deletions(-)