diff mbox series

[v1] mm: zswap: Fix a potential memory leak in zswap_decompress().

Message ID 20241113052413.157039-1-kanchana.p.sridhar@intel.com (mailing list archive)
State New
Headers show
Series [v1] mm: zswap: Fix a potential memory leak in zswap_decompress(). | expand

Commit Message

Sridhar, Kanchana P Nov. 13, 2024, 5:24 a.m. UTC
This is a hotfix for a potential zpool memory leak that could result in
the existing zswap_decompress():

        mutex_unlock(&acomp_ctx->mutex);

        if (src != acomp_ctx->buffer)
                zpool_unmap_handle(zpool, entry->handle);

Releasing the lock before the conditional does not protect the integrity of
"src", which is set earlier under the acomp_ctx mutex lock. This poses a
risk for the conditional behaving as intended, and consequently not
unmapping the zpool handle, which could cause a zswap zpool memory leak.

This patch moves the mutex_unlock() to occur after the conditional and
subsequent zpool_unmap_handle(). This ensures that the value of "src"
obtained earlier, with the mutex locked, does not change.

Even though an actual memory leak was not observed, this fix seems like a
cleaner implementation.

Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
---
 mm/zswap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c

Comments

Yosry Ahmed Nov. 13, 2024, 5:34 a.m. UTC | #1
On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar
<kanchana.p.sridhar@intel.com> wrote:
>
> This is a hotfix for a potential zpool memory leak that could result in
> the existing zswap_decompress():
>
>         mutex_unlock(&acomp_ctx->mutex);
>
>         if (src != acomp_ctx->buffer)
>                 zpool_unmap_handle(zpool, entry->handle);
>
> Releasing the lock before the conditional does not protect the integrity of
> "src", which is set earlier under the acomp_ctx mutex lock. This poses a
> risk for the conditional behaving as intended, and consequently not
> unmapping the zpool handle, which could cause a zswap zpool memory leak.
>
> This patch moves the mutex_unlock() to occur after the conditional and
> subsequent zpool_unmap_handle(). This ensures that the value of "src"
> obtained earlier, with the mutex locked, does not change.

The commit log is too complicated and incorrect. It is talking about
the stability of 'src', but that's a local variable on the stack
anyway. It doesn't need protection.

The problem is 'acomp_ctx->buffer' being reused and changed after the
mutex is released. Leading to the check not being reliable. Please
simplify this.

>
> Even though an actual memory leak was not observed, this fix seems like a
> cleaner implementation.
>
> Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
> ---
>  mm/zswap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index f6316b66fb23..58810fa8ff23 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -986,10 +986,11 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
>         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
>         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> -       mutex_unlock(&acomp_ctx->mutex);
>
>         if (src != acomp_ctx->buffer)
>                 zpool_unmap_handle(zpool, entry->handle);

Actually now that I think more about it, I think this check isn't
entirely safe, even under the lock. Is it possible that
'acomp_ctx->buffer' just happens to be equal to 'src' from a previous
decompression at the same handle? In this case, we will also
mistakenly skip the unmap.

It would be more reliable to set a boolean variable if we copy to
acomp_ctx->buffer and do the unmap, and check that flag here to check
if the unmap was done or not.

> +
> +       mutex_unlock(&acomp_ctx->mutex);
>  }
>
>  /*********************************
>
> base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c
> --
> 2.27.0
>
Sridhar, Kanchana P Nov. 13, 2024, 5:58 a.m. UTC | #2
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Tuesday, November 12, 2024 9:35 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > This is a hotfix for a potential zpool memory leak that could result in
> > the existing zswap_decompress():
> >
> >         mutex_unlock(&acomp_ctx->mutex);
> >
> >         if (src != acomp_ctx->buffer)
> >                 zpool_unmap_handle(zpool, entry->handle);
> >
> > Releasing the lock before the conditional does not protect the integrity of
> > "src", which is set earlier under the acomp_ctx mutex lock. This poses a
> > risk for the conditional behaving as intended, and consequently not
> > unmapping the zpool handle, which could cause a zswap zpool memory
> leak.
> >
> > This patch moves the mutex_unlock() to occur after the conditional and
> > subsequent zpool_unmap_handle(). This ensures that the value of "src"
> > obtained earlier, with the mutex locked, does not change.
> 
> The commit log is too complicated and incorrect. It is talking about
> the stability of 'src', but that's a local variable on the stack
> anyway. It doesn't need protection.
> 
> The problem is 'acomp_ctx->buffer' being reused and changed after the
> mutex is released. Leading to the check not being reliable. Please
> simplify this.

Thanks Yosry. That's exactly what I meant, but I think the wording got
confusing. The problem I was trying to fix is the acomp_ctx->buffer
value changing after the lock is released. This could happen as a result of any
other compress or decompress that acquires the lock. I will simplify and
clarify accordingly.

> 
> >
> > Even though an actual memory leak was not observed, this fix seems like a
> > cleaner implementation.
> >
> > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
> > ---
> >  mm/zswap.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb23..58810fa8ff23 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -986,10 +986,11 @@ static void zswap_decompress(struct
> zswap_entry *entry, struct folio *folio)
> >         acomp_request_set_params(acomp_ctx->req, &input, &output, entry-
> >length, PAGE_SIZE);
> >         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> >req), &acomp_ctx->wait));
> >         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > -       mutex_unlock(&acomp_ctx->mutex);
> >
> >         if (src != acomp_ctx->buffer)
> >                 zpool_unmap_handle(zpool, entry->handle);
> 
> Actually now that I think more about it, I think this check isn't
> entirely safe, even under the lock. Is it possible that
> 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous
> decompression at the same handle? In this case, we will also
> mistakenly skip the unmap.

If we move the mutex_unlock to happen after the conditional and unmap,
shouldn't that be sufficient under all conditions? With the fix, "src" can
take on only 2 values in this procedure: the mapped handle or
acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case
if the mapped zpool handle happens to be equal to acomp_ctx->buffer.

Please let me know if I am missing anything.

> 
> It would be more reliable to set a boolean variable if we copy to
> acomp_ctx->buffer and do the unmap, and check that flag here to check
> if the unmap was done or not.

Sure, this could be done, but not sure if it is required. Please let me know
if we still need the boolean variable in addition to moving the mutex_unlock().

Thanks,
Kanchana

> 
> > +
> > +       mutex_unlock(&acomp_ctx->mutex);
> >  }
> >
> >  /*********************************
> >
> > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c
> > --
> > 2.27.0
> >
Yosry Ahmed Nov. 13, 2024, 6:21 a.m. UTC | #3
On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Tuesday, November 12, 2024 9:35 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > zswap_decompress().
> >
> > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > > This is a hotfix for a potential zpool memory leak that could result in
> > > the existing zswap_decompress():
> > >
> > >         mutex_unlock(&acomp_ctx->mutex);
> > >
> > >         if (src != acomp_ctx->buffer)
> > >                 zpool_unmap_handle(zpool, entry->handle);
> > >
> > > Releasing the lock before the conditional does not protect the integrity of
> > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a
> > > risk for the conditional behaving as intended, and consequently not
> > > unmapping the zpool handle, which could cause a zswap zpool memory
> > leak.
> > >
> > > This patch moves the mutex_unlock() to occur after the conditional and
> > > subsequent zpool_unmap_handle(). This ensures that the value of "src"
> > > obtained earlier, with the mutex locked, does not change.
> >
> > The commit log is too complicated and incorrect. It is talking about
> > the stability of 'src', but that's a local variable on the stack
> > anyway. It doesn't need protection.
> >
> > The problem is 'acomp_ctx->buffer' being reused and changed after the
> > mutex is released. Leading to the check not being reliable. Please
> > simplify this.
>
> Thanks Yosry. That's exactly what I meant, but I think the wording got
> confusing. The problem I was trying to fix is the acomp_ctx->buffer
> value changing after the lock is released. This could happen as a result of any
> other compress or decompress that acquires the lock. I will simplify and
> clarify accordingly.
>
> >
> > >
> > > Even though an actual memory leak was not observed, this fix seems like a
> > > cleaner implementation.
> > >
> > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
> > > ---
> > >  mm/zswap.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index f6316b66fb23..58810fa8ff23 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct
> > zswap_entry *entry, struct folio *folio)
> > >         acomp_request_set_params(acomp_ctx->req, &input, &output, entry-
> > >length, PAGE_SIZE);
> > >         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> > >req), &acomp_ctx->wait));
> > >         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > > -       mutex_unlock(&acomp_ctx->mutex);
> > >
> > >         if (src != acomp_ctx->buffer)
> > >                 zpool_unmap_handle(zpool, entry->handle);
> >
> > Actually now that I think more about it, I think this check isn't
> > entirely safe, even under the lock. Is it possible that
> > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous
> > decompression at the same handle? In this case, we will also
> > mistakenly skip the unmap.
>
> If we move the mutex_unlock to happen after the conditional and unmap,
> shouldn't that be sufficient under all conditions? With the fix, "src" can
> take on only 2 values in this procedure: the mapped handle or
> acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case
> if the mapped zpool handle happens to be equal to acomp_ctx->buffer.

Yes, that's the scenario I mean.

Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer'
so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer'
happens to be equal to the same handle from a previous operation as we
don't clear it.

In this case, the 'src != acomp_ctx->buffer' check will be false, even
though it should be true. This will result in an extra
zpool_unmap_handle() call. I didn't look closely, but this seems like
it can have a worse effect than leaking memory (e.g. there will be an
extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting
a random handle).

>
> Please let me know if I am missing anything.
>
> >
> > It would be more reliable to set a boolean variable if we copy to
> > acomp_ctx->buffer and do the unmap, and check that flag here to check
> > if the unmap was done or not.
>
> Sure, this could be done, but not sure if it is required. Please let me know
> if we still need the boolean variable in addition to moving the mutex_unlock().

If we use a boolean, there is no need to move mutex_unlock(). The
boolean will be a local variable on the stack that doesn't need
protection.

>
> Thanks,
> Kanchana
>
> >
> > > +
> > > +       mutex_unlock(&acomp_ctx->mutex);
> > >  }
> > >
> > >  /*********************************
> > >
> > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c
> > > --
> > > 2.27.0
> > >
Sridhar, Kanchana P Nov. 13, 2024, 7:12 p.m. UTC | #4
Hi Yosry,

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Tuesday, November 12, 2024 10:22 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Sent: Tuesday, November 12, 2024 9:35 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; nphamcs@gmail.com;
> chengming.zhou@linux.dev;
> > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org;
> > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > zswap_decompress().
> > >
> > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > > This is a hotfix for a potential zpool memory leak that could result in
> > > > the existing zswap_decompress():
> > > >
> > > >         mutex_unlock(&acomp_ctx->mutex);
> > > >
> > > >         if (src != acomp_ctx->buffer)
> > > >                 zpool_unmap_handle(zpool, entry->handle);
> > > >
> > > > Releasing the lock before the conditional does not protect the integrity
> of
> > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a
> > > > risk for the conditional behaving as intended, and consequently not
> > > > unmapping the zpool handle, which could cause a zswap zpool memory
> > > leak.
> > > >
> > > > This patch moves the mutex_unlock() to occur after the conditional and
> > > > subsequent zpool_unmap_handle(). This ensures that the value of "src"
> > > > obtained earlier, with the mutex locked, does not change.
> > >
> > > The commit log is too complicated and incorrect. It is talking about
> > > the stability of 'src', but that's a local variable on the stack
> > > anyway. It doesn't need protection.
> > >
> > > The problem is 'acomp_ctx->buffer' being reused and changed after the
> > > mutex is released. Leading to the check not being reliable. Please
> > > simplify this.
> >
> > Thanks Yosry. That's exactly what I meant, but I think the wording got
> > confusing. The problem I was trying to fix is the acomp_ctx->buffer
> > value changing after the lock is released. This could happen as a result of
> any
> > other compress or decompress that acquires the lock. I will simplify and
> > clarify accordingly.
> >
> > >
> > > >
> > > > Even though an actual memory leak was not observed, this fix seems
> like a
> > > > cleaner implementation.
> > > >
> > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
> > > > ---
> > > >  mm/zswap.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > > index f6316b66fb23..58810fa8ff23 100644
> > > > --- a/mm/zswap.c
> > > > +++ b/mm/zswap.c
> > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct
> > > zswap_entry *entry, struct folio *folio)
> > > >         acomp_request_set_params(acomp_ctx->req, &input, &output,
> entry-
> > > >length, PAGE_SIZE);
> > > >         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> > > >req), &acomp_ctx->wait));
> > > >         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > > > -       mutex_unlock(&acomp_ctx->mutex);
> > > >
> > > >         if (src != acomp_ctx->buffer)
> > > >                 zpool_unmap_handle(zpool, entry->handle);
> > >
> > > Actually now that I think more about it, I think this check isn't
> > > entirely safe, even under the lock. Is it possible that
> > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous
> > > decompression at the same handle? In this case, we will also
> > > mistakenly skip the unmap.
> >
> > If we move the mutex_unlock to happen after the conditional and unmap,
> > shouldn't that be sufficient under all conditions? With the fix, "src" can
> > take on only 2 values in this procedure: the mapped handle or
> > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case
> > if the mapped zpool handle happens to be equal to acomp_ctx->buffer.
> 
> Yes, that's the scenario I mean.
> 
> Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer'
> so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer'
> happens to be equal to the same handle from a previous operation as we
> don't clear it.

Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(),
we only copy to it.

> 
> In this case, the 'src != acomp_ctx->buffer' check will be false, even
> though it should be true. This will result in an extra
> zpool_unmap_handle() call. I didn't look closely, but this seems like
> it can have a worse effect than leaking memory (e.g. there will be an
> extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting
> a random handle).
> 
> >
> > Please let me know if I am missing anything.
> >
> > >
> > > It would be more reliable to set a boolean variable if we copy to
> > > acomp_ctx->buffer and do the unmap, and check that flag here to check
> > > if the unmap was done or not.
> >
> > Sure, this could be done, but not sure if it is required. Please let me know
> > if we still need the boolean variable in addition to moving the
> mutex_unlock().
> 
> If we use a boolean, there is no need to move mutex_unlock(). The
> boolean will be a local variable on the stack that doesn't need
> protection.

I agree, using the boolean variable to do the unmap rather than the check
for (src != acomp_ctx->buffer) is more fail-safe.

I am still thinking moving the mutex_unlock() could help, or at least have
no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock
safeguards the interaction between the decompress operation, the
sg_*() API calls inside zswap_decompress() and the shared zpool.

If we release the per-cpu acomp_ctx's mutex lock before the
zpool_unmap_handle(), is it possible that another cpu could acquire
it's acomp_ctx's lock and map the same zpool handle (that the earlier
cpu has yet to unmap or is concurrently unmapping) for a write?
If this could happen, would it result in undefined state for both
these zpool ops on different cpu's?

Would keeping the per-cpu mutex locked through the
zpool_unmap_handle() create a non-preemptible state that would
avoid this? IOW, if the above scenario is possible, does the per-cpu
acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow
prevented by the implementation of the zpools?

Just thought I would bring up these open questions. Please do share
your thoughts and advise.

Thanks,
Kanchana

> 
> >
> > Thanks,
> > Kanchana
> >
> > >
> > > > +
> > > > +       mutex_unlock(&acomp_ctx->mutex);
> > > >  }
> > > >
> > > >  /*********************************
> > > >
> > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c
> > > > --
> > > > 2.27.0
> > > >
Yosry Ahmed Nov. 13, 2024, 8:11 p.m. UTC | #5
On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
> Hi Yosry,
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Tuesday, November 12, 2024 10:22 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > zswap_decompress().
> >
> > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > Sent: Tuesday, November 12, 2024 9:35 PM
> > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > > hannes@cmpxchg.org; nphamcs@gmail.com;
> > chengming.zhou@linux.dev;
> > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > foundation.org;
> > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > > <vinodh.gopal@intel.com>
> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > zswap_decompress().
> > > >
> > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar
> > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > >
> > > > > This is a hotfix for a potential zpool memory leak that could result in
> > > > > the existing zswap_decompress():
> > > > >
> > > > >         mutex_unlock(&acomp_ctx->mutex);
> > > > >
> > > > >         if (src != acomp_ctx->buffer)
> > > > >                 zpool_unmap_handle(zpool, entry->handle);
> > > > >
> > > > > Releasing the lock before the conditional does not protect the integrity
> > of
> > > > > "src", which is set earlier under the acomp_ctx mutex lock. This poses a
> > > > > risk for the conditional behaving as intended, and consequently not
> > > > > unmapping the zpool handle, which could cause a zswap zpool memory
> > > > leak.
> > > > >
> > > > > This patch moves the mutex_unlock() to occur after the conditional and
> > > > > subsequent zpool_unmap_handle(). This ensures that the value of "src"
> > > > > obtained earlier, with the mutex locked, does not change.
> > > >
> > > > The commit log is too complicated and incorrect. It is talking about
> > > > the stability of 'src', but that's a local variable on the stack
> > > > anyway. It doesn't need protection.
> > > >
> > > > The problem is 'acomp_ctx->buffer' being reused and changed after the
> > > > mutex is released. Leading to the check not being reliable. Please
> > > > simplify this.
> > >
> > > Thanks Yosry. That's exactly what I meant, but I think the wording got
> > > confusing. The problem I was trying to fix is the acomp_ctx->buffer
> > > value changing after the lock is released. This could happen as a result of
> > any
> > > other compress or decompress that acquires the lock. I will simplify and
> > > clarify accordingly.
> > >
> > > >
> > > > >
> > > > > Even though an actual memory leak was not observed, this fix seems
> > like a
> > > > > cleaner implementation.
> > > > >
> > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
> > > > > ---
> > > > >  mm/zswap.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > > > index f6316b66fb23..58810fa8ff23 100644
> > > > > --- a/mm/zswap.c
> > > > > +++ b/mm/zswap.c
> > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct
> > > > zswap_entry *entry, struct folio *folio)
> > > > >         acomp_request_set_params(acomp_ctx->req, &input, &output,
> > entry-
> > > > >length, PAGE_SIZE);
> > > > >         BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> > > > >req), &acomp_ctx->wait));
> > > > >         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > > > > -       mutex_unlock(&acomp_ctx->mutex);
> > > > >
> > > > >         if (src != acomp_ctx->buffer)
> > > > >                 zpool_unmap_handle(zpool, entry->handle);
> > > >
> > > > Actually now that I think more about it, I think this check isn't
> > > > entirely safe, even under the lock. Is it possible that
> > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous
> > > > decompression at the same handle? In this case, we will also
> > > > mistakenly skip the unmap.
> > >
> > > If we move the mutex_unlock to happen after the conditional and unmap,
> > > shouldn't that be sufficient under all conditions? With the fix, "src" can
> > > take on only 2 values in this procedure: the mapped handle or
> > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case
> > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer.
> >
> > Yes, that's the scenario I mean.
> >
> > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer'
> > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer'
> > happens to be equal to the same handle from a previous operation as we
> > don't clear it.
>
> Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(),
> we only copy to it.

Duh, yes. I confused myself, sorry for the noise.

>
> >
> > In this case, the 'src != acomp_ctx->buffer' check will be false, even
> > though it should be true. This will result in an extra
> > zpool_unmap_handle() call. I didn't look closely, but this seems like
> > it can have a worse effect than leaking memory (e.g. there will be an
> > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting
> > a random handle).
> >
> > >
> > > Please let me know if I am missing anything.
> > >
> > > >
> > > > It would be more reliable to set a boolean variable if we copy to
> > > > acomp_ctx->buffer and do the unmap, and check that flag here to check
> > > > if the unmap was done or not.
> > >
> > > Sure, this could be done, but not sure if it is required. Please let me know
> > > if we still need the boolean variable in addition to moving the
> > mutex_unlock().
> >
> > If we use a boolean, there is no need to move mutex_unlock(). The
> > boolean will be a local variable on the stack that doesn't need
> > protection.
>
> I agree, using the boolean variable to do the unmap rather than the check
> for (src != acomp_ctx->buffer) is more fail-safe.
>
> I am still thinking moving the mutex_unlock() could help, or at least have
> no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock
> safeguards the interaction between the decompress operation, the
> sg_*() API calls inside zswap_decompress() and the shared zpool.
>
> If we release the per-cpu acomp_ctx's mutex lock before the
> zpool_unmap_handle(), is it possible that another cpu could acquire
> it's acomp_ctx's lock and map the same zpool handle (that the earlier
> cpu has yet to unmap or is concurrently unmapping) for a write?
> If this could happen, would it result in undefined state for both
> these zpool ops on different cpu's?

Why would this result in an undefined state? For zsmalloc, mapping
uses a per-CPU buffer and preemption is disabled between mapping and
unmapping anyway. If another CPU maps the object it should be fine.

>
> Would keeping the per-cpu mutex locked through the
> zpool_unmap_handle() create a non-preemptible state that would
> avoid this? IOW, if the above scenario is possible, does the per-cpu
> acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow
> prevented by the implementation of the zpools?

At least for zsmalloc, I think it is.

>
> Just thought I would bring up these open questions. Please do share
> your thoughts and advise.

I think moving the mutex unlock after the unmap won't make much of a
difference from a performance side, at least for zsmalloc. Preemption
will be disabled until the unmap is done anyway, so even after we
release the per-CPU mutex it cannot be acquired by anyone else until
the unmap is done.

Anyway, I think the fix you have right now is fine, if you prefer not
adding a boolean. If you do add a boolean, whether you move the mutex
unlock or not should not make a difference.

Please just rewrite the commit log and CC stable (in the commit log,
not in the email CC list).

Thanks and sorry for the confusion!

>
> Thanks,
> Kanchana
>
> >
> > >
> > > Thanks,
> > > Kanchana
> > >
> > > >
> > > > > +
> > > > > +       mutex_unlock(&acomp_ctx->mutex);
> > > > >  }
> > > > >
> > > > >  /*********************************
> > > > >
> > > > > base-commit: 0e5bdedb39ded767bff4c6184225578595cee98c
> > > > > --
> > > > > 2.27.0
> > > > >
Sridhar, Kanchana P Nov. 13, 2024, 8:59 p.m. UTC | #6
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Wednesday, November 13, 2024 12:11 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Wed, Nov 13, 2024 at 11:12 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > Hi Yosry,
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Sent: Tuesday, November 12, 2024 10:22 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > hannes@cmpxchg.org; nphamcs@gmail.com;
> chengming.zhou@linux.dev;
> > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org;
> > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > zswap_decompress().
> > >
> > > On Tue, Nov 12, 2024 at 9:59 PM Sridhar, Kanchana P
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > > Sent: Tuesday, November 12, 2024 9:35 PM
> > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> > > > > hannes@cmpxchg.org; nphamcs@gmail.com;
> > > chengming.zhou@linux.dev;
> > > > > usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> > > > > <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > foundation.org;
> > > > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > > > <vinodh.gopal@intel.com>
> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > zswap_decompress().
> > > > >
> > > > > On Tue, Nov 12, 2024 at 9:24 PM Kanchana P Sridhar
> > > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > > >
> > > > > > This is a hotfix for a potential zpool memory leak that could result in
> > > > > > the existing zswap_decompress():
> > > > > >
> > > > > >         mutex_unlock(&acomp_ctx->mutex);
> > > > > >
> > > > > >         if (src != acomp_ctx->buffer)
> > > > > >                 zpool_unmap_handle(zpool, entry->handle);
> > > > > >
> > > > > > Releasing the lock before the conditional does not protect the
> integrity
> > > of
> > > > > > "src", which is set earlier under the acomp_ctx mutex lock. This
> poses a
> > > > > > risk for the conditional behaving as intended, and consequently not
> > > > > > unmapping the zpool handle, which could cause a zswap zpool
> memory
> > > > > leak.
> > > > > >
> > > > > > This patch moves the mutex_unlock() to occur after the conditional
> and
> > > > > > subsequent zpool_unmap_handle(). This ensures that the value of
> "src"
> > > > > > obtained earlier, with the mutex locked, does not change.
> > > > >
> > > > > The commit log is too complicated and incorrect. It is talking about
> > > > > the stability of 'src', but that's a local variable on the stack
> > > > > anyway. It doesn't need protection.
> > > > >
> > > > > The problem is 'acomp_ctx->buffer' being reused and changed after
> the
> > > > > mutex is released. Leading to the check not being reliable. Please
> > > > > simplify this.
> > > >
> > > > Thanks Yosry. That's exactly what I meant, but I think the wording got
> > > > confusing. The problem I was trying to fix is the acomp_ctx->buffer
> > > > value changing after the lock is released. This could happen as a result
> of
> > > any
> > > > other compress or decompress that acquires the lock. I will simplify and
> > > > clarify accordingly.
> > > >
> > > > >
> > > > > >
> > > > > > Even though an actual memory leak was not observed, this fix seems
> > > like a
> > > > > > cleaner implementation.
> > > > > >
> > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
> > > > > > Fixes: 9c500835f279 ("mm: zswap: fix kernel BUG in sg_init_one")
> > > > > > ---
> > > > > >  mm/zswap.c | 3 ++-
> > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > > > > index f6316b66fb23..58810fa8ff23 100644
> > > > > > --- a/mm/zswap.c
> > > > > > +++ b/mm/zswap.c
> > > > > > @@ -986,10 +986,11 @@ static void zswap_decompress(struct
> > > > > zswap_entry *entry, struct folio *folio)
> > > > > >         acomp_request_set_params(acomp_ctx->req, &input, &output,
> > > entry-
> > > > > >length, PAGE_SIZE);
> > > > > >
> BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx-
> > > > > >req), &acomp_ctx->wait));
> > > > > >         BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
> > > > > > -       mutex_unlock(&acomp_ctx->mutex);
> > > > > >
> > > > > >         if (src != acomp_ctx->buffer)
> > > > > >                 zpool_unmap_handle(zpool, entry->handle);
> > > > >
> > > > > Actually now that I think more about it, I think this check isn't
> > > > > entirely safe, even under the lock. Is it possible that
> > > > > 'acomp_ctx->buffer' just happens to be equal to 'src' from a previous
> > > > > decompression at the same handle? In this case, we will also
> > > > > mistakenly skip the unmap.
> > > >
> > > > If we move the mutex_unlock to happen after the conditional and
> unmap,
> > > > shouldn't that be sufficient under all conditions? With the fix, "src" can
> > > > take on only 2 values in this procedure: the mapped handle or
> > > > acomp_ctx->buffer. The only ambiguity would be in the (unlikely?) case
> > > > if the mapped zpool handle happens to be equal to acomp_ctx->buffer.
> > >
> > > Yes, that's the scenario I mean.
> > >
> > > Specifically, in zswap_decompress(), we do not use 'acomp_ctx->buffer'
> > > so 'src' is equal to the mapped handle. But, 'acomp_ctx->buffer'
> > > happens to be equal to the same handle from a previous operation as we
> > > don't clear it.
> >
> > Although, we never modify 'acomp_ctx->buffer' in zswap_decompress(),
> > we only copy to it.
> 
> Duh, yes. I confused myself, sorry for the noise.
> 
> >
> > >
> > > In this case, the 'src != acomp_ctx->buffer' check will be false, even
> > > though it should be true. This will result in an extra
> > > zpool_unmap_handle() call. I didn't look closely, but this seems like
> > > it can have a worse effect than leaking memory (e.g. there will be an
> > > extra __kunmap_atomic() call in zsmalloc, and we may end up corrupting
> > > a random handle).
> > >
> > > >
> > > > Please let me know if I am missing anything.
> > > >
> > > > >
> > > > > It would be more reliable to set a boolean variable if we copy to
> > > > > acomp_ctx->buffer and do the unmap, and check that flag here to
> check
> > > > > if the unmap was done or not.
> > > >
> > > > Sure, this could be done, but not sure if it is required. Please let me know
> > > > if we still need the boolean variable in addition to moving the
> > > mutex_unlock().
> > >
> > > If we use a boolean, there is no need to move mutex_unlock(). The
> > > boolean will be a local variable on the stack that doesn't need
> > > protection.
> >
> > I agree, using the boolean variable to do the unmap rather than the check
> > for (src != acomp_ctx->buffer) is more fail-safe.
> >
> > I am still thinking moving the mutex_unlock() could help, or at least have
> > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock
> > safeguards the interaction between the decompress operation, the
> > sg_*() API calls inside zswap_decompress() and the shared zpool.
> >
> > If we release the per-cpu acomp_ctx's mutex lock before the
> > zpool_unmap_handle(), is it possible that another cpu could acquire
> > it's acomp_ctx's lock and map the same zpool handle (that the earlier
> > cpu has yet to unmap or is concurrently unmapping) for a write?
> > If this could happen, would it result in undefined state for both
> > these zpool ops on different cpu's?
> 
> Why would this result in an undefined state? For zsmalloc, mapping
> uses a per-CPU buffer and preemption is disabled between mapping and
> unmapping anyway. If another CPU maps the object it should be fine.
> 
> >
> > Would keeping the per-cpu mutex locked through the
> > zpool_unmap_handle() create a non-preemptible state that would
> > avoid this? IOW, if the above scenario is possible, does the per-cpu
> > acomp_ctx's mutex help/is a no-op? Or, is the above scenario somehow
> > prevented by the implementation of the zpools?
> 
> At least for zsmalloc, I think it is.
> 
> >
> > Just thought I would bring up these open questions. Please do share
> > your thoughts and advise.
> 
> I think moving the mutex unlock after the unmap won't make much of a
> difference from a performance side, at least for zsmalloc. Preemption
> will be disabled until the unmap is done anyway, so even after we
> release the per-CPU mutex it cannot be acquired by anyone else until
> the unmap is done.
> 
> Anyway, I think the fix you have right now is fine, if you prefer not
> adding a boolean. If you do add a boolean, whether you move the mutex
> unlock or not should not make a difference.

Thanks for the guidance on next steps! I will add the boolean and move
the mutex.

> 
> Please just rewrite the commit log and CC stable (in the commit log,
> not in the email CC list).

I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log
(as found from the latest mainline git log) ?

Thanks again,
Kanchana
Yosry Ahmed Nov. 13, 2024, 8:59 p.m. UTC | #7
[..]
> >
> > Please just rewrite the commit log and CC stable (in the commit log,
> > not in the email CC list).
>
> I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log
> (as found from the latest mainline git log) ?

Yes, thanks!

>
> Thanks again,
> Kanchana
>
Sridhar, Kanchana P Nov. 13, 2024, 9:12 p.m. UTC | #8
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Wednesday, November 13, 2024 1:00 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> hannes@cmpxchg.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> [..]
> > >
> > > Please just rewrite the commit log and CC stable (in the commit log,
> > > not in the email CC list).
> >
> > I guess this refers to adding "Cc: stable@vger.kernel.org" in the commit log
> > (as found from the latest mainline git log) ?
> 
> Yes, thanks!

Sounds good, thanks Yosry!

> 
> >
> > Thanks again,
> > Kanchana
> >
Johannes Weiner Nov. 13, 2024, 9:30 p.m. UTC | #9
On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote:
> I am still thinking moving the mutex_unlock() could help, or at least have
> no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock
> safeguards the interaction between the decompress operation, the
> sg_*() API calls inside zswap_decompress() and the shared zpool.
> 
> If we release the per-cpu acomp_ctx's mutex lock before the
> zpool_unmap_handle(), is it possible that another cpu could acquire
> it's acomp_ctx's lock and map the same zpool handle (that the earlier
> cpu has yet to unmap or is concurrently unmapping) for a write?
> If this could happen, would it result in undefined state for both
> these zpool ops on different cpu's?

The code is fine as is.

Like you said, acomp_ctx->buffer (the pointer) doesn't change. It
points to whatever was kmalloced in zswap_cpu_comp_prepare(). The
handle points to backend memory. Neither of those addresses can change
under us. There is no confusing them, and they cannot coincide.

The mutex guards the *memory* behind the buffer, so that we don't have
multiple (de)compressors stepping on each others' toes. But it's fine
to drop the mutex once we're done working with the memory. We don't
need the mutex to check whether src holds the acomp buffer address.

That being said, I do think there is a UAF bug in CPU hotplugging.

There is an acomp_ctx for each cpu, but note that this is best effort
parallelism, not a guarantee that we always have the context of the
local CPU. Look closely: we pick the "local" CPU with preemption
enabled, then contend for the mutex. This may well put us to sleep and
get us migrated, so we could be using the context of a CPU we are no
longer running on. This is fine because we hold the mutex - if that
other CPU tries to use the acomp_ctx, it'll wait for us.

However, if we get migrated and vacate the CPU whose context we have
locked, the CPU might get offlined and zswap_cpu_comp_dead() can free
the context underneath us. I think we need to refcount the acomp_ctx.
Yosry Ahmed Nov. 13, 2024, 10:01 p.m. UTC | #10
On Wed, Nov 13, 2024 at 1:30 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote:
> > I am still thinking moving the mutex_unlock() could help, or at least have
> > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock
> > safeguards the interaction between the decompress operation, the
> > sg_*() API calls inside zswap_decompress() and the shared zpool.
> >
> > If we release the per-cpu acomp_ctx's mutex lock before the
> > zpool_unmap_handle(), is it possible that another cpu could acquire
> > it's acomp_ctx's lock and map the same zpool handle (that the earlier
> > cpu has yet to unmap or is concurrently unmapping) for a write?
> > If this could happen, would it result in undefined state for both
> > these zpool ops on different cpu's?
>
> The code is fine as is.
>
> Like you said, acomp_ctx->buffer (the pointer) doesn't change. It
> points to whatever was kmalloced in zswap_cpu_comp_prepare(). The
> handle points to backend memory. Neither of those addresses can change
> under us. There is no confusing them, and they cannot coincide.
>
> The mutex guards the *memory* behind the buffer, so that we don't have
> multiple (de)compressors stepping on each others' toes. But it's fine
> to drop the mutex once we're done working with the memory. We don't
> need the mutex to check whether src holds the acomp buffer address.

I have to admit that I confused myself with this alleged bug more than
I like to admit :)

I initially thought acomp_ctx->buffer can be changed, then when I
realized it cannot be changed I did not tie that back to the 'fix' not
being needed at all. I need more coffee.

>
> That being said, I do think there is a UAF bug in CPU hotplugging.
>
> There is an acomp_ctx for each cpu, but note that this is best effort
> parallelism, not a guarantee that we always have the context of the
> local CPU. Look closely: we pick the "local" CPU with preemption
> enabled, then contend for the mutex. This may well put us to sleep and
> get us migrated, so we could be using the context of a CPU we are no
> longer running on. This is fine because we hold the mutex - if that
> other CPU tries to use the acomp_ctx, it'll wait for us.
>
> However, if we get migrated and vacate the CPU whose context we have
> locked, the CPU might get offlined and zswap_cpu_comp_dead() can free
> the context underneath us. I think we need to refcount the acomp_ctx.
Sridhar, Kanchana P Nov. 13, 2024, 10:13 p.m. UTC | #11
> -----Original Message-----
> From: Johannes Weiner <hannes@cmpxchg.org>
> Sent: Wednesday, November 13, 2024 1:30 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>; linux-kernel@vger.kernel.org;
> linux-mm@kvack.org; nphamcs@gmail.com; chengming.zhou@linux.dev;
> usamaarif642@gmail.com; ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Wed, Nov 13, 2024 at 07:12:18PM +0000, Sridhar, Kanchana P wrote:
> > I am still thinking moving the mutex_unlock() could help, or at least have
> > no downside. The acomp_ctx is per-cpu and it's mutex_lock/unlock
> > safeguards the interaction between the decompress operation, the
> > sg_*() API calls inside zswap_decompress() and the shared zpool.
> >
> > If we release the per-cpu acomp_ctx's mutex lock before the
> > zpool_unmap_handle(), is it possible that another cpu could acquire
> > it's acomp_ctx's lock and map the same zpool handle (that the earlier
> > cpu has yet to unmap or is concurrently unmapping) for a write?
> > If this could happen, would it result in undefined state for both
> > these zpool ops on different cpu's?
> 
> The code is fine as is.
> 
> Like you said, acomp_ctx->buffer (the pointer) doesn't change. It
> points to whatever was kmalloced in zswap_cpu_comp_prepare(). The
> handle points to backend memory. Neither of those addresses can change
> under us. There is no confusing them, and they cannot coincide.
> 
> The mutex guards the *memory* behind the buffer, so that we don't have
> multiple (de)compressors stepping on each others' toes. But it's fine
> to drop the mutex once we're done working with the memory. We don't
> need the mutex to check whether src holds the acomp buffer address.

Thanks Johannes, for these insights. I was thinking of the following
in zswap_decompress() as creating a non-preemptible context because
of the call to raw_cpu_ptr() at the start; with this context extending
until the mutex_unlock():

	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
	mutex_lock(&acomp_ctx->mutex);

	[...]

	mutex_unlock(&acomp_ctx->mutex);

	if (src != acomp_ctx->buffer)
		zpool_unmap_handle(zpool, entry->handle);

Based on this understanding, I was a bit worried about the
"acomp_ctx->buffer" in the conditional that gates the
zpool_unmap_handle() not being the same acomp_ctx as the one
at the beginning. I may have been confusing myself, since the acomp_ctx
is not re-evaluated before the conditional, just reused from the
start. My apologies to you and Yosry!

> 
> That being said, I do think there is a UAF bug in CPU hotplugging.
> 
> There is an acomp_ctx for each cpu, but note that this is best effort
> parallelism, not a guarantee that we always have the context of the
> local CPU. Look closely: we pick the "local" CPU with preemption
> enabled, then contend for the mutex. This may well put us to sleep and
> get us migrated, so we could be using the context of a CPU we are no
> longer running on. This is fine because we hold the mutex - if that
> other CPU tries to use the acomp_ctx, it'll wait for us.
> 
> However, if we get migrated and vacate the CPU whose context we have
> locked, the CPU might get offlined and zswap_cpu_comp_dead() can free
> the context underneath us. I think we need to refcount the acomp_ctx.

I see. Wouldn't it then seem to make the code more fail-safe to not allow
the migration to happen until after the check for (src != acomp_ctx->buffer), by
moving the mutex_unlock() after this check? Or, use a boolean to determine
if the unmap_handle needs to be done as Yosry suggested?

Thanks,
Kanchana
Nhat Pham Nov. 14, 2024, 12:28 a.m. UTC | #12
On Wed, Nov 13, 2024 at 2:13 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
>
> Thanks Johannes, for these insights. I was thinking of the following
> in zswap_decompress() as creating a non-preemptible context because
> of the call to raw_cpu_ptr() at the start; with this context extending
> until the mutex_unlock():
>
>         acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>         mutex_lock(&acomp_ctx->mutex);
>
>         [...]
>
>         mutex_unlock(&acomp_ctx->mutex);
>
>         if (src != acomp_ctx->buffer)
>                 zpool_unmap_handle(zpool, entry->handle);
>
> Based on this understanding, I was a bit worried about the
> "acomp_ctx->buffer" in the conditional that gates the
> zpool_unmap_handle() not being the same acomp_ctx as the one
> at the beginning. I may have been confusing myself, since the acomp_ctx
> is not re-evaluated before the conditional, just reused from the
> start. My apologies to you and Yosry!
>
> >
> > That being said, I do think there is a UAF bug in CPU hotplugging.
> >
> > There is an acomp_ctx for each cpu, but note that this is best effort
> > parallelism, not a guarantee that we always have the context of the
> > local CPU. Look closely: we pick the "local" CPU with preemption
> > enabled, then contend for the mutex. This may well put us to sleep and
> > get us migrated, so we could be using the context of a CPU we are no
> > longer running on. This is fine because we hold the mutex - if that
> > other CPU tries to use the acomp_ctx, it'll wait for us.
> >
> > However, if we get migrated and vacate the CPU whose context we have
> > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free
> > the context underneath us. I think we need to refcount the acomp_ctx.
>
> I see. Wouldn't it then seem to make the code more fail-safe to not allow
> the migration to happen until after the check for (src != acomp_ctx->buffer), by
> moving the mutex_unlock() after this check? Or, use a boolean to determine
> if the unmap_handle needs to be done as Yosry suggested?

Hmm does it make it safe? It is mutex_lock() that puts the task to
sleep, after which it can get migrated to a different CPU. Moving
mutex_unlock() to below or not doesn't really matter, no? Or am I
missing something here...

I think Johannes' proposal is the safest :)
Sridhar, Kanchana P Nov. 14, 2024, 1:56 a.m. UTC | #13
> -----Original Message-----
> From: Nhat Pham <nphamcs@gmail.com>
> Sent: Wednesday, November 13, 2024 4:29 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>; Yosry Ahmed
> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com;
> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Wed, Nov 13, 2024 at 2:13 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> >
> > Thanks Johannes, for these insights. I was thinking of the following
> > in zswap_decompress() as creating a non-preemptible context because
> > of the call to raw_cpu_ptr() at the start; with this context extending
> > until the mutex_unlock():
> >
> >         acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> >         mutex_lock(&acomp_ctx->mutex);
> >
> >         [...]
> >
> >         mutex_unlock(&acomp_ctx->mutex);
> >
> >         if (src != acomp_ctx->buffer)
> >                 zpool_unmap_handle(zpool, entry->handle);
> >
> > Based on this understanding, I was a bit worried about the
> > "acomp_ctx->buffer" in the conditional that gates the
> > zpool_unmap_handle() not being the same acomp_ctx as the one
> > at the beginning. I may have been confusing myself, since the acomp_ctx
> > is not re-evaluated before the conditional, just reused from the
> > start. My apologies to you and Yosry!
> >
> > >
> > > That being said, I do think there is a UAF bug in CPU hotplugging.
> > >
> > > There is an acomp_ctx for each cpu, but note that this is best effort
> > > parallelism, not a guarantee that we always have the context of the
> > > local CPU. Look closely: we pick the "local" CPU with preemption
> > > enabled, then contend for the mutex. This may well put us to sleep and
> > > get us migrated, so we could be using the context of a CPU we are no
> > > longer running on. This is fine because we hold the mutex - if that
> > > other CPU tries to use the acomp_ctx, it'll wait for us.
> > >
> > > However, if we get migrated and vacate the CPU whose context we have
> > > locked, the CPU might get offlined and zswap_cpu_comp_dead() can free
> > > the context underneath us. I think we need to refcount the acomp_ctx.
> >
> > I see. Wouldn't it then seem to make the code more fail-safe to not allow
> > the migration to happen until after the check for (src != acomp_ctx->buffer),
> by
> > moving the mutex_unlock() after this check? Or, use a boolean to determine
> > if the unmap_handle needs to be done as Yosry suggested?
> 
> Hmm does it make it safe? It is mutex_lock() that puts the task to
> sleep, after which it can get migrated to a different CPU. Moving
> mutex_unlock() to below or not doesn't really matter, no? Or am I
> missing something here...

Thanks for the comments, Nhat. I guess my last comment in response
to what Johannes mentioned about the UAF situation, is the one that's
still an open from my perspective. If the process can get migrated after 
the mutex unlock, and if the acomp_ctx obtained earlier gets deleted 
(as Johannes was mentioning), then we could be accessing
an invalid pointer in the "if (src != acomp_ctx->buffer)". So my question
was, can we prevent the migration to a different cpu by relinquishing
the mutex lock after this conditional; or, make the conditional be determined
by a boolean that gets set earlier to decide whether or not to unmap
the zpool handle (as against using the acomp_ctx to decide this).

Thanks,
Kanchana

> 
> I think Johannes' proposal is the safest :)
Johannes Weiner Nov. 14, 2024, 5:11 a.m. UTC | #14
On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote:
> So my question was, can we prevent the migration to a different cpu
> by relinquishing the mutex lock after this conditional

Holding the mutex doesn't prevent preemption/migration.
Sridhar, Kanchana P Nov. 14, 2024, 6:37 a.m. UTC | #15
> -----Original Message-----
> From: Johannes Weiner <hannes@cmpxchg.org>
> Sent: Wednesday, November 13, 2024 9:12 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com;
> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote:
> > So my question was, can we prevent the migration to a different cpu
> > by relinquishing the mutex lock after this conditional
> 
> Holding the mutex doesn't prevent preemption/migration.

Sure, however, is this also applicable to holding the mutex of a per-cpu
structure obtained via raw_cpu_ptr()?

Would holding the mutex prevent the acomp_ctx of the cpu prior to
the migration (in the UAF scenario you described) from being deleted?

If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the
UAF, I agree, we might need a way to prevent the acomp_ctx from being
deleted, e.g. with refcounts as you've suggested, or to not use the
acomp_ctx at all for the check, instead use a boolean.

Thanks,
Kanchana
Chengming Zhou Nov. 14, 2024, 7:24 a.m. UTC | #16
Hello,

On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> 
>> -----Original Message-----
>> From: Johannes Weiner <hannes@cmpxchg.org>
>> Sent: Wednesday, November 13, 2024 9:12 PM
>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
>> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
>> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com;
>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
>> zswap_decompress().
>>
>> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote:
>>> So my question was, can we prevent the migration to a different cpu
>>> by relinquishing the mutex lock after this conditional
>>
>> Holding the mutex doesn't prevent preemption/migration.
> 
> Sure, however, is this also applicable to holding the mutex of a per-cpu
> structure obtained via raw_cpu_ptr()?

Yes, unless you use migration_disable() or cpus_read_lock() to protect
this section.

> 
> Would holding the mutex prevent the acomp_ctx of the cpu prior to
> the migration (in the UAF scenario you described) from being deleted?

No, cpu offline can kick in anytime to free the acomp_ctx->buffer.

> 
> If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the
> UAF, I agree, we might need a way to prevent the acomp_ctx from being
> deleted, e.g. with refcounts as you've suggested, or to not use the

Right, refcount solution from Johannes is very good IMHO.

> acomp_ctx at all for the check, instead use a boolean.

But this is not enough to just avoid using acomp_ctx for the check,
the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
can kick in anytime to free the acomp_ctx->buffer.

Thanks.
Sridhar, Kanchana P Nov. 15, 2024, 9:12 p.m. UTC | #17
Hi Chengming,

> -----Original Message-----
> From: Chengming Zhou <chengming.zhou@linux.dev>
> Sent: Wednesday, November 13, 2024 11:24 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes Weiner
> <hannes@cmpxchg.org>
> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; Huang,
> Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> Hello,
> 
> On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> >
> >> -----Original Message-----
> >> From: Johannes Weiner <hannes@cmpxchg.org>
> >> Sent: Wednesday, November 13, 2024 9:12 PM
> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> >> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com;
> >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> >> zswap_decompress().
> >>
> >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote:
> >>> So my question was, can we prevent the migration to a different cpu
> >>> by relinquishing the mutex lock after this conditional
> >>
> >> Holding the mutex doesn't prevent preemption/migration.
> >
> > Sure, however, is this also applicable to holding the mutex of a per-cpu
> > structure obtained via raw_cpu_ptr()?
> 
> Yes, unless you use migration_disable() or cpus_read_lock() to protect
> this section.

Ok.

> 
> >
> > Would holding the mutex prevent the acomp_ctx of the cpu prior to
> > the migration (in the UAF scenario you described) from being deleted?
> 
> No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> 
> >
> > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the
> > UAF, I agree, we might need a way to prevent the acomp_ctx from being
> > deleted, e.g. with refcounts as you've suggested, or to not use the
> 
> Right, refcount solution from Johannes is very good IMHO.
> 
> > acomp_ctx at all for the check, instead use a boolean.
> 
> But this is not enough to just avoid using acomp_ctx for the check,
> the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
> can kick in anytime to free the acomp_ctx->buffer.

I see. How would the refcounts work? Would this add latency to zswap
ops? In low memory situations, could the cpu offlining code over-ride
the refcounts?

Based on Johannes' earlier comments, I don't think it makes sense for
me to submit a v2.

Thanks,
Kanchana

> 
> Thanks.
Yosry Ahmed Nov. 15, 2024, 9:49 p.m. UTC | #18
On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
> Hi Chengming,
>
> > -----Original Message-----
> > From: Chengming Zhou <chengming.zhou@linux.dev>
> > Sent: Wednesday, November 13, 2024 11:24 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes Weiner
> > <hannes@cmpxchg.org>
> > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com; Huang,
> > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > zswap_decompress().
> >
> > Hello,
> >
> > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > >
> > >> -----Original Message-----
> > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > >> mm@kvack.org; chengming.zhou@linux.dev; usamaarif642@gmail.com;
> > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > >> zswap_decompress().
> > >>
> > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P wrote:
> > >>> So my question was, can we prevent the migration to a different cpu
> > >>> by relinquishing the mutex lock after this conditional
> > >>
> > >> Holding the mutex doesn't prevent preemption/migration.
> > >
> > > Sure, however, is this also applicable to holding the mutex of a per-cpu
> > > structure obtained via raw_cpu_ptr()?
> >
> > Yes, unless you use migration_disable() or cpus_read_lock() to protect
> > this section.
>
> Ok.
>
> >
> > >
> > > Would holding the mutex prevent the acomp_ctx of the cpu prior to
> > > the migration (in the UAF scenario you described) from being deleted?
> >
> > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> >
> > >
> > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the
> > > UAF, I agree, we might need a way to prevent the acomp_ctx from being
> > > deleted, e.g. with refcounts as you've suggested, or to not use the
> >
> > Right, refcount solution from Johannes is very good IMHO.
> >
> > > acomp_ctx at all for the check, instead use a boolean.
> >
> > But this is not enough to just avoid using acomp_ctx for the check,
> > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
> > can kick in anytime to free the acomp_ctx->buffer.
>
> I see. How would the refcounts work? Would this add latency to zswap
> ops? In low memory situations, could the cpu offlining code over-ride
> the refcounts?

I think what Johannes meant is that the zswap compress/decompress
paths grab a ref on the acomp_ctx before using it, and the CPU
offlining code only drops the initial ref, and does not free the
buffer directly. The buffer is only freed when the ref drops to zero.

I am not familiar with CPU hotplug, would it be simpler if we have a
wrapper like get_acomp_ctx() that disables migration or calls
cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
wrapper, put_acompt_ctx() will be used after we are done using the
acomp_ctx.

>
> Based on Johannes' earlier comments, I don't think it makes sense for
> me to submit a v2.
>
> Thanks,
> Kanchana
>
> >
> > Thanks.
Sridhar, Kanchana P Nov. 19, 2024, 7:22 p.m. UTC | #19
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Friday, November 15, 2024 1:49 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> > Hi Chengming,
> >
> > > -----Original Message-----
> > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes
> Weiner
> > > <hannes@cmpxchg.org>
> > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com;
> Huang,
> > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > zswap_decompress().
> > >
> > > Hello,
> > >
> > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > >
> > > >> -----Original Message-----
> > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > >> mm@kvack.org; chengming.zhou@linux.dev;
> usamaarif642@gmail.com;
> > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > >> zswap_decompress().
> > > >>
> > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P
> wrote:
> > > >>> So my question was, can we prevent the migration to a different cpu
> > > >>> by relinquishing the mutex lock after this conditional
> > > >>
> > > >> Holding the mutex doesn't prevent preemption/migration.
> > > >
> > > > Sure, however, is this also applicable to holding the mutex of a per-cpu
> > > > structure obtained via raw_cpu_ptr()?
> > >
> > > Yes, unless you use migration_disable() or cpus_read_lock() to protect
> > > this section.
> >
> > Ok.
> >
> > >
> > > >
> > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to
> > > > the migration (in the UAF scenario you described) from being deleted?
> > >
> > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> > >
> > > >
> > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the
> > > > UAF, I agree, we might need a way to prevent the acomp_ctx from being
> > > > deleted, e.g. with refcounts as you've suggested, or to not use the
> > >
> > > Right, refcount solution from Johannes is very good IMHO.
> > >
> > > > acomp_ctx at all for the check, instead use a boolean.
> > >
> > > But this is not enough to just avoid using acomp_ctx for the check,
> > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
> > > can kick in anytime to free the acomp_ctx->buffer.
> >
> > I see. How would the refcounts work? Would this add latency to zswap
> > ops? In low memory situations, could the cpu offlining code over-ride
> > the refcounts?
> 
> I think what Johannes meant is that the zswap compress/decompress
> paths grab a ref on the acomp_ctx before using it, and the CPU
> offlining code only drops the initial ref, and does not free the
> buffer directly. The buffer is only freed when the ref drops to zero.
> 
> I am not familiar with CPU hotplug, would it be simpler if we have a
> wrapper like get_acomp_ctx() that disables migration or calls
> cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
> wrapper, put_acompt_ctx() will be used after we are done using the
> acomp_ctx.

Would it be sufficient to add a check for mutex_is_locked() in
zswap_cpu_comp_dead() and if this returns true, to exit without deleting
the acomp? If this is an acceptable solution, it would also require us
to move the mutex_unlock() to occur after the "if (src != acomp_ctx->buffer)"
in zswap_decompress(). This would ensure all existing zswap code that's
within the mutex_lock()-mutex_unlock() will work correctly without
worrying about the acomp_ctx being deleted by cpu offlining.

Not sure if this would be a comprehensive solution, or if it would have
unintended consequences to the cpu offlining code. Would appreciate
comments.

Thanks,
Kanchana

> 
> >
> > Based on Johannes' earlier comments, I don't think it makes sense for
> > me to submit a v2.
> >
> > Thanks,
> > Kanchana
> >
> > >
> > > Thanks.
Yosry Ahmed Nov. 19, 2024, 7:27 p.m. UTC | #20
On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Friday, November 15, 2024 1:49 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > zswap_decompress().
> >
> > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > > Hi Chengming,
> > >
> > > > -----Original Message-----
> > > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes
> > Weiner
> > > > <hannes@cmpxchg.org>
> > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com;
> > Huang,
> > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> > Vinodh
> > > > <vinodh.gopal@intel.com>
> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > zswap_decompress().
> > > >
> > > > Hello,
> > > >
> > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > >> mm@kvack.org; chengming.zhou@linux.dev;
> > usamaarif642@gmail.com;
> > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > >> zswap_decompress().
> > > > >>
> > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P
> > wrote:
> > > > >>> So my question was, can we prevent the migration to a different cpu
> > > > >>> by relinquishing the mutex lock after this conditional
> > > > >>
> > > > >> Holding the mutex doesn't prevent preemption/migration.
> > > > >
> > > > > Sure, however, is this also applicable to holding the mutex of a per-cpu
> > > > > structure obtained via raw_cpu_ptr()?
> > > >
> > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect
> > > > this section.
> > >
> > > Ok.
> > >
> > > >
> > > > >
> > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to
> > > > > the migration (in the UAF scenario you described) from being deleted?
> > > >
> > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> > > >
> > > > >
> > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent the
> > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from being
> > > > > deleted, e.g. with refcounts as you've suggested, or to not use the
> > > >
> > > > Right, refcount solution from Johannes is very good IMHO.
> > > >
> > > > > acomp_ctx at all for the check, instead use a boolean.
> > > >
> > > > But this is not enough to just avoid using acomp_ctx for the check,
> > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
> > > > can kick in anytime to free the acomp_ctx->buffer.
> > >
> > > I see. How would the refcounts work? Would this add latency to zswap
> > > ops? In low memory situations, could the cpu offlining code over-ride
> > > the refcounts?
> >
> > I think what Johannes meant is that the zswap compress/decompress
> > paths grab a ref on the acomp_ctx before using it, and the CPU
> > offlining code only drops the initial ref, and does not free the
> > buffer directly. The buffer is only freed when the ref drops to zero.
> >
> > I am not familiar with CPU hotplug, would it be simpler if we have a
> > wrapper like get_acomp_ctx() that disables migration or calls
> > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
> > wrapper, put_acompt_ctx() will be used after we are done using the
> > acomp_ctx.
>
> Would it be sufficient to add a check for mutex_is_locked() in
> zswap_cpu_comp_dead() and if this returns true, to exit without deleting
> the acomp?

I don't think this works. First of all, it's racy. It's possible the
mutex gets locked after we check mutex_is_locked() but before we
delete the acomp_ctx. Also, if we find that the mutex is locked, then
we do nothing and essentially leak the memory.

Second, and probably more important, this only checks if anyone is
currently holding the mutex. What about tasks that may be sleeping
waiting for the mutex to be unlocked? The mutex will be deleted from
under them as well.

> If this is an acceptable solution, it would also require us
> to move the mutex_unlock() to occur after the "if (src != acomp_ctx->buffer)"
> in zswap_decompress(). This would ensure all existing zswap code that's
> within the mutex_lock()-mutex_unlock() will work correctly without
> worrying about the acomp_ctx being deleted by cpu offlining.
>
> Not sure if this would be a comprehensive solution, or if it would have
> unintended consequences to the cpu offlining code. Would appreciate
> comments.
>
> Thanks,
> Kanchana
Sridhar, Kanchana P Nov. 19, 2024, 7:41 p.m. UTC | #21
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Tuesday, November 19, 2024 11:27 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Sent: Friday, November 15, 2024 1:49 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > zswap_decompress().
> > >
> > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > > Hi Chengming,
> > > >
> > > > > -----Original Message-----
> > > > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes
> > > Weiner
> > > > > <hannes@cmpxchg.org>
> > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com;
> > > Huang,
> > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> > > Vinodh
> > > > > <vinodh.gopal@intel.com>
> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > zswap_decompress().
> > > > >
> > > > > Hello,
> > > > >
> > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > > > >
> > > > > >> -----Original Message-----
> > > > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > > >> mm@kvack.org; chengming.zhou@linux.dev;
> > > usamaarif642@gmail.com;
> > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > >> zswap_decompress().
> > > > > >>
> > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P
> > > wrote:
> > > > > >>> So my question was, can we prevent the migration to a different
> cpu
> > > > > >>> by relinquishing the mutex lock after this conditional
> > > > > >>
> > > > > >> Holding the mutex doesn't prevent preemption/migration.
> > > > > >
> > > > > > Sure, however, is this also applicable to holding the mutex of a per-
> cpu
> > > > > > structure obtained via raw_cpu_ptr()?
> > > > >
> > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect
> > > > > this section.
> > > >
> > > > Ok.
> > > >
> > > > >
> > > > > >
> > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to
> > > > > > the migration (in the UAF scenario you described) from being
> deleted?
> > > > >
> > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> > > > >
> > > > > >
> > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent
> the
> > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from
> being
> > > > > > deleted, e.g. with refcounts as you've suggested, or to not use the
> > > > >
> > > > > Right, refcount solution from Johannes is very good IMHO.
> > > > >
> > > > > > acomp_ctx at all for the check, instead use a boolean.
> > > > >
> > > > > But this is not enough to just avoid using acomp_ctx for the check,
> > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
> > > > > can kick in anytime to free the acomp_ctx->buffer.
> > > >
> > > > I see. How would the refcounts work? Would this add latency to zswap
> > > > ops? In low memory situations, could the cpu offlining code over-ride
> > > > the refcounts?
> > >
> > > I think what Johannes meant is that the zswap compress/decompress
> > > paths grab a ref on the acomp_ctx before using it, and the CPU
> > > offlining code only drops the initial ref, and does not free the
> > > buffer directly. The buffer is only freed when the ref drops to zero.
> > >
> > > I am not familiar with CPU hotplug, would it be simpler if we have a
> > > wrapper like get_acomp_ctx() that disables migration or calls
> > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
> > > wrapper, put_acompt_ctx() will be used after we are done using the
> > > acomp_ctx.
> >
> > Would it be sufficient to add a check for mutex_is_locked() in
> > zswap_cpu_comp_dead() and if this returns true, to exit without deleting
> > the acomp?
> 
> I don't think this works. First of all, it's racy. It's possible the
> mutex gets locked after we check mutex_is_locked() but before we
> delete the acomp_ctx. Also, if we find that the mutex is locked, then
> we do nothing and essentially leak the memory.

Yes, this would assume the cpu offlining code retries at some interval,
which could prevent the memory leak.

> 
> Second, and probably more important, this only checks if anyone is
> currently holding the mutex. What about tasks that may be sleeping
> waiting for the mutex to be unlocked? The mutex will be deleted from
> under them as well.

Wouldn't this and the race described above, also be issues for the
refcount based approach?

Also, I am wondering if the mutex design already handles cases where
tasks are sleeping, waiting for a mutex that disappears?

Thanks,
Kanchana

> 
> > If this is an acceptable solution, it would also require us
> > to move the mutex_unlock() to occur after the "if (src != acomp_ctx-
> >buffer)"
> > in zswap_decompress(). This would ensure all existing zswap code that's
> > within the mutex_lock()-mutex_unlock() will work correctly without
> > worrying about the acomp_ctx being deleted by cpu offlining.
> >
> > Not sure if this would be a comprehensive solution, or if it would have
> > unintended consequences to the cpu offlining code. Would appreciate
> > comments.
> >
> > Thanks,
> > Kanchana
Yosry Ahmed Nov. 19, 2024, 7:51 p.m. UTC | #22
On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Tuesday, November 19, 2024 11:27 AM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > zswap_decompress().
> >
> > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > Sent: Friday, November 15, 2024 1:49 PM
> > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > zswap_decompress().
> > > >
> > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > >
> > > > > Hi Chengming,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > > > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>; Johannes
> > > > Weiner
> > > > > > <hannes@cmpxchg.org>
> > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > > > mm@kvack.org; usamaarif642@gmail.com; ryan.roberts@arm.com;
> > > > Huang,
> > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal,
> > > > Vinodh
> > > > > > <vinodh.gopal@intel.com>
> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > > zswap_decompress().
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > > > > >
> > > > > > >> -----Original Message-----
> > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > > > >> mm@kvack.org; chengming.zhou@linux.dev;
> > > > usamaarif642@gmail.com;
> > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > > >> zswap_decompress().
> > > > > > >>
> > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana P
> > > > wrote:
> > > > > > >>> So my question was, can we prevent the migration to a different
> > cpu
> > > > > > >>> by relinquishing the mutex lock after this conditional
> > > > > > >>
> > > > > > >> Holding the mutex doesn't prevent preemption/migration.
> > > > > > >
> > > > > > > Sure, however, is this also applicable to holding the mutex of a per-
> > cpu
> > > > > > > structure obtained via raw_cpu_ptr()?
> > > > > >
> > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to protect
> > > > > > this section.
> > > > >
> > > > > Ok.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior to
> > > > > > > the migration (in the UAF scenario you described) from being
> > deleted?
> > > > > >
> > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> > > > > >
> > > > > > >
> > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to prevent
> > the
> > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx from
> > being
> > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use the
> > > > > >
> > > > > > Right, refcount solution from Johannes is very good IMHO.
> > > > > >
> > > > > > > acomp_ctx at all for the check, instead use a boolean.
> > > > > >
> > > > > > But this is not enough to just avoid using acomp_ctx for the check,
> > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu offline
> > > > > > can kick in anytime to free the acomp_ctx->buffer.
> > > > >
> > > > > I see. How would the refcounts work? Would this add latency to zswap
> > > > > ops? In low memory situations, could the cpu offlining code over-ride
> > > > > the refcounts?
> > > >
> > > > I think what Johannes meant is that the zswap compress/decompress
> > > > paths grab a ref on the acomp_ctx before using it, and the CPU
> > > > offlining code only drops the initial ref, and does not free the
> > > > buffer directly. The buffer is only freed when the ref drops to zero.
> > > >
> > > > I am not familiar with CPU hotplug, would it be simpler if we have a
> > > > wrapper like get_acomp_ctx() that disables migration or calls
> > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
> > > > wrapper, put_acompt_ctx() will be used after we are done using the
> > > > acomp_ctx.
> > >
> > > Would it be sufficient to add a check for mutex_is_locked() in
> > > zswap_cpu_comp_dead() and if this returns true, to exit without deleting
> > > the acomp?
> >
> > I don't think this works. First of all, it's racy. It's possible the
> > mutex gets locked after we check mutex_is_locked() but before we
> > delete the acomp_ctx. Also, if we find that the mutex is locked, then
> > we do nothing and essentially leak the memory.
>
> Yes, this would assume the cpu offlining code retries at some interval,
> which could prevent the memory leak.

I am not sure about that, but even so, it wouldn't handle the first
scenario where the mutex gets locked after we check mutex_is_locked().

>
> >
> > Second, and probably more important, this only checks if anyone is
> > currently holding the mutex. What about tasks that may be sleeping
> > waiting for the mutex to be unlocked? The mutex will be deleted from
> > under them as well.
>
> Wouldn't this and the race described above, also be issues for the
> refcount based approach?

I don't think so, at least if implemented correctly. There are a lot
of examples around the kernel that use RCU + refcounts for such use
cases. I think there are also some examples in kernel docs.

That being said, I am wondering if we can get away with something
simpler like holding the cpus read lock or disabling migration as I
suggested earlier, but I am not quite sure.

>
> Also, I am wondering if the mutex design already handles cases where
> tasks are sleeping, waiting for a mutex that disappears?

I don't believe so. It doesn't make sense for someone to free a mutex
while someone is waiting for it. How would the waiter know if the
memory backing the mutex was freed?
Sridhar, Kanchana P Nov. 19, 2024, 10:35 p.m. UTC | #23
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Tuesday, November 19, 2024 11:51 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Sent: Tuesday, November 19, 2024 11:27 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > zswap_decompress().
> > >
> > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > > Sent: Friday, November 15, 2024 1:49 PM
> > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > > > kernel@vger.kernel.org; linux-mm@kvack.org;
> usamaarif642@gmail.com;
> > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > zswap_decompress().
> > > > >
> > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> > > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > > >
> > > > > > Hi Chengming,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>;
> Johannes
> > > > > Weiner
> > > > > > > <hannes@cmpxchg.org>
> > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > > > > mm@kvack.org; usamaarif642@gmail.com;
> ryan.roberts@arm.com;
> > > > > Huang,
> > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> Gopal,
> > > > > Vinodh
> > > > > > > <vinodh.gopal@intel.com>
> > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > > > zswap_decompress().
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > > > > > >
> > > > > > > >> -----Original Message-----
> > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org;
> linux-
> > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev;
> > > > > usamaarif642@gmail.com;
> > > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali,
> Wajdi K
> > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory
> leak in
> > > > > > > >> zswap_decompress().
> > > > > > > >>
> > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana
> P
> > > > > wrote:
> > > > > > > >>> So my question was, can we prevent the migration to a
> different
> > > cpu
> > > > > > > >>> by relinquishing the mutex lock after this conditional
> > > > > > > >>
> > > > > > > >> Holding the mutex doesn't prevent preemption/migration.
> > > > > > > >
> > > > > > > > Sure, however, is this also applicable to holding the mutex of a
> per-
> > > cpu
> > > > > > > > structure obtained via raw_cpu_ptr()?
> > > > > > >
> > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to
> protect
> > > > > > > this section.
> > > > > >
> > > > > > Ok.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior
> to
> > > > > > > > the migration (in the UAF scenario you described) from being
> > > deleted?
> > > > > > >
> > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> > > > > > >
> > > > > > > >
> > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to
> prevent
> > > the
> > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx
> from
> > > being
> > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use
> the
> > > > > > >
> > > > > > > Right, refcount solution from Johannes is very good IMHO.
> > > > > > >
> > > > > > > > acomp_ctx at all for the check, instead use a boolean.
> > > > > > >
> > > > > > > But this is not enough to just avoid using acomp_ctx for the check,
> > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu
> offline
> > > > > > > can kick in anytime to free the acomp_ctx->buffer.
> > > > > >
> > > > > > I see. How would the refcounts work? Would this add latency to
> zswap
> > > > > > ops? In low memory situations, could the cpu offlining code over-ride
> > > > > > the refcounts?
> > > > >
> > > > > I think what Johannes meant is that the zswap compress/decompress
> > > > > paths grab a ref on the acomp_ctx before using it, and the CPU
> > > > > offlining code only drops the initial ref, and does not free the
> > > > > buffer directly. The buffer is only freed when the ref drops to zero.
> > > > >
> > > > > I am not familiar with CPU hotplug, would it be simpler if we have a
> > > > > wrapper like get_acomp_ctx() that disables migration or calls
> > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
> > > > > wrapper, put_acompt_ctx() will be used after we are done using the
> > > > > acomp_ctx.
> > > >
> > > > Would it be sufficient to add a check for mutex_is_locked() in
> > > > zswap_cpu_comp_dead() and if this returns true, to exit without
> deleting
> > > > the acomp?
> > >
> > > I don't think this works. First of all, it's racy. It's possible the
> > > mutex gets locked after we check mutex_is_locked() but before we
> > > delete the acomp_ctx. Also, if we find that the mutex is locked, then
> > > we do nothing and essentially leak the memory.
> >
> > Yes, this would assume the cpu offlining code retries at some interval,
> > which could prevent the memory leak.
> 
> I am not sure about that, but even so, it wouldn't handle the first
> scenario where the mutex gets locked after we check mutex_is_locked().
> 
> >
> > >
> > > Second, and probably more important, this only checks if anyone is
> > > currently holding the mutex. What about tasks that may be sleeping
> > > waiting for the mutex to be unlocked? The mutex will be deleted from
> > > under them as well.
> >
> > Wouldn't this and the race described above, also be issues for the
> > refcount based approach?
> 
> I don't think so, at least if implemented correctly. There are a lot
> of examples around the kernel that use RCU + refcounts for such use
> cases. I think there are also some examples in kernel docs.
> 
> That being said, I am wondering if we can get away with something
> simpler like holding the cpus read lock or disabling migration as I
> suggested earlier, but I am not quite sure.

Another idea to consider is how zsmalloc avoids this issue through
its use of the local_lock() on the per-cpu mapping area. This disables
preemption from zs_map_object() through zs_unmap_object().
Would changing the acomp_ctx's mutex to a local_lock solve the
problem?

> 
> >
> > Also, I am wondering if the mutex design already handles cases where
> > tasks are sleeping, waiting for a mutex that disappears?
> 
> I don't believe so. It doesn't make sense for someone to free a mutex
> while someone is waiting for it. How would the waiter know if the
> memory backing the mutex was freed?

Thanks Yosry, all good points. There would need to be some sort of
arbiter (for e.g., the cpu offlining code) that would reschedule tasks
running on a cpu before shutting it down, which could address
this specific issue. I was thinking these are not problems unique to
zswap's per-cpu acomp_ctx->mutex wrt the offlining?

Thanks,
Kanchana
Yosry Ahmed Nov. 19, 2024, 11:44 p.m. UTC | #24
On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P
<kanchana.p.sridhar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@google.com>
> > Sent: Tuesday, November 19, 2024 11:51 AM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > zswap_decompress().
> >
> > On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P
> > <kanchana.p.sridhar@intel.com> wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > Sent: Tuesday, November 19, 2024 11:27 AM
> > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
> > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > zswap_decompress().
> > > >
> > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
> > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > > > Sent: Friday, November 15, 2024 1:49 PM
> > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > > > > kernel@vger.kernel.org; linux-mm@kvack.org;
> > usamaarif642@gmail.com;
> > > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > <vinodh.gopal@intel.com>
> > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > > zswap_decompress().
> > > > > >
> > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> > > > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > > > >
> > > > > > > Hi Chengming,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>;
> > Johannes
> > > > > > Weiner
> > > > > > > > <hannes@cmpxchg.org>
> > > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
> > > > > > > > mm@kvack.org; usamaarif642@gmail.com;
> > ryan.roberts@arm.com;
> > > > > > Huang,
> > > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
> > > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> > Gopal,
> > > > > > Vinodh
> > > > > > > > <vinodh.gopal@intel.com>
> > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > > > > zswap_decompress().
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > > > > > > >
> > > > > > > > >> -----Original Message-----
> > > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org;
> > linux-
> > > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev;
> > > > > > usamaarif642@gmail.com;
> > > > > > > > >> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali,
> > Wajdi K
> > > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > > <vinodh.gopal@intel.com>
> > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory
> > leak in
> > > > > > > > >> zswap_decompress().
> > > > > > > > >>
> > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana
> > P
> > > > > > wrote:
> > > > > > > > >>> So my question was, can we prevent the migration to a
> > different
> > > > cpu
> > > > > > > > >>> by relinquishing the mutex lock after this conditional
> > > > > > > > >>
> > > > > > > > >> Holding the mutex doesn't prevent preemption/migration.
> > > > > > > > >
> > > > > > > > > Sure, however, is this also applicable to holding the mutex of a
> > per-
> > > > cpu
> > > > > > > > > structure obtained via raw_cpu_ptr()?
> > > > > > > >
> > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to
> > protect
> > > > > > > > this section.
> > > > > > >
> > > > > > > Ok.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu prior
> > to
> > > > > > > > > the migration (in the UAF scenario you described) from being
> > > > deleted?
> > > > > > > >
> > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to
> > prevent
> > > > the
> > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx
> > from
> > > > being
> > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not use
> > the
> > > > > > > >
> > > > > > > > Right, refcount solution from Johannes is very good IMHO.
> > > > > > > >
> > > > > > > > > acomp_ctx at all for the check, instead use a boolean.
> > > > > > > >
> > > > > > > > But this is not enough to just avoid using acomp_ctx for the check,
> > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu
> > offline
> > > > > > > > can kick in anytime to free the acomp_ctx->buffer.
> > > > > > >
> > > > > > > I see. How would the refcounts work? Would this add latency to
> > zswap
> > > > > > > ops? In low memory situations, could the cpu offlining code over-ride
> > > > > > > the refcounts?
> > > > > >
> > > > > > I think what Johannes meant is that the zswap compress/decompress
> > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU
> > > > > > offlining code only drops the initial ref, and does not free the
> > > > > > buffer directly. The buffer is only freed when the ref drops to zero.
> > > > > >
> > > > > > I am not familiar with CPU hotplug, would it be simpler if we have a
> > > > > > wrapper like get_acomp_ctx() that disables migration or calls
> > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
> > > > > > wrapper, put_acompt_ctx() will be used after we are done using the
> > > > > > acomp_ctx.
> > > > >
> > > > > Would it be sufficient to add a check for mutex_is_locked() in
> > > > > zswap_cpu_comp_dead() and if this returns true, to exit without
> > deleting
> > > > > the acomp?
> > > >
> > > > I don't think this works. First of all, it's racy. It's possible the
> > > > mutex gets locked after we check mutex_is_locked() but before we
> > > > delete the acomp_ctx. Also, if we find that the mutex is locked, then
> > > > we do nothing and essentially leak the memory.
> > >
> > > Yes, this would assume the cpu offlining code retries at some interval,
> > > which could prevent the memory leak.
> >
> > I am not sure about that, but even so, it wouldn't handle the first
> > scenario where the mutex gets locked after we check mutex_is_locked().
> >
> > >
> > > >
> > > > Second, and probably more important, this only checks if anyone is
> > > > currently holding the mutex. What about tasks that may be sleeping
> > > > waiting for the mutex to be unlocked? The mutex will be deleted from
> > > > under them as well.
> > >
> > > Wouldn't this and the race described above, also be issues for the
> > > refcount based approach?
> >
> > I don't think so, at least if implemented correctly. There are a lot
> > of examples around the kernel that use RCU + refcounts for such use
> > cases. I think there are also some examples in kernel docs.
> >
> > That being said, I am wondering if we can get away with something
> > simpler like holding the cpus read lock or disabling migration as I
> > suggested earlier, but I am not quite sure.
>
> Another idea to consider is how zsmalloc avoids this issue through
> its use of the local_lock() on the per-cpu mapping area. This disables
> preemption from zs_map_object() through zs_unmap_object().
> Would changing the acomp_ctx's mutex to a local_lock solve the
> problem?

This is similar to disabling migration as I suggested, but disabling
preemption means that we cannot sleep, we spin on a lock instead.

In zswap_compress(), we send the compression request and may sleep
waiting for it. We also make a non-atomic allocation later that may
also sleep but that's less of a problem.

In zswap_decompress() we may also sleep, which is why we sometimes
copy the data into acomp_ctx->buffer and unmap the handle to begin
with.

So I don't think we can just replace the mutex with a lock.

>
> >
> > >
> > > Also, I am wondering if the mutex design already handles cases where
> > > tasks are sleeping, waiting for a mutex that disappears?
> >
> > I don't believe so. It doesn't make sense for someone to free a mutex
> > while someone is waiting for it. How would the waiter know if the
> > memory backing the mutex was freed?
>
> Thanks Yosry, all good points. There would need to be some sort of
> arbiter (for e.g., the cpu offlining code) that would reschedule tasks
> running on a cpu before shutting it down, which could address
> this specific issue. I was thinking these are not problems unique to
> zswap's per-cpu acomp_ctx->mutex wrt the offlining?

There are a few reasons why zswap has this problem and other code may
not have it. For example the data structure is dynamically allocated
and is freed during offlining, it wouldn't be a problem if it was
static. Also the fact that we don't disable preemption when accessing
the per-CPU data, as I mentioned earlier, which would prevent the CPU
we are running on from going offline while we access the per-CPU data.

I think we should either:
(a) Use refcounts.
(b) Disable migration.
(c) Hold the CPUs read lock.

I was hoping someone with more knowledge about CPU offlining would
confirm (b) and (c) would work, but I am pretty confident they would.
Sridhar, Kanchana P Nov. 20, 2024, midnight UTC | #25
> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@google.com>
> Sent: Tuesday, November 19, 2024 3:45 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> zswap_decompress().
> 
> On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yosry Ahmed <yosryahmed@google.com>
> > > Sent: Tuesday, November 19, 2024 11:51 AM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-
> foundation.org;
> > > Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > zswap_decompress().
> > >
> > > On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P
> > > <kanchana.p.sridhar@intel.com> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > > Sent: Tuesday, November 19, 2024 11:27 AM
> > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
> > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
> > > > > kernel@vger.kernel.org; linux-mm@kvack.org;
> usamaarif642@gmail.com;
> > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
> > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> <vinodh.gopal@intel.com>
> > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > zswap_decompress().
> > > > >
> > > > > On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
> > > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Yosry Ahmed <yosryahmed@google.com>
> > > > > > > Sent: Friday, November 15, 2024 1:49 PM
> > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > > > Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes
> Weiner
> > > > > > > <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>;
> linux-
> > > > > > > kernel@vger.kernel.org; linux-mm@kvack.org;
> > > usamaarif642@gmail.com;
> > > > > > > ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
> > > > > > > 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi
> K
> > > > > > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > <vinodh.gopal@intel.com>
> > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
> > > > > > > zswap_decompress().
> > > > > > >
> > > > > > > On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
> > > > > > > <kanchana.p.sridhar@intel.com> wrote:
> > > > > > > >
> > > > > > > > Hi Chengming,
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Chengming Zhou <chengming.zhou@linux.dev>
> > > > > > > > > Sent: Wednesday, November 13, 2024 11:24 PM
> > > > > > > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>;
> > > Johannes
> > > > > > > Weiner
> > > > > > > > > <hannes@cmpxchg.org>
> > > > > > > > > Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > > > > <yosryahmed@google.com>; linux-kernel@vger.kernel.org;
> linux-
> > > > > > > > > mm@kvack.org; usamaarif642@gmail.com;
> > > ryan.roberts@arm.com;
> > > > > > > Huang,
> > > > > > > > > Ying <ying.huang@intel.com>; 21cnbao@gmail.com;
> akpm@linux-
> > > > > > > > > foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
> > > Gopal,
> > > > > > > Vinodh
> > > > > > > > > <vinodh.gopal@intel.com>
> > > > > > > > > Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory
> leak in
> > > > > > > > > zswap_decompress().
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
> > > > > > > > > >
> > > > > > > > > >> -----Original Message-----
> > > > > > > > > >> From: Johannes Weiner <hannes@cmpxchg.org>
> > > > > > > > > >> Sent: Wednesday, November 13, 2024 9:12 PM
> > > > > > > > > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
> > > > > > > > > >> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
> > > > > > > > > >> <yosryahmed@google.com>; linux-kernel@vger.kernel.org;
> > > linux-
> > > > > > > > > >> mm@kvack.org; chengming.zhou@linux.dev;
> > > > > > > usamaarif642@gmail.com;
> > > > > > > > > >> ryan.roberts@arm.com; Huang, Ying
> <ying.huang@intel.com>;
> > > > > > > > > >> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali,
> > > Wajdi K
> > > > > > > > > >> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
> > > > > <vinodh.gopal@intel.com>
> > > > > > > > > >> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory
> > > leak in
> > > > > > > > > >> zswap_decompress().
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar,
> Kanchana
> > > P
> > > > > > > wrote:
> > > > > > > > > >>> So my question was, can we prevent the migration to a
> > > different
> > > > > cpu
> > > > > > > > > >>> by relinquishing the mutex lock after this conditional
> > > > > > > > > >>
> > > > > > > > > >> Holding the mutex doesn't prevent preemption/migration.
> > > > > > > > > >
> > > > > > > > > > Sure, however, is this also applicable to holding the mutex of
> a
> > > per-
> > > > > cpu
> > > > > > > > > > structure obtained via raw_cpu_ptr()?
> > > > > > > > >
> > > > > > > > > Yes, unless you use migration_disable() or cpus_read_lock() to
> > > protect
> > > > > > > > > this section.
> > > > > > > >
> > > > > > > > Ok.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Would holding the mutex prevent the acomp_ctx of the cpu
> prior
> > > to
> > > > > > > > > > the migration (in the UAF scenario you described) from being
> > > > > deleted?
> > > > > > > > >
> > > > > > > > > No, cpu offline can kick in anytime to free the acomp_ctx-
> >buffer.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If holding the per-cpu acomp_ctx's mutex isn't sufficient to
> > > prevent
> > > > > the
> > > > > > > > > > UAF, I agree, we might need a way to prevent the acomp_ctx
> > > from
> > > > > being
> > > > > > > > > > deleted, e.g. with refcounts as you've suggested, or to not
> use
> > > the
> > > > > > > > >
> > > > > > > > > Right, refcount solution from Johannes is very good IMHO.
> > > > > > > > >
> > > > > > > > > > acomp_ctx at all for the check, instead use a boolean.
> > > > > > > > >
> > > > > > > > > But this is not enough to just avoid using acomp_ctx for the
> check,
> > > > > > > > > the usage of acomp_ctx inside the mutex is also UAF, since cpu
> > > offline
> > > > > > > > > can kick in anytime to free the acomp_ctx->buffer.
> > > > > > > >
> > > > > > > > I see. How would the refcounts work? Would this add latency to
> > > zswap
> > > > > > > > ops? In low memory situations, could the cpu offlining code over-
> ride
> > > > > > > > the refcounts?
> > > > > > >
> > > > > > > I think what Johannes meant is that the zswap
> compress/decompress
> > > > > > > paths grab a ref on the acomp_ctx before using it, and the CPU
> > > > > > > offlining code only drops the initial ref, and does not free the
> > > > > > > buffer directly. The buffer is only freed when the ref drops to zero.
> > > > > > >
> > > > > > > I am not familiar with CPU hotplug, would it be simpler if we have
> a
> > > > > > > wrapper like get_acomp_ctx() that disables migration or calls
> > > > > > > cpus_read_lock() before grabbing the per-CPU acomp_ctx? A
> similar
> > > > > > > wrapper, put_acompt_ctx() will be used after we are done using
> the
> > > > > > > acomp_ctx.
> > > > > >
> > > > > > Would it be sufficient to add a check for mutex_is_locked() in
> > > > > > zswap_cpu_comp_dead() and if this returns true, to exit without
> > > deleting
> > > > > > the acomp?
> > > > >
> > > > > I don't think this works. First of all, it's racy. It's possible the
> > > > > mutex gets locked after we check mutex_is_locked() but before we
> > > > > delete the acomp_ctx. Also, if we find that the mutex is locked, then
> > > > > we do nothing and essentially leak the memory.
> > > >
> > > > Yes, this would assume the cpu offlining code retries at some interval,
> > > > which could prevent the memory leak.
> > >
> > > I am not sure about that, but even so, it wouldn't handle the first
> > > scenario where the mutex gets locked after we check mutex_is_locked().
> > >
> > > >
> > > > >
> > > > > Second, and probably more important, this only checks if anyone is
> > > > > currently holding the mutex. What about tasks that may be sleeping
> > > > > waiting for the mutex to be unlocked? The mutex will be deleted from
> > > > > under them as well.
> > > >
> > > > Wouldn't this and the race described above, also be issues for the
> > > > refcount based approach?
> > >
> > > I don't think so, at least if implemented correctly. There are a lot
> > > of examples around the kernel that use RCU + refcounts for such use
> > > cases. I think there are also some examples in kernel docs.
> > >
> > > That being said, I am wondering if we can get away with something
> > > simpler like holding the cpus read lock or disabling migration as I
> > > suggested earlier, but I am not quite sure.
> >
> > Another idea to consider is how zsmalloc avoids this issue through
> > its use of the local_lock() on the per-cpu mapping area. This disables
> > preemption from zs_map_object() through zs_unmap_object().
> > Would changing the acomp_ctx's mutex to a local_lock solve the
> > problem?
> 
> This is similar to disabling migration as I suggested, but disabling
> preemption means that we cannot sleep, we spin on a lock instead.
> 
> In zswap_compress(), we send the compression request and may sleep
> waiting for it. We also make a non-atomic allocation later that may
> also sleep but that's less of a problem.
> 
> In zswap_decompress() we may also sleep, which is why we sometimes
> copy the data into acomp_ctx->buffer and unmap the handle to begin
> with.
> 
> So I don't think we can just replace the mutex with a lock.

Thanks Yosry, for the explanations. Yes, I understand and agree.

> 
> >
> > >
> > > >
> > > > Also, I am wondering if the mutex design already handles cases where
> > > > tasks are sleeping, waiting for a mutex that disappears?
> > >
> > > I don't believe so. It doesn't make sense for someone to free a mutex
> > > while someone is waiting for it. How would the waiter know if the
> > > memory backing the mutex was freed?
> >
> > Thanks Yosry, all good points. There would need to be some sort of
> > arbiter (for e.g., the cpu offlining code) that would reschedule tasks
> > running on a cpu before shutting it down, which could address
> > this specific issue. I was thinking these are not problems unique to
> > zswap's per-cpu acomp_ctx->mutex wrt the offlining?
> 
> There are a few reasons why zswap has this problem and other code may
> not have it. For example the data structure is dynamically allocated
> and is freed during offlining, it wouldn't be a problem if it was
> static. Also the fact that we don't disable preemption when accessing
> the per-CPU data, as I mentioned earlier, which would prevent the CPU
> we are running on from going offline while we access the per-CPU data.
> 
> I think we should either:
> (a) Use refcounts.
> (b) Disable migration.
> (c) Hold the CPUs read lock.
> 
> I was hoping someone with more knowledge about CPU offlining would
> confirm (b) and (c) would work, but I am pretty confident they would.

Ok. Thanks again for the explanations.

Thanks,
Kanchana
Chengming Zhou Nov. 20, 2024, 2:31 a.m. UTC | #26
On 2024/11/20 07:44, Yosry Ahmed wrote:
> On Tue, Nov 19, 2024 at 2:35 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@intel.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: Yosry Ahmed <yosryahmed@google.com>
>>> Sent: Tuesday, November 19, 2024 11:51 AM
>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>>> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
>>> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
>>> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
>>> ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org;
>>> Feghali, Wajdi K <wajdi.k.feghali@intel.com>; Gopal, Vinodh
>>> <vinodh.gopal@intel.com>
>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
>>> zswap_decompress().
>>>
>>> On Tue, Nov 19, 2024 at 11:42 AM Sridhar, Kanchana P
>>> <kanchana.p.sridhar@intel.com> wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Yosry Ahmed <yosryahmed@google.com>
>>>>> Sent: Tuesday, November 19, 2024 11:27 AM
>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>>>>> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
>>>>> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
>>>>> kernel@vger.kernel.org; linux-mm@kvack.org; usamaarif642@gmail.com;
>>>>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
>>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>
>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
>>>>> zswap_decompress().
>>>>>
>>>>> On Tue, Nov 19, 2024 at 11:22 AM Sridhar, Kanchana P
>>>>> <kanchana.p.sridhar@intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Yosry Ahmed <yosryahmed@google.com>
>>>>>>> Sent: Friday, November 15, 2024 1:49 PM
>>>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>>>>>>> Cc: Chengming Zhou <chengming.zhou@linux.dev>; Johannes Weiner
>>>>>>> <hannes@cmpxchg.org>; Nhat Pham <nphamcs@gmail.com>; linux-
>>>>>>> kernel@vger.kernel.org; linux-mm@kvack.org;
>>> usamaarif642@gmail.com;
>>>>>>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
>>>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali, Wajdi K
>>>>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
>>> <vinodh.gopal@intel.com>
>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
>>>>>>> zswap_decompress().
>>>>>>>
>>>>>>> On Fri, Nov 15, 2024 at 1:14 PM Sridhar, Kanchana P
>>>>>>> <kanchana.p.sridhar@intel.com> wrote:
>>>>>>>>
>>>>>>>> Hi Chengming,
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Chengming Zhou <chengming.zhou@linux.dev>
>>>>>>>>> Sent: Wednesday, November 13, 2024 11:24 PM
>>>>>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>;
>>> Johannes
>>>>>>> Weiner
>>>>>>>>> <hannes@cmpxchg.org>
>>>>>>>>> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
>>>>>>>>> <yosryahmed@google.com>; linux-kernel@vger.kernel.org; linux-
>>>>>>>>> mm@kvack.org; usamaarif642@gmail.com;
>>> ryan.roberts@arm.com;
>>>>>>> Huang,
>>>>>>>>> Ying <ying.huang@intel.com>; 21cnbao@gmail.com; akpm@linux-
>>>>>>>>> foundation.org; Feghali, Wajdi K <wajdi.k.feghali@intel.com>;
>>> Gopal,
>>>>>>> Vinodh
>>>>>>>>> <vinodh.gopal@intel.com>
>>>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory leak in
>>>>>>>>> zswap_decompress().
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> On 2024/11/14 14:37, Sridhar, Kanchana P wrote:
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Johannes Weiner <hannes@cmpxchg.org>
>>>>>>>>>>> Sent: Wednesday, November 13, 2024 9:12 PM
>>>>>>>>>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com>
>>>>>>>>>>> Cc: Nhat Pham <nphamcs@gmail.com>; Yosry Ahmed
>>>>>>>>>>> <yosryahmed@google.com>; linux-kernel@vger.kernel.org;
>>> linux-
>>>>>>>>>>> mm@kvack.org; chengming.zhou@linux.dev;
>>>>>>> usamaarif642@gmail.com;
>>>>>>>>>>> ryan.roberts@arm.com; Huang, Ying <ying.huang@intel.com>;
>>>>>>>>>>> 21cnbao@gmail.com; akpm@linux-foundation.org; Feghali,
>>> Wajdi K
>>>>>>>>>>> <wajdi.k.feghali@intel.com>; Gopal, Vinodh
>>>>> <vinodh.gopal@intel.com>
>>>>>>>>>>> Subject: Re: [PATCH v1] mm: zswap: Fix a potential memory
>>> leak in
>>>>>>>>>>> zswap_decompress().
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 14, 2024 at 01:56:16AM +0000, Sridhar, Kanchana
>>> P
>>>>>>> wrote:
>>>>>>>>>>>> So my question was, can we prevent the migration to a
>>> different
>>>>> cpu
>>>>>>>>>>>> by relinquishing the mutex lock after this conditional
>>>>>>>>>>>
>>>>>>>>>>> Holding the mutex doesn't prevent preemption/migration.
>>>>>>>>>>
>>>>>>>>>> Sure, however, is this also applicable to holding the mutex of a
>>> per-
>>>>> cpu
>>>>>>>>>> structure obtained via raw_cpu_ptr()?
>>>>>>>>>
>>>>>>>>> Yes, unless you use migration_disable() or cpus_read_lock() to
>>> protect
>>>>>>>>> this section.
>>>>>>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Would holding the mutex prevent the acomp_ctx of the cpu prior
>>> to
>>>>>>>>>> the migration (in the UAF scenario you described) from being
>>>>> deleted?
>>>>>>>>>
>>>>>>>>> No, cpu offline can kick in anytime to free the acomp_ctx->buffer.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If holding the per-cpu acomp_ctx's mutex isn't sufficient to
>>> prevent
>>>>> the
>>>>>>>>>> UAF, I agree, we might need a way to prevent the acomp_ctx
>>> from
>>>>> being
>>>>>>>>>> deleted, e.g. with refcounts as you've suggested, or to not use
>>> the
>>>>>>>>>
>>>>>>>>> Right, refcount solution from Johannes is very good IMHO.
>>>>>>>>>
>>>>>>>>>> acomp_ctx at all for the check, instead use a boolean.
>>>>>>>>>
>>>>>>>>> But this is not enough to just avoid using acomp_ctx for the check,
>>>>>>>>> the usage of acomp_ctx inside the mutex is also UAF, since cpu
>>> offline
>>>>>>>>> can kick in anytime to free the acomp_ctx->buffer.
>>>>>>>>
>>>>>>>> I see. How would the refcounts work? Would this add latency to
>>> zswap
>>>>>>>> ops? In low memory situations, could the cpu offlining code over-ride
>>>>>>>> the refcounts?
>>>>>>>
>>>>>>> I think what Johannes meant is that the zswap compress/decompress
>>>>>>> paths grab a ref on the acomp_ctx before using it, and the CPU
>>>>>>> offlining code only drops the initial ref, and does not free the
>>>>>>> buffer directly. The buffer is only freed when the ref drops to zero.
>>>>>>>
>>>>>>> I am not familiar with CPU hotplug, would it be simpler if we have a
>>>>>>> wrapper like get_acomp_ctx() that disables migration or calls
>>>>>>> cpus_read_lock() before grabbing the per-CPU acomp_ctx? A similar
>>>>>>> wrapper, put_acompt_ctx() will be used after we are done using the
>>>>>>> acomp_ctx.
>>>>>>
>>>>>> Would it be sufficient to add a check for mutex_is_locked() in
>>>>>> zswap_cpu_comp_dead() and if this returns true, to exit without
>>> deleting
>>>>>> the acomp?
>>>>>
>>>>> I don't think this works. First of all, it's racy. It's possible the
>>>>> mutex gets locked after we check mutex_is_locked() but before we
>>>>> delete the acomp_ctx. Also, if we find that the mutex is locked, then
>>>>> we do nothing and essentially leak the memory.
>>>>
>>>> Yes, this would assume the cpu offlining code retries at some interval,
>>>> which could prevent the memory leak.
>>>
>>> I am not sure about that, but even so, it wouldn't handle the first
>>> scenario where the mutex gets locked after we check mutex_is_locked().
>>>
>>>>
>>>>>
>>>>> Second, and probably more important, this only checks if anyone is
>>>>> currently holding the mutex. What about tasks that may be sleeping
>>>>> waiting for the mutex to be unlocked? The mutex will be deleted from
>>>>> under them as well.
>>>>
>>>> Wouldn't this and the race described above, also be issues for the
>>>> refcount based approach?
>>>
>>> I don't think so, at least if implemented correctly. There are a lot
>>> of examples around the kernel that use RCU + refcounts for such use
>>> cases. I think there are also some examples in kernel docs.
>>>
>>> That being said, I am wondering if we can get away with something
>>> simpler like holding the cpus read lock or disabling migration as I
>>> suggested earlier, but I am not quite sure.
>>
>> Another idea to consider is how zsmalloc avoids this issue through
>> its use of the local_lock() on the per-cpu mapping area. This disables
>> preemption from zs_map_object() through zs_unmap_object().
>> Would changing the acomp_ctx's mutex to a local_lock solve the
>> problem?
> 
> This is similar to disabling migration as I suggested, but disabling
> preemption means that we cannot sleep, we spin on a lock instead.
> 
> In zswap_compress(), we send the compression request and may sleep
> waiting for it. We also make a non-atomic allocation later that may
> also sleep but that's less of a problem.
> 
> In zswap_decompress() we may also sleep, which is why we sometimes
> copy the data into acomp_ctx->buffer and unmap the handle to begin
> with.
> 
> So I don't think we can just replace the mutex with a lock.
> 
>>
>>>
>>>>
>>>> Also, I am wondering if the mutex design already handles cases where
>>>> tasks are sleeping, waiting for a mutex that disappears?
>>>
>>> I don't believe so. It doesn't make sense for someone to free a mutex
>>> while someone is waiting for it. How would the waiter know if the
>>> memory backing the mutex was freed?
>>
>> Thanks Yosry, all good points. There would need to be some sort of
>> arbiter (for e.g., the cpu offlining code) that would reschedule tasks
>> running on a cpu before shutting it down, which could address
>> this specific issue. I was thinking these are not problems unique to
>> zswap's per-cpu acomp_ctx->mutex wrt the offlining?
> 
> There are a few reasons why zswap has this problem and other code may
> not have it. For example the data structure is dynamically allocated
> and is freed during offlining, it wouldn't be a problem if it was
> static. Also the fact that we don't disable preemption when accessing
> the per-CPU data, as I mentioned earlier, which would prevent the CPU
> we are running on from going offline while we access the per-CPU data.
> 
> I think we should either:
> (a) Use refcounts.
> (b) Disable migration.
> (c) Hold the CPUs read lock.
> 
> I was hoping someone with more knowledge about CPU offlining would
> confirm (b) and (c) would work, but I am pretty confident they would.

+1, I also think (b) or (c) should work with my limited knowledge about
CPU offlining :-) And they seem simpler than refcounts implementation.

Thanks.
diff mbox series

Patch

diff --git a/mm/zswap.c b/mm/zswap.c
index f6316b66fb23..58810fa8ff23 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -986,10 +986,11 @@  static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
 	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
 	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
-	mutex_unlock(&acomp_ctx->mutex);
 
 	if (src != acomp_ctx->buffer)
 		zpool_unmap_handle(zpool, entry->handle);
+
+	mutex_unlock(&acomp_ctx->mutex);
 }
 
 /*********************************