diff mbox series

[RFC,1/8] mm: add overflow and underflow checks for page->_refcount

Message ID 20211026173822.502506-2-pasha.tatashin@soleen.com (mailing list archive)
State New
Headers show
Series Hardening page _refcount | expand

Commit Message

Pasha Tatashin Oct. 26, 2021, 5:38 p.m. UTC
The problems with page->_refcount are hard to debug, because usually
when they are detected, the damage has occurred a long time ago. Yet,
the problems with invalid page refcount may be catastrophic and lead to
memory corruptions.

Reduce the scope of when the _refcount problems manifest themselves by
adding checks for underflows and overflows into functions that modify
_refcount.

Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
 include/linux/page_ref.h | 61 ++++++++++++++++++++++++++++++++--------
 1 file changed, 49 insertions(+), 12 deletions(-)

Comments

Matthew Wilcox Oct. 26, 2021, 7:48 p.m. UTC | #1
On Tue, Oct 26, 2021 at 05:38:15PM +0000, Pasha Tatashin wrote:
>  static inline void page_ref_add(struct page *page, int nr)
>  {
> -	atomic_add(nr, &page->_refcount);
> +	int ret;
> +
> +	VM_BUG_ON(nr <= 0);
> +	ret = atomic_add_return(nr, &page->_refcount);
> +	VM_BUG_ON_PAGE(ret <= 0, page);

This isn't right.  _refcount is allowed to overflow into the negatives.
See page_ref_zero_or_close_to_overflow() and the conversations that led
to it being added.
Pasha Tatashin Oct. 26, 2021, 9:34 p.m. UTC | #2
On Tue, Oct 26, 2021 at 3:50 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 26, 2021 at 05:38:15PM +0000, Pasha Tatashin wrote:
> >  static inline void page_ref_add(struct page *page, int nr)
> >  {
> > -     atomic_add(nr, &page->_refcount);
> > +     int ret;
> > +
> > +     VM_BUG_ON(nr <= 0);
> > +     ret = atomic_add_return(nr, &page->_refcount);
> > +     VM_BUG_ON_PAGE(ret <= 0, page);
>
> This isn't right.  _refcount is allowed to overflow into the negatives.
> See page_ref_zero_or_close_to_overflow() and the conversations that led
> to it being added.

#define page_ref_zero_or_close_to_overflow(page) \
1204   ((unsigned int) page_ref_count(page) + 127u <= 127u)


Uh, right, I saw the macro but did not realize there was an (unsigned int) cast.

OK, I think we can move this macro inside:
include/linux/page_ref.h

modify it to something like this:
#define page_ref_zero_or_close_to_overflow(page) \
   ((unsigned int) page_ref_count(page) + v + 127u <= v + 127u)

The sub/dec can also be fixed to ensure that we do not underflow but
still working with the fact that we use all 32bits of _refcount.

Pasha
Pasha Tatashin Oct. 27, 2021, 1:21 a.m. UTC | #3
On Tue, Oct 26, 2021 at 5:34 PM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> On Tue, Oct 26, 2021 at 3:50 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Oct 26, 2021 at 05:38:15PM +0000, Pasha Tatashin wrote:
> > >  static inline void page_ref_add(struct page *page, int nr)
> > >  {
> > > -     atomic_add(nr, &page->_refcount);
> > > +     int ret;
> > > +
> > > +     VM_BUG_ON(nr <= 0);
> > > +     ret = atomic_add_return(nr, &page->_refcount);
> > > +     VM_BUG_ON_PAGE(ret <= 0, page);
> >
> > This isn't right.  _refcount is allowed to overflow into the negatives.
> > See page_ref_zero_or_close_to_overflow() and the conversations that led
> > to it being added.
>
> #define page_ref_zero_or_close_to_overflow(page) \
> 1204   ((unsigned int) page_ref_count(page) + 127u <= 127u)
>
>
> Uh, right, I saw the macro but did not realize there was an (unsigned int) cast.
>
> OK, I think we can move this macro inside:
> include/linux/page_ref.h
>
> modify it to something like this:
> #define page_ref_zero_or_close_to_overflow(page) \
>    ((unsigned int) page_ref_count(page) + v + 127u <= v + 127u)
>
> The sub/dec can also be fixed to ensure that we do not underflow but
> still working with the fact that we use all 32bits of _refcount.

I think we can do that by using:
atomic_fetch_*() and check for overflow/underflow after operation. I
will send the updated series soon.

Pasha
Matthew Wilcox Oct. 27, 2021, 3:04 a.m. UTC | #4
On Tue, Oct 26, 2021 at 09:21:29PM -0400, Pasha Tatashin wrote:
> I think we can do that by using:
> atomic_fetch_*() and check for overflow/underflow after operation. I
> will send the updated series soon.

Please include performance measurements.  You're adding code to some hot
paths.
Muchun Song Oct. 27, 2021, 7:46 a.m. UTC | #5
On Wed, Oct 27, 2021 at 1:38 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> The problems with page->_refcount are hard to debug, because usually
> when they are detected, the damage has occurred a long time ago. Yet,
> the problems with invalid page refcount may be catastrophic and lead to
> memory corruptions.
>
> Reduce the scope of when the _refcount problems manifest themselves by
> adding checks for underflows and overflows into functions that modify
> _refcount.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>

I found some atomic_add/dec are replaced with atomic_add/dec_return,
those helpers with return value imply a full memory barrier around it, but
others without return value do not. Do you have any numbers to show
the impact? Maybe atomic_add/dec_return_relaxed can help this.

Thanks.
Pasha Tatashin Oct. 27, 2021, 6:22 p.m. UTC | #6
> I found some atomic_add/dec are replaced with atomic_add/dec_return,

I am going to replace -return variants with -fetch variants, potentially -fetch

> those helpers with return value imply a full memory barrier around it, but
> others without return value do not. Do you have any numbers to show
> the impact? Maybe atomic_add/dec_return_relaxed can help this.

The generic variant uses  arch_cmpxchg() for all atomic variants
without any extra barriers. Therefore, on platforms that use generic
implementations there won't be performance differences except for an
extra branch that checks results when VM_BUG_ON is enabled.

On x86 the difference between the two is the following

atomic_add:
   lock add %eax,(%rsi)

atomic_fetch_add:
   lock xadd %eax,(%rsi)

atomic_fetch_add_relaxed:
   lock xadd %eax,(%rsi)

No differences between relaxed and non relaxed variants. However, we
used lock xadd instead of lock add. I am not sure if the performance
difference is going to be different.

Pasha
Pasha Tatashin Oct. 27, 2021, 6:22 p.m. UTC | #7
On Tue, Oct 26, 2021 at 11:05 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 26, 2021 at 09:21:29PM -0400, Pasha Tatashin wrote:
> > I think we can do that by using:
> > atomic_fetch_*() and check for overflow/underflow after operation. I
> > will send the updated series soon.
>
> Please include performance measurements.  You're adding code to some hot
> paths.

I will do boot performance tests to ensure we do not regress
initializing "struct pages" on larger machines. Do you have a
suggestion for another perf test?

Thanks,
Pasha
Muchun Song Oct. 28, 2021, 4:08 a.m. UTC | #8
On Thu, Oct 28, 2021 at 2:22 AM Pasha Tatashin
<pasha.tatashin@soleen.com> wrote:
>
> > I found some atomic_add/dec are replaced with atomic_add/dec_return,
>
> I am going to replace -return variants with -fetch variants, potentially -fetch
>
> > those helpers with return value imply a full memory barrier around it, but
> > others without return value do not. Do you have any numbers to show
> > the impact? Maybe atomic_add/dec_return_relaxed can help this.
>
> The generic variant uses  arch_cmpxchg() for all atomic variants
> without any extra barriers. Therefore, on platforms that use generic
> implementations there won't be performance differences except for an
> extra branch that checks results when VM_BUG_ON is enabled.
>
> On x86 the difference between the two is the following
>
> atomic_add:
>    lock add %eax,(%rsi)
>
> atomic_fetch_add:
>    lock xadd %eax,(%rsi)
>
> atomic_fetch_add_relaxed:
>    lock xadd %eax,(%rsi)
>
> No differences between relaxed and non relaxed variants. However, we

Right. There is no difference on x86. Maybe there are differences in
other architectures.

> used lock xadd instead of lock add. I am not sure if the performance
> difference is going to be different.
>
> Pasha
diff mbox series

Patch

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 7ad46f45df39..b3ec2b231fc7 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -90,21 +90,35 @@  static inline void init_page_count(struct page *page)
 
 static inline void page_ref_add(struct page *page, int nr)
 {
-	atomic_add(nr, &page->_refcount);
+	int ret;
+
+	VM_BUG_ON(nr <= 0);
+	ret = atomic_add_return(nr, &page->_refcount);
+	VM_BUG_ON_PAGE(ret <= 0, page);
+
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, nr);
 }
 
 static inline void page_ref_sub(struct page *page, int nr)
 {
-	atomic_sub(nr, &page->_refcount);
+	int ret;
+
+	VM_BUG_ON(nr <= 0);
+	ret = atomic_sub_return(nr, &page->_refcount);
+	VM_BUG_ON_PAGE(ret < 0, page);
+
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, -nr);
 }
 
 static inline int page_ref_sub_return(struct page *page, int nr)
 {
-	int ret = atomic_sub_return(nr, &page->_refcount);
+	int ret;
+
+	VM_BUG_ON(nr <= 0);
+	ret = atomic_sub_return(nr, &page->_refcount);
+	VM_BUG_ON_PAGE(ret < 0, page);
 
 	if (page_ref_tracepoint_active(page_ref_mod_and_return))
 		__page_ref_mod_and_return(page, -nr, ret);
@@ -113,31 +127,43 @@  static inline int page_ref_sub_return(struct page *page, int nr)
 
 static inline void page_ref_inc(struct page *page)
 {
-	atomic_inc(&page->_refcount);
+	int ret = atomic_inc_return(&page->_refcount);
+
+	VM_BUG_ON_PAGE(ret <= 0, page);
+
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, 1);
 }
 
 static inline void page_ref_dec(struct page *page)
 {
-	atomic_dec(&page->_refcount);
+	int ret = atomic_dec_return(&page->_refcount);
+
+	VM_BUG_ON_PAGE(ret < 0, page);
+
 	if (page_ref_tracepoint_active(page_ref_mod))
 		__page_ref_mod(page, -1);
 }
 
 static inline int page_ref_sub_and_test(struct page *page, int nr)
 {
-	int ret = atomic_sub_and_test(nr, &page->_refcount);
+	int ret;
+
+	VM_BUG_ON(nr <= 0);
+	ret = atomic_sub_return(nr, &page->_refcount);
+	VM_BUG_ON_PAGE(ret < 0, page);
 
 	if (page_ref_tracepoint_active(page_ref_mod_and_test))
 		__page_ref_mod_and_test(page, -nr, ret);
-	return ret;
+	return ret == 0;
 }
 
 static inline int page_ref_inc_return(struct page *page)
 {
 	int ret = atomic_inc_return(&page->_refcount);
 
+	VM_BUG_ON_PAGE(ret <= 0, page);
+
 	if (page_ref_tracepoint_active(page_ref_mod_and_return))
 		__page_ref_mod_and_return(page, 1, ret);
 	return ret;
@@ -145,17 +171,21 @@  static inline int page_ref_inc_return(struct page *page)
 
 static inline int page_ref_dec_and_test(struct page *page)
 {
-	int ret = atomic_dec_and_test(&page->_refcount);
+	int ret = atomic_dec_return(&page->_refcount);
+
+	VM_BUG_ON_PAGE(ret < 0, page);
 
 	if (page_ref_tracepoint_active(page_ref_mod_and_test))
 		__page_ref_mod_and_test(page, -1, ret);
-	return ret;
+	return ret == 0;
 }
 
 static inline int page_ref_dec_return(struct page *page)
 {
 	int ret = atomic_dec_return(&page->_refcount);
 
+	VM_BUG_ON_PAGE(ret < 0, page);
+
 	if (page_ref_tracepoint_active(page_ref_mod_and_return))
 		__page_ref_mod_and_return(page, -1, ret);
 	return ret;
@@ -163,16 +193,23 @@  static inline int page_ref_dec_return(struct page *page)
 
 static inline int page_ref_add_unless(struct page *page, int nr, int u)
 {
-	int ret = atomic_add_unless(&page->_refcount, nr, u);
+	int ret;
+
+	VM_BUG_ON(nr <= 0 || u < 0);
+	ret = atomic_fetch_add_unless(&page->_refcount, nr, u);
+	VM_BUG_ON_PAGE(ret < 0, page);
 
 	if (page_ref_tracepoint_active(page_ref_mod_unless))
 		__page_ref_mod_unless(page, nr, ret);
-	return ret;
+	return ret != u;
 }
 
 static inline int page_ref_freeze(struct page *page, int count)
 {
-	int ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
+	int ret;
+
+	VM_BUG_ON(count <= 0);
+	ret = likely(atomic_cmpxchg(&page->_refcount, count, 0) == count);
 
 	if (page_ref_tracepoint_active(page_ref_freeze))
 		__page_ref_freeze(page, count, ret);