diff mbox series

[RFC,2/2] mm: readahead: handle LARGE input to get_init_ra_size()

Message ID 20201220211051.1416-1-rdunlap@infradead.org (mailing list archive)
State New, archived
Headers show
Series [RFC,1/2] log2: handle LARGE input to __roundup_pow_of_two() | expand

Commit Message

Randy Dunlap Dec. 20, 2020, 9:10 p.m. UTC
Add a test to detect if the input ra request size has its high order
bit set (is negative when tested as a signed long). This would be a
really Huge readahead.

If so, WARN() with the value and a stack trace so that we can see
where this is happening and then make further corrections later.
Then adjust the size value so that it is not so Huge (although
this may not be needed).

Also correct a comment: the return value is not squared for
small size.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Toralf Förster <toralf.foerster@gmx.de>
Cc: linux-mm@kvack.org
---
Notes:
- Look for "WARNING:.*get_init_ra_size"
- If panic_on_warn is set, this will cause a kernel panic().

 mm/readahead.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Andrew Morton Dec. 23, 2020, 1:35 a.m. UTC | #1
On Sun, 20 Dec 2020 13:10:51 -0800 Randy Dunlap <rdunlap@infradead.org> wrote:

> Add a test to detect if the input ra request size has its high order
> bit set (is negative when tested as a signed long). This would be a
> really Huge readahead.
> 
> If so, WARN() with the value and a stack trace so that we can see
> where this is happening and then make further corrections later.
> Then adjust the size value so that it is not so Huge (although
> this may not be needed).

What motivates this change?  Is there any reason to think this can
happen?

Also, everything in there *should* be unsigned, because a negative
readahead is semantically nonsensical.  Is our handling of this
inherently unsigned quantity incorrect somewhere?

> --- linux-5.10.1.orig/mm/readahead.c
> +++ linux-5.10.1/mm/readahead.c
> 
> ...
>
> @@ -303,14 +304,21 @@ void force_page_cache_ra(struct readahea
>  }
>  
>  /*
> - * Set the initial window size, round to next power of 2 and square
> + * Set the initial window size, round to next power of 2
>   * for small size, x 4 for medium, and x 2 for large
>   * for 128k (32 page) max ra
>   * 1-8 page = 32k initial, > 8 page = 128k initial
>   */
>  static unsigned long get_init_ra_size(unsigned long size, unsigned long max)
>  {
> -	unsigned long newsize = roundup_pow_of_two(size);
> +	unsigned long newsize;
> +
> +	if ((signed long)size < 0) { /* high bit is set: ultra-large ra req */
> +		WARN_ONCE(1, "%s: size=0x%lx\n", __func__, size);
> +		size = -size;	/* really only need to flip the high/sign bit */
> +	}
> +
> +	newsize = roundup_pow_of_two(size);

Is there any way in which userspace can deliberately trigger warning?
Via sys_readadhead() or procfs tuning or whatever?

I guess that permitting a user-triggerable WARN_ONCE() isn't a huuuuge
problem - it isn't a DoS if it only triggers a single time.  It does
permit the malicious user to disable future valid warnings, but I don't
see what incentive there would be for this.  But still, it seems
desirable to avoid it.
Randy Dunlap Dec. 23, 2020, 1:50 a.m. UTC | #2
On 12/22/20 5:35 PM, Andrew Morton wrote:
> On Sun, 20 Dec 2020 13:10:51 -0800 Randy Dunlap <rdunlap@infradead.org> wrote:
> 
>> Add a test to detect if the input ra request size has its high order
>> bit set (is negative when tested as a signed long). This would be a
>> really Huge readahead.
>>
>> If so, WARN() with the value and a stack trace so that we can see
>> where this is happening and then make further corrections later.
>> Then adjust the size value so that it is not so Huge (although
>> this may not be needed).
> 
> What motivates this change?  Is there any reason to think this can
> happen?

Spotted in the wild:

mr-fox kernel: [ 1974.206977] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
mr-fox kernel: [ 1974.206980] shift exponent 64 is too large for 64-bit type 'long unsigned int'

Original report:
https://lore.kernel.org/lkml/c6e5eb81-680f-dd5c-8a81-62041a5ce50c@gmx.de/


Willy suggested that get_init_ra_size() was being called with a size of 0,
which would cause this (instead of some Huge value), so I made a follow-up
patch that only checks size for 0 and if 0, defaults it to 32 (pages).

---
 mm/readahead.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- linux-5.10.1.orig/mm/readahead.c
+++ linux-5.10.1/mm/readahead.c
@@ -310,7 +310,11 @@ void force_page_cache_ra(struct readahea
  */
 static unsigned long get_init_ra_size(unsigned long size, unsigned long max)
 {
-	unsigned long newsize = roundup_pow_of_two(size);
+	unsigned long newsize;
+
+	if (!size)
+		size = 32;
+	newsize = roundup_pow_of_two(size);
 
 	if (newsize <= max / 32)
 		newsize = newsize * 4;


Toralf has only seen this problem one time.


> Also, everything in there *should* be unsigned, because a negative
> readahead is semantically nonsensical.  Is our handling of this
> inherently unsigned quantity incorrect somewhere?
> 
>> --- linux-5.10.1.orig/mm/readahead.c
>> +++ linux-5.10.1/mm/readahead.c
>>
>> ...
>>
>> @@ -303,14 +304,21 @@ void force_page_cache_ra(struct readahea
>>  }
>>  
>>  /*
>> - * Set the initial window size, round to next power of 2 and square
>> + * Set the initial window size, round to next power of 2
>>   * for small size, x 4 for medium, and x 2 for large
>>   * for 128k (32 page) max ra
>>   * 1-8 page = 32k initial, > 8 page = 128k initial
>>   */
>>  static unsigned long get_init_ra_size(unsigned long size, unsigned long max)
>>  {
>> -	unsigned long newsize = roundup_pow_of_two(size);
>> +	unsigned long newsize;
>> +
>> +	if ((signed long)size < 0) { /* high bit is set: ultra-large ra req */
>> +		WARN_ONCE(1, "%s: size=0x%lx\n", __func__, size);
>> +		size = -size;	/* really only need to flip the high/sign bit */
>> +	}
>> +
>> +	newsize = roundup_pow_of_two(size);
> 
> Is there any way in which userspace can deliberately trigger warning?
> Via sys_readadhead() or procfs tuning or whatever?
> 
> I guess that permitting a user-triggerable WARN_ONCE() isn't a huuuuge
> problem - it isn't a DoS if it only triggers a single time.  It does
> permit the malicious user to disable future valid warnings, but I don't
> see what incentive there would be for this.  But still, it seems
> desirable to avoid it.

Sure. I think that we can drop RFC patches 1/2 and 2/2 and just consider the
other one above.
Toralf Förster Dec. 29, 2020, 6:01 p.m. UTC | #3
On 12/23/20 2:50 AM, Randy Dunlap wrote:
>> What motivates this change?  Is there any reason to think this can
>> happen?
> Spotted in the wild:
I run 2 hardened Gentoo systems, a server and a desktop.

I patched the server with this:

mr-fox ~ # cat ubsan.patch
--- linux-5.10.1.orig/mm/readahead.c
+++ linux-5.10.1/mm/readahead.c
@@ -310,7 +310,11 @@ void force_page_cache_ra(struct readahea
   */
  static unsigned long get_init_ra_size(unsigned long size, unsigned
long max)
  {
-       unsigned long newsize = roundup_pow_of_two(size);
+       unsigned long newsize;
+
+       if (!size)
+               size = 32;
+       newsize = roundup_pow_of_two(size);

         if (newsize <= max / 32)
                 newsize = newsize * 4;



and the issue did no longer occurred at the server (5.10.2).

I did not patched the desktop system and the issue occurred still 3
times since 21th of december (5.10.2/3)

--
Toralf
Randy Dunlap Dec. 29, 2020, 6:11 p.m. UTC | #4
On 12/29/20 10:01 AM, Toralf Förster wrote:
> On 12/23/20 2:50 AM, Randy Dunlap wrote:
>>> What motivates this change?  Is there any reason to think this can
>>> happen?
>> Spotted in the wild:
> I run 2 hardened Gentoo systems, a server and a desktop.
> 
> I patched the server with this:
> 
> mr-fox ~ # cat ubsan.patch
> --- linux-5.10.1.orig/mm/readahead.c
> +++ linux-5.10.1/mm/readahead.c
> @@ -310,7 +310,11 @@ void force_page_cache_ra(struct readahea
>   */
>  static unsigned long get_init_ra_size(unsigned long size, unsigned
> long max)
>  {
> -       unsigned long newsize = roundup_pow_of_two(size);
> +       unsigned long newsize;
> +
> +       if (!size)
> +               size = 32;
> +       newsize = roundup_pow_of_two(size);
> 
>         if (newsize <= max / 32)
>                 newsize = newsize * 4;
> 
> 
> 
> and the issue did no longer occurred at the server (5.10.2).
> 
> I did not patched the desktop system and the issue occurred still 3
> times since 21th of december (5.10.2/3)

Yes, that's the patch that I posted on 2020-DEC-22.

Looks like I should submit a real patch for that.

thanks.
Randy Dunlap Dec. 29, 2020, 8 p.m. UTC | #5
On 12/29/20 10:11 AM, Randy Dunlap wrote:
> On 12/29/20 10:01 AM, Toralf Förster wrote:
>> On 12/23/20 2:50 AM, Randy Dunlap wrote:
>>>> What motivates this change?  Is there any reason to think this can
>>>> happen?
>>> Spotted in the wild:
>> I run 2 hardened Gentoo systems, a server and a desktop.
>>
>> I patched the server with this:
>>
>> mr-fox ~ # cat ubsan.patch
>> --- linux-5.10.1.orig/mm/readahead.c
>> +++ linux-5.10.1/mm/readahead.c
>> @@ -310,7 +310,11 @@ void force_page_cache_ra(struct readahea
>>   */
>>  static unsigned long get_init_ra_size(unsigned long size, unsigned
>> long max)
>>  {
>> -       unsigned long newsize = roundup_pow_of_two(size);
>> +       unsigned long newsize;
>> +
>> +       if (!size)
>> +               size = 32;
>> +       newsize = roundup_pow_of_two(size);
>>
>>         if (newsize <= max / 32)
>>                 newsize = newsize * 4;
>>
>>
>>
>> and the issue did no longer occurred at the server (5.10.2).
>>
>> I did not patched the desktop system and the issue occurred still 3
>> times since 21th of december (5.10.2/3)
> 
> Yes, that's the patch that I posted on 2020-DEC-22.
> 
> Looks like I should submit a real patch for that.
> 
> thanks.
> 

Hi Toralf,

Do you want either or both of your
Reported-by: and Tested-by: on the patch?

thanks.
Toralf Förster Dec. 29, 2020, 8:46 p.m. UTC | #6
On 12/29/20 9:00 PM, Randy Dunlap wrote:
> Hi Toralf,
>
> Do you want either or both of your
> Reported-by: and Tested-by: on the patch?
>
> thanks.
> -- ~Randy
The first is enough, for a Tested-by: was the time frame too short IMO.

--
Toralf
diff mbox series

Patch

--- linux-5.10.1.orig/mm/readahead.c
+++ linux-5.10.1/mm/readahead.c
@@ -14,6 +14,7 @@ 
 #include <linux/export.h>
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
+#include <linux/bug.h>
 #include <linux/task_io_accounting_ops.h>
 #include <linux/pagevec.h>
 #include <linux/pagemap.h>
@@ -303,14 +304,21 @@  void force_page_cache_ra(struct readahea
 }
 
 /*
- * Set the initial window size, round to next power of 2 and square
+ * Set the initial window size, round to next power of 2
  * for small size, x 4 for medium, and x 2 for large
  * for 128k (32 page) max ra
  * 1-8 page = 32k initial, > 8 page = 128k initial
  */
 static unsigned long get_init_ra_size(unsigned long size, unsigned long max)
 {
-	unsigned long newsize = roundup_pow_of_two(size);
+	unsigned long newsize;
+
+	if ((signed long)size < 0) { /* high bit is set: ultra-large ra req */
+		WARN_ONCE(1, "%s: size=0x%lx\n", __func__, size);
+		size = -size;	/* really only need to flip the high/sign bit */
+	}
+
+	newsize = roundup_pow_of_two(size);
 
 	if (newsize <= max / 32)
 		newsize = newsize * 4;