Message ID | 20250120082908.4162780-1-hyesoo.yu@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: slub: Panic if the object corruption is checked. | expand |
On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote: > If a slab object is corrupted or an error occurs in its internal > value, continuing after restoration may cause other side effects. > At this point, it is difficult to debug because the problem occurred > in the past. A flag has been added that can cause a panic when there > is a problem with the object. > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 Linux does not use Change IDs. Please omit these from future patches. Panicing is a very unfriendly approach. I think a better approach would be to freeze the slab where corruption is detected. That is, no future objects are allocated from that slab, and attempts to free objects from that slab become no-ops. I don't think that should be hard to implement.
On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > If a slab object is corrupted or an error occurs in its internal > value, continuing after restoration may cause other side effects. > At this point, it is difficult to debug because the problem occurred > in the past. A flag has been added that can cause a panic when there > is a problem with the object. Hi Hyesoo, I'm concerned about this because it goes against the effort to avoid introducing new BUG() calls [1]. And I think it would be more appropriate to use existing panic_on_warn functionality [2] which causes a panic on WARN(), rather than introducing a SLUB-specific knob to do the same thing. However SLUB does not call WARN() and uses pr_err() instead when reporting an error. Vlastimil and I talked about changing it to use WARN() a while ago [3], but neither of us have done that yet. Probably you may want to look at it, as it also aligns with your purpose? FYI, if you would like to work on it, please make sure that it WARN() is suppressed during kunit test. [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz Best, Hyeonggon > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 > --- > Documentation/mm/slub.rst | 2 ++ > include/linux/slab.h | 4 ++++ > mm/slub.c | 14 ++++++++++++++ > 3 files changed, 20 insertions(+) > > diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst > index 84ca1dc94e5e..ce58525db93d 100644 > --- a/Documentation/mm/slub.rst > +++ b/Documentation/mm/slub.rst > @@ -53,6 +53,7 @@ Possible debug options are:: > U User tracking (free and alloc) > T Trace (please only use on single slabs) > A Enable failslab filter mark for the cache > + C Panic if object corruption is checked. > O Switch debugging off for caches that would have > caused higher minimum slab orders > - Switch all debugging off (useful if the kernel is > @@ -113,6 +114,7 @@ options from the ``slab_debug`` parameter translate to the following files:: > U store_user > T trace > A failslab > + C corruption_panic > > failslab file is writable, so writing 1 or 0 will enable or disable > the option at runtime. Write returns -EINVAL if cache is an alias. > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 10a971c2bde3..4391c30564d6 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -31,6 +31,7 @@ enum _slab_flag_bits { > _SLAB_CACHE_DMA32, > _SLAB_STORE_USER, > _SLAB_PANIC, > + _SLAB_CORRUPTION_PANIC, > _SLAB_TYPESAFE_BY_RCU, > _SLAB_TRACE, > #ifdef CONFIG_DEBUG_OBJECTS > @@ -97,6 +98,9 @@ enum _slab_flag_bits { > #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) > /* Panic if kmem_cache_create() fails */ > #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) > +/* Panic if object corruption is checked */ > +#define SLAB_CORRUPTION_PANIC __SLAB_FLAG_BIT(_SLAB_CORRUPTION_PANIC) > + > /** > * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! > * > diff --git a/mm/slub.c b/mm/slub.c > index 48cefc969480..36a8dabf1349 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1306,6 +1306,8 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) > fault, end - 1, fault - start); > print_section(KERN_ERR, "Padding ", pad, remainder); > > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > + > restore_bytes(s, "slab padding", POISON_INUSE, fault, end); > } > > @@ -1389,6 +1391,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, > if (!ret && !slab_in_kunit_test()) { > print_trailer(s, slab, object); > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > + > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > } > > return ret; > @@ -1689,6 +1693,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) > case 'a': > *flags |= SLAB_FAILSLAB; > break; > + case 'c': > + *flags |= SLAB_CORRUPTION_PANIC; > + break; > case 'o': > /* > * Avoid enabling debugging on caches if its minimum > @@ -6874,6 +6881,12 @@ static ssize_t store_user_show(struct kmem_cache *s, char *buf) > > SLAB_ATTR_RO(store_user); > > +static ssize_t corruption_panic_show(struct kmem_cache *s, char *buf) > +{ > + return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_CORRUPTION_PANIC)); > +} > +SLAB_ATTR_RO(corruption_panic); > + > static ssize_t validate_show(struct kmem_cache *s, char *buf) > { > return 0; > @@ -7092,6 +7105,7 @@ static struct attribute *slab_attrs[] = { > &red_zone_attr.attr, > &poison_attr.attr, > &store_user_attr.attr, > + &corruption_panic_attr.attr, > &validate_attr.attr, > #endif > #ifdef CONFIG_ZONE_DMA > -- > 2.48.0 >
On Mon, Jan 20, 2025 at 03:36:08PM +0000, Matthew Wilcox wrote: > On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote: > > If a slab object is corrupted or an error occurs in its internal > > value, continuing after restoration may cause other side effects. > > At this point, it is difficult to debug because the problem occurred > > in the past. A flag has been added that can cause a panic when there > > is a problem with the object. > > > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 > > Linux does not use Change IDs. Please omit these from future patches. > > Panicing is a very unfriendly approach. I think a better approach would > be to freeze the slab where corruption is detected. That is, no future > objects are allocated from that slab, and attempts to free objects from > that slab become no-ops. I don't think that should be hard to implement. > Thanks you for your responce. That is my mistake. I will remove the change ID. I agree that freezing is better than recovery or panic for the system's stability. However what I want from the patch is not just to make the system run stably. I need to immediately trigger a panic to investigate the slub. I would like to analyze the corrupted data at that moment to check issues like cache problem, user errors, system clock frequency and similar problems, not just passing by without any issues. However I agree that panic is not a friendly approach. I will modify it to notify the problem using warn() and then use panic_on_warn to trigger panic. Thanks, Regards.
On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: > On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > > > If a slab object is corrupted or an error occurs in its internal > > value, continuing after restoration may cause other side effects. > > At this point, it is difficult to debug because the problem occurred > > in the past. A flag has been added that can cause a panic when there > > is a problem with the object. > > Hi Hyesoo, > > I'm concerned about this because it goes against the effort to avoid > introducing new BUG() calls [1]. > > And I think it would be more appropriate to use existing panic_on_warn > functionality [2] which causes > a panic on WARN(), rather than introducing a SLUB-specific knob to do > the same thing. > > However SLUB does not call WARN() and uses pr_err() instead when > reporting an error. > Vlastimil and I talked about changing it to use WARN() a while ago > [3], but neither of us > have done that yet. > > Probably you may want to look at it, as it also aligns with your purpose? > FYI, if you would like to work on it, please make sure that it WARN() > is suppressed during kunit test. > > [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on > [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn > [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz > > Best, > Hyeonggon Thanks for response. Using warn() instead of panic, is a great idea. Thanks for pointing out what I missed. The next patch will be changed to use warn(). Thanks. > > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > > Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 > > --- > > Documentation/mm/slub.rst | 2 ++ > > include/linux/slab.h | 4 ++++ > > mm/slub.c | 14 ++++++++++++++ > > 3 files changed, 20 insertions(+) > > > > diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst > > index 84ca1dc94e5e..ce58525db93d 100644 > > --- a/Documentation/mm/slub.rst > > +++ b/Documentation/mm/slub.rst > > @@ -53,6 +53,7 @@ Possible debug options are:: > > U User tracking (free and alloc) > > T Trace (please only use on single slabs) > > A Enable failslab filter mark for the cache > > + C Panic if object corruption is checked. > > O Switch debugging off for caches that would have > > caused higher minimum slab orders > > - Switch all debugging off (useful if the kernel is > > @@ -113,6 +114,7 @@ options from the ``slab_debug`` parameter translate to the following files:: > > U store_user > > T trace > > A failslab > > + C corruption_panic > > > > failslab file is writable, so writing 1 or 0 will enable or disable > > the option at runtime. Write returns -EINVAL if cache is an alias. > > diff --git a/include/linux/slab.h b/include/linux/slab.h > > index 10a971c2bde3..4391c30564d6 100644 > > --- a/include/linux/slab.h > > +++ b/include/linux/slab.h > > @@ -31,6 +31,7 @@ enum _slab_flag_bits { > > _SLAB_CACHE_DMA32, > > _SLAB_STORE_USER, > > _SLAB_PANIC, > > + _SLAB_CORRUPTION_PANIC, > > _SLAB_TYPESAFE_BY_RCU, > > _SLAB_TRACE, > > #ifdef CONFIG_DEBUG_OBJECTS > > @@ -97,6 +98,9 @@ enum _slab_flag_bits { > > #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) > > /* Panic if kmem_cache_create() fails */ > > #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) > > +/* Panic if object corruption is checked */ > > +#define SLAB_CORRUPTION_PANIC __SLAB_FLAG_BIT(_SLAB_CORRUPTION_PANIC) > > + > > /** > > * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! > > * > > diff --git a/mm/slub.c b/mm/slub.c > > index 48cefc969480..36a8dabf1349 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -1306,6 +1306,8 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) > > fault, end - 1, fault - start); > > print_section(KERN_ERR, "Padding ", pad, remainder); > > > > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > > + > > restore_bytes(s, "slab padding", POISON_INUSE, fault, end); > > } > > > > @@ -1389,6 +1391,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, > > if (!ret && !slab_in_kunit_test()) { > > print_trailer(s, slab, object); > > add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); > > + > > + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); > > } > > > > return ret; > > @@ -1689,6 +1693,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) > > case 'a': > > *flags |= SLAB_FAILSLAB; > > break; > > + case 'c': > > + *flags |= SLAB_CORRUPTION_PANIC; > > + break; > > case 'o': > > /* > > * Avoid enabling debugging on caches if its minimum > > @@ -6874,6 +6881,12 @@ static ssize_t store_user_show(struct kmem_cache *s, char *buf) > > > > SLAB_ATTR_RO(store_user); > > > > +static ssize_t corruption_panic_show(struct kmem_cache *s, char *buf) > > +{ > > + return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_CORRUPTION_PANIC)); > > +} > > +SLAB_ATTR_RO(corruption_panic); > > + > > static ssize_t validate_show(struct kmem_cache *s, char *buf) > > { > > return 0; > > @@ -7092,6 +7105,7 @@ static struct attribute *slab_attrs[] = { > > &red_zone_attr.attr, > > &poison_attr.attr, > > &store_user_attr.attr, > > + &corruption_panic_attr.attr, > > &validate_attr.attr, > > #endif > > #ifdef CONFIG_ZONE_DMA > > -- > > 2.48.0 > > >
On 1/21/2025 9:54 AM, Hyesoo Yu wrote: > On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: >> On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: >>> >>> If a slab object is corrupted or an error occurs in its internal >>> value, continuing after restoration may cause other side effects. >>> At this point, it is difficult to debug because the problem occurred >>> in the past. A flag has been added that can cause a panic when there >>> is a problem with the object. >> >> Hi Hyesoo, >> >> I'm concerned about this because it goes against the effort to avoid >> introducing new BUG() calls [1]. >> >> And I think it would be more appropriate to use existing panic_on_warn >> functionality [2] which causes >> a panic on WARN(), rather than introducing a SLUB-specific knob to do >> the same thing. >> >> However SLUB does not call WARN() and uses pr_err() instead when >> reporting an error. >> Vlastimil and I talked about changing it to use WARN() a while ago >> [3], but neither of us >> have done that yet. >> >> Probably you may want to look at it, as it also aligns with your purpose? >> FYI, if you would like to work on it, please make sure that it WARN() >> is suppressed during kunit test. >> >> [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on >> [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn >> [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz >> >> Best, >> Hyeonggon > > Thanks for response. > > Using warn() instead of panic, is a great idea. > Thanks for pointing out what I missed. Just for clarification, I think changing the common error reporting logic (like, slab_bug()) to use WARN() will be preferable to inserting new WARN()s at random points, which is what this patch does now. Best, Hyeonggon
On Tue, Jan 21, 2025 at 10:48:08AM +0900, Hyeonggon Yoo wrote: > > > On 1/21/2025 9:54 AM, Hyesoo Yu wrote: > > On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: > > > On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: > > > > > > > > If a slab object is corrupted or an error occurs in its internal > > > > value, continuing after restoration may cause other side effects. > > > > At this point, it is difficult to debug because the problem occurred > > > > in the past. A flag has been added that can cause a panic when there > > > > is a problem with the object. > > > > > > Hi Hyesoo, > > > > > > I'm concerned about this because it goes against the effort to avoid > > > introducing new BUG() calls [1]. > > > > > > And I think it would be more appropriate to use existing panic_on_warn > > > functionality [2] which causes > > > a panic on WARN(), rather than introducing a SLUB-specific knob to do > > > the same thing. > > > > > > However SLUB does not call WARN() and uses pr_err() instead when > > > reporting an error. > > > Vlastimil and I talked about changing it to use WARN() a while ago > > > [3], but neither of us > > > have done that yet. > > > > > > Probably you may want to look at it, as it also aligns with your purpose? > > > FYI, if you would like to work on it, please make sure that it WARN() > > > is suppressed during kunit test. > > > > > > [1] https://docs.kernel.org/process/deprecated.html#bug-and-bug-on > > > [2] https://www.kernel.org/doc/html/v6.9/admin-guide/sysctl/kernel.html#panic-on-warn > > > [3] https://lore.kernel.org/linux-mm/d4219cd9-32d3-4697-93b9-6a44bf77d50c@suse.cz > > > > > > Best, > > > Hyeonggon > > > > Thanks for response. > > > > Using warn() instead of panic, is a great idea. > > Thanks for pointing out what I missed. > > Just for clarification, I think changing the common error reporting > logic (like, slab_bug()) to use WARN() will be preferable to inserting > new WARN()s at random points, which is what this patch does now. > > Best, > Hyeonggon > Thanks you for clarification. Actually, I considered adding BUG_ON() to slab_bug. However if we add BUG_ON() to slab_bug, it will prevent many meaningful error log from being printed subsequently. As you know, slab_bug is the log that usually is printed at the biginning of a bug in the slab. As a result, it would be difficult to figure out the problems based on the logs during our large-scale test-bed. Similary, even if I use WARN() in slab_bug, we won't be able to obtain the logs when panic_on_warn is enabled. I don't think it is useful to include WARN in slab_bug. Instead, I will implement a solution where WARN is only used in slab_fix before slab object is restored. If I add it to slab_fix, I think warning is suppressed on kunit test by slab_add_kunit_errors handling. Thanks, Regards.
On 1/21/25 1:40 AM, Hyesoo Yu wrote: > On Mon, Jan 20, 2025 at 03:36:08PM +0000, Matthew Wilcox wrote: >> On Mon, Jan 20, 2025 at 05:28:21PM +0900, Hyesoo Yu wrote: >>> If a slab object is corrupted or an error occurs in its internal >>> value, continuing after restoration may cause other side effects. >>> At this point, it is difficult to debug because the problem occurred >>> in the past. A flag has been added that can cause a panic when there >>> is a problem with the object. >>> >>> Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> >>> Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 >> >> Linux does not use Change IDs. Please omit these from future patches. >> >> Panicing is a very unfriendly approach. I think a better approach would >> be to freeze the slab where corruption is detected. That is, no future >> objects are allocated from that slab, and attempts to free objects from >> that slab become no-ops. I don't think that should be hard to implement. Freezing of slab is already done in some cases when corruption is detected - all objects are marked as used, and further freeing attempts on the slab are discarded. Perhaps not all cases, which could be improved. > Thanks you for your responce. That is my mistake. I will remove the change ID. > > I agree that freezing is better than recovery or panic for the system's stability. > However what I want from the patch is not just to make the system run stably. > I need to immediately trigger a panic to investigate the slub. IMHO it's a valid goal to panic more quickly when debugging, and enabling slub_debug means debugging is in progress (as opposed to normal production when we try to avoid panic). But making it possible to reuse the general panic_on_warn mechanism (which can be also expected to be enabled when debugging) is indeed preferable to introducing a new slab-specific flag. > I would like to analyze the corrupted data at that moment to check issues > like cache problem, user errors, system clock frequency and similar problems, > not just passing by without any issues. > > However I agree that panic is not a friendly approach. > I will modify it to notify the problem using warn() and then use > panic_on_warn to trigger panic. > > Thanks, > Regards. > >
On 1/21/25 3:32 AM, Hyesoo Yu wrote: > On Tue, Jan 21, 2025 at 10:48:08AM +0900, Hyeonggon Yoo wrote: >> >> >> On 1/21/2025 9:54 AM, Hyesoo Yu wrote: >>> On Tue, Jan 21, 2025 at 12:41:01AM +0900, Hyeonggon Yoo wrote: >>>> On Mon, Jan 20, 2025 at 5:30 PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote: >>> >>> Thanks for response. >>> >>> Using warn() instead of panic, is a great idea. >>> Thanks for pointing out what I missed. >> >> Just for clarification, I think changing the common error reporting >> logic (like, slab_bug()) to use WARN() will be preferable to inserting >> new WARN()s at random points, which is what this patch does now. >> >> Best, >> Hyeonggon >> > > Thanks you for clarification. > > Actually, I considered adding BUG_ON() to slab_bug. However if we add BUG_ON() to slab_bug, > it will prevent many meaningful error log from being printed subsequently. > As you know, slab_bug is the log that usually is printed at the biginning of a bug > in the slab. As a result, it would be difficult to figure out the problems based on > the logs during our large-scale test-bed. > > Similary, even if I use WARN() in slab_bug, we won't be able to obtain the logs > when panic_on_warn is enabled. I don't think it is useful to include WARN in slab_bug. > > Instead, I will implement a solution where WARN is only used in slab_fix before > slab object is restored. If I add it to slab_fix, I think warning is suppressed on > kunit test by slab_add_kunit_errors handling. Right, makes sense to only do the WARN() after printing the debugging logs. Thanks. > Thanks, > Regards. > >
Hi Hyesoo, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/Hyesoo-Yu/mm-slub-Panic-if-the-object-corruption-is-checked/20250120-163233 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20250120082908.4162780-1-hyesoo.yu%40samsung.com patch subject: [PATCH] mm: slub: Panic if the object corruption is checked. config: mips-randconfig-r111-20250121 (https://download.01.org/0day-ci/archive/20250121/202501212026.lUnLNhv6-lkp@intel.com/config) compiler: mips-linux-gcc (GCC) 14.2.0 reproduce: (https://download.01.org/0day-ci/archive/20250121/202501212026.lUnLNhv6-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501212026.lUnLNhv6-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> mm/slub.c:1308:9: sparse: sparse: cast from restricted slab_flags_t mm/slub.c:1394:17: sparse: sparse: cast from restricted slab_flags_t mm/slub.c:4440:47: sparse: sparse: context imbalance in '__slab_free' - unexpected unlock vim +1308 mm/slub.c 1273 1274 /* Check the pad bytes at the end of a slab page */ 1275 static pad_check_attributes void 1276 slab_pad_check(struct kmem_cache *s, struct slab *slab) 1277 { 1278 u8 *start; 1279 u8 *fault; 1280 u8 *end; 1281 u8 *pad; 1282 int length; 1283 int remainder; 1284 1285 if (!(s->flags & SLAB_POISON)) 1286 return; 1287 1288 start = slab_address(slab); 1289 length = slab_size(slab); 1290 end = start + length; 1291 remainder = length % s->size; 1292 if (!remainder) 1293 return; 1294 1295 pad = end - remainder; 1296 metadata_access_enable(); 1297 fault = memchr_inv(kasan_reset_tag(pad), POISON_INUSE, remainder); 1298 metadata_access_disable(); 1299 if (!fault) 1300 return; 1301 while (end > fault && end[-1] == POISON_INUSE) 1302 end--; 1303 1304 slab_err(s, slab, "Padding overwritten. 0x%p-0x%p @offset=%tu", 1305 fault, end - 1, fault - start); 1306 print_section(KERN_ERR, "Padding ", pad, remainder); 1307 > 1308 BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); 1309 1310 restore_bytes(s, "slab padding", POISON_INUSE, fault, end); 1311 } 1312
diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst index 84ca1dc94e5e..ce58525db93d 100644 --- a/Documentation/mm/slub.rst +++ b/Documentation/mm/slub.rst @@ -53,6 +53,7 @@ Possible debug options are:: U User tracking (free and alloc) T Trace (please only use on single slabs) A Enable failslab filter mark for the cache + C Panic if object corruption is checked. O Switch debugging off for caches that would have caused higher minimum slab orders - Switch all debugging off (useful if the kernel is @@ -113,6 +114,7 @@ options from the ``slab_debug`` parameter translate to the following files:: U store_user T trace A failslab + C corruption_panic failslab file is writable, so writing 1 or 0 will enable or disable the option at runtime. Write returns -EINVAL if cache is an alias. diff --git a/include/linux/slab.h b/include/linux/slab.h index 10a971c2bde3..4391c30564d6 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -31,6 +31,7 @@ enum _slab_flag_bits { _SLAB_CACHE_DMA32, _SLAB_STORE_USER, _SLAB_PANIC, + _SLAB_CORRUPTION_PANIC, _SLAB_TYPESAFE_BY_RCU, _SLAB_TRACE, #ifdef CONFIG_DEBUG_OBJECTS @@ -97,6 +98,9 @@ enum _slab_flag_bits { #define SLAB_STORE_USER __SLAB_FLAG_BIT(_SLAB_STORE_USER) /* Panic if kmem_cache_create() fails */ #define SLAB_PANIC __SLAB_FLAG_BIT(_SLAB_PANIC) +/* Panic if object corruption is checked */ +#define SLAB_CORRUPTION_PANIC __SLAB_FLAG_BIT(_SLAB_CORRUPTION_PANIC) + /** * define SLAB_TYPESAFE_BY_RCU - **WARNING** READ THIS! * diff --git a/mm/slub.c b/mm/slub.c index 48cefc969480..36a8dabf1349 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1306,6 +1306,8 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) fault, end - 1, fault - start); print_section(KERN_ERR, "Padding ", pad, remainder); + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); + restore_bytes(s, "slab padding", POISON_INUSE, fault, end); } @@ -1389,6 +1391,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, if (!ret && !slab_in_kunit_test()) { print_trailer(s, slab, object); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + BUG_ON(s->flags & SLAB_CORRUPTION_PANIC); } return ret; @@ -1689,6 +1693,9 @@ parse_slub_debug_flags(char *str, slab_flags_t *flags, char **slabs, bool init) case 'a': *flags |= SLAB_FAILSLAB; break; + case 'c': + *flags |= SLAB_CORRUPTION_PANIC; + break; case 'o': /* * Avoid enabling debugging on caches if its minimum @@ -6874,6 +6881,12 @@ static ssize_t store_user_show(struct kmem_cache *s, char *buf) SLAB_ATTR_RO(store_user); +static ssize_t corruption_panic_show(struct kmem_cache *s, char *buf) +{ + return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_CORRUPTION_PANIC)); +} +SLAB_ATTR_RO(corruption_panic); + static ssize_t validate_show(struct kmem_cache *s, char *buf) { return 0; @@ -7092,6 +7105,7 @@ static struct attribute *slab_attrs[] = { &red_zone_attr.attr, &poison_attr.attr, &store_user_attr.attr, + &corruption_panic_attr.attr, &validate_attr.attr, #endif #ifdef CONFIG_ZONE_DMA
If a slab object is corrupted or an error occurs in its internal value, continuing after restoration may cause other side effects. At this point, it is difficult to debug because the problem occurred in the past. A flag has been added that can cause a panic when there is a problem with the object. Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> Change-Id: I4e7e5e0ec3421a7f6c84d591db052f79d3775493 --- Documentation/mm/slub.rst | 2 ++ include/linux/slab.h | 4 ++++ mm/slub.c | 14 ++++++++++++++ 3 files changed, 20 insertions(+)