Message ID | 26518.1565273511@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add a slab corruption tracepoint | expand |
On 8/8/19 4:11 PM, David Howells wrote: > > Add a tracepoint to log slab corruption messages to the trace log also so > that it's easier to correlate with other trace messages that are being used > to track refcounting. > > Signed-off-by: David Howells <dhowells@redhat.com> Shouldn't that include SLUB? I'm surprised to see SLAB used for debugging refcounting these days, as the SLUB debugging features are vastly superior, while SLAB ones are being sometimes found to be broken for years and removed. > --- > include/trace/events/kmem.h | 23 +++++++++++++++++++++++ > mm/slab.c | 2 ++ > 2 files changed, 25 insertions(+) > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > index eb57e3037deb..c96f3b03a6e2 100644 > --- a/include/trace/events/kmem.h > +++ b/include/trace/events/kmem.h > @@ -315,6 +315,29 @@ TRACE_EVENT(mm_page_alloc_extfrag, > __entry->change_ownership) > ); > > +TRACE_EVENT(slab_corruption, > + TP_PROTO(const char *slab, void *object, unsigned int size, unsigned int offset), > + > + TP_ARGS(slab, object, size, offset), > + > + TP_STRUCT__entry( > + __field( void *, object ) > + __field( unsigned int, size ) > + __field( unsigned int, offset ) > + __array( char, slab, 16 ) > + ), > + > + TP_fast_assign( > + strlcpy(__entry->slab, slab, sizeof(__entry->slab)); > + __entry->object = object; > + __entry->size = size; > + __entry->offset = offset; > + ), > + > + TP_printk("slab=%s obj=%px size=%x off=%x", > + __entry->slab, __entry->object, __entry->size, __entry->offset) > +); > + > #endif /* _TRACE_KMEM_H */ > > /* This part must be outside protection */ > diff --git a/mm/slab.c b/mm/slab.c > index 9df370558e5d..47c5a86e39be 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -1527,6 +1527,8 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp) > print_tainted(), cachep->name, > realobj, size); > print_objinfo(cachep, objp, 0); > + trace_slab_corruption(cachep->name, realobj, > + size, i); > } > /* Hexdump the affected line */ > i = (i / 16) * 16; >
Vlastimil Babka <vbabka@suse.cz> wrote: > Shouldn't that include SLUB? I've been using SLAB. Looking in SLUB it's much less obvious where to insert the tracepoint. check_bytes_and_report() maybe? > I'm surprised to see SLAB used for debugging refcounting these days, The refcount debugging in question is not in SLAB, but rather in rxrpc; it's just SLAB detected the resulting memory corruption. rxrpc has tracepoints that track the refcounting, but SLAB printks a message to indicate the corruption and it's a bit tricky to work out where the printk happened with respect to the trace. > as the SLUB debugging features are vastly superior, while SLAB ones are > being sometimes found to be broken for years and removed. If SLUB is better than SLAB, shouldn't SLAB be removed? David
On 8/19/19 11:03 AM, David Howells wrote: > Vlastimil Babka <vbabka@suse.cz> wrote: > >> Shouldn't that include SLUB? > > I've been using SLAB. Looking in SLUB it's much less obvious where to insert > the tracepoint. check_bytes_and_report() maybe? Perhaps object_err() and slab_err(). >> I'm surprised to see SLAB used for debugging refcounting these days, > > The refcount debugging in question is not in SLAB, but rather in rxrpc; it's > just SLAB detected the resulting memory corruption. I understand, and it's what I meant - SLUB is better suited to debug wrong uses (use-after-free, double free) that may be e.g. result of a refcounting bug. You can configure it to perform the extra checks only on the single cache you care about, and when it detects inconsistency it will also print stacktraces of who last allocated and freed the object. > rxrpc has tracepoints > that track the refcounting, but SLAB printks a message to indicate the > corruption and it's a bit tricky to work out where the printk happened with > respect to the trace. Could you correlate by timestamps? >> as the SLUB debugging features are vastly superior, while SLAB ones are >> being sometimes found to be broken for years and removed. > > If SLUB is better than SLAB, shouldn't SLAB be removed? It has been proposed before, but SLAB still has its users, who don't care about the debugging scenarios that much. > David >
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index eb57e3037deb..c96f3b03a6e2 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -315,6 +315,29 @@ TRACE_EVENT(mm_page_alloc_extfrag, __entry->change_ownership) ); +TRACE_EVENT(slab_corruption, + TP_PROTO(const char *slab, void *object, unsigned int size, unsigned int offset), + + TP_ARGS(slab, object, size, offset), + + TP_STRUCT__entry( + __field( void *, object ) + __field( unsigned int, size ) + __field( unsigned int, offset ) + __array( char, slab, 16 ) + ), + + TP_fast_assign( + strlcpy(__entry->slab, slab, sizeof(__entry->slab)); + __entry->object = object; + __entry->size = size; + __entry->offset = offset; + ), + + TP_printk("slab=%s obj=%px size=%x off=%x", + __entry->slab, __entry->object, __entry->size, __entry->offset) +); + #endif /* _TRACE_KMEM_H */ /* This part must be outside protection */ diff --git a/mm/slab.c b/mm/slab.c index 9df370558e5d..47c5a86e39be 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1527,6 +1527,8 @@ static void check_poison_obj(struct kmem_cache *cachep, void *objp) print_tainted(), cachep->name, realobj, size); print_objinfo(cachep, objp, 0); + trace_slab_corruption(cachep->name, realobj, + size, i); } /* Hexdump the affected line */ i = (i / 16) * 16;
Add a tracepoint to log slab corruption messages to the trace log also so that it's easier to correlate with other trace messages that are being used to track refcounting. Signed-off-by: David Howells <dhowells@redhat.com> --- include/trace/events/kmem.h | 23 +++++++++++++++++++++++ mm/slab.c | 2 ++ 2 files changed, 25 insertions(+)